xaminmo: (Josh 2014)
Anyone in the ADSM/TSM/Spectrum Protect land, could you take a moment to add a vote to this RFE?

It's a request for the client to support more than 1 producer thread per filesystem, and more than 4 producer threads per DSMC instance. It's dated code that doesn't take into account high performance disk subsystems.

https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=86101

Like:
xaminmo: Josh 2016 (Default)
It depends on if it's small files or not. I normally have a small-file pool, which is the DIRMC, VMTLMC, and TOCDESTINATION. The offsite for this is kept reclaimed down to 1 tape, and I try to restore that primary pool first.

For large file, TDP, VM and image full backups, they can ...
Read more... )
http://omnitech.net/reference/2014/10/22/after-dr-of-a-tsm-server-do-you-need-to-restore-the-primary-storage-pool-from-the-copy-cool/
xaminmo: Josh 2016 (Default)
BACKUP STGPOOL for dedupe runs about 6x slower than direct tape to tape.
Why?

1) First, the database has a huge number of random reads for dedupe rehydration.
Tack on any Dedup Deletion activity (SHOW DEDUPDELETEINFO) and anything else that's competing for DB IOPS.
FIX: Put the database on SSD or RAM backed storage.
NOTE: SSD stats are usually lies. Sustained performance is 4500-12,000 IOPS each, divided by 2 for RAID-1/10, or by 3.5 for RAID-5/6)
FIX: increase server memory and provide more for DB2 bufferpools.
NOTE: This might require manually changing bufferpools, limiting filesystem cache, etc.
FIX: Large amounts of cache for the database containers

2) Next, the file class, while sequential, still has a large number of random read IOPS.
TSM Server has no read ahead for this. It reads the chunks in order, rather than requesting a huge buffer full of chunks.
As such, streaming speed will be limited by DB latency, file-class latency, and actual read IO times.
FIX: Reduce the latency for your file class
FIX: Reduce the latency for your database
FIX: Don't do anything else during BACKUP STGPOOL.
FIX: Run your EXPIRE INVENTORY and IDENTIFY DUPLICATE after, not before.
FIX: Submit a Design Change Request (DCR) for larger chunk read cache to be used for BACKUP STGPOOL.
FIX: Submit a Design Change Request (DCR) for larger tape write buffer.

3) Last, tape buffer underruns can kill performance.
If the write buffer empties, then the tape will stop.
Before it begins again, the tape has to be repositioned backward.
For LTO drives, usually the minimum write speed is 50MB/sec.
Anything less, and you have latency and tape life consumed by "shoe shining".
FIX: Fix/improve issues 1 and 2 above.
FIX: Submit a design change request to allow TSM to interleave more threads onto the same tape at once.
FIX: Use tape drives with lower minimum speeds to prevent underruns
FIX: Don't use tape. Use virtual tape, another dedupe disk pool, or a replica target TSM server.

4) Check TSM server instrumentation.
This will show you where your time is spent, and what to upgrade next.
INSTRUMENTATION BEGIN
BACKUP STGPOOL DEDUP COPYPOOL
wait several minutes
INSTRUMENTATION END FILE=/tsm/instrumentation.out


http://omnitech.net/reference/2014/08/12/tsm-dedup-backup-stgpool-performance/
xaminmo: Josh 2016 (Default)
NDMP backups into a TSM storage pool will not be deduplicated.
If you set ENABLENASDEDUPE YES, that only affects NetApp backups.
IBM doesn't make the NDMP code, so they don't support deduplication of anything but NetApp.
That means neither IBM's v7000 Unified backups, nor any other NDMP device, get deduplicated.

As such, go ahead and have your NDMP backups go to a DISK pool or direct to tape.
Sending to your dedupe pool will just clog things up.



http://omnitech.net/reference/2014/08/12/tsm-and-ndmp/
xaminmo: Josh 2016 (Default)
This is a defect in DB2 10.5 FP1
The defect does not exist in DB2 9.7 FP6
This problem affects TSM 7.1.0.0 customers with billions of extents (over 30TB deduplicatedmay release late enough to include DB2 10.5 FP3a,

In TSM Server 7.1.0.0 on AIX (unk if limited to AIX),
when RUNSTATS parses BF_AGGREGATED_BITFILES,
and there are more than maxint unique values for BFID,
then COLCARD may become negative.

A negative column cardinality will the index for queries against it,
which will lead to slowdowns and lock escalations within TSM.
This will present as a growing dedupdelete queue, slow expire, slow BACKUP STGPOOL, and slow client backups.

This is not exactly maxint related, as maxint - colcard was higher than the number of columns by about 20%.

You can check for this by logging in to your instance user, and running:

db2 connect to tsmdb1
db2 set schema tsmdb1
db2 'select TABNAME,COLNAME,COLCARD from SYSSTAT.COLUMNS where COLCARD<-1'


The output should say "0 record(s) selected."
If it lists any negative values for tables, then that table's index will becompromised.

There is no fix for TSM Server 7.1, as no patches are available.
TSM 7.1.1 will release with DB2 10.5 FP3, which will not include a fix for this problem.
As of 2014-08-01, the problem has not been isolated yet.

The workaround is to update column cardinality to a reasonable value.
It doesn't need to be exact. An example command might be:

db2 connect to tsmdb1
db2 set schema tsmdb1
db2 "UPDATE SYSSTAT.COLUMNS SET COLCARD=3300000000 WHERE COLNAME='BFID' AND TABNAME='BF_AGGREGATED_BITFILES' AND TABSCHEMA='TSMDB1'"


There is no APAR for this, and no hits on Google for "DB2 'negative column cardinality'".
This seems slightly related to: http://www-01.ibm.com/support/docview.wss?uid=swg1IC99408

NOTE: DO NOT INSTALL DB2 FIXPACK SEPARATELY. The TSM bundled DB2 is very slightly different. Standard DB2 fixpacks are not supported. If you decide to do this, you may find command or schema problems. If it works, then you may not be able to upgrade TSM afterward without a BACKUP DB, uninstall, reinstall, RESTORE DB -- at best.

If you have a large dedupe database, your options include:
* Stay at TSM 6.x
* Monitor for negative column cardinality
* Wait for an APAR and efix from IBM.
* Wait for TSM 7.1.1.1 or TSM 7.2.0 in 2015 (or whatever versions will contain fixes).

http://omnitech.net/reference/2014/08/04/db2-10-5-0-1-negative-colcard/
xaminmo: Josh 2016 (Default)
In the past, I set up TSM.PWD as root, but this seems to not be what I needed.

I'm posting because the error messages and IBM docs don't cover this.

tsmdbmgr.log shows:
ANS2119I An invalid replication server address return code rc value = 2 was received from the server.

TSM Activity log shows:
ANR2983E Database backup terminated due to environment or setup issue related to DSMI_DIR - DB2 sqlcode -2033 sqlerrmc 168. (SESSION: 1, PROCESS: 9)

db2diag.log shows:

2014-02-26-13.54.12.425089-360 E415619A371 LEVEL: Error
PID : 15138852 TID : 1 PROC : db2vend
INSTANCE: tsminst1 NODE : 000
HOSTNAME: tsmserver
EDUID : 1
FUNCTION: DB2 UDB, database utilities, sqluvint, probe:321
DATA #1 : TSM RC, PD_DB2_TYPE_TSM_RC, 4 bytes
TSM RC=0x000000A8=168 -- see TSM API Reference for meaning.

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:656
DATA #1 : String, 134 bytes
Vendor error: rc = 11 returned from function sqluvint.
Return_code structure from vendor library /tsm/tsminst1/sqllib/adsm/libtsm.a:

DATA #2 : Hexdump, 48 bytes
0x0A00030462F0C4D0 : 0000 00A8 3332 3120 3136 3800 0000 0000 ....321 168.....
0x0A00030462F0C4E0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0A00030462F0C4F0 : 0000 0000 0000 0000 0000 0000 0000 0000 ................

EDUID : 38753 EDUNAME: db2med.35926.0 (TSMDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:696
MESSAGE : Error in vendor support code at line: 321 rc: 168

RC 168 per dsmrc.h means:
#define DSM_RC_NO_PASS_FILE 168 /* password file needed and user is
not root */

Verified everything required for this:
• passworddir points to the right directory
• DSMI_DIR points to the right directory
• dsmtca runs okay
• dsmapipw runs okay

Verified hostname info was correct

dsmffdc.log shows:
[ FFDC_GENERAL_SERVER_ERROR ]: (rdbdb.c:4200) GetOtherLogsUsageInfo failed, rc=2813, archLogDir = /tsm/arch.

Checked, and the log directory inside dsmserv.opt was typoed as /tsm/arch instead of /tsm/arc as was used to create the instance and as exists on the filesystems.

Updated dsmserv.opt and restarted tsm server. No change other than fixing Q LOG

SOLUTION:
The TSM.PWD file must be owned by the instance user, not by root.
Make sure to run the dsmapipw as the instance user, or chown the file after.

http://omnitech.net/reference/2014/02/26/tsm-7-1-config/

TSM 7.1

Feb. 20th, 2014 11:03 pm
xaminmo: (Josh 2004 Happy)
GAHHHHHHHHHH! If you install TSM client first on a clean AIX 7.1 system, TSM Server won't install. TSM Client comes with a version of xlsmp.rte that reverts the OS level to an unsupported version. You have to go find install base AIX media and install the version from there. This is a packaging oversight. Someone thought a 5 year old prerequesite was ok.

Further, you cannot call your TSM Server "tsmserver". Even though this matches the hostname requirements, Operation Center says nope with ANRI0011E.

Also, I'm absolutely required to use a password with at last 6 characters, one upper, one lower, one digit, and two nonalpha characters from a specific list.

*sigh*

A while back I tried to update TSM to 7.1 on Windows, but I installed the new Operation Center first. After that, the new deployment tool that is non-standard to everything except IBM refused to install TSM server, saying there was nothing to upgrade, but also that it couldn't install because TSM server was already installed.

But, someone, somewhere, is getting their bonus.
xaminmo: Josh 2016 (Default)
If you have 6 filesystems backing a sequential access file storage pool, and you remove one filesystem, TSM cannot calculate free space properly.

Instead of looking at the free space of the remaining filesystems, it take the total space of the filesystems, minus the volumes in that device class.

Since there may still be old volumes in the "removed" directory, it considers the device class 100% full if everything currently existing cannot fit into the remaining directories.

Note that removing a directory from a device class does not invalidate the existing volumes in that directory. So long as the directory is still accessible, the volumes will be usable.

This is a problem when you want to reduce a filesystem but not migrate 100% off of it, as there is no other way to tell TSM not to allocate new volumes in that directory other than to remove that dir from the device class.

http://omnitech.net/reference/2014/01/07/tsm-file-class-design-issue/
xaminmo: (Josh 2004 Happy)
tsm: TSM>show deduppending dedupe
ANR1015I Storage pool DEDUPE has 2,018,762,268,864 duplicate bytes pending removal.

tsm: TSM>SHOW DEDUPDELETE
****Dedup Deletion General Status****
Number of worker threads : 8
Number of active worker threads : 0
Number of chunks waiting in queue : 0

****Dedup Deletion Worker Info****
Worker thread 1 is not active
Worker thread 2 is not active
Worker thread 3 is not active
Worker thread 4 is not active
Worker thread 5 is not active
Worker thread 6 is not active
Worker thread 7 is not active
Worker thread 8 is not active
------------------------------------------
Total worker chunks queued : 0
Total worker chunks deleted : 0

tsm: TSM>q proc

Process Process Description Process Status
Number
-------- -------------------- -------------------------------------------------
1 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/00036730.BFS. State: active.
State Date/Time: 01/01/14 19:44:36. Current
Physical File(bytes): 13,453,908,907. Total
Files Processed: 32. Total Duplicate Extents
Found: 207,302. Total Duplicate Bytes Found:
27,030,923,224.
2 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/0003672C.BFS. State: active.
State Date/Time: 01/01/14 19:59:29. Current
Physical File(bytes): 82,217,208,517. Total
Files Processed: 1,110. Total Duplicate Extents
Found: 628,508. Total Duplicate Bytes Found:
99,009,523,025.
3 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/0003657E.BFS. State: active.
State Date/Time: 01/01/14 19:09:54. Current
Physical File(bytes): 32,356,415,194. Total
Files Processed: 1,799. Total Duplicate Extents
Found: 560,040. Total Duplicate Bytes Found:
87,123,137,741.
4 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/0003672F.BFS. State: active.
State Date/Time: 01/01/14 19:36:57. Current
Physical File(bytes): 2,147,746,191. Total Files
Processed: 2,701. Total Duplicate Extents Found:
565,790. Total Duplicate Bytes Found:
97,240,779,156.
5 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/0003672D.BFS. State: active.
State Date/Time: 01/01/14 18:47:32. Current
Physical File(bytes): 22,696,147,854. Total
Files Processed: 54. Total Duplicate Extents
Found: 43,421. Total Duplicate Bytes Found:
7,901,680,314.
6 Identify Duplicates Storage pool: DEDUPE. Volume:
/tsm/dedupe/00036731.BFS. State: active.
State Date/Time: 01/01/14 19:16:13. Current
Physical File(bytes): 24,424,088,494. Total
Files Processed: 6. Total Duplicate Extents
Found: 65,229. Total Duplicate Bytes Found:
14,781,615,514.
xaminmo: (Logo Tivoli Certified)
run all uninst* from /opt/IBM/tivoli
remove contents of /opt/IBM/tivoli
remove contents of /opt/tivoli
remove contents of /home/db2inst1

List the remnants in DE:
cd /usr/ibm/common/acsi/bin/
./de_lsrootiu.sh

Delete the UUID and discriminant (directory). My examples were:
./deleteRootIU.sh 2ADC4A33F09F4E85AD27963E850290C3 /opt/IBM/tivoli/tipv2
./deleteRootIU.sh 3DD9564D2E7442788584C1F35B07F2A2 /opt/IBM/tivoli/tipv2Components/TCRComponent
./deleteRootIU.sh 61AE95EAFC824C45BECFD427C959D5B7 /opt/IBM/tivoli/tipv2Components/TCRComponent
./deleteRootIU.sh 7F15FB682C80DFB90EBE3B0BF5D8EDC6 /opt/IBM/tivoli/tsmac
./deleteRootIU.sh C00DA95AFD9B7E0397153CD944B5A255 /opt/IBM/tivoli/tipv2


TAGS: admincenter admin center deploymentengine deployment engine ibm eserver tivoli force uninstall wipe
xaminmo: Josh 2016 (Default)
Admin center and reporting are installed.  I'm trying to log in to see if it works, and basic setup.

If I'm coming through a SOCKS proxy, it doesn't work at all.  "Connection Reset".

If I'm coming through port remapping, it doesn't work - "Connection Refused" - What hostname?  What IP?

If I run FF 3.6.28 (community RPM) or 3.5.13.1 (IBM BFF) on the AIX box where AdminCenter is installed, the javascript goes into a constant reload cycle.  Completely unusable as the page constantly refreshes itself.  There is no newer Firefox for AIX.

If I run FF on a Windows system in-network (Citrix), it works, to a point.  Many selectors, etc don't work in IBM's GUI tools on FF.  Selectors are missing the top item, which sucks if there is only one item.  I can't add the server.

If I run IE8 on a system in-network (Citrix), the tab crashes, and it says "This tab has been recovered."  Eventually, retrying, it gives up because whatever's in the tab just continues to crash.  It does this with all add-ons disabled too.

*sigh*
xaminmo: Josh 2016 (Default)
So, I tried installing the other way around, admin-center first. AC went on fine.
Reporting didn't fail immediately, but it won't let me pick my language.
Read more... )
The solution for this issue is to not pick to install any additional languages.

SO, did I misread the docs, or are the backwards?
I am Le Tired, so I'll have to check later.
xaminmo: Josh 2016 (Default)
Various issues I've run into and resolved.
Cut/paste out of a doc I'm working on, so the formattng isn't HTML/LJ pretty.
Read more... )
xaminmo: Josh 2016 (Default)
This was because the symlinks for the libobk shared library were incorrect, and/or permissions on the libtdp_r3.sl were incorrect, and/or the agent.lic was not readable by the DB user.

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on sbt_1 channel at 05/07/2012 17:00:48
ORA-19506: failed to create sequential file, name="WP1_aeimncnu.102340_1", parms=""
ORA-27028: skgfqcre: sbtbackup returned error
ORA-19511: Error received from media manager layer, error text:
BKI9204E: Additional support information: An exception was thrown at position: esd-rmanapplication.cpp(271) (text=Unknown exception.
).
RMAN>
specification does not match any backup in the repository
RMAN>
Recovery Manager complete.
xaminmo: (Baby poop)
This is a new, clean install of the OS, and a new, clean download of the 6.3.1 reporting tool.

daltsmrpt: /install/2012/TSM/631rpt# cat /stdout
rootRA: com.ibm.tivoli.remoteaccess.LocalUNIXProtocol@298a298a
rootRA.isProtocolAvailable(): true
Exception: Userid is not privileged. java.net.ConnectException: CTGRI0002E Session not established.
(X) commiting registry
(X) shutting down service manager
(X) cleaning up temporary directories

daltsmrpt: /install/2012/TSM/631rpt# whoami
root

daltsmrpt: /install/2012/TSM/631ac# oslevel -s
7100-01-02-1150


If I get this sorted out, I'll post about it.
(More)(Reply)
xaminmo: Josh 2016 (Default)
This has been going on intermittently for over a week, and I know I have publib issues in general.

Please fix it. This is unacceptable and really really tiring.


Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /infocenter/tsminfo/v6r3/index.jsp.

Reason: Error reading from remote server

IBM_HTTP_Server/6.1.0.39-PM46234 Apache/2.0.47 (Unix) Server at publib.boulder.ibm.com Port 80
xaminmo: Josh 2016 (Default)
https://www-304.ibm.com/support/docview.wss?uid=swg21265179

To start the script in the background, you need to adjust the following variables:

my $server = "localhost"; # server stanza for UNIX, TCPSERVERADDRESS for MSWin32
my $tcpport = "1500"; # only used in MSWin32
my $administrator = "admin"; # the admin id
my $password = "password"; # guess what?

You can easily adjust the script to collect data as you like, the default set has been found helpful in analysis of TSM server performance.

Please note that the script is unsupported and usage is at your own risk.

Note: APAR IC72251 documents a problem that the SHOW THREADS command can crash the server. The script uses this command frequently on the Windows platform, so please make sure the fix for this is applied. See the link section for APAR details. On platforms other than Windows, the SHOW THREADS usage has been greatly reduced, however it is still recommended to apply the referenced fix.

Read more... )
xaminmo: Josh 2016 (Default)
https://www-304.ibm.com/support/docview.wss?uid=swg21432937

Please make sure to run the script from a user with sufficient DB2 access rights (e.g. the instance user) with the DB2 environment properly initialized. Message DB21061E is thrown if the script fails to connect to the database.

Other than the V5 script documented with swg21265179, easiest the script is to be run on the Tivoli Storage Manager server box. If your installation paths are different to the default paths you need to adjust the path variables accordingly.

Note: APAR IC72251 documents a problem that the SHOW THREADS command can crash the server. The script uses this command frequently on the Windows platform, so please make sure the fix for this is applied. See the link section for APAR details. On platforms other than Windows, the SHOW THREADS usage has been greatly reduced, however it is still recommended to apply the referenced fix.
APAR IC79957 documents a problem that there is a rare chance that SHOW DEDUPDELETEINFO might crash the server. The script has been modified so that the command is commented out. If you have the fix installed you can reactivate the command again.

Read more... )
xaminmo: Josh 2016 (Default)
With performance like this, our 16 hour outage might possibly be only 4 hours long:

ANR1392I EXTRACTDB: Extracted 66,306,868 database entries in 2,901,496 pages and wrote 8,164,504,887 bytes in 0:10:01 (46638.27 megabytes per hour).
ANR1379I INSERTDB: Read 8,321,835,921 bytes and inserted 67,430,573 database entries in 0:10:00 (47616.00 megabytes per hour).

Read more... )

Schnazzy. 67GB going in, 56GB on the target, 5 hours and 31 minutes from TSM 5.5.1 to TSM 6.3.

Just a few touchups and rebuild diskpools on the new box. *happy*
xaminmo: Josh 2016 (Default)
I also found that DB2 is wonky in that:
* DB2 put my ACTIVELOG in C:\SERVER1 rather than C:\Tivoli\TSM\LOG as specified during loadformat
* DB2 leaves a whole lot of garbage in C:\ProgramData\IBM\DB2\DB2TSM1\SERVER1
* TSM doesn't clean up the DB2TSM1\SERVER1, even after DSMSERV REMOVEDB and DB2ICRT sequence.
* TSM decides to run a full DBB on every 10 min poll of log space, even when nothing is running and no log space is used.

I've cleared off another 20G of garbage from the drive, mostly DB2 dumps from the last instance, and restarted TSM. No new DBBs yet, which is promising.

actlog, sessions, procs, log )

Profile

xaminmo: Josh 2016 (Default)
xaminmo

August 2017

S M T W T F S
  1 2 345
6789101112
1314 151617 1819
20212223242526
27 28293031  

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 23rd, 2017 06:17 pm
Powered by Dreamwidth Studios