1 / 70

Storage Technical Exchange Tivoli Storage Manager DB2 for TSM Administrators - Update

Storage Technical Exchange Tivoli Storage Manager DB2 for TSM Administrators - Update. Dave Canan IBM Advanced Technical Skills ddcanan@us.ibm.com. Dave Canan ddcanan@us.ibm.com Rich Crespo racrespo@us.ibm.com Dave Daun djdaun@us.ibm.com Tom Hepner hep@us.ibm.com.

chul
Download Presentation

Storage Technical Exchange Tivoli Storage Manager DB2 for TSM Administrators - Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage Technical ExchangeTivoli Storage ManagerDB2 for TSM Administrators - Update Dave Canan IBM Advanced Technical Skills ddcanan@us.ibm.com

  2. Dave Canan ddcanan@us.ibm.com Rich Crespo racrespo@us.ibm.com Dave Daun djdaun@us.ibm.com Tom Hepner hep@us.ibm.com TSM Advanced Technical Support Team

  3. Topics • Objectives for this presentation • DB2 Monitoring and Diagnostic Utilities • TSM Server Instrumentation – DB2 related information • TSM V6 Reorgs Revisited

  4. Objectives for this Presentation DB2 for TSM Admins - Update

  5. Objectives for this Presentation • This presentation was created for TSM admins who have not attended any DB2 classes. • It was built on top of the previous TSM Session given on TSM for TSM Admins (given in December 2010) • This presentation assumes you know how to invoke the db2 command line to issue basic DB2 commands. • The goal here is not to make you a DB2 expert; the goal is to teach you about some basics about some common utilities you need to know in order to monitor and diagnose TSM Server issues. • TSM support recommends that you NOT change the TSM DB without their advice. • We still have the recommendation that you keep up to date on your TSM V6 software levels. As that time of this presentation, the recommended level was V6.3.4.200

  6. DB2 Monitoring and Diagnostic Utilities DB2 for TSM Admins - Update

  7. DB2 Monitoring, Diagnostic Utilities, and Logs • runstats • db2diag.log • db2pd • db2dart • db2top • db2support • db2mtrk • For each utility, I will provide some ways you may want to use it and some recommendations.

  8. runstats DB2 for TSM Admins - Update

  9. DB2 runstats - Introduction • DB2 maintains statistics for tables and indices so that it knows how to optimally execute a given SQL statement • Those statistics are obtained via a DB2 process called runstats • It scans the table determining it’s size and cardinality of the various columns • TSM or DB2 automatically runs runstats periodically on tables that have changed a lot • You shouldn’t have to manually run runstats in order to maintain this information, but you can examine when it was last run for a given table/index

  10. DB2 Table Scans • When the proper indices don’t exist to optimally read data from a table, then a table scan is performed by DB2. • Optimally, DB2 should be avoiding this instead be pulling data from buffers in memory • This can happen for several reasons • Runstats has not been run on that table • The runstats data is obsolete • Data can get obsolete quick, especially on an empty table that is filling fast • Note: newly installed servers can quickly get obsolete runstats data. • Maybe there needs to be an index created - a proper index to service that query simply does not exist. (Open a PMR)

  11. DB2 Table Scans • Symptoms of a table scan: • DB2 is consuming a CPU (or two) and TSM barely registers • i.e. DB2 is consuming 25% of CPU on a 4 core box and TSM is maybe consuming 2% • You have individual queries taking many minutes/hours • DB2 statement snapshots show no index activity • What do I do if I think table scans are happening? • A manual runstats on that table can be run (with the direction of TSM support) • You have to know the table name, may need a trace • DBSTMT trace would be used for this analysis • Call support and open a PMR.

  12. DB2 runstats– How to Determine if it is Running • Look for message ANR0136I - this actlogmessage means that runstats has just been run on a table: • ANR0136I Table updating statistics performed successfully for 1 of 1. • Issue the show runstatscommand. It shows when the last TSM initiated runstats was run • Stats only tracked since server start • Issue the following command to db2: d2 "select tabname, stats_time from syscat.tables where tabschema=‘TSMDB1’ and type=‘T’“ • Shows when DB2 believes runstats was last run

  13. DB2 runstats– How to Issue Manually • Runstats may be run manually via the TSM runstatscommand. Do this if support requests that it be run. Examples of command: runstatsBF.Aggregated.Bitfiles runstats all • Do not use all unless directed to by TSM support!! • Table names are case sensitive • See show runstats output for proper case • Runstats may also be run via DB2 commands. Example: db2 runstats on table tsmdb1.bf_aggregated_bitfiles and indexes all allow write access

  14. DB2 runstats– Checking the History of Executions • DB2 auto runstats doesn’t always succeed • Due to failure to obtain locks, etc. • Runs as low priority process • To see the runstats history collect the db2optstats.* files in ~/sqllib/db2dump/events

  15. DB2 runstats– Looking for Errors (db2diag.log) • An example of failure: 2010-12-10-16.59.47.598047-420 E71536A630 LEVEL: Event PID : 13362 TID : 2199174244624PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 DB : TSMDB1 APPHDL : 0-1495 APPID: *LOCAL.db2inst1.101210235955 AUTHID : DB2INST1 EDUID : 70 EDUNAME: db2agent (TSMDB1) 0 FUNCTION: DB2 UDB, SW- optimizer, sqlno_collect_stats, probe:210 COLLECT : TABLE AND INDEX STATS : Object name with schema : AT "2010-12-10-16.59.47.598033" : BY "Synchronous" : DUE TO "Conflict" : failure OBJECT : Object name with schema, 23 bytes TSMDB1 .RESTORE_SRVOBJ IMPACT : None • But also look for subsequent success before running runstats • Given runstats has to scan (nearly) the entire table and it’s indexes, runstats can take a long time • Potentially hours per table if table is large • So, don’t do a runstats all unless directed by TSM support

  16. db2diag.log DB2 for TSM Admins - Update

  17. db2diag.log • Lots of useful information here: • When did DB2/TSM last start on my instance? • What level of DB2 am I using? • How much free memory do you have? Is it enough? • Has my db2 instance gotten any “critical” errors? • Do you need to adjust the DB2 locklist? • Are you experiencing hardware errors? • Am I running out of space on my DB / logs / boot disk? • Am I getting errors when running runstats? (discussed previously)

  18. Db2diag.log – Information Displayed at Start of Instance Timestamp Use eduid to tie entries in the log together—like threadid Db2 build level Phys Mem Watch the free physical memory. If it’s lower than the minimum required, that suggests a problem.

  19. Db2diag.log – How Much Memory Do You Have? • If you want to know how much free memory is available each time the instance starts: • grep ‘Physical Memory (MB)’ db2diag.* Even though there’s 16G of memory, most of it’s in use when the instance starts! See also this slide.

  20. Db2diag.log – Am I Getting any Critical Errors? 2012-06-03-13.36.30.039000-420 E18294014F999 LEVEL: Critical PID : 14496 TID : 10504 PROC : db2syscs.exe INSTANCE: SERVER1 NODE : 000 DB : TSMDB1 APPHDL : 0-43077 APPID: *LOCAL.SERVER1.120603203517 AUTHID : ADMINISTRATOR EDUID : 10504 EDUNAME: db2agent (TSMDB1) FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10 MESSAGE : ADM14001C An unexpected and critical error has occurred: "DBMarkedBad". The instance may have been shutdown as a result. "Automatic" FODC (First Occurrence Data Capture) has been invoked and diagnostic information has been recorded in directory "C:\ProgramData\IBM\DB2\DB2TSM1\SERVER1\FODC_DBMarkedBad_2012-06-03-1 3.36.30.039000_0000\". Please look in this directory for detailed evidence about what happened and contact IBM support if necessary to diagnose the problem.

  21. Db2diag.log – Checking for Out of Space Conditions in Active Log / Archive Logs • Running out of disk space is very bad; it’s even worse if the filesystem that runs out of space is the one that db2diag.log resides on (can be the boot or system drive). • grepfor “DIA8312C” in db2diag.log

  22. db2diag.log – Out of Space Example 1 • Using command: grep ‘DIA8312C’ db2diag.* 2012-10-26-19.01.45.842623-240 E3797A737 LEVEL: Error (OS) PID : 9044114 TID : 4885 PROC : db2sysc 0 INSTANCE: tsm011 NODE : 000 EDUID : 4885 EDUNAME: db2pclnr (TSMDB1) 0 FUNCTION: DB2 UDB, oper system services, sqloLioAIOCollect, probe:100 MESSAGE : ZRC=0x850F000C=-2062614516=SQLO_DISK "Disk full." DIA8312C Disk was full. CALLED : OS, -, aio_return OSERR : ENOSPC (28) "No space left on device" Note the EDUID and find associated messages.

  23. db2diag.log– Out of Space Example 2 (DB Out of Space) 2012-10-26-19.01.45.828204-240 E1507A775 LEVEL: Error PID : 9044114 TID : 4885 PROC : db2sysc 0 INSTANCE: tsm011 NODE : 000 EDUID : 4885 EDUNAME: db2pclnr (TSMDB1) 0 FUNCTION: DB2 UDB, buffer pool services, sqlbClnrAsyncWriteCompletion, probe:0 MESSAGE : ADM6017E The table space "LGTMPTSP" (ID "7") is full. Detected on container "/tsmdbs/tsm011/db7/tsm011/NODE0000/TSMDB1/T0000007/C0000006.TMP" (ID "6"). The underlying file system is full or the maximum allowed space usage for the file system has been reached. It is also possible that there are user limits in place with respect to maximum file size and these limits have been reached. • Using command: grep‘is full’ db2diag.*

  24. db2diag.log – Out of Space Example 3 • (Example of running out of space in archivelogfilesystem) • Using command: grep‘Failed to archive log file’ db2diag.* 2013-03-19-16.08.48.148052+060 E37551034E499 LEVEL: Warning PID : 5576 TID : 140145739163392PROC : db2sysc 0 INSTANCE: tsminst1 NODE : 000 EDUID : 108 EDUNAME: db2logmgr (TSMDB1) 0 FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3150 MESSAGE : ADM1848W Failed to archive log file "S0020872.LOG" to "/tsmbox/archlog/archmeth1/tsminst1/TSMDB1/NODE0000/C0000000/" from "/tsmbox/actlog/NODE0000/". 2013-03-19-16.08.48.157169+060 E37551534E568 LEVEL: Error PID : 5576 TID : 140145739163392PROC : db2sysc 0 INSTANCE: tsminst1 NODE : 000 EDUID : 108 EDUNAME: db2logmgr (TSMDB1) 0 FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile, probe:3160 MESSAGE : ZRC=0x850F000C=-2062614516=SQLO_DISK "Disk full." DIA8312C Disk was full. DATA #1 : <preformatted> Failed to archive log file S0020872.LOG to /tsmbox/archlog/archmeth1/tsminst1/TSMDB1/NODE0000/C0000000/ from /tsmbox/actlog/NODE0000/.

  25. db2pd DB2 for TSM Admins - Update

  26. db2pd – Pinned Log Situations • Long-running read-write transactions can span many log files, pin the active log and cause the server to halt. • Using db2pd, you can figure out who the culprit is, and cancel it before it crashes your TSM server. • This section provides a series of steps, similar to what the previous “show logpinned” command used to provide for determining the transaction pinning the active log, and then will show how to cancel it. (Note: the “show logpinned” command does not exists in V6.)

  27. db2pd – Pinned Log Situations • This command has many functions. Using db2pd for looking for log pining is just one way it can be used. • Use along with script (available from support) correlateTxnLog.pl • Once you determine the thread / session / application ID, you can take appropriate action to cancel it.

  28. db2pd – Using for pinned log situations (cont.) 1. Assume we have a server with many R/W transactions and we gotten the command output from db2pd –transactions –logs –db tsmdb1 2. Figure out what is the first and last log file being used by every R/W transaction. • To do this, use the correlateTxnLog.pl perl script. • Syntax: perl /pathname/correlateTxnLog.pl txnFilelogFile > outfile 3. Figure out who is using all the log space from the from previous step. Write down the appl ID. 4. Determine the process / thread / application ID that is pinning my active log from previous step from a snapshot. 5. Use DB2 to “force” that application to close

  29. db2pd – Pinned Log Situations $ perl correlatetxnlog.pl FirstLog=S0060695.LOG, lastLog=S0060934.LOG AppId 457, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=31706652, logSpace=62803841 AppId 504, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=13292901, logSpace=25300514 AppId 538, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=13929200, logSpace=26244752 AppId 579, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=9155968, logSpace=17089294 AppId 602, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=12664603, logSpace=23639788 AppId 696, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=49847, logSpace=95985 AppId 695, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=84209, logSpace=185466 AppId 6107, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=23384430, logSpace=53871872 AppId 6101, firstLog=S0060853.LOG, lastLog=S0060853.LOG, spaceReserved=5779293, logSpace=13416988 AppId 3540, firstLog=S0060695.LOG, lastLog=S0060851.LOG, spaceReserved=161816138, logSpace=349631511 Issue command: db2pd –transactions –logs –dbtsmdb1 Use script to display log usage: 3. Note that AppId 3540 spans 156 log files. (60695 – 60851) (156 * 512MB is 78GB of active log space!!) Who is this 3540 application id? We need to look at the application snapshot file

  30. db2pd – Pinned Log Situations 4. Determine the application ID that belongs to the process: • db2 connect to tsmdb1 • db2 get snapshot for all applications >application.out Examine application.out file, find the entry that says: Most recent operation = xxxxx Find the line that says "Application handle" Sample: Application handle = NNNNN (where NNNNN is the actual application handle)

  31. db2pd – Pinned Log Situations Find the application handle that matches : This is index reorg – application name is dsmsvc.exe Application handle = 3540 Application status = UOW Executing Application name = dsmsvc.exe Connection request start timestamp = 03/07/2013 02:47:25.972586 Snapshot timestamp = 03/07/2013 08:21:12.931830 Coordinator agent process or thread ID = 11772 Lock timeout (seconds) = -1 Locks held by application = 4 Lock waits since connect = 0 Buffer pool data logical reads = 4 Buffer pool data physical reads = 3 Buffer pool index logical reads = 19153833 Buffer pool index physical reads = 2307802 Buffer pool index writes = 2 Number of SQL requests since last commit = 0 Commit statements = 1 Dynamic SQL statements attempted = 2 Static SQL statements attempted = 1 Most recent operation = Reorganize

  32. db2pd – Pinned Log Situations 5. Issue the command in the DB2 Command Line Processor Window : • db2 "force application (NNNNN)" Wait . Be patient!! Wait 30 minutes, and then do steps 1-4 again to verify recent entry doesn’t still say same Appl handle.Note: If recent operation is Reorganize, no work is lost with index reorg cancellation as redo logging is in place

  33. db2dart DB2 for TSM Admins - Update

  34. db2dart • The db2dart command can be used to verify the architectural correctness ofdatabases and the objects within them. It can also be used to display thecontents of database control files in order to extract data from tables thatmight otherwise be inaccessible. • Command syntax: db2dart tsmdb1 /db >db2dart.out - inspects entiredatabase • TSM server should be halted (use db2stop to make sure the instance is halted) • This is to make sure the DB is in consistent state • Also will improve the performance of the db2dart process • Use db2dart when you suspect that the database and/or indices have been corrupted. • If db2dart indicates that just indices are corrupted/incorrect, TSM and DB2 L2 will provide instructions on how to drop the corrupted indices and rebuild them. • db2dart has many different options to view and repair databases—DB2 L2 will give specific commands to do so based on their analysis

  35. db2dart • Sample of db2dart output: Database inspection phase end. ______________________________________ DB2DART Processing completed with error! WARNING: The inspection phase did not complete! ERROR: Some tables indexes may be corrupt; errors will be cleared for index objects with invalid state after they have been succesfully rebuilt. DB2DART Processing completed with warning(s)! Warning(s) detected during processing. ______________________________________

  36. db2top DB2 for TSM Admins - Update

  37. Db2top – An Introduction • There are several methods to collect information and diagnose DB2 system performance issues. The snapshot monitor is one of the most commonly used tools to collect information in order to narrow down a problem. However, most entries in snapshots are cumulative values and show the condition of the system at a point in time. Manual work is needed to get delta value for each entry from one snapshot to the next. • The db2top tool comes with DB2 (except Windows), and can be used to calculate delta values for those snapshot entries in real time. This tool provides a GUI under a command line mode, so that users can get a better understanding while reading each entry. This tool also integrates multiple types of DB2 snapshots, categorizes them, and presents them in different screens for the GUI environment.

  38. Starting db2top – (2 ways) • Db2top can be run in two modes: • interactive mode. In interactive mode, the user enters command directly at the terminal text user interface and waits for the system to respond. The left and right arrow keys on the keyboard can be used to scroll columns to left or right. • Batch mode In batch mode a series of jobs are executed without user interaction. Batch mode creates a file which is then viewed later for analysis • Db2top not supported on Windows platform

  39. Starting db2top – interactive mode • 1. Log in as instance owner. • 2. Change directory to location of db2top. (On AIX this is /opt/tivoli/tsm/db2/bin.) • 3. ./db2top –d database-name (for TSM, this is typically tsmdb1)

  40. Db2top – Using the Keyboard When in Interactive mode • To go to a specific display, enter letter for that display (there are a total of 42 letter command). Examples: • d (Database) – good for viewing active log space used, disk performance. • T (Tables) Shows the tables in the the DB2 DB. • B (Bottlenecks) – Shows resource usage • m (memory) – shows memory usage, bufferpools. • h (Help) - Display help for all single letter commands. • If you invoke “h” to show all single letter commands, just hit <enter> to return to previous display

  41. db2top – d Option, Showing Log Space Used • 1. Log in as instance owner. • 2. Change directory to location of db2top. (On AIX this is /opt/tivoli/tsm/db2/bin.) • 3. ./db2top –d database-name (for TSM, this is tsmdb1) • 4. d option to see database information

  42. db2top – d option, Disk Subsystem Performance • Average Physical Read time (AvgPRdTime), • Average Direct Read Time (AvgDRdTime), • Average Physical Write time (AvgPWrTime), • Average Direct Write time (AvgDWrTime). Disk performance info

  43. db2top – m –i 10 option , Memory Usage This will show memory pool allocations, and their % of total. Because we have specified the i option of 10, the display will refresh approximately every 10 seconds. Probably would be better to do this in batch mode, and watch over time segments for memory leaking issues. Note also the package cache. This stores the package and information required for the execution of SQL statements. If it is too small, then your hit% suffers, meaning older statements have to be pushed out to make room for new ones, and aren’t reused as much. (Should be close to 100%, and it usually is.)

  44. Starting db2top – Batch Mode • Log in as instance owner. • Change directory to location of db2top. (On AIX this is /opt/tivoli/tsm/db2/bin.) • ./db2top –d database-name -f /tmp/tsmdb1.collect.file –C –m 120 –i 60 • -f is name of the file to create • -C run in snapshot collector mode • -m nnn Run for “nnn” minutes • -inn run in “nn” intervals • Also should reply N to “Writing to” prompt

  45. db2top – Using the Data File Previously Collected • To perform automatic performance analysis, issue: ./db2top -d tsmdb1 -f /tmp/tsmdb1.collect.file -b l -A • To jump to a certain point in time (example, what happened at 2:00 AM?): ./db2top -d tsmdb1 -f /tmp/tsmdb1.collect.file /02:00:00 Much more info available on db2top. See : http://www.ibm.com/developerworks/data/library/techarticle/dm-0812wang/

  46. db2support DB2 for TSM Admins - Update

  47. Why would you need this command? • Many times when a PMR is opened, there is a need for analysis from DB2 point of view, and db2support zip file is often obtained. • Command syntax: db2support –d tsmdb1 –c –s -g. • This produces a db2support.zip file that can be sent to DB2 L2 for analysis. • Contains a collection of lots of files. • Most of the files are person-readable, and often TSM developers and DB2 developers can gain insight and understanding by examining some of the files.

  48. Structure of db2support.zip File Unzipping will give several directories something like this: Definitions: DB2CONFIG – configuration DB2DUMP – db2diag/traps/FODC/stacks DB2SNAP – Snapshots Db2supp_system.zip – Additional diagnosis -- extract to same directory EVENTS – generally runstats output STMM – Self-tuning Memory Module

  49. db2mtrk DB2 for TSM Admins - Update

  50. Why would you need this command? • During the daily monitoring of your logs, you see the following (TSM actlog):

More Related