Washington Area Informix/DB2 User’s Group July 26, 2006

Washington Area Informix/DB2 User’s Group July 26, 2006 DB2 Performance Tuning: A Practical Approach By Jim Cleveland jimcleve@aol.com Bluepoint Consulting Inc.

Introduction • What is our aim? • Performance tuningis: • Analyzing the system’s configuration • Observing performance characteristics • Formulating recommendations intended to enhance performance • (Rinse, shampoo, repeat) • Metrics driven model involves: • Gathering system performance data (at all levels) • Focusing on a particular quantifiable performance measurement • Adjusting performance parameters or modifying architectural characteristics • Gauging the effect of those changes on overall performance and selected measurement

Look familiar?

The Diagnostic Tools – DB2 • DB2 Snapshot Monitors • Internal counters set at global or session levels using monitor switches • Monitors collect cumulative statistics either from the initiation of a CLP session (switches and stats are local to session), since time of last reboot (global switches set and stats available to all users), or last “reset monitors” command at either level • Useful for collecting point-in-time information regarding performance metrics with respect to overall db/dbm behavior. Also able to provide more specific information on: • Locking – at given moment lists all locks & types held by all applications • Bufferpools – cumulative stats for memory, physical & logical I/O’s, synch/asych • Sorts – provides detailed statistics regard sortheap usage, overflows, # active etc. • Tablespaces – detailed I/O stats and other activity for a given tablespace • UOW – displays status of application Unit of Work at point-in-time • Dynamic SQL – shows contents of package cache and related stats at point-in-time

The Diagnostic Tools – DB2 • DB2 Event Monitors • Event Monitors are the DB2 facility used to collect information regarding a specific event or chain • of events within the context of database operations. • They are event driven versus continuously collected as with snapshots • Must be explicitly created and activated via DB2 commands or API’s • Are the best way to effectively diagnose and resolve deadlock issues from the database perspective, since deadlocks require tracing a chain of events through time • Output may be directed to a DB2 table and results then analyzed using SQL. For example, output from statement Event Monitor can be used to: • identify and rank most frequently executed or most time consuming SQL • track the sequence of statements leading up to a lock timeout or deadlock • track connections to the database • breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time

The Diagnostic Tools - Unix • Concurrent Monitoring of OS Resource Usage is Imperative. • When collecting DB2 snapshot/event data it will be useful (necessary • even?) to have, at a minimum, time stamped output from: • iostat[interval in sec. #iterations] provides detailed information, broken down by hdisk for % of time disk busy, read/write transfer rates and totals, xacts per sec. Particularly useful for identifying heavily used (‘hot’) disks which may be the cause I/O bottlenecks (i.e. – hdisks with continuous activity > 40% deserve attention) • vmstat[interval in sec. #iterations] provides information regarding CPU activity (% for system, user, and time waiting for I/O), available/free memory and paging activity. Useful for identifying shortage of CPU resources - overall activity > 70% indicates there may be inadequate ‘headroom’ for handling peak loads  DB2 processes may back up in wait queue. • Other useful OS performance monitoring/config commands: • filemon- drill down to retrieve detailed I/O statistics on individual devices • netstat- identify latency in transactions due to network data transfer time • ps- provides capability to identify DB2 (or other) processes • sar- system activity report, combines metrics from iostat, vmstat, and others • lsps – describes paging space • lscfg/smit–system hardware inventory command and configuration utilitiy

The Diagnostic Tools – Unix/AIX • In the best of all possible worlds...get topas installed (or nmon):

Resort to bribery….

The DB2 Process Model • To understand the context in which performance issues arise we need to understand how the various DB2 components function together as a whole. That is, how the DB2 process model works –

Identifying the Problem – Step I • Collect Preliminary Configuration and Performance Information • Do a cursory configuration and architecture review (beforehand, if possible) • Capturing db and dbm level snapshot stats over a ‘representative’ period of time provides a good jumping off/drill-down point for non-specific performance issues (i.e. – focus is not on a single transaction or process). • Simple shell script run from Unix prompt or scheduled using crontab will suffice to collect data over specified period (sub # of iterations for i1, interval len for i2): • For example: • #!/bin/ksh • date; x=o; • db2 update monitor switches using lock sort on bufferpool on uow on table on • statement on; • db2 reset monitor all • do while x < i1; # determines number of iterations, total monitoring period • date • db2 get snapshot for dbm; • db2 get snapshot for db on xxxxxx; • sleep i2 # interval in seconds for loop, longer interval for longer monitoring period • x=x+1 • enddo • As mentioned cross time indexed OS level stats should be collected concurrently – same script using iostat/vmstat maybe.

Identifying the Problem – Step II • The question at this stage is: on what tasks and in what proportion is • DB2 spending its time? • To identify common performance issues, breakdown as follows: • Locking - time waiting on resolution of lock & deadlock issues • Sorts - time spent performing sorts, hash joins, (other?) • Logging Activity - time for log writes, reads and overhead • System CPU - time executing system calls • User CPU - time spent on SQL prep, ovhd for locking and sorting activity • General I/O - no single cause, likely a confluence of factors for e.g. ts design, container mapping, bp configuration, db/dbm params etc. (Specific calculations for above available in Admin Guide: Performance)

The Tuning Process – a Roadmap

Sort Issues • SQL containing GROUP BY, ORDER BY, or DISTINCT will result in a sort • if no index is available to order the result set. Also in prep for merge • join. To identify sort problems, look for the following: • Sort overflows are nominally < 3% (Total Overflows/Total Sorts * 100) • More than minimal post-threshold sorts or rejected piped sorts • Sort heap high water mark > SHEAPTHRESH (post-threshold sorts) • The # of active sorts at a given time is greater than x2 ( or 3σ) baseline • Any large or significant small hash join overflows, any hash loops • Rows selected >> rows will generally causes ^ sorts  need for indexes • > 5 sorts per statement (Total sorts/Total Statements)  Problem SQL

Adjusting Sort Parameters • Increasing Sort Heap/Threshold (remember this is treating the symptom!): • may allow for larger in memory sorts preventing disk spills • fewer sort passes/merge joins and may also cut down overall exec time • same benefits for hash joins, dynamic bit index AND’ing, & star joins • Caveats: large sort memory may cause excessive paging, in memory sort rather • than index scan, and sort space may come at the expense of bufferpool space • Rule Of Thumb: • Calculate average sort space = (sort_heap allocated/avg. # active sorts)  use as baseline sortheap then adjust to minimize overflows, pipe rejects, hash overflows • Sheapthresh = max(avg # appls connected, avg. active sorts) * max(sortheap among db’s in instance) adjust to minimize post-thresh sorts, hash overflows etc.

Logging Issues • Transaction logging and overhead can represent a large part of I/O time • in OLTP/mixed environments. To identify logging problems look for: • Hot (> 40%) hdisks where logs located (use db2 snaps for raw devices) • Snapshots showing frequent use of secondary logs, large volume of writes/reads to logs, log high watermark approached, ^ opens/closes • Logging in Non-OLTP environment? Check db2diag for ROLLBACK’s • Potential adjustments/remedies: • Increase size & # of primary logs, SOFTMAX, MINCOMMIT, LOGBUFSIZE • Relocate logs on faster devices? • Design options: compound SQL, group > 5 statements into Stored Proc.

Prefetch Wait Time – Refining the Cause • To determine why prefetch is not happening effectively examine (for both tablespaces and individual tables): • Asynch Read Pct: Asynch Reads/Buffer Pool Physical Reads • Asynch Read Ratio: Asynch Pages Read/Logical Reads • Asynch Pages Per Req (APPR): Asynch Pages Read/Asynch Requests • Asynch Read Time: Asynch Read time in ms/Overall Read Time ms • Number of unread prefetch pages • Is wait caused by an underlying I/O bandwidth limitation? • Bufferpool configuration • Purpose of calculations is to determine: • Is prefetch taking place on correct tables/tablespaces? • In large enough chunks? • In proper proportion to overall logical and synchronous I/O? • Without being overwritten by other pfetchers?

Excessive Prefetch Time – Possible Remedies • Check db/dbm configuration and architecture: • DB2_PARALLEL_IO=* regvar, SEQDETECT=Y, # of IO_SERVERS • Physical parameters - page/extent/prefetch sized/RAID strip size Pfetch size = n * [# devices in array] * extent size where n Є 1,2,3… • Incremental improvement may be achieved by: • Implementing Big-Block I/O • Adjust the number and size of prefetch queues • bp’s only need to be sized to provide for effective pfetch • Need separate bp’s for tables w/high ARR’s or contention will occur • Table/jfs reorganization - Pre-fetch advantage is offset by fragmented DB2/OS data check contiguity at OS level and w/DB2 REORGCHK • I/O bandwidth limiting factor? Adequate parallelism with existing devices?  Solution: map to more containers or more/faster devices.

Detection & Classifying of Locking Issues • To detect and diagnose locking problems, look for: • Snapshots show significant Time Spent Waiting on Locks (compared to overall execution time) • Snapshots show lock escalations • LocklistHigh Water Mark > 50% of total space • db2diag.log shows repeated SQL0911N (rc 1 or 2) meaning Lock timeouts or deadlocks detected and resolved • Look for ROLLBACK’s in diaglog as well  deadlocks

Locking Problem Mitigation • Locking is fundamentally an application logic issue. That said, recourse • on db side includes: • Insure LOCKTIMEOUT changed from default -1 to e.g. 10000 • Set LOCKTABLE for heavily updated tables • Provide sufficient LOCKLIST space to prevent lock escalation • Tune db parameters MAXLOCKS, LOCKTIMEOUT & DLCHKTIME • Set registry variables LOCK_TO_RB , KEEPTABLELOCK, MAX_NON_TABLE_LOCKS, DB2_EVALUNCOMMITTED, DB2_SKIPDELETED and DB2_SKIPINSERTED • Type II indexes ^ concurrency if migrated from v7, Use REORG to convert • Application development principles: • Frequent of COMMIT’s (use event monitor or db2diag.log to track) • Specifying FOR FETCH cursors whenever possible • Lowest isolation level for entire application or single statement

System or User CPU • User CPU is often a result of waiting on locks or sorts. If that’s still the • case after after previous steps, application design consult is required. • Barring that, the need is to identify processes or particular SQL • statement consuming CPU cycles: • Use ps -elf to identify processes at OS level • List applications show detail for connections within DB2 • Event monitors to isolate apps/SQL using extensive CPU resources • Stack trace, explained in the DB2 Problem Determination Guide’s, is • the ultimate means of identifying process (and time) flow within DB2.

Paging Issues – • AIX VMM allows system to assign more memory to a process than • physically exists. Properly tuned DB2 system on dedicated server should • incur NO paging whatsoever. Page faults/sec > 50 or > 10-20 pi/po  problem. • Paging means memory is overcommitted – common sources in 32-bit envirnoments: • Over allocation of bufferpools or sortheap • Database memory allocated based on avg. # applications • Private memory allocated by large number of connections Remedies: • Identify and release overcommitted memory (use db2mtrk/visualizer) • AIX minperm and maxperm parameters • Paging space on disk x2 physical memory • Pin bufferpools in memory using DB2_PINNED_BP • Release agent private memory using DB2 regvars MEMDISCLAIM/MAXFREE • Remember that Intra-parallelism and FCM take up 2 segments (i.e. – less for BP’s)

Whatever is left is I/O, right? • Sorta-kinda-maybe. I/O efficiency can be addressed within context of • existing configuration But, later we’ll need consider performance drivers • inherent in design. • Checklist of common opportunities for improving I/O performance: • ◊ Parallel I/O – Registry variables, num_ioservers, adequate # containers (hdisks) • ◊ Turn off Memory Mapped I/O (MMAP_R/W) but beware of i-node latching • ◊ Consider DMSRAW DEVICEcontainers. DMS file are worst of both worlds • ◊ Examine behavior of HASH JOIN’s – may degrade performance wo/resources • ◊ RUNSTATS and REORG’s – automate! Don’t leave to discretion of overtaxed DBA! • ◊ OLTP specifics – consider optimization 2/3, AVG_APPLS = 1, watch logging closely, • CHNGPGS_THRESH 20 – 30%, smaller pages  more efficient I/O, MINFREE and • PCTFREE to avoid block splitting/overflows • ◊ DSS specifics – consider INTRAPARALLEL, optimization > 5? AA=connections, • watch sorts closesly, 32 K pages? (but remember only 255 rows per • page), DLCHKTIME ^, large result sets - rqioblock to 64 K

Laundry List of I/O issues (continued) • ESS: stripe size is 64 K, stripe ts containers across LUN’s, keep in mind • LUN’s can be on same rank – (same array), do not duplicate RAID at OS • FAStT: stripe size configurable from 8 – 256 K, multiple RAID configs • Less prevalent, but no less debilitating architectural/configuration issues: • Page size correct for row size and access method? • Unless there is a distinct advantage, (synch/asych I/O) fewer bufferpools • Benchmark index/data separation – not always advantageous • Pay attention to victim page steals – adjust # of cleaner/threshold • CPU Speed -1, let optimizer decide • File open/close overhead  MAXFILEOPEN/ulimits settings

AIX Rec’s - The good, the bad & the ugly • General settings, rules, suggestions, incantations: • Use AIX 5.2 with latest maintenance release if possible • Use 64 bit DB2 with 64 bit kernel, 32 bit / 32 bit, recompile & rebind SP’s etc. when migration to 64 bit takes place, point to 64 bit libraries • Steady state load of hardware config (CPU, Memory, I/O) should use no more than 70% of available resources to allow for peak loads/unseen overhead • Number of licensed users and maxuproc set to max DB2 connections • Maxperm/minperm settings along with DB2 reg vars for memory disclaim • Maxmemory/minmemory to regulate AIX • Ulimits – maxprocs increased, unlimited for other params • Make sure asynchronous I/O (aio) is enabled w/adequate # of agents • Paging space at least x2 physical memory • If pinning memory using regvar, vmtune tuning parameter ? • Disabling MMAP reads/writes in DB2 means jfs cache bybassed…can provide significant I/O improvement, but i-node latches  3 containers • Make sure all processors in an SMP environment are enabled! Ask about WLM and DLPAR settings. • Check in smit for LP/PP ratio – is there disk mirroring on top of RAID-5? • db2oscnfg – makes rec’s for OS kernel settings in HP-UX and Solaris

SQL Analysis & Tips • The Golden Rule for Join Performance: • Columns used to join tables or that are used in a group by, order by or distinct clause should be indexed to improve performance • Other useful SQL rules: • Optimizer is all about ‘SARGABLE’ predicates (stage 1, indexable, • For join with N tables, there must be N-1 relationships defined to avoid a Cartesian product -- Corollary: establishing higher selectivity early in plan. • All SQL optimization efforts marginalized with outdated statistics or fragged data • Use optimize/fetch only rows clause if possible • Test alternate optimization classes for problem SQL (i.e. – 5, 7 & 9) • Use singleton select instead of cursor – use routines to identify node in EEE, make cursors READ ONLY where possible  non-deletable  no exclusive locks, BLOCKING ALL option also • Use UR or lowest possible isolation level when possible to avoid waiting on locks • Consider Global Temporary tables – scanning workfile faster sometimes than subquery • Run logic as function instead of procedure

Washington Area Informix/DB2 User’s Group July 26, 2006