Query Performance and Optimizer Specific Type of Issues

Query Performance and Optimizer Specific Type of Issues Speaker name: Anthony E. Reina Email: aereina@ca.ibm.com

IBM Software Accelerated Value Program • The IBM Software Accelerated Value Program delivers a proactive, cost-reducing, and productivity enhancing advisory service. The program pairs you with an assigned team who build a foundational understanding of your overall environment. Through that understanding, the trusted partnet works to facilitate faster deployment, lifecycle leadership, risk mitigation, and more by identifying ways to improve your environment, staff skill set, and processes. http://www-01.ibm.com/software/support/acceleratedvalue/ 2

High-Level Overview Optimizer purpose: • Chooses access plan for data manipulation language (DML) SQL statements. Means of fulfilling the purpose: • Calculates the execution cost of many alternative access plans, and then selects the one with the minimal estimated cost. • Bases calculations on catalog statistics. • Measures cost in timerons: • An abstract unit of measure. • Not directly reciprocal to actual elapsed time. • Rough relative estimate of the resources (cost) required by the database manager to execute an access plan. • Access plan cost is cumulative: each operator cost is added to the next one. 3

High-Level Overview Factors influencing the optimizer: • Database design • Indexes • Tablespace configuration • Table and index statistics • Basis of plan costs • Generated and collected by the RUNSTATS command • Operating environment • Influences plan costs • Various configuration parameters • Optimization class (OPTLEVEL) • Controls the level of modeling considered during compilation • Influences SQL compilation time 4

DB2 OPTIMIZER CODEPATH 5

DB2 OPTIMIZER CODEPATH • Parse Query:analyzes the query to validate the syntax. • Check Semantics:ensures that there are no inconsistencies among parts of the statement. • Rewrite Query:uses the global semantics that are stored in the Query Graph Model (QGM) to transform the query into a form that can be optimized more easily. The result is stored in the QGM, as well. • Pushdown Analysis(Federated DB only): recommends to the optimizer whether an operation can be remotely evaluated or pushed down at a data source. 6

DB2 OPTIMIZER CODEPATH • Optimize Access Plan:using the QGM as input, it generates many alternative access plans, at the end it select the most cost effective plan for satisfying the query. • Remote SQL Generation(Federated DB only): creates an efficient SQL statement for operations that are performed by each remote data source based on the SQL dialect at that data source. • Generate Executable Code:uses the access plan and the QGM to create an executable access plan or section for the query. 7

DB2EXFMT TOOL Explain purpose: • Captures information about the access plan, optimizer inputs, and environment of an SQL query. • Helps to understand how the SQL query is compiled and executed. • Shows how configuration parameter changes impact query performance. • Indispensible tool for query problem determination. 8

DB2EXFMT TOOL Explain information includes: • Sequence of operations to process the query • Cost information • Predicates and selectivity estimates for each predicate • Statistics for all objects referenced in the SQL query at the time the explain information is captured 9

DB2EXFMT TOOL Explain tables: • Must be created before the Explain can be invoked: • SYSINTSALLOBJECTS() procedure call sysproc.sysinstallobjects (‘EXPLAIN’,’C’,NULL,<schema name>) • EXPLAIN.DDL db2 –tvf …/sqllib/misc/EXPLAIN.DDL • Must be created per schema/user or under SYSTOOLS schema • Capture access plan data when the Explain facility is activated 10

DB2EXFMT TOOL Explain tables: 11

DB2EXFMT TOOL Explain tools: db2expln • Provides access plan information for one or more SQL query packages • Shows the actual implementation of the chosen access plan • Does not show detailed optimizer information db2exfmt • Formats the contents of the explain tables • Provides more detailed explain information than db2expln • Shows clues as to why the optimizer has made particular decisions 12

DB2EXFMT TOOL db2exfmt output sections: • Explain instance • Database context • Package context • Original statement • Optimized statement • Access plan • Extended diagnostic information • Plan details • Objects used in the access plan 13

DB2EXFMT TOOL db2exfmt output sections: • Explain instance:general information (e.g., version, time, schema, etc). • Database context: configuration parameters used during the query compilation. • Package context: query type and optimization level. • Original statement: actual query text. • Optimized statement: rewritten query text according to the applied optimization techniques. 14

DB2EXFMT TOOL Explain output sections (continued): • Access plan: graphical representation of the query execution; displays the order of operations along with rows returned, cost and I/O for each operator. • Extended diagnostic information: warnings, informational or error messages returned by the optimizer. • Plan details: detailed information for each operator including arguments, predicates, estimates (filter factor), and input/output streams. • Objects used in the access plan: any objects (table/index) used by the query, and/or the access plan is displayed on this are which includes statistical information. 15

DB2EXFMT TOOL 16

DB2EXFMT TOOL SET CURRENT EXPLAIN MODE statement • Changes the value of the CURRENT EXPLAIN MODE special register • Compiler returns SQL0217W for a successful compilation of SQL run under the EXPLAIN mode • Syntax: 17

DB2 Cfg’s and Env. Registry Settings • Tuning is usually done by making changes to one of the configuration categories that influence the optimizer behaviour: • Database manager configuration • Database configuration • Query compiler registry variables • CURRENT QUERY OPTIMIZATION special register 18

DB2 Cfg’s and Env. Registry Settings Database manager configuration: • Parallelism (INTRA_PARALLEL): indicates whether intra-partition parallelism is enabled. When YES, some parts of query execution (e.g., index sort) can run in parallel within a single database partition. • CPU Speed (CPUSPEED): the optimizer uses the CPU speed (milliseconds per instruction) to estimate the cost of performing certain operations. • Communications speed (COMM_BANDWIDTH): the optimizer uses the value (megabytes per second) in this parameter to estimate the cost of performing certain operations between the database partition servers in a partitioned database environment. 19

DB2 Cfg’s and Env. Registry Settings Database configuration: • Buffer pools (BUFFPAGE): • Determined by the buffpage parameter, if using buffpage as default for one bufferpool, or a calculation based on the contents of syscat. bufferpools. • The number shown is the total number of bufferpool pages allocated for the database. The table below shows an example of how we arrive at the total of 5000 pages: 20

DB2 Cfg’s and Env. Registry Settings Database configuration (continued): • Sort heap size (SORTHEAP):Defines the maximum number of private memory pages to be used for private sorts or the maximum number of shared memory pages to be used for shared sorts. • Database heap size (DBHEAP): Contains metadata for tables, indexes, table spaces and bufferpools. There is one database heap per database. • Lock list size (LOCKLIST): Indicates the amount of storage allocated to the lock list. There is one lock list per database and it contains the locks held by all applications concurrently connected to the database. • Maximum lock list (MAXLOCKS): Defines a percentage of the lock list held by any one application that must be filled before the database manager performs lock escalation. 21

DB2 Cfg’s and Env. Registry Settings Database configuration (continued): • Average applications (AVG_APPLS):Used by the SQL optimizer to help estimate how much bufferpool will be available at run-time for the access plan chosen (since the bufferpool is shared by all active connections). • Optimization class (DFT_QUERYOPT): Tells the optimizer which level of optimization to use when compiling SQL query (default=3). The higher the level, the more complex algorithms are considered. • Query degree (DFT_DEGREE): Represents the degree of intra-partition parallelism for an SQL statement. If set to ANY, the optimizer is sensitive to the actual number of CPUs that are online. If set to ANY and intra_parallel is enabled, then the number of CPUs on test and production should be configured the same. 22

DB2 Cfg’s and Env. Registry Settings Database configuration (continued): • Number of frequent values retained (NUM_FREQVALUES): Allows you to specify the number of "most frequent values" that will be collected when the WITH DISTRIBUTION option is specified with the RUNSTATS command. • Number of quantiles retained (NUM_QUANTILES): Controls the number of quantiles that will be collected when the WITH DISTRIBUTION option is specified on the RUNSTATS command. 23

DB2 Cfg’s and Env. Registry Settings Query compiler variables (common): • DB2_EXTENDED_OPTIMIZATION: Specifies whether or not the query optimizer uses optimization extensions to improve query performance. • DB2_SORT_AFTER_TQ: Determines how the optimizer works with directed table queues in a partitioned database environment when the receiving end requires the data to be sorted, and the number of receiving nodes is equal to the number of sending nodes. • DB2_INLIST_TO_NLJN: Causes the optimizer to favour nested loop joins to join the list of values, using the table that contributes the IN list as the inner table in the join. * See the DB2 V9.7 Info Center for additional variables (search for “Query compiler variables”) 24

DB2 Cfg’s and Env. Registry Settings CURRENT QUERY OPTIMIZATION special register: • Controls the query optimization class for dynamic SQL statements. • Overrides the dft_queryoptdatabase configuration parameter. • Affects the succeeding SQL statements only for the current session. • Assigned by the SET CURRENT QUERY OPTIMIZATION statement. 25

CATALOG STATISTICS • The optimizer is heavily influenced by statistical information about the size of tables, indexes and statistical views. • The optimizer also uses statistics about the distribution of data in columns of tables, indexes and statistical views, if these columns are used to select rows or to join tables. • The optimizer uses distribution statistics in its calculations to estimate the costs of alternative access plans for each query. • Statistical information is stored in system catalog tables and is accessible via the SYSCAT schema views. 26

CATALOG STATISTICS • Statistical information used by the optimizer: • Cluster ratio of indexes (higher is better) • Number of leaf pages in indexes • Number of table rows that overflow their original pages • Number of filled and empty pages in a table (may indicate if table or index reorganization is needed) • Catalog statistics are collected and updated by the RUNSTATS utility. 27

CATALOG STATISTICS DB2LOOK utility: • Mimic option (-m) extracts catalog statistics for all or given tables. • Helps recreate optimizer behaviour in another database. • DB2LOOK for a specific table: db2look –d <dbname> -a –e –m –z <schema> -t <table1 table2 …> -o <outfile> • DB2LOOK for all tables in the database: db2look –d <dbname> -f –m –l –a –e –o <outfile> • Alternative method: query the catalog tables directly using SYSCAT schema views. 28

CATALOG STATISTICS Catalog views • Value of -1 in the columns indicates no statistics collected. • SYSCAT views are read-only; SYSSTAT views are updateable. 29

RUNSTATS RUNSTATS utility: • Updates statistics about the characteristics of a table and/or associated indexes, or statistical views. • For a table, should be run after frequent updates / inserts / deletes are made to the table or after reorganizing the table. • For a statistical view, RUNSTATS should be run when changes to underlying tables have substantially affected the rows returned by the view. 30

RUNSTATS RUNSTATS best practices: • Use RUNSTATS on all tables and indexes frequently accessed by queries. • Use RUNSTATS with FREQUENT and QUANTILE value statistics for complex queries (only collected when using WITH DISTRIBUTION clause). • Rerun RUNSTATS after a significant change in the table data. 31

RUNSTATS RUNSTATS best practices (continued): Use RUNSTATS in the following situations: • After loading data and indexes have been created. • After creating new index on a table. • After running a REORG utility. • After significant insert, update or delete activity. • Before binding application programs. • After executing REDISTRIBUTE DB PARTITION GROUP command. 32

RUNSTATS RUNSTATS examples (basic statistics): • Collect statistics for table only: RUNSTATS ON TABLE <schema>.<tablename> • Collect statistics for indexes only on a given table: RUNSTATS ON TABLE <schema>.<tablename> FOR INDEXES ALL • Collect statistics for both table and its indexes: RUNSTATS ON TABLE <schema>.<tablename> AND INDEXES ALL 33

RUNSTATS RUNSTATS examples (enhanced statistics): • Collect statistics for table including distribution statistics: RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION • Collect statistics for indexes only on a given table including extended index statistics: RUNSTATS ON TABLE <schema>.<tablename> FOR DETAILED INDEXES ALL • Collect statistics for table and its indexes including extended index and distribution statistics: RUNSTATS ON TABLE <schema>.<tablename> WITH DISTRIBUTION AND DETAILED INDEXES ALL 34

Cardinality Estimates Cardinality (CARD) Total number of rows in a table. The number of unique values in a column. Filter factor (FF) The percentage of rows estimated by the optimizer to be allowed to filter through a given predicate. Expressed as probability: 0 – 1.0 (0 to 100%). Cardinality estimate The number of rows estimated by the optimizer to be returned for a given predicate based on its filter factor. Calculated as Filter Factor x Cardinality. 35

Cardinality Estimates Filtering can occur with a local or join predicate Movie_title = ‘RANGO’ (local predicate) a.c1 = b.c2 (join predicate) COLCARD Number of distinct values in the column SYSCAT.COLUMNS VALCOUNT Frequency with which the data value occurs in column SYSCAT.COLDIST 36

Cardinality Estimates 37 Calculating FF for equality predicates (c1=value) No statistics available/collected • FF = default value which is typically ¼ • FF = 0.04 / 0.25 Only basic statistics collected • FF = 1/COLCARD • COLCARD is for the column of the predicate being calculated With basic and distribution statistics collected • FF = VALCOUNT / CARD

Cardinality Estimates 38 Example 1 - No statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: • SYSSTAT.TABLE  CARD = -1 • SYSSTAT.COLUMNS  COLCARD = -1 • SYSSTAT.COLDIST  no data Filter Factor: 0.04

Cardinality Estimates Example 1 - No statistics collected: 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.04 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 39

Cardinality Estimates 40 Example 2 - Only basic statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: • RUNSTATS ON TABLE example.movie • SYSSTAT.TABLE  CARD = 146 • SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) • SYSSTAT.COLDIST  no data Filter Factor: 1/COLCARD = 1/6 = 0.166667

Cardinality Estimates Example 2 - Only basic statistics collected: 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.166667 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 41

Cardinality Estimates 42 Example 3 - Basic and distribution statistics collected: Query: SELECT movie_id FROM movie WHERE movie_genre=‘A’ Statistics: • RUNSTATS ON TABLE example.movie WITH DISTRIBUTION • SYSSTAT.TABLE  CARD = 146 • SYSSTAT.COLUMNS  COLCARD = 6 (for movie_genre col) • SYSSTAT.COLDIST  VALCOUNT = 72 (movie_genre=‘A’) Filter Factor: VALCOUNT / CARD = 72/146 = 0.493151

Cardinality Estimates Example 3 - Basic and distribution statistics collected: 2) TBSCAN: (Table Scan) Predicates: ---------- 2) Sargable Predicate, Comparison Operator: Equal (=) Subquery Input Required: No Filter Factor: 0.493151 Predicate Text: -------------- (Q1.MOVIE_GENRE = 'A') 43

Statistical Views • Views with statistics that can be used to improve cardinality estimates for queries in which the view definition overlaps with the query definition. • A powerful feature as it provides the optimizer with accurate statistics for determining cardinality estimates for queries with complex sets of (possibly correlated) predicates involving one or more tables. 44

Statistical Views • Mechanism to provide the optimizer with more sophisticated statistics that represent more complex relationships: • Comparisons involving expressions: price > MSRP + Dealer_markup • Relationships spanning multiple tables: product.name = 'Alloy wheels' and product.key = sales.product_key • Anything other than predicates involving independent attributes and simple comparison operations. 45

Statistical Views • Statistical views are able to represent these types of complex relationships, because statistics are collected on the result set returned by the view, rather than the base tables reference by the view. • When a SQL query is compiled, the optimizer matches the SQL query to the available statistical views. When the optimizer computes cardinality estimates for intermediate result sets, it uses the statistics from the view to compute a better estimate. 46

Statistical Views Usage: • Create a view: CREATE VIEW <view name> AS (<select clause>) • Enable the view for optimization using ALTER VIEW statement: ALTER VIEW <view name> ENABLE QUERY OPTIMIZATION • Issue the RUNSTATS command to populate system catalog tables with statistics for the view: RUNSTATS ON <schema>.<view name> <runstats clause> 47

Optimizer Specific Issues SQL0437W rc=1 Cause : This typically happens when there is not enough statement heap (STMTHEAP) available to complete full dynamic join enumeration plan for the query. Resolving the problem : This message does not necessarily mean the query will run nor there is a problem, but at the sametime it does indicate that the query is potentially using a sub-optimal access plan. In detail, the following are way on how to resolve it. • If the execution of the query at the sametime is bad (when this warning message is returned) then increasing the STMTHEAP would be the ideal solution, take note this may increase compile time. • If the execution time is not the issue but the warning message imposes a problem at application end, application change maybe required to avoid/ignore the warning message. • Binding the package with the ignore warning option can also be another solution. For example: db2 bind <package name> blocking all grant public SQLWARN NO • If this is a cli/odbc application, using the IgnoreWarnList keyword in the db2cli.ini file can be used. For example : IgnoreWarnList = “sqlstate1”, “sqlstate2”,….” The sqlstate corresponding to SQL0437 is 01602. Hence: IgnoreWarnList=“01602” 48

Optimizer Specific Issues Query resulting to trap/abnormal interruption Cause : For query executions resulting to trap the cause can range from a defect within the DB2 code to some limitation within DB2 or OS. Resolving the problem : To begin resolving the issue, collecting the “db2support with optimizer data”, trap file, and identifying the condition for the trap to occur (if possible) would be the first action to take. Succeeding actions can be take to determine the root cause and resolution of the issue… • If possible recreate the issue either on the same environment or on a test, using a mimic database or using the db2look data. • Review the db2diag.log entries and trap file generated relating to issue. This will provide information at which OPTIMIZER codepath the problem occurred. • Review the current fixpak listing from the IBM support fixpak site. Including IBM support knowledge base websites. To check for known issues. • If this is a known issue, either apply the workaround and/or the fixpak. 49

Optimizer Specific Issues Query Performance/Incorrect results Cause : Query performance issue can occur from various reasons, some most common are as follows… • Defect from within the DB2 code. • Changes have occurred within DB2 configurations, DB2 environment variables, query, statistics collection, or data load. After a major release/change/maintenance window. • Data load/operation sequence have changed significantly. • Most times its cause by unexpected change within the environment. • Unknown issue which would require assistance from IBM DB2 support site. • Problem could either be during query compilation or runtime. Resolving the problem : • Determine the conditions that exist when the problem occurs. • Review the current fixpak listing from the IBM support fixpak site. Including IBM support knowledge base websites. To check for known issues. • If this is a known issue, either apply the workaround and/or the fixpak. • Compare previous OPTIMIZER related DB2 setting including access plans, to determine if there is a unexpected/accidental change within the environment. • Review the access plan and check if the statistics are up to date, including if the right options are used for the RUNSTATS command. • Use db2batch command to determine if performance issue is within query compilation or runtime. • For query performance issue caused by query compilation, try lowering OPTLEVEL as a workaround. 50

Query Performance and Optimizer Specific Type of Issues