Advanced Performance Diagnostics for SQL in DB2 9.7

DownloadAdvanced Performance Diagnostics for SQL in DB2 9.7

Advertisement
Download Presentation
Comments
bernad
From:
|  
(4896) |   (0) |   (0)
Views: 260 | Added: 16-05-2012
Rate Presentation: 0 0
Description:
Objectives. Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problemLearn how to pinpoint your most expensive statements using the package cache table functionsLearn about the new runtime explain capabilities and how you c
Advanced Performance Diagnostics for SQL in DB2 9.7

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




1. Advanced Performance Diagnostics for SQL in DB2 9.7 David Kalmuk IBM Platform: DB2 for Linux, Unix, Windows DB2 9.7 presents you with important new features for diagnosing and solving your SQL problems. Through a series of practical examples, this session will introduce you to the latest DB2 capabilities available for SQL diagnostics and show you how you can leverage them to solve query problems in your own environment. You will learn how to use the new ?time spent? capabilities to identify that you have an SQL problem on your system, and how to find your most expensive statements using the new package cache table functions. You will also learn how to use the new runtime explain capabilities to perform an explain using the actual compiled sections in the package cache or catalogs. Finally, you will be introduced to the new section actuals feature in DB2 9.7 FP1 that allows you to examine the actual plan cardinalities for your statement executions.DB2 9.7 presents you with important new features for diagnosing and solving your SQL problems. Through a series of practical examples, this session will introduce you to the latest DB2 capabilities available for SQL diagnostics and show you how you can leverage them to solve query problems in your own environment. You will learn how to use the new ?time spent? capabilities to identify that you have an SQL problem on your system, and how to find your most expensive statements using the new package cache table functions. You will also learn how to use the new runtime explain capabilities to perform an explain using the actual compiled sections in the package cache or catalogs. Finally, you will be introduced to the new section actuals feature in DB2 9.7 FP1 that allows you to examine the actual plan cardinalities for your statement executions.

2. Objectives Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problem Learn how to pinpoint your most expensive statements using the package cache table functions Learn about the new runtime explain capabilities and how you can perform an explain from the actual compiled section in the catalogs or the package cache to understand the query plan. Learn how to leverage the new section actuals explain capabilities to examine the actual cardinalities from the execution of your problem queries Take away practical examples you can try out in your own environment. Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problem Learn how to pinpoint your most expensive statements using the package cache table functions Learn about the new runtime explain capabilities and how you can perform an explain from the actual compiled section in the catalogs or the package cache to understand the query plan. Learn how to leverage the new section actuals explain capabilities to examine the actual cardinalities from the execution of your problem queries Take away practical examples you can try out in your own environment. Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problem Learn how to pinpoint your most expensive statements using the package cache table functions Learn about the new runtime explain capabilities and how you can perform an explain from the actual compiled section in the catalogs or the package cache to understand the query plan. Learn how to leverage the new section actuals explain capabilities to examine the actual cardinalities from the execution of your problem queries Take away practical examples you can try out in your own environment.

3. Agenda A quick review of the new Monitoring Capabilities introduced in DB2 9.7 Time Spent Metrics Identifying query related performance problems Identifying high impact queries Analyzing queries using Time Spent Advanced diagnostics using Runtime Explain and Section Actuals Final thoughts

4. A Quick Review of the New Performance Monitoring Capabilities Introduced in DB2 9.7

5. New Lightweight Monitoring Functions New set of MON_* SQL functions introduced starting in DB2 9.7 Less impact / more efficient then snapshot functions Direct in-memory access through trusted routines (not fenced wrappers over snapshot) Much less latch contention Uses new infrastructure that pushes data up to accumulation points rather than forcing monitor queries to do extensive drilldown Lower CPU consumption Significantly faster response time Less FCM resource usage / internode traffic Monitoring data collection carries low overhead ? is enabled by default on new databases In 9.7 we introduced a new monitoring infrastructure that both has less overhead in terms of data collection, as well as provides interfaces that carry significantly less overhead and impact than the SNAP_ class of table functions (which are effectively fenced wrappers over top of snapshot API calls). The key points to take away on the new infrastructure are that: Data is pushed up to accumulation levels incrementally (every 10 seconds) during execution rather than collected and maintained at the individual thread / agent level Monitor queries need only access the metrics directly at the accumulation level rather than performing extensive drilldown across all agents and applications belonging to the object we are reporting on (as is required by snapshot requests) Unlike snapshot, the table functions allow filtering of data at source, so for focused monitoring on DPF or PureScale we only need to flow back the requested data to the coordinator (the snapshot UDFs required all data to be sent back to the coord and filtering is performed there)In 9.7 we introduced a new monitoring infrastructure that both has less overhead in terms of data collection, as well as provides interfaces that carry significantly less overhead and impact than the SNAP_ class of table functions (which are effectively fenced wrappers over top of snapshot API calls). The key points to take away on the new infrastructure are that: Data is pushed up to accumulation levels incrementally (every 10 seconds) during execution rather than collected and maintained at the individual thread / agent level Monitor queries need only access the metrics directly at the accumulation level rather than performing extensive drilldown across all agents and applications belonging to the object we are reporting on (as is required by snapshot requests) Unlike snapshot, the table functions allow filtering of data at source, so for focused monitoring on DPF or PureScale we only need to flow back the requested data to the coordinator (the snapshot UDFs required all data to be sent back to the coord and filtering is performed there)

6. New Monitoring Perspectives and Dimensions Starting in 9.7, DB2 allows monitoring metrics to be accessed through a number of different dimensions Allows more effective drilldown, and different perspectives on the data to help isolate problems Three main dimensions, each consisting of a number of reporting points with corresponding SQL functions System Provide total perspective of application work being done by database system Aggregated through the WLM infrastructure Data objects Provide perspective of impact of all activity occurring with the scope of data objects Aggregated through data storage infrastructure Activity Provide perspective of work being done by specific SQL statements Aggregated through the package cache infrastructure These three perspectives allow you to answer questions along the lines of: What work is being done by applications or groups of applications, and what is their impact (b) What activity is occurring on individual data objects like tables, bufferpools, tablespaces (c) What is the impact of the execution of individual SQL statements, or tracked items like Load These three perspectives allow you to answer questions along the lines of: What work is being done by applications or groups of applications, and what is their impact (b) What activity is occurring on individual data objects like tables, bufferpools, tablespaces (c) What is the impact of the execution of individual SQL statements, or tracked items like Load

7. In-Memory Metrics: System Perspective A request comes into an agent and is processed. The agent gathers the metrics as it is running and at regular intervals propagates them upwards to the different aggregation points in the system hierarchy.A request comes into an agent and is processed. The agent gathers the metrics as it is running and at regular intervals propagates them upwards to the different aggregation points in the system hierarchy.

8. Access Points: System Perspective MON_GET_UNIT_OF_WORK MON_GET_WORKLOAD MON_GET_CONNECTION MON_GET_SERVICE_SUBCLASS Also provide interfaces that produce XML output: MON_GET_UNIT_OF_WORK_DETAILS MON_GET_WORKLOAD_DETAILS MON_GET_CONNECTION_DETAILS MON_GET_SERVICE_SUBCLASS_DETAILS We report data at several different aggregation levels. We allow drilldown across connections, and current unit of work. We also now allow monitoring based on WLM objects like workloads and service classes. This gives you a lot of power in terms of monitoring since you can create user defined groups of applications (workloads), and monitor the impact they are having in terms of the group of resources they are assigned (service classes). The XML interfaces are intended to provide an additional flexible format for the monitoring data, allowing us to easily store the whole set of metrics as a group, and also perform useful transformations to the data, such as translating to relational format using XMLTABLE for analysis, formatting for display in a browser, or for use in one or more of the row based metrics formatters we will show later on. The XML interfaces also provide a extra granularity in terms of the time spent in DB2 9.7.We report data at several different aggregation levels. We allow drilldown across connections, and current unit of work. We also now allow monitoring based on WLM objects like workloads and service classes. This gives you a lot of power in terms of monitoring since you can create user defined groups of applications (workloads), and monitor the impact they are having in terms of the group of resources they are assigned (service classes). The XML interfaces are intended to provide an additional flexible format for the monitoring data, allowing us to easily store the whole set of metrics as a group, and also perform useful transformations to the data, such as translating to relational format using XMLTABLE for analysis, formatting for display in a browser, or for use in one or more of the row based metrics formatters we will show later on. The XML interfaces also provide a extra granularity in terms of the time spent in DB2 9.7.

9. In-Memory Metrics: Data Object Perspective A request comes into an agent and is processed. As the request is processing, the agent increments the appropriate metrics at the different aggregation points in the object hierarchy. Above we show the core hierarchy of data objects involved in query processing. Other data object interfaces report data that?s collected and aggregated independently. A request comes into an agent and is processed. As the request is processing, the agent increments the appropriate metrics at the different aggregation points in the object hierarchy. Above we show the core hierarchy of data objects involved in query processing. Other data object interfaces report data that?s collected and aggregated independently.

10. Access Points: Data Object Perspective MON_GET_TABLE MON_GET_INDEX MON_GET_BUFFERPOOL MON_GET_TABLESPACE MON_GET_CONTAINER MON_GET_EXTENT_MOVEMENT_STATUS MON_GET_APPL_LOCKWAIT [9.7FP1] MON_GET_LOCKS [9.7FP1] MON_GET_FCM [9.7FP2] MON_GET_FCM_CONNECTION_LIST [9.7FP2] The data object interfaces provide aggregate data for various data objects of interest in the system. Index metrics are new available for the first time via an SQL interface as of DB2 9.7. The core set of data objects (table, index, bufferpool, tablespace, container) collect metrics through the common hierarchy shown on the previous slide. Other objects shown in the list above are relatively independent and therefore each have data collected independently of each other.The data object interfaces provide aggregate data for various data objects of interest in the system. Index metrics are new available for the first time via an SQL interface as of DB2 9.7. The core set of data objects (table, index, bufferpool, tablespace, container) collect metrics through the common hierarchy shown on the previous slide. Other objects shown in the list above are relatively independent and therefore each have data collected independently of each other.

11. In-Memory Metrics: Activity Perspective A request comes into an agent and is processed. If the request is related to an activity, then the agent gathers the metrics from the start of activity execution and at regular intervals aggregates them in the activity control block. When the activity completes, those activity execution metrics are propagated to the package cache and aggregated under the specific cached section that was executed (static and dynamic SQL). A request comes into an agent and is processed. If the request is related to an activity, then the agent gathers the metrics from the start of activity execution and at regular intervals aggregates them in the activity control block. When the activity completes, those activity execution metrics are propagated to the package cache and aggregated under the specific cached section that was executed (static and dynamic SQL).

12. Access Points: Activity Perspective MON_GET_PKG_CACHE_STMT Both static and dynamic SQL MON_GET_PKG_CACHE_STMT_DETAILS XML based output MON_GET_ACTIVITY_DETAILS (XML) Details for an activity currently in progress MON_GET_PKG_CACHE_STMT reports metrics on a per section basis for all SQL in the package cache. Unlike the existing dynamic SQL snapshot, this includes both dynamic as well as static entries. The package cache metrics for a given statement are updated when the statement completes execution. MON_GET_ACTIVITY_DETAILS is provided to allow the user to obtain metrics for an SQL statement that is currently executing (useful for long running queries in problem determination scenarios). MON_GET_PKG_CACHE_STMT reports metrics on a per section basis for all SQL in the package cache. Unlike the existing dynamic SQL snapshot, this includes both dynamic as well as static entries. The package cache metrics for a given statement are updated when the statement completes execution. MON_GET_ACTIVITY_DETAILS is provided to allow the user to obtain metrics for an SQL statement that is currently executing (useful for long running queries in problem determination scenarios).

13. Time Spent Metrics

14. Time Spent Metrics A new set of metrics are being introduced into DB2 that represent a breakdown of where time is spent within DB2 Represents sum of time spent by each agent thread in the system (foreground processing) Provides user with a relative breakdown of time spent, showing which areas are the most expensive during request / query processing Available in both system and activity perspectives This presentation will focus on analysis from the activity perspective Can be used for rapid identification and diagnosis of performance problems Times are divided into: Wait times Time agent threads spend blocking on I/O, network communications, etc Processing times (starting in 9.7FP1) Time spent processing in different component areas when the agent was not stuck on a wait Summary / total times Total time spent in a particular component area including both processing + wait times The time spent metrics represent the sum of the time spent by agent threads processing requests on behalf of client applications. We say they provide a relative breakdown of time spent because the times do not correspond directly to wall clock time (i.e. 10 threads will accumulate 10 seconds worth of time in DB2 for every 1 second of clock time), so the most meaningful way to analyze these is to look at the amount of time spent in a given area relative to the total time spent by all agent threads. In terms of understanding the basic time metrics, you should imagine any given interval of time to be broken into two pieces ? the time spent performing processing, and the time spent waiting for some event to occur (the completion of an I/O request, or the obtaining of a lock for example). Each time metric (with a few exceptions) also has a corresponding count, which indicates the number of times the wait or component metric has occurred. This allows the user to perform calculations such as the average time spent per wait, and tell you if you are dealing with elephants vs. mosquitoes (i.e. a few large occurrences vs. many small occurrences).The time spent metrics represent the sum of the time spent by agent threads processing requests on behalf of client applications. We say they provide a relative breakdown of time spent because the times do not correspond directly to wall clock time (i.e. 10 threads will accumulate 10 seconds worth of time in DB2 for every 1 second of clock time), so the most meaningful way to analyze these is to look at the amount of time spent in a given area relative to the total time spent by all agent threads. In terms of understanding the basic time metrics, you should imagine any given interval of time to be broken into two pieces ? the time spent performing processing, and the time spent waiting for some event to occur (the completion of an I/O request, or the obtaining of a lock for example). Each time metric (with a few exceptions) also has a corresponding count, which indicates the number of times the wait or component metric has occurred. This allows the user to perform calculations such as the average time spent per wait, and tell you if you are dealing with elephants vs. mosquitoes (i.e. a few large occurrences vs. many small occurrences).

15. ?Time Spent? Metrics: Breakdown of Wait + Processing Times in DB2 This is a representative picture of how a breakdown of wait + component times can be visualized. The times shown here are a subset of the full set of times available. In the above example you will see our total time involves both waits on particular areas (I/Os, lock waits) as well as processing in different areas (query compilation, section processing, commit / rollback, and some unknown time).This is a representative picture of how a breakdown of wait + component times can be visualized. The times shown here are a subset of the full set of times available. In the above example you will see our total time involves both waits on particular areas (I/Os, lock waits) as well as processing in different areas (query compilation, section processing, commit / rollback, and some unknown time).

16. Activity Time Spent Hierarchy Here we show the basic hierarchy of times included in the activity dimension. This is a slightly different hierarchy than we would see when examining the times from the system perspective (there are two general hierarchies ? one for the system perspective, and one for the activity perspective). This session will focus on the times within the activity perspective as these provide a method to look at where the time is spent during actual query execution, excluding one time setup and other system infrastructure costs such as commit and rollback processing (whereas the system perspective presents the total hierarchy of time spent in the server). The basic activity hierarchy consists of an overall statement execution time, which is then subdivided into both a set of waits and processing times incurred by threads performing work for the statement. One important thing to note is that a given statement or compiled query plan may involve the invocation of an arbitrary number of nested query plans in the case of SQL routines and procedures. In the hierarchy, the stmt_exec_time represents the entire amount of time spent executing the parent statement including any time spent in nested statements. The breakdown of waits and processing times however only pertain to the specific statement being executed and not nested statements it invokes, as the package cache aggregates metrics on a per section basis (i.e. per executable query plan), and each nested query has a separate executable plan / section. To examine the times within nested statements you need to look at the metrics for those individual statements separately in MON_GET_PKG_CACHE_STMT. Here we show the basic hierarchy of times included in the activity dimension. This is a slightly different hierarchy than we would see when examining the times from the system perspective (there are two general hierarchies ? one for the system perspective, and one for the activity perspective). This session will focus on the times within the activity perspective as these provide a method to look at where the time is spent during actual query execution, excluding one time setup and other system infrastructure costs such as commit and rollback processing (whereas the system perspective presents the total hierarchy of time spent in the server). The basic activity hierarchy consists of an overall statement execution time, which is then subdivided into both a set of waits and processing times incurred by threads performing work for the statement. One important thing to note is that a given statement or compiled query plan may involve the invocation of an arbitrary number of nested query plans in the case of SQL routines and procedures. In the hierarchy, the stmt_exec_time represents the entire amount of time spent executing the parent statement including any time spent in nested statements. The breakdown of waits and processing times however only pertain to the specific statement being executed and not nested statements it invokes, as the package cache aggregates metrics on a per section basis (i.e. per executable query plan), and each nested query has a separate executable plan / section. To examine the times within nested statements you need to look at the metrics for those individual statements separately in MON_GET_PKG_CACHE_STMT.

17. Navigating the ?time spent? hierarchy The row based formatting functions introduced in 9.7 FP1 offer an easy way to navigate the time spent hierarchy in a generic fashion MON_FORMAT_XML_TIMES_BY_ROW Shows breakdown of waits + processing times MON_FORMAT_XML_WAIT_TIMES_BY_ROW Shows breakdown of just wait times MON_FORMAT_XML_COMPONENT_TIMES_BY_ROW Shows breakdown of processing time as well as overall time spent in each ?component? of DB2 MON_FORMAT_XML_METRICS_BY_ROW Outputs all metrics in generic row based form The row based formatting functions provide an easy to navigate generic format for examining and ranking time spent metrics. These formatters accept as input the XML metrics data returned from any of the monitoring interfaces that produce XML (including event monitors). Three of the formatters produce different views of the time metrics specifically, whereas the fourth is a generic formatter for all the metrics encapsulated in the XML document. The basic usage of these functions can best be illustrated through several examples.The row based formatting functions provide an easy to navigate generic format for examining and ranking time spent metrics. These formatters accept as input the XML metrics data returned from any of the monitoring interfaces that produce XML (including event monitors). Three of the formatters produce different views of the time metrics specifically, whereas the fourth is a generic formatter for all the metrics encapsulated in the XML document. The basic usage of these functions can best be illustrated through several examples.

18. Example select r.metric_name, r.parent_metric_name, r.total_time_value as time, r.count, p.member, s.stmt_text as stmt from table(mon_get_pkg_cache_stmt_details(?S',null,null,-2)) as p table(mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s, table(mon_format_xml_times_by_row(p.details)) as r, where s.stmt_text like ?UPDATE stock%? order by total_time_value desc The example above obtains the full set of wait and processing times for a specific statement. The query itself obtains a listing of all static statements in the package cache, including fetching the related XML details. We then format the XML using the row based formatter, and qualify the statement we are looking for in the where clause. A few important points to note: The row based format is generic so that metrics can easily be sorted and identified with no specific knowledge by the application programmer on what the metrics of interest are ? this also means no extra code is needed when new metrics are added in the future as the function schema does not change. Metrics include both the time spent as well as the number of occurrences of a time interval to allow users to determine relative impact of the individual intervals as well as the overall impact on the system The relationship of metrics to each other in the hierarchy is shown via the parent_metric_name field allowing the user to understand and query relationships without prior knowledge of the hierarchy. This also allows programmatic discovery of the hierarchy in applications if desired. The row based formatters in general will work on the XML output produced by any of the new 9.7 Monitoring interfaces or event monitors. The example above obtains the full set of wait and processing times for a specific statement. The query itself obtains a listing of all static statements in the package cache, including fetching the related XML details. We then format the XML using the row based formatter, and qualify the statement we are looking for in the where clause. A few important points to note: The row based format is generic so that metrics can easily be sorted and identified with no specific knowledge by the application programmer on what the metrics of interest are ? this also means no extra code is needed when new metrics are added in the future as the function schema does not change. Metrics include both the time spent as well as the number of occurrences of a time interval to allow users to determine relative impact of the individual intervals as well as the overall impact on the system The relationship of metrics to each other in the hierarchy is shown via the parent_metric_name field allowing the user to understand and query relationships without prior knowledge of the hierarchy. This also allows programmatic discovery of the hierarchy in applications if desired. The row based formatters in general will work on the XML output produced by any of the new 9.7 Monitoring interfaces or event monitors.

19. Example 2 select r.metric_name, r.parent_metric_name, r.total_time_value as time, r.count, p.member, s.stmt_text as stmt from table(mon_get_pkg_cache_stmt_details(?S',null,null,-2)) as p table(mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s, table(mon_format_xml_wait_times_by_row(p.details)) as r, where s.stmt_text like ?UPDATE stock%? order by total_time_value desc This example shows a similar query but this time we are only querying the wait time metrics / wait time hierarchy. This example shows a similar query but this time we are only querying the wait time metrics / wait time hierarchy.

20. Example 3 select r.metric_name, r.parent_metric_name as parent_metric, r.total_time_value as total_time,r.proc_metric_name, r.parent_proc_metric_name as parent_proc_metric, r.proc_time_value as proc_time, r.count, p.member, s.stmt_text as stmt from table(mon_get_pkg_cache_stmt_details('S',null,null,-2)) as p, table (mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s, table(mon_format_xml_component_times_by_row(p.details)) as r where s.stmt_text like 'UPDATE stock%' order by total_time_value desc This example shows us obtaining the times for each individual component area in for our statement of interest. This provides a different perspective on where time is spent in the statement as we are showing the total times broken down by component, and the processing time spent in each individual component (as opposed the individual processing + wait times). This view can be useful to isolate which components are of interest irrespective of whether the time spent in those components are processing vs wait times. Both the processing + wait times can be identified within the individual component, but we do not capture the individual waits within a specific component due to memory / performance overhead. Also note that the output above shows two hierarchies; both the relationship of total times to each other, as well as the relationship of processing times to each other (which is a subset of the waits + processing times hierarchy). A few additional points that apply to all these examples: In this example we are filtering by comparing against the statement text. Starting in 9.7 we have further introduced an executable_id identifier that allows you to uniquely identify a section in the package cache. If we had obtained this identifier previously we could have passed it directly as input to MON_GET_PKG_CACHE_STMT_DETAILS in order to filter out the statement we wanted at source (note that in this example this was done in order to retrieve the actual statement text for each statement in the cache). Note here that we used MON_GET_PKG_CACHE_STMT in order to retrieve the statement text for each statement in columnar form. This data is already contained in the details field from MON_GET_PKG_CACHE_STMT_DETAILS so we could alternately have chosen to extract this using the XMLTABLE function. This example shows us obtaining the times for each individual component area in for our statement of interest. This provides a different perspective on where time is spent in the statement as we are showing the total times broken down by component, and the processing time spent in each individual component (as opposed the individual processing + wait times). This view can be useful to isolate which components are of interest irrespective of whether the time spent in those components are processing vs wait times. Both the processing + wait times can be identified within the individual component, but we do not capture the individual waits within a specific component due to memory / performance overhead. Also note that the output above shows two hierarchies; both the relationship of total times to each other, as well as the relationship of processing times to each other (which is a subset of the waits + processing times hierarchy). A few additional points that apply to all these examples: In this example we are filtering by comparing against the statement text. Starting in 9.7 we have further introduced an executable_id identifier that allows you to uniquely identify a section in the package cache. If we had obtained this identifier previously we could have passed it directly as input to MON_GET_PKG_CACHE_STMT_DETAILS in order to filter out the statement we wanted at source (note that in this example this was done in order to retrieve the actual statement text for each statement in the cache). Note here that we used MON_GET_PKG_CACHE_STMT in order to retrieve the statement text for each statement in columnar form. This data is already contained in the details field from MON_GET_PKG_CACHE_STMT_DETAILS so we could alternately have chosen to extract this using the XMLTABLE function.

21. Identifying Query Related Performance Problems

22. Identifying Query Related Performance Problems Generally we will want to try to identify and address any system level bottlenecks / non query related issues first before proceeding to more granular investigation at the statement or query level We will look at some initial indicators here to help rule out broad system-wide or non query related problems This will help us narrow down whether the performance issues we are observing are related to statement execution or plan issues specifically rather than other system bottlenecks, and that tuning individual queries will give us some bang for our buck The following examples will show some of the basic indicators that may be useful in helping to make this determination This section might alternately have been titled ?looking for system level performance problems before proceeding with query tuning?. This section might alternately have been titled ?looking for system level performance problems before proceeding with query tuning?.

23. ?How much time on the database is spent doing query processing?? This first example shows us two things - how much of the time on the system is being spent during executions of recognized activities (these would be SQL statements, and load utilities in DB2 9.7 and 9.8). It also shows us how much of that time is spent performing actual query plan execution vs. just statement processing (total_section_time vs. total_activity_time). If our system is running well we would expect to see a large percentage of our time spent here, indicating we do not have any unexpected bottlenecks outside of query execution. If however we see significant time is being spent outside of query execution it may be an indication that further investigation and tuning needs to be performed at the system level before delving deeper into a query tuning exercise. In our example above we see that the activity time represents about half of the total request time on the server, indicating that we are spending a reasonably large portion of our time outside of activity execution. This tells us that we might want to perform a bit more investigation at the system level before we start chasing individual rogue queries. Activity total time is a superset of section time, and here we also see that all the activity time is being spent performing query plan execution. This indicates that our query performance is purely a factor of the speed at which the query plan executes. If the activity time had been significantly larger than the section time, it would generally be an indication that we are spending a portion of our time in non-sql routines when executing statements, or that we are running Loads on the system. A quick check of the total_load_time would allow us to confirm that the majority of our time is being spent in statement execution. For this example we used an aggregate of metrics at the connection level to look at the division of time for work done by any connections currently on the system, but it?s important to note that we could just of easily have done a similar query at the service class or workload level as well. Another point to note is that by looking at the connection level metrics in this way we are looking at the aggregates since the connections were established. To obtain a more recent data sample we could instead have computed deltas for the metrics over a sampling period. The next example shows one simple way to accomplish this. This first example shows us two things - how much of the time on the system is being spent during executions of recognized activities (these would be SQL statements, and load utilities in DB2 9.7 and 9.8). It also shows us how much of that time is spent performing actual query plan execution vs. just statement processing (total_section_time vs. total_activity_time). If our system is running well we would expect to see a large percentage of our time spent here, indicating we do not have any unexpected bottlenecks outside of query execution. If however we see significant time is being spent outside of query execution it may be an indication that further investigation and tuning needs to be performed at the system level before delving deeper into a query tuning exercise. In our example above we see that the activity time represents about half of the total request time on the server, indicating that we are spending a reasonably large portion of our time outside of activity execution. This tells us that we might want to perform a bit more investigation at the system level before we start chasing individual rogue queries. Activity total time is a superset of section time, and here we also see that all the activity time is being spent performing query plan execution. This indicates that our query performance is purely a factor of the speed at which the query plan executes. If the activity time had been significantly larger than the section time, it would generally be an indication that we are spending a portion of our time in non-sql routines when executing statements, or that we are running Loads on the system. A quick check of the total_load_time would allow us to confirm that the majority of our time is being spent in statement execution. For this example we used an aggregate of metrics at the connection level to look at the division of time for work done by any connections currently on the system, but it?s important to note that we could just of easily have done a similar query at the service class or workload level as well. Another point to note is that by looking at the connection level metrics in this way we are looking at the aggregates since the connections were established. To obtain a more recent data sample we could instead have computed deltas for the metrics over a sampling period. The next example shows one simple way to accomplish this.

24. ?Are there any system-wide bottlenecks?? You?ll immediately notice that this example is fairly busy, but at a high level we are doing only a few simple things First we have created some simple views to help us obtain a delta for our wait time metrics over a 60 second period. This consists of creating a view that obtains the aggregate metrics in row based form for the database by summing all the service subclasses, creating a temporary table where we can cache our initial sample, and creating a third view that will compute a delta for our metrics from the baseline we?ve stored. Note that this example shows one key benefit of the row based time metrics format which is that it allows deltas to be computed generically without knowledge of the specific metrics returned. You?ll immediately notice that this example is fairly busy, but at a high level we are doing only a few simple things First we have created some simple views to help us obtain a delta for our wait time metrics over a 60 second period. This consists of creating a view that obtains the aggregate metrics in row based form for the database by summing all the service subclasses, creating a temporary table where we can cache our initial sample, and creating a third view that will compute a delta for our metrics from the baseline we?ve stored. Note that this example shows one key benefit of the row based time metrics format which is that it allows deltas to be computed generically without knowledge of the specific metrics returned.

25. (continued) To obtain our sample we then simply cache a baseline of our database metrics, wait for the sampling period, and then select the delta values from our delta view (in this case we select the top 5 times only in order to fit onto a single slide) ? Looking at the results, what we see is that we have some significant wait times to investigate. The first wait we need to address is client idle wait time- this time represents both the think time on the client side between requests, but also the time spent processing results send from the server to the client. In this case we were running an OLTP workload, and the data the queries were processing was comparable to the results returned to the client side, so long as our applications are not experiencing response time problems, we should be able to safely ignore this time as normal for this case. Note also that although we didn?t cover the system time spent hierarchy in this session, that client_idle_wait_time is actually not included as a component of total_wait_time since it is not incurred during request processing. Notice also from the output that the client_idle_wait_time is neither considered part of total_rqst_time nor total_wait_time for this reason. Moving on we can see that our request time appears to basically be 50% wait time, and 50% section processing, with all the wait time being log disk wait time, indicating it?s most likely time spent performing commit related processing. In this case though we are running a heavy OLTP workload with frequent commits and not very powerful disks, so once again for our environment this level of waits is expected. Having ruled out these two waits as being indicative of a problem, the majority of our time spent is left in section / query processing. This gives us further confidence that there are no general system-wide issues affecting our query performance and that our investigation should now center on what the queries themselves are doing. As a counter-example if we had determined instead that most of our time was going towards query compilation, it would have been a strong indicator that the issue was not related to query execution itself, and that would have led our investigation towards package cache tuning, and or excessive usage of literals in the SQL from certain applications / enablement of statement concentrator rather than looking at the query plans.To obtain our sample we then simply cache a baseline of our database metrics, wait for the sampling period, and then select the delta values from our delta view (in this case we select the top 5 times only in order to fit onto a single slide) ? Looking at the results, what we see is that we have some significant wait times to investigate. The first wait we need to address is client idle wait time- this time represents both the think time on the client side between requests, but also the time spent processing results send from the server to the client. In this case we were running an OLTP workload, and the data the queries were processing was comparable to the results returned to the client side, so long as our applications are not experiencing response time problems, we should be able to safely ignore this time as normal for this case. Note also that although we didn?t cover the system time spent hierarchy in this session, that client_idle_wait_time is actually not included as a component of total_wait_time since it is not incurred during request processing. Notice also from the output that the client_idle_wait_time is neither considered part of total_rqst_time nor total_wait_time for this reason. Moving on we can see that our request time appears to basically be 50% wait time, and 50% section processing, with all the wait time being log disk wait time, indicating it?s most likely time spent performing commit related processing. In this case though we are running a heavy OLTP workload with frequent commits and not very powerful disks, so once again for our environment this level of waits is expected. Having ruled out these two waits as being indicative of a problem, the majority of our time spent is left in section / query processing. This gives us further confidence that there are no general system-wide issues affecting our query performance and that our investigation should now center on what the queries themselves are doing. As a counter-example if we had determined instead that most of our time was going towards query compilation, it would have been a strong indicator that the issue was not related to query execution itself, and that would have led our investigation towards package cache tuning, and or excessive usage of literals in the SQL from certain applications / enablement of statement concentrator rather than looking at the query plans.

26. Identifying High Impact / Expensive Queries

27. Identifying Problem Queries Once we?ve ruled out general system-wide bottlenecks that might be impacting our performance we?ll want to identify the highest impact / worst performing queries on the system as possible candidates for tuning Time spent metrics can be used to drill down in a number of different useful ways in order to help identify high impact queries. Among the many things we can look at are: Queries with the most execution time in server Queries with the worst velocity in terms of processing vs wait CPU intensive queries Least efficient query plans The following examples will show you several ways you can leverage the time spent metrics to drill down and identify your problem queries

28. Finding High Impact Queries The next several examples show a few basic ?Top N? type queries for statements in the package cache using time spent metrics and the MON_GET_PKG_CACHE_STMT table function. It?s important to note that this function will show data for both static and dynamic statements unless specific filtering is performed, and therefore offers some significant advantages over the older dynamic SQL snapshot (it is also significantly more lightweight). The first example finds the 5 statements with the most request time in the server ? these represent the statements that have spent the longest amount of time on the server in comparison to other statements. Note that the way we have constructed the example this may show statements with long response times, or statements executed many many times. Refinements can be made using the num_executions count to separate these cases. The next several examples show a few basic ?Top N? type queries for statements in the package cache using time spent metrics and the MON_GET_PKG_CACHE_STMT table function. It?s important to note that this function will show data for both static and dynamic statements unless specific filtering is performed, and therefore offers some significant advantages over the older dynamic SQL snapshot (it is also significantly more lightweight). The first example finds the 5 statements with the most request time in the server ? these represent the statements that have spent the longest amount of time on the server in comparison to other statements. Note that the way we have constructed the example this may show statements with long response times, or statements executed many many times. Refinements can be made using the num_executions count to separate these cases.

29. More High Impact Queries The second example here ranks statements in terms of CPU time consumption. This shows the most CPU intensive statements on the database and can be very useful to help determine who the biggest contributors to CPU bottlenecks are. The third example ranks statements by the amount of time they have spent performing I/O operations (including both bufferpool and direct read / read lob activity). This will show the statements that are driving the most I/O activity in the server. Similar queries can be constructed based on any of the available metrics, but the above represent some of the more common ones you might be interested in. The second example here ranks statements in terms of CPU time consumption. This shows the most CPU intensive statements on the database and can be very useful to help determine who the biggest contributors to CPU bottlenecks are. The third example ranks statements by the amount of time they have spent performing I/O operations (including both bufferpool and direct read / read lob activity). This will show the statements that are driving the most I/O activity in the server. Similar queries can be constructed based on any of the available metrics, but the above represent some of the more common ones you might be interested in.

30. Queries with the Worst Relative Velocity This example shows another useful metric we can compute with time spent ? the relative velocity. Here we are looking at the relative velocity at the statement level. The relative velocity shows the degree of progress of a statement?s execution relative to the progress it would make if it was not spending any time blocking / waiting on subsystems. In the example above we have selected the 5 statements with the worst relative velocities. You can see that only 1-2% of the request time on the server for these statements is actually spent making progress! Also keep in mind that our computation is based on the total_act_time which includes only the time spent actually executing the statement (and not other costs such as compilation / optimization). So this is specifically reflecting the cost of subsystem waits within the statement execution (such as I/O operations, lock waits, etc). A logical next step if we wanted to drill down further to understand the execution profile of the statements would be to turn around and rank the individual wait and component times for each statement of interest. We will look at this in the next section.This example shows another useful metric we can compute with time spent ? the relative velocity. Here we are looking at the relative velocity at the statement level. The relative velocity shows the degree of progress of a statement?s execution relative to the progress it would make if it was not spending any time blocking / waiting on subsystems. In the example above we have selected the 5 statements with the worst relative velocities. You can see that only 1-2% of the request time on the server for these statements is actually spent making progress! Also keep in mind that our computation is based on the total_act_time which includes only the time spent actually executing the statement (and not other costs such as compilation / optimization). So this is specifically reflecting the cost of subsystem waits within the statement execution (such as I/O operations, lock waits, etc). A logical next step if we wanted to drill down further to understand the execution profile of the statements would be to turn around and rank the individual wait and component times for each statement of interest. We will look at this in the next section.

31. Queries with the Least Efficient Plans Another common query we can perform using MON_GET_PKG_CACHE_STMT is to look at the ratio of rows read vs. rows returned for a given statement. This is an excellent indication of index effectiveness in our queries. A significantly high ratio of rows read for every row returned tells us that we are doing extensive table scanning, and that we should probably investigate the tables underlying the query for possible missing indexes (depending on how frequently the query is executed). Also by keeping track of such data for our workloads over time we can detect problematic changes in existing queries (such as a dropped index for example suddenly leading to a degradation in the ratio). In the example above we have selected the top 10 sql statements in the package cache with the worst ratio of rows read to rows returned. The values shown indicate significant inefficiency in two of our queries that would warrant further investigation. In the first one though we appear to be doing a count(*) would account for the table scan. In the second one however we are selecting the minimum value for a particular column ? this could be significantly speeded up by an index, so it would be worth investigating if this is a query that is executed on a regular basis which might justify such a change (or whether there was already supposed to be an index on this column). We may not always be able to get to the root cause of the problem in such an easy manner, but this type of query can be very effective as a first line of investigation.Another common query we can perform using MON_GET_PKG_CACHE_STMT is to look at the ratio of rows read vs. rows returned for a given statement. This is an excellent indication of index effectiveness in our queries. A significantly high ratio of rows read for every row returned tells us that we are doing extensive table scanning, and that we should probably investigate the tables underlying the query for possible missing indexes (depending on how frequently the query is executed). Also by keeping track of such data for our workloads over time we can detect problematic changes in existing queries (such as a dropped index for example suddenly leading to a degradation in the ratio). In the example above we have selected the top 10 sql statements in the package cache with the worst ratio of rows read to rows returned. The values shown indicate significant inefficiency in two of our queries that would warrant further investigation. In the first one though we appear to be doing a count(*) would account for the table scan. In the second one however we are selecting the minimum value for a particular column ? this could be significantly speeded up by an index, so it would be worth investigating if this is a query that is executed on a regular basis which might justify such a change (or whether there was already supposed to be an index on this column). We may not always be able to get to the root cause of the problem in such an easy manner, but this type of query can be very effective as a first line of investigation.

32. The ?Executable ID? In order to facilitate tracking of individual statements, DB2 9.7 introduced the new concept of an ?executable id? Short identifier that uniquely identifies a section in the package cache (not an individual execution, but an individual statement in a particular compilation environment) Can be obtained from activity monitoring interfaces such as MON_GET_PKG_CACHE_STMT / MON_GET_PKG_CACHE_STMT_DETAILS, MON_GET_ACTIVITY_DETAILS or the package cache and activity event monitors Can be subsequently used as input to the package cache interfaces to retrieve data for that particular section without the statement text In the next section we will leverage the executable ID that uniquely identifies our statements of interest in order to perform further drilldown In our previous examples using MON_GET_PKG_CACHE_STMT we could have trivially extracted the executable id column in addition to the statement text which gives us an efficient way to identify or reference that particular statement in future queries (or for joins).In our previous examples using MON_GET_PKG_CACHE_STMT we could have trivially extracted the executable id column in addition to the statement text which gives us an efficient way to identify or reference that particular statement in future queries (or for joins).

33. Analyzing Individual Statements Using Time Spent

34. Analyzing Individual Queries Using Time Spent Once we have pinpointed our statements of interest, our next step is to drill down into these individual statements to understand where they are spending their time By understanding where the time is being spent in the query we can identify where the database server is spending effort, and look for opportunities for tuning Using the executable id from problem statements identified in the previous section we can now perform an analysis of the time spent breakdown for those statements

35. ?Where is my time being spent?? select p.executable_id, r.metric_name, r.parent_metric_name, r.total_time_value as time, r.count, p.member from (select stmt_exec_time, executable_id from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as s order by stmt_exec_time desc fetch first row only) as stmts, table(mon_get_pkg_cache_stmt_details(null, stmts.executable_id, null, -2)) as p, table(mon_format_xml_times_by_row(p.details)) as r order by stmts.executable_id, total_time_value desc Here we query the breakdown of wait and processing times for a statement of interest identified by one of our previous examples to see where we are spending our time. Passing the executable ID for the statement as input to the package cache statement causes us to filter only the statement of interest at source (particularly useful on DPF or PureScale configurations), and is a very efficient way to retrieve these metrics. This statement is really two queries combined into one - we are combining the query to find the statement with the most time in the server and then passing the executable id from this directly as input into the mon_get_pkg_cache_stmt_details function to obtain the metrics for that statement. Here we query the breakdown of wait and processing times for a statement of interest identified by one of our previous examples to see where we are spending our time. Passing the executable ID for the statement as input to the package cache statement causes us to filter only the statement of interest at source (particularly useful on DPF or PureScale configurations), and is a very efficient way to retrieve these metrics. This statement is really two queries combined into one - we are combining the query to find the statement with the most time in the server and then passing the executable id from this directly as input into the mon_get_pkg_cache_stmt_details function to obtain the metrics for that statement.

36. (continued) From the actual output we can see that our statement spends a significant time in non-section processing, and specifically in routine user code processing. This is an indication that the bulk of the statements costs are coming from code the statement is executing outside of DB2 (generally user defined stored procedures or UDFs). Depending on whether the cost of routines is expected to be this high for the statement, this may indicate an investigation is needed into the implementation of the routines themselves. We can also see from the example here that the statement has only one wait time, spending a modest percentage of it?s overall execution time blocking on I/Os. This means the query has been able to fulfill it?s processing without any blocking on lock waits or on other subsystems. An additional note on routine times worth mentioning is that for performance reasons TOTAL_ROUTINE_USER_CODE_PROC_TIME is not collected for NO SQL routines by default. If investigation into routines is performed it is therefore recommended that the mon_act_metrics db config parameter (or the collect activity metrics clause for the associated workload object) be set to EXTENDED.From the actual output we can see that our statement spends a significant time in non-section processing, and specifically in routine user code processing. This is an indication that the bulk of the statements costs are coming from code the statement is executing outside of DB2 (generally user defined stored procedures or UDFs). Depending on whether the cost of routines is expected to be this high for the statement, this may indicate an investigation is needed into the implementation of the routines themselves. We can also see from the example here that the statement has only one wait time, spending a modest percentage of it?s overall execution time blocking on I/Os. This means the query has been able to fulfill it?s processing without any blocking on lock waits or on other subsystems. An additional note on routine times worth mentioning is that for performance reasons TOTAL_ROUTINE_USER_CODE_PROC_TIME is not collected for NO SQL routines by default. If investigation into routines is performed it is therefore recommended that the mon_act_metrics db config parameter (or the collect activity metrics clause for the associated workload object) be set to EXTENDED.

37. Understanding the Time Metrics: Wait Times Wait Times represent the time an agent spends blocking / waiting on particular subsystems Large % wait times in a particular subsystem may indicate is a bottleneck, and is preventing agents working on statement processing from making optimal progress In some cases the presence of large wait times may indicate a potential query problem For example, larger than expected bufferpool read time resulting from unexpected table scan activity might indicate a missing index rather than an I/O problem It might also be an indication of a bad disk though ? In other cases just the presence of a wait time on its own may indicate a problem For example, direct reads / writes when we are expecting LOBs to be inlined, or FCM TQ waits for a query that is expected to be co-located on a partitioned system would directly indicate a problem A surface discussion is provided here on the characteristics of wait times and processing times in the activity / statement dimension to give further background on time spent. Due to time limitations we will not go into detail on the individual metrics, however a table of the metrics in the activity hierarchy is included in the presentation for further reference. A surface discussion is provided here on the characteristics of wait times and processing times in the activity / statement dimension to give further background on time spent. Due to time limitations we will not go into detail on the individual metrics, however a table of the metrics in the activity hierarchy is included in the presentation for further reference.

38. Wait Times (Activity) The table above gives a brief cheatsheet on the major wait times added in 9.7 related to query execution and their interpretation. Most represent subsystem waits where tuning actions may be taken. Others help complete the picture of where server threads spend their time waiting.The table above gives a brief cheatsheet on the major wait times added in 9.7 related to query execution and their interpretation. Most represent subsystem waits where tuning actions may be taken. Others help complete the picture of where server threads spend their time waiting.

39. More Wait Times (Activity)

40. Understanding the Time Metrics: Processing / Component Times Processing / Component Times represent the time an agent thread spends processing / consuming CPU in a particular component area of DB2 Broken down into two pieces - the processing portion of in the component as well as the total time in the component Large % processing times in a particular component subsystem indicates that most of our processing is taking place there Whether this is an issue depends on what the component is For example, large times in sort might be expected in a particular query, whereas their presence in others might indicate a problem

41. Processing/Component Times (Activity) The table above gives a brief cheatsheet on the major component areas in 9.7 related to query execution and their interpretation. The components represent general areas in the query execution pipeline where we are spending our time. Spending significant time in unexpected component areas may indicate unexpected / inefficient application behavior on the system. By looking at the processing vs. total time spent in components one can also differentiate whether the time is actually spent processing in that component, or whether the time is spent waiting on some subsystem while in that component area. For the latter one would typically want to switch to an examination of the specific wait times in different dimensions to see which subsystems might be a bottleneck and offer opportunities for tuning.The table above gives a brief cheatsheet on the major component areas in 9.7 related to query execution and their interpretation. The components represent general areas in the query execution pipeline where we are spending our time. Spending significant time in unexpected component areas may indicate unexpected / inefficient application behavior on the system. By looking at the processing vs. total time spent in components one can also differentiate whether the time is actually spent processing in that component, or whether the time is spent waiting on some subsystem while in that component area. For the latter one would typically want to switch to an examination of the specific wait times in different dimensions to see which subsystems might be a bottleneck and offer opportunities for tuning.

42. Other Important Time Metrics A few time (and time related) metrics that bear mention here: STMT_EXEC_TIME The total time spent executing a statement (included the time spent in any nested invocations) TOTAL_ACT_WAIT_TIME The total time spent blocking on waits while executing this particular section (does not include wait times for nested invocations however which are reported in the section entry for those individual statements) TOTAL_ROUTINE_TIME The total amount of time the statement spent executing within routines (UDFs or Stored Procedures) TOTAL_CPU_TIME Not a metric in the actual ?time spent? hierarchy, but generally a critical time-related metric A few other key metrics bear mention in the discussion of time spent that are general summary metrics not found in the summary of metrics appendix of this presentation. The first three represent summary times within the time spent hierarchy, while the third is an important general metric. STMT_EXEC_TIME represents the total time spent by all foreground / agent threads in the engine working on behalf of the statement in question, and therefore represent the net effort expended by DB2 towards executing this statement. Note that while the section execution metrics are for individual sections only (and therefore don?t include the cost of nested SQL statements), the STMT_EXEC_TIME specifically *does* include the time spent executing nested SQL in order to provide an indication of the aggregate cost of the statement. TOTAL_ACT_WAIT_TIME represents the total time threads spent blocking on subsystems within the section, and give an indication of how efficiently the section processing is proceeding. Computing (TOTAL_ACTIVITY_TIME ? TOTAL_ACT_WAIT_TIME) will give you the total amount of time threads were eligible to process or consume CPU during this section execution. This is a proxy for the CPU time consumed by agents on the server, but is not exactly the same thing. This is because it measures the time spent both when threads are consuming CPU, or are queued to run on the CPU. If the actual CPU consumption is desired we provide the TOTAL_CPU_TIME metric, which indicates the actual CPU consumed by agent threads when performing statement processing. Note that obtaining CPU time on the system is performance intensive so therefore do not provide CPU times per component ? only in aggregate.A few other key metrics bear mention in the discussion of time spent that are general summary metrics not found in the summary of metrics appendix of this presentation. The first three represent summary times within the time spent hierarchy, while the third is an important general metric. STMT_EXEC_TIME represents the total time spent by all foreground / agent threads in the engine working on behalf of the statement in question, and therefore represent the net effort expended by DB2 towards executing this statement. Note that while the section execution metrics are for individual sections only (and therefore don?t include the cost of nested SQL statements), the STMT_EXEC_TIME specifically *does* include the time spent executing nested SQL in order to provide an indication of the aggregate cost of the statement. TOTAL_ACT_WAIT_TIME represents the total time threads spent blocking on subsystems within the section, and give an indication of how efficiently the section processing is proceeding. Computing (TOTAL_ACTIVITY_TIME ? TOTAL_ACT_WAIT_TIME) will give you the total amount of time threads were eligible to process or consume CPU during this section execution. This is a proxy for the CPU time consumed by agents on the server, but is not exactly the same thing. This is because it measures the time spent both when threads are consuming CPU, or are queued to run on the CPU. If the actual CPU consumption is desired we provide the TOTAL_CPU_TIME metric, which indicates the actual CPU consumed by agent threads when performing statement processing. Note that obtaining CPU time on the system is performance intensive so therefore do not provide CPU times per component ? only in aggregate.

43. Advanced Diagnostics Using Runtime Explain and Section Actuals

44. Introducing Runtime Explain There may be cases when a more detailed analysis of query execution is required than can be provided with basic monitoring metrics such as time spent In these cases the tool we typically turn to is the EXPLAIN feature of DB2 ? which herein we will refer to as the SQL Compiler EXPLAIN This capability compiles an input SQL statement and allows you to format and view the query plan Expected to be a generally accurate approximation of the query you actually ran May differ due to differences in compilation environment and/or table statistics from when your query was compiled An exciting new feature introduced in DB2 9.7 FP1 is the ability to perform a ?runtime explain? (or ?Explain From Section?) which produces explain output directly from a compiled query plan (or section) in the engine Allows you to obtain the plan from the actual section you are executing The key aspect of runtime explain is that it allows an explain to be performed against the actual executable byte-code or ?section? the compiler / optimizer produces when building a query plan. This guarantees the explain you are seeing matches the actual plan used to execute your statement and also opens up some interesting capabilities in terms of integrating runtime data with explains, which we will discuss in a few slides.The key aspect of runtime explain is that it allows an explain to be performed against the actual executable byte-code or ?section? the compiler / optimizer produces when building a query plan. This guarantees the explain you are seeing matches the actual plan used to execute your statement and also opens up some interesting capabilities in terms of integrating runtime data with explains, which we will discuss in a few slides.

45. Explain from Section Procedures A set of stored procedures provided in DB2 9.7 FP1 allow you to explain a runtime section into the explain tables EXPLAIN_FROM_CATALOG EXPLAIN_FROM_SECTION EXPLAIN_FROM_ACTIVITY EXPLAIN_FROM_DATA Explain table content can then be processed using any existing explain tools (eg. db2exfmt) Explain output can be generated from the following sources: Static or dynamic statement entries in the package cache Any cache entry (DETAILED) captured by the new package cache event monitor Static statement from the catalog tables Statement execution captured with section by the activity event monitor The runtime explain capabilities can be accessed via one of 4 stored procedures which allow you to explain section data from multiple sources and store the results in the standard explain tables. EXPLAIN_FROM_CATALOG allows you to perform an explain on a section in a previously compiled static package stored in the database catalog tables EXPLAIN_FROM_SECTION allows you to perform an explain on a section found in the current package cache, or from the historical package cache table data collected by a package cache event monitor (when detailed data is collected) EXPLAIN_FROM_ACTIVITY allows you to perform an explain on a section captured along with a specific execution of an SQL statement from table data collected using an activity event monitor EXPLAIN_FROM_DATA provides an alternate formatting mechanism that takes a generic binary section object from an unspecified source and formats it. The binary section data can be obtained, for example, from the catalogs, the package cache or activity event monitor tables. With this function though you have the flexibility to extract and store this data elsewhere for future use without needing to keep the original event monitors around.The runtime explain capabilities can be accessed via one of 4 stored procedures which allow you to explain section data from multiple sources and store the results in the standard explain tables. EXPLAIN_FROM_CATALOG allows you to perform an explain on a section in a previously compiled static package stored in the database catalog tables EXPLAIN_FROM_SECTION allows you to perform an explain on a section found in the current package cache, or from the historical package cache table data collected by a package cache event monitor (when detailed data is collected) EXPLAIN_FROM_ACTIVITY allows you to perform an explain on a section captured along with a specific execution of an SQL statement from table data collected using an activity event monitor EXPLAIN_FROM_DATA provides an alternate formatting mechanism that takes a generic binary section object from an unspecified source and formats it. The binary section data can be obtained, for example, from the catalogs, the package cache or activity event monitor tables. With this function though you have the flexibility to extract and store this data elsewhere for future use without needing to keep the original event monitors around.

46. Section Actuals One of the key benefits of the explain from section capability is the ability to capture and format ?section actuals? All Explain output will contain cardinality estimates for individual operators in the plan When a section based explain is performed on a statement captured by the activity event monitor we can also capture the actual cardinalities for each operator for that execution Examining this data gives you an indication of what actually happened during the query execution, and whether the estimates the optimizer used in generating the query plan were accurate In order to examine the actuals we will need to capture the execution of our SQL statement of interest using the activity event monitor The section actuals or runtime statistics capability is one of the key benefits we can add by leveraging runtime explain. That is, we can integrate the query plan information provided by explain with actual statistics collected at the operator level. In DB2 9.7 FP1 we specifically introduced the ability to capture cardinalities for individual operators within the plan for comparison with estimates. Future versions will be able to leverage this infrastructure to provide additional detailed metrics within the plan.The section actuals or runtime statistics capability is one of the key benefits we can add by leveraging runtime explain. That is, we can integrate the query plan information provided by explain with actual statistics collected at the operator level. In DB2 9.7 FP1 we specifically introduced the ability to capture cardinalities for individual operators within the plan for comparison with estimates. Future versions will be able to leverage this infrastructure to provide additional detailed metrics within the plan.

47. Capturing Activities to Obtain Actuals The activity event monitor in DB2 allows the capture of execution details for individual SQL statements as well as several other recognized activities (eg. Load) It can be configured to capture a variety of different metrics as well as the section data Starting in DB2 9.7 FP1 the section data captured by the activity event monitor has been enhanced to now include section actuals Since the capture of individual activities is quite granular we offer a fair degree of flexibility allowing the following data capture options: Capture data for all activities running in a particular WLM workload Capture data for all activities running in a particular WLM service class Capture data for activities that violate a particular WLM threshold We can also enable the capture of activities run by a specific application using the WLM_SET_CONN_ENV procedure first introduced in DB2 9.7 FP2 Our final example will demonstrate how to capture a statement of interest using the activity event monitor and then obtain the section actuals using the new explain capabilities in DB2 9.7 FP1

48. create event monitor actEvmon for activities write to table activity ( table activity, in monitorTBS), activityvals ( table activityvals, in monitorTBS ), activitystmt ( table activitystmt, in monitorTBS ), control ( table control, in monitorTBS ) manualstart Step I: Required Setup Steps In order to capture and examine section actuals for statements we first need to do a bit of setup: If we haven?t already created explain tables on our database we need to execute the DDL provided with DB2 to set these up. These are the output tables where the explain data produced when running one of the new explain procedures as well as the compiler based EXPLAIN. We then need to create an activity event monitor to capture our statements and their associated section data ? in this case we assume the user already has a specific tablespace set aside to collect this data. A separate tablespace is advisable in cases where this event monitor is used heavily as by capturing significant information at the statement level it may generate large volumes of data. We have also made this a manualstart event monitor so that we can enable it for periods of SQL tracing without the DBMS implicitly enabling it on us if the database is shut down.In order to capture and examine section actuals for statements we first need to do a bit of setup: If we haven?t already created explain tables on our database we need to execute the DDL provided with DB2 to set these up. These are the output tables where the explain data produced when running one of the new explain procedures as well as the compiler based EXPLAIN. We then need to create an activity event monitor to capture our statements and their associated section data ? in this case we assume the user already has a specific tablespace set aside to collect this data. A separate tablespace is advisable in cases where this event monitor is used heavily as by capturing significant information at the statement level it may generate large volumes of data. We have also made this a manualstart event monitor so that we can enable it for periods of SQL tracing without the DBMS implicitly enabling it on us if the database is shut down.

49. Step II: Capturing the Activity Data We now need to actually capture the execution of our SQL statement(s) of interest using the activity event monitor. In our case we will use the activity event monitor to capture what is effectively an ?SQL Trace? of statements run over a brief period of time. We?ll show two approaches you can use to accomplish this. The first, shown here, involves a case where we want to run a specific piece of SQL and capture the actuals for it. In order to accomplish this there are several steps to follow: We first enable the event monitor ? this will cause it to start capturing activity data that is collected on the system Second we use WLM_SET_CONN_ENV to update the attributes for our connection ? we are specifying three things here: collectactdata indicates that each activity that executes should collect both details and section data to the event monitor (section data being the key element we?ll need to obtain the actuals) (b) We have specified collection will occur on all partitions. In our case we are running on a single node instance so this parameter is superfluous, but if we were running on DPF we would want to ensure the actuals are collected at each partition since the data is collected on a per partition basis (c) Finally we enable is the collection of section actuals so that we actually capturing the cardinality values in memory during query execution at runtime Once these steps are done any statements executed on this connection will be captured using the activity event monitor, so we now run the statement we are interested in capturing. When we have run our statement of interest, we call wlm_set_conn_env once again to disable collection of activity data, and of the section actuals on this connection. If we do not intend to use the event monitor further we would also shut it down.We now need to actually capture the execution of our SQL statement(s) of interest using the activity event monitor. In our case we will use the activity event monitor to capture what is effectively an ?SQL Trace? of statements run over a brief period of time. We?ll show two approaches you can use to accomplish this. The first, shown here, involves a case where we want to run a specific piece of SQL and capture the actuals for it. In order to accomplish this there are several steps to follow: We first enable the event monitor ? this will cause it to start capturing activity data that is collected on the system Second we use WLM_SET_CONN_ENV to update the attributes for our connection ? we are specifying three things here: collectactdata indicates that each activity that executes should collect both details and section data to the event monitor (section data being the key element we?ll need to obtain the actuals) (b) We have specified collection will occur on all partitions. In our case we are running on a single node instance so this parameter is superfluous, but if we were running on DPF we would want to ensure the actuals are collected at each partition since the data is collected on a per partition basis (c) Finally we enable is the collection of section actuals so that we actually capturing the cardinality values in memory during query execution at runtime Once these steps are done any statements executed on this connection will be captured using the activity event monitor, so we now run the statement we are interested in capturing. When we have run our statement of interest, we call wlm_set_conn_env once again to disable collection of activity data, and of the section actuals on this connection. If we do not intend to use the event monitor further we would also shut it down.

50. Step II: Another approach The second approach we can use involves the case where you want to capture data for statements of interest that others are running on the database system, as opposed to statements you run directly on your own connection. This approach would be appropriate if you were troubleshooting a live system, or if you were testing a workload where it would be difficult to enable collection on an individual connection basis. The steps we?d take for this approach are the following: As before we must first enable the event monitor so that it will capture activity data The next step is to enable the collection of section actuals on the system. In this case we assume we do not know which connections will be running the statements of interest so we enable this for the entire database. The final step here is to enable collection of the appropriate activity data events ? once again we enable collection of data with details and with the section. Here we have enabled this collection for the entire sysdefaultuserclass for simplicity (which encompasses the entire user workload unless customized WLM service classes are configured). In practice it may be more desirable to enable the collection for a custom defined workload or service class, as this allows the volume of data captured to be better managed. Once these steps are done, any statements executed in the default service class will be captured using the activity event monitor. If we are manually running the test we would run the workload now, otherwise we must wait for the event of interest to occur. Once the statement(s) of interest have completed we alter the sysdefaultsubclass to stop collecting activity data. If we do not intend to use the event monitor or section actuals further, we would also shut down the event monitor, and turn off the collection of actuals.The second approach we can use involves the case where you want to capture data for statements of interest that others are running on the database system, as opposed to statements you run directly on your own connection. This approach would be appropriate if you were troubleshooting a live system, or if you were testing a workload where it would be difficult to enable collection on an individual connection basis. The steps we?d take for this approach are the following: As before we must first enable the event monitor so that it will capture activity data The next step is to enable the collection of section actuals on the system. In this case we assume we do not know which connections will be running the statements of interest so we enable this for the entire database. The final step here is to enable collection of the appropriate activity data events ? once again we enable collection of data with details and with the section. Here we have enabled this collection for the entire sysdefaultuserclass for simplicity (which encompasses the entire user workload unless customized WLM service classes are configured). In practice it may be more desirable to enable the collection for a custom defined workload or service class, as this allows the volume of data captured to be better managed. Once these steps are done, any statements executed in the default service class will be captured using the activity event monitor. If we are manually running the test we would run the workload now, otherwise we must wait for the event of interest to occur. Once the statement(s) of interest have completed we alter the sysdefaultsubclass to stop collecting activity data. If we do not intend to use the event monitor or section actuals further, we would also shut down the event monitor, and turn off the collection of actuals.

51. Step III: Locating the activity of interest select a.appl_id, a.uow_id, a.activity_id, a.appl_name, s.executable_id, s.stmt_text from activity_actevmon as a, activitystmt_actevmon as s where a.appl_id = s.appl_id and a.uow_id = s.uow_id and a.activity_id = s.activity_id and s.stmt_text like 'select t1.ident, sum(t1.data)%' Having captured the statements of interest, we now need to identify the captured activity entries and their IDs in order to perform an explain and examine the section actuals. Depending on the scenario in question there are multiple ways we can locate the activity of interest. We will show two methods here. In our example above we select the unique set of activity identifiers, the application id, uow id, and activity id for all those entries in the activity tables that match the statement text for our statement of interest. In this particular case there is only one entry for the statement we captured, so we can note the identifiers for subsequent use in generating the explain. A variation on this approach would be to actually set a specific client info field such as the application name or accounting string when executing the query, and use the presence of those tags to find the query of interest. Having captured the statements of interest, we now need to identify the captured activity entries and their IDs in order to perform an explain and examine the section actuals. Depending on the scenario in question there are multiple ways we can locate the activity of interest. We will show two methods here. In our example above we select the unique set of activity identifiers, the application id, uow id, and activity id for all those entries in the activity tables that match the statement text for our statement of interest. In this particular case there is only one entry for the statement we captured, so we can note the identifiers for subsequent use in generating the explain. A variation on this approach would be to actually set a specific client info field such as the application name or accounting string when executing the query, and use the presence of those tags to find the query of interest.

52. Step III: Another approach select a.appl_id, a.uow_id, a.activity_id, a.appl_name, r.value as total_cpu_time, s.executable_id, s.stmt_text from activity_actevmon as a, activitystmt_actevmon as s, table(mon_format_xml_metrics_by_row(a.details_xml)) as r where a.appl_id = s.appl_id and a.uow_id = s.uow_id and a.activity_id = s.activity_id and r.metric_name='TOTAL_CPU_TIME' order by total_cpu_time desc fetch first 5 rows only An alternate way to approach our statement problem investigation is to capture a trace of the sql running on the system over a period of time using the activity event monitor, and then pinpoint the statements of interest using the associated captured metrics. Note that the activity event monitor captures all the time spent and related metrics we have previously been examining and stores them in XML form, so they can easily be accessed using the previously demonstrated row based formatters. In our example above we again select the unique set of activity identifiers, the application id, uow id, and activity id, but this time we are using the generic row based metrics formatter on the captured metrics XML document in order to obtain the TOTAL_CPU_TIME metric for each statement captured. We then rank by the CPU time and obtain the top 5 most expensive statements captured in terms of CPU consumed. Note that we also display the statement text as well as the executable id for illustrative purposes. Yet another way to locate our activity of interest would be to perform analysis to find our statement of interest using MON_GET_PKG_CACHE_STMT and, assuming we had captured activity history data over an appropriate period, use the executable id from our statement of interest to locate the captured activity entry (and many further variations exist).An alternate way to approach our statement problem investigation is to capture a trace of the sql running on the system over a period of time using the activity event monitor, and then pinpoint the statements of interest using the associated captured metrics. Note that the activity event monitor captures all the time spent and related metrics we have previously been examining and stores them in XML form, so they can easily be accessed using the previously demonstrated row based formatters. In our example above we again select the unique set of activity identifiers, the application id, uow id, and activity id, but this time we are using the generic row based metrics formatter on the captured metrics XML document in order to obtain the TOTAL_CPU_TIME metric for each statement captured. We then rank by the CPU time and obtain the top 5 most expensive statements captured in terms of CPU consumed. Note that we also display the statement text as well as the executable id for illustrative purposes. Yet another way to locate our activity of interest would be to perform analysis to find our statement of interest using MON_GET_PKG_CACHE_STMT and, assuming we had captured activity history data over an appropriate period, use the executable id from our statement of interest to locate the captured activity entry (and many further variations exist).

53. Step IV: Performing and Formatting the Explain from Section Now that we?ve captured and identified the activity of interest we can proceed to format it to obtain the explain output with section actuals. We have already created the explain tables so we can proceed to call the EXPLAIN_FROM_ACTIVITY procedure to generate the activity data. We pass in the activity identifiers we collected in the previous step as well as the name of the activity event monitor. The procedure will look up the event monitor tables and proceed to search for the relevant activity and section data. If any of the required data cannot be found an error will be returned. Assuming we performed all the previous steps properly the procedure will indicate that the formatting succeeded. The final step in the process is to use the db2exfmt tool to format the generated data in the explain tables into a human readable text report. We show the default options here required to format the most recent explain captured on our database (here we are just using the ?sample? database).Now that we?ve captured and identified the activity of interest we can proceed to format it to obtain the explain output with section actuals. We have already created the explain tables so we can proceed to call the EXPLAIN_FROM_ACTIVITY procedure to generate the activity data. We pass in the activity identifiers we collected in the previous step as well as the name of the activity event monitor. The procedure will look up the event monitor tables and proceed to search for the relevant activity and section data. If any of the required data cannot be found an error will be returned. Assuming we performed all the previous steps properly the procedure will indicate that the formatting succeeded. The final step in the process is to use the db2exfmt tool to format the generated data in the explain tables into a human readable text report. We show the default options here required to format the most recent explain captured on our database (here we are just using the ?sample? database).

54. Step V: Examining the Query Plan Graph Access Plan: ----------- Total Cost: 30.8779 Query Degree: 1 Rows Rows Actual RETURN ( 1) Cost I/O | 1 1 GRPBY ( 2) 30.8423 NA | There are several parts to the formatted explain which go beyond the scope of this presentation. For our purposes here we will simply focus on the plan graph itself. If you are familiar with the explain facility you will see that this appears to be a standard explain graph with one specific exception; in this case since this was an explain from section on an activity event monitor, and we had captured the section actuals, we actually see two numbers above each operator. The top number is the cardinality estimate, but there is an additional number underneath it (highlighted in blue here) which is the actual cardinality from our query execution. Looking at our specific example you can see that the actuals were quite far off from the estimates ? this is in no small part due to the fact that we created fresh tables for the example and inserted data without performing any runstats. ? In a production system however, discrepancies between actuals and estimates can provide you with important clues on inefficiencies affecting your query performance. Often these will point to stale table statistics or inaccuracies that can be rectified by altering the way table statistics are produced using runstats. Also of course, as with SQL Compiler Explain variant, the more general plan data can be used to identify opportunities for tuning that can point to things such as missing indexes and also inefficient sections of the plan due to certain query characteristics (many of which can be addressed by simply modifying the way the query is written). All said the topic of query tuning based on explain data is easily large enough to warrant its own presentation (or two), so I will not even attempt to do it justice in a single slide. Hopefully what we have accomplished though is to show you how this data can be captured and examined in order to provide additional ammunition when searching for query problems using explain.There are several parts to the formatted explain which go beyond the scope of this presentation. For our purposes here we will simply focus on the plan graph itself. If you are familiar with the explain facility you will see that this appears to be a standard explain graph with one specific exception; in this case since this was an explain from section on an activity event monitor, and we had captured the section actuals, we actually see two numbers above each operator. The top number is the cardinality estimate, but there is an additional number underneath it (highlighted in blue here) which is the actual cardinality from our query execution. Looking at our specific example you can see that the actuals were quite far off from the estimates ? this is in no small part due to the fact that we created fresh tables for the example and inserted data without performing any runstats. ? In a production system however, discrepancies between actuals and estimates can provide you with important clues on inefficiencies affecting your query performance. Often these will point to stale table statistics or inaccuracies that can be rectified by altering the way table statistics are produced using runstats. Also of course, as with SQL Compiler Explain variant, the more general plan data can be used to identify opportunities for tuning that can point to things such as missing indexes and also inefficient sections of the plan due to certain query characteristics (many of which can be addressed by simply modifying the way the query is written). All said the topic of query tuning based on explain data is easily large enough to warrant its own presentation (or two), so I will not even attempt to do it justice in a single slide. Hopefully what we have accomplished though is to show you how this data can be captured and examined in order to provide additional ammunition when searching for query problems using explain.

55. Final Thoughts

56. In Closing We?ve introduced you to some of the new monitoring capabilities introduced in DB2 9.7 and how you can leverage these to help diagnose and solve query performance problems The key thing to keep in mind is that performance analysis is as much an art as it is a science: There?s never a single hard and fast ?best? way to approach performance problems Different problems may be best attacked from the top-down, bottom-up, or even sideways ? It should be fair to say that generally it?s desirable to first isolate and resolve system level problems before trying to analyze individual queries, but people will find different approaches work best depending on their environment Hopefully the examples in this presentation have given you some ideas on ways you can leverage the latest DB2 Monitoring capabilities in your own environment In closing I hope this presentation has given you some ideas on how to leverage some of the new DB2 diagnostics for analyzing SQL performance to solve problems in your system. As always there is no hard and fast rule when approaching performance problems, but the methods outlined in this presentation provide approaches that will yield useful query performance information in a wide variety of contexts.In closing I hope this presentation has given you some ideas on how to leverage some of the new DB2 diagnostics for analyzing SQL performance to solve problems in your system. As always there is no hard and fast rule when approaching performance problems, but the methods outlined in this presentation provide approaches that will yield useful query performance information in a wide variety of contexts.

57. Questions?

58. DB2 Monitoring Resources

59. DB2 Monitoring Resources DB2 9.7 documentation: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp Related IDUG NA 2011 presentations: New Ways to Solve Your Locking Problems with DB2 9.7 (C06)

60. David Kalmuk IBM dckalmuk@ca.ibm.com Advanced Performance Diagnostics for SQL in DB2 9.7 David Kalmuk is a technical lead and architect for the DB2 for LUW product, specializing in the process model, workload management, and monitoring areas. David has played a lead role in the design and architecture of numerous technologies in DB2 over the years including the new Performance Monitoring features in DB2 9.7 and PureScale, the threaded engine, DB2's cross node messaging architecture for partitioned environments, and the connection concentrator. Most recently he has been leading the development of new Workload Management features and technologies. David has been a member of the DB2 team since 2000.David Kalmuk is a technical lead and architect for the DB2 for LUW product, specializing in the process model, workload management, and monitoring areas. David has played a lead role in the design and architecture of numerous technologies in DB2 over the years including the new Performance Monitoring features in DB2 9.7 and PureScale, the threaded engine, DB2's cross node messaging architecture for partitioned environments, and the connection concentrator. Most recently he has been leading the development of new Workload Management features and technologies. David has been a member of the DB2 team since 2000.


Other Related Presentations

Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro