Download

Advanced Performance Diagnostics for SQL in DB2 9.7






Advertisement
/ 60 []
Download Presentation
Comments
bernad
From:
|  
(4874) |   (0) |   (0)
Views: 300 | Added:
Rate Presentation: 0 0
Description:
Advanced Performance Diagnostics for SQL in DB2 9.7. David Kalmuk IBM Platform: DB2 for Linux, Unix, Windows. Objectives. Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problem
Advanced Performance Diagnostics for SQL in DB2 9.7

An Image/Link below is provided (as is) to

Download Policy: Content on the Website is provided to you AS IS for your information and personal use only and may not be sold or licensed nor shared on other sites. SlideServe reserves the right to change this policy at anytime. While downloading, If for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.











- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -




Slide 1

Advanced Performance Diagnostics for SQL in DB2 9.7

David Kalmuk

IBM

Platform: DB2 for Linux, Unix, Windows

Slide 2

Objectives

  • Learn how to leverage the latest DB2 performance monitoring features and time spent metrics to identify that you have an SQL problem

  • Learn how to pinpoint your most expensive statements using the package cache table functions

  • Learn about the new runtime explain capabilities and how you can perform an explain from the actual compiled section in the catalogs or the package cache to understand the query plan.

  • Learn how to leverage the new section actuals explain capabilities to examine the actual cardinalities from the execution of your problem queries

  • Take away practical examples you can try out in your own environment.

Slide 3

Agenda

  • A quick review of the new Monitoring Capabilities introduced in DB2 9.7

  • Time Spent Metrics

  • Identifying query related performance problems

  • Identifying high impact queries

  • Analyzing queries using Time Spent

  • Advanced diagnostics using Runtime Explain and Section Actuals

  • Final thoughts

Slide 4

A Quick Review of the New Performance Monitoring Capabilities Introduced in DB2 9.7

Slide 5

New Lightweight Monitoring Functions

  • New set of MON_* SQL functions introduced starting in DB2 9.7

  • Less impact / more efficient then snapshot functions

    • Direct in-memory access through trusted routines (not fenced wrappers over snapshot)

    • Much less latch contention

      • Uses new infrastructure that pushes data up to accumulation points rather than forcing monitor queries to do extensive drilldown

    • Lower CPU consumption

    • Significantly faster response time

    • Less FCM resource usage / internode traffic

  • Monitoring data collection carries low overhead – is enabled by default on new databases

Slide 6

New Monitoring Perspectives and Dimensions

  • Starting in 9.7, DB2 allows monitoring metrics to be accessed through a number of different dimensions

  • Allows more effective drilldown, and different perspectives on the data to help isolate problems

  • Three main dimensions, each consisting of a number of reporting points with corresponding SQL functions

  • System

    • Provide total perspective of application work being done by database system

    • Aggregated through the WLM infrastructure

  • Data objects

    • Provide perspective of impact of all activity occurring with the scope of data objects

    • Aggregated through data storage infrastructure

  • Activity

    • Provide perspective of work being done by specific SQL statements

    • Aggregated through the package cache infrastructure

Slide 7

Connection

∑R

Service Class

∑R

Service Class

∑R

Workload

Occurrence (UOW) ∑R

Workload

Definition

∑R

Request Metrics

DB2 Agent

Collects Data

Database

Request

In-Memory Metrics: System Perspective

Legend

ΣR = Accumulation of request metrics collected by agent

Slide 8

Access Points: System Perspective

  • MON_GET_UNIT_OF_WORK

  • MON_GET_WORKLOAD

  • MON_GET_CONNECTION

  • MON_GET_SERVICE_SUBCLASS

  • Also provide interfaces that produce XML output:

    • MON_GET_UNIT_OF_WORK_DETAILS

    • MON_GET_WORKLOAD_DETAILS

    • MON_GET_CONNECTION_DETAILS

    • MON_GET_SERVICE_SUBCLASS_DETAILS

Slide 9

Bufferpool

Metrics

Bufferpool

Container

Metrics

Temp

Tablespace

Tablespace

Metrics

Tablespace

Tablespace

Tablespace

Container

Row Data

LOB /Data

Index

Table

XML Data

Table

Metrics

Database

Request

DB2 Agent

Collects Data

In-Memory Metrics: Data Object Perspective

Slide 10

Access Points: Data Object Perspective

  • MON_GET_TABLE

  • MON_GET_INDEX

  • MON_GET_BUFFERPOOL

  • MON_GET_TABLESPACE

  • MON_GET_CONTAINER

  • MON_GET_EXTENT_MOVEMENT_STATUS

  • MON_GET_APPL_LOCKWAIT [9.7FP1]

  • MON_GET_LOCKS [9.7FP1]

  • MON_GET_FCM [9.7FP2]

  • MON_GET_FCM_CONNECTION_LIST [9.7FP2]

Slide 11

In-Memory Metrics: Activity Perspective

Package Cache

∑A

Activity

Metrics

WLM Activity

∑A

Activity Level

DB2 Agent

Collects Data

Database

Request

Legend

ΣA = Accumulation of metrics from activity execution portion of request

Slide 12

Access Points: Activity Perspective

  • MON_GET_PKG_CACHE_STMT

    • Both static and dynamic SQL

  • MON_GET_PKG_CACHE_STMT_DETAILS

    • XML based output

  • MON_GET_ACTIVITY_DETAILS (XML)

    • Details for an activity currently in progress

Slide 13

Time Spent Metrics

Slide 14

Time Spent Metrics

  • A new set of metrics are being introduced into DB2 that represent a breakdown of where time is spent within DB2

    • Represents sum of time spent by each agent thread in the system (foreground processing)

    • Provides user with a relative breakdown of time spent, showing which areas are the most expensive during request / query processing

    • Available in both system and activity perspectives

      • This presentation will focus on analysis from the activity perspective

    • Can be used for rapid identification and diagnosis of performance problems

  • Times are divided into:

    • Wait times

      • Time agent threads spend blocking on I/O, network communications, etc

    • Processing times (starting in 9.7FP1)

      • Time spent processing in different component areas when the agent was not stuck on a wait

    • Summary / total times

      • Total time spent in a particular component area including both processing + wait times

Slide 15

“Time Spent” Metrics: Breakdown of Wait + Processing Times in DB2

Slide 16

Activity Time Spent Hierarchy

  • “Time spent” metrics are mutually exclusive and in aggregate form a hierarchy (shown below) that breaks down the time spent executing queries in the database server on behalf of the client. Below we show the hierarchy for the activity perspective.

STMT_EXEC_TIME

TOTAL__ACT_WAIT_TIME

LOCK_WAIT_TIME

LOG_BUFFER_WAIT_TIME

LOG_DISK_WAIT_TIME

FCM_SEND/RECV_WAIT_TIME

FCM_MESSAGE_SEND/RECV_WAIT_TIME

FCM_TQ_SEND/RECV_WAIT_TIME

DIAGLOG_WRITE_WAIT_TIME

POOL_READ/WRITE_TIME

DIRECT_READ/WRITE_TIME

AUDIT_SUBSYSTEM_WAIT_TIME

AUDIT_FILE_WRITE_TIME

TOTAL_SECTION_PROC_TIME

TOTAL_SECTION_SORT_PROC_TIME

TOTAL_ROUTINE_NON_SECT_PROC_TIME

TOTAL_ROUTINE_USER_CODE_PROC_TIME

(Other – nested section execution)

Time performing query plan execution

Slide 17

Navigating the “time spent” hierarchy

  • The row based formatting functions introduced in 9.7 FP1 offer an easy way to navigate the time spent hierarchy in a generic fashion

    • MON_FORMAT_XML_TIMES_BY_ROW

      • Shows breakdown of waits + processing times

  • MON_FORMAT_XML_WAIT_TIMES_BY_ROW

    • Shows breakdown of just wait times

  • MON_FORMAT_XML_COMPONENT_TIMES_BY_ROW

    • Shows breakdown of processing time as well as overall time spent in each “component” of DB2

  • MON_FORMAT_XML_METRICS_BY_ROW

    • Outputs all metrics in generic row based form

  • Slide 18

    Example

    Show me the full hierarchy of waits + processing times for a particular statement

    select r.metric_name, r.parent_metric_name,

    r.total_time_value as time, r.count, p.member, s.stmt_text as stmt

    from table(mon_get_pkg_cache_stmt_details(‘S',null,null,-2)) as p

    table(mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s,

    table(mon_format_xml_times_by_row(p.details)) as r,

    where s.stmt_text like ‘UPDATE stock%‘

    order by total_time_value desc

    METRIC_NAME PARENT_METRIC_NAME TIME COUNT MEMBER STMT

    ------------------------- -------------------- ----------- --------- ------ --------------------

    TOTAL_SECTION_PROC_TIME STMT_EXEC_TIME 1030980 2134223 0 UPDATE stock…

    STMT_EXEC_TIME - 1030980 2134223 0 UPDATE stock…

    WLM_QUEUE_TIME_TOTAL - 0 0 0 UPDATE stock…

    LOCK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… DIRECT_READ_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… DIRECT_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… LOG_BUFFER_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… LOG_DISK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… POOL_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… POOL_READ_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… AUDIT_FILE_WRITE_WAIT_TIM TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… AUDIT_SUBSYSTEM_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… DIAGLOG_WRITE_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… FCM_SEND_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… FCM_RECV_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock… TOTAL_ACT_WAIT_TIME STMT_EXEC_TIME 0 - 0 UPDATE stock…

    Slide 19

    Example 2

    Show me the just the wait times perspective

    select r.metric_name, r.parent_metric_name,

    r.total_time_value as time, r.count, p.member, s.stmt_text as stmt

    from table(mon_get_pkg_cache_stmt_details(‘S',null,null,-2)) as p

    table(mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s,

    table(mon_format_xml_wait_times_by_row(p.details)) as r,

    where s.stmt_text like ‘UPDATE stock%‘

    order by total_time_value desc

    METRIC_NAME PARENT_METRIC_NAME TIME COUNT MEMBER STMT

    ------------------------- -------------------- ----------- --------- ------ --------------------

    WLM_QUEUE_TIME_TOTAL - 0 0 0 UPDATE stock LOCK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock DIRECT_READ_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock DIRECT_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock LOG_BUFFER_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock LOG_DISK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock POOL_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock POOL_READ_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock AUDIT_FILE_WRITE_WAIT_TIM TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock AUDIT_SUBSYSTEM_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock DIAGLOG_WRITE_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock FCM_SEND_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock FCM_RECV_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0 UPDATE stock TOTAL_ACT_WAIT_TIME STMT_EXEC_TIME 0 - 0 UPDATE stock …

    Slide 20

    Example 3

    Show me the just the times for each individual component area

    select r.metric_name, r.parent_metric_name as parent_metric,

    r.total_time_value as total_time,r.proc_metric_name,

    r.parent_proc_metric_name as parent_proc_metric,

    r.proc_time_value as proc_time, r.count, p.member,

    s.stmt_text as stmt

    from table(mon_get_pkg_cache_stmt_details('S',null,null,-2)) as p,

    table (mon_get_pkg_cache_stmt(null,p.executable_id,null,-2)) as s,

    table(mon_format_xml_component_times_by_row(p.details)) as r

    where s.stmt_text like 'UPDATE stock%'

    order by total_time_value desc

    METRIC_NAME PARENT_METRIC TOTAL_TIME PROC_METRIC_NAME PARENT_PROC_METRIC PROC_TIME COUNT MEMBER STMT

    -------------------- ---------------- ---------- -------------------- ------------------ --------- ------ ------ -------

    STMT_EXEC_TIME - 1031684 - - - 0 0 UPDATE…

    TOTAL_SECTION_TIME TOTAL_RQST_TIME 1031684 TOTAL_SECTION_PROC TOTAL_RQST_TIME 1031684 2134223 0 UPDATE…

    TOTAL_SECTION_SORT_T TOTAL_SECTION_TIME 0 TOTAL_SECTION_SORT_P TOTAL_SECTION_PROC 0 0 0 UPDATE…

    TOTAL_ROUTINE_NON_SE TOTAL_ROUTINE_TIME 0 TOTAL_ROUTINE_NON_SE TOTAL_ROUTINE_TIME 0 0 0 UPDATE…

    TOTAL_ROUTINE_TIME STMT_EXEC_TIME 0 - - - 0 0 UPDATE…

    TOTAL_ROUTINE_USER_C TOTAL_ROUTINE_NON_S 0 TOTAL_ROUTINE_USER_C TOTAL_ROUTINE_NON_S 0 0 0 UPDATE…

    Slide 21

    Identifying Query Related Performance Problems

    Slide 22

    Identifying Query Related Performance Problems

    • Generally we will want to try to identify and address any system level bottlenecks / non query related issues first before proceeding to more granular investigation at the statement or query level

    • We will look at some initial indicators here to help rule out broad system-wide or non query related problems

    • This will help us narrow down whether the performance issues we are observing are related to statement execution or plan issues specifically rather than other system bottlenecks, and that tuning individual queries will give us some bang for our buck

    • The following examples will show some of the basic indicators that may be useful in helping to make this determination

    Slide 23

    “How much time on the database is spent doing query processing?”

    select sum(total_rqst_time) as rqst_time,

    sum(total_act_time) as act_time,

    sum(total_section_time) as sect_time,

    (case when sum(total_rqst_time) > 0

    then (sum(total_act_time) * 100) / sum(total_rqst_time)

    else 0 end) as act_pct

    (case when sum(total_rqst_time) > 0

    then (sum(total_section_time) * 100) / sum(total_rqst_time)

    else 0 end) as sect_pct

    from table(mon_get_connection(null,-2)) as t

    Added some percentages for readability

    About 46% of time spent executing query plans

    RQST_TIME ACT_TIME SECT_TIME ACT_PCT SECT_PCT

    ----------------- ----------------- ----------------- ----------------- -----------------

    18188462 8419265 8419265 46 46

    Slide 24

    “Are there any system-wide bottlenecks?”

    • create view dbtimemetrics (metric_name, parent_metric_name, total_time_value, count)

      • as select u.metric_name, u.parent_metric_name, sum(u.total_time_value), sum(u.count)

      • from table(mon_get_service_subclass_details(null,null,-2)) as t,

      • table(mon_format_xml_times_by_row(t.details)) as u

      • group by metric_name, parent_metric_name

    • create global temporary table timesamples

    • as (select * from dbtimemetrics) definition only on commit delete rows

    • create view dbtimedelta (metric_name, parent_metric_name, total_time_value, count)

    • as select t2.metric_name, t2.parent_metric_name,

    • t2.total_time_value - t1.total_time_value, t2.count - t1.count

    • from dbtimemetrics as t2, timesamples as t1

    • where t2.metric_name = t1.metric_name

    A bit of scripting / setup to help obtain deltas

    Slide 25

    (continued)

    Get wait times on database sampled over 1 minute

    insert into timesamples select * from dbtimemetrics

    <sleep for 60s sampling period>

    select * from dbtimedelta order by total_time_value desc

    fetch first 5 rows only

    (Be sure to run with auto-commit disabled when using CLP)

    METRIC_NAME PARENT_METRIC_NAME TOTAL_TIME_VALUE COUNT MEMBER

    ------------------------- ------------------------- -------------------- -------------------- ------

    TOTAL_RQST_TIME - 14694008 31778866 0

    TOTAL_WAIT_TIME TOTAL_RQST_TIME 7471599 - 0

    CLIENT_IDLE_WAIT_TIME - 7230313 - 0

    TOTAL_SECTION_PROC_TIME TOTAL_RQST_TIME 7220409 29277233 0 LOG_DISK_WAIT_TIME TOTAL_WAIT_TIME 7001599 103341 0

    Slide 26

    Identifying High Impact / Expensive Queries

    Slide 27

    Identifying Problem Queries

    • Once we’ve ruled out general system-wide bottlenecks that might be impacting our performance we’ll want to identify the highest impact / worst performing queries on the system as possible candidates for tuning

    • Time spent metrics can be used to drill down in a number of different useful ways in order to help identify high impact queries.

    • Among the many things we can look at are:

      • Queries with the most execution time in server

      • Queries with the worst velocity in terms of processing vs wait

      • CPU intensive queries

      • Least efficient query plans

    • The following examples will show you several ways you can leverage the time spent metrics to drill down and identify your problem queries

    Slide 28

    Finding High Impact Queries

    Top 5 queries by statement execution time in server

    select stmt_exec_time, num_executions, stmt_text

    from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as s

    order by stmt_exec_time desc fetch first 5 rows only

    Statement with most execution time in the server

    STMT_EXEC_TIME NUM_EXECUTIONS STMT

    -------------------- -------------------- -----------------------------------------------

    3951764 2218111 SELECT s_quantity, s_dist_01, s_dist_02, …

    902078 195866 SELECT c_balance, c_delivery_cnt …

    619547 212999 DECLARE CUST_CURSOR1 CURSOR FOR SELEC …

    480681 221873 SELECT w_tax, c_discount, c_last, c_credit …

    441494 20124 SELECT count(distinct S_I_ID) INTO :H …

    Slide 29

    More High Impact Queries

    Top 5 most CPU intensive queries

    select stmt_exec_time, num_executions,

    (total_cpu_time / 1000) as cpu_time,

    stmt_text

    from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as s

    order by cpu_time desc fetch first 5 rows only

    select stmt_exec_time, num_executions,

    (pool_read_time + pool_write_time +

    direct_read_time + direct_write_time) as io_time

    from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as t

    order by io_time desc fetch first 5 rows only

    Top 5 most I/O intensive queries

    Slide 30

    Queries with the Worst Relative Velocity

    Relative velocity shows the degree to which progress of the query is impacted by waits

    select total_act_time, total_act_wait_time,

    (case when total_act_time > 0

    then ((total_act_time – total_act_wait_time) * 100

    / total_act_time)

    else 100

    end) as relvelocity,

    stmt_text

    from table (mon_get_pkg_cache_stmt(null,null,null,-2)) as t

    order by relvelocity fetch first 5 rows only

    Compute percentage of query time where we’re processing

    Majority of query time spent in waits!

    TOTAL_ACT_TIME TOTAL_ACT_WAIT_TIME RELVELOCITY STMT_TEXT

    -------------------- -------------------- -------------------- --------------------

    1481597 1457690 1 DECLARE READ_ORDERLI …

    228 223 2 create view dbtimeme …

    28 27 3 alter table activity …

    30 29 3 create event monitor …

    35 33 5 create event monitor …

    Slide 31

    Queries with the Least Efficient Plans

    This query shows us how much data we processed to produce a single row of results

    select rows_returned, rows_read,

    (case when rows_returned > 0

    then rows_read / rows_returned

    else 0

    end) as ratio,

    stmt_text as stmt

    from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as p

    order by ratio desc

    fetch first 10 rows only

    Ratio of rows read to rows returned

    ROWS_RETURNED ROWS_READ RATIO STMT

    -------------------- -------------------- -------------------- -------------------------

    2 11137814 5568907 select count(*) from acti…

    1 5568907 5568907 select min(time_completed

    3 9 3 select * from syscat.WORK…

    9 9 1 select substr(serviceclas…

    9 9 1 select * from dbtimedelta…

    2843729 2843729 1 DECLARE CUST_CURSOR1 CURS…

    2843729 2843729 1 SELECT w_street_1, w_stre…

    29599464 29599528 1 SELECT s_quantity, s_dist…

    0 14 0 alter table control drop…

    0 13 0 create view dbtimemetrics…

    Slide 32

    The “Executable ID”

    • In order to facilitate tracking of individual statements, DB2 9.7 introduced the new concept of an “executable id”

      • Short identifier that uniquely identifies a section in the package cache (not an individual execution, but an individual statement in a particular compilation environment)

      • Can be obtained from activity monitoring interfaces such as MON_GET_PKG_CACHE_STMT / MON_GET_PKG_CACHE_STMT_DETAILS, MON_GET_ACTIVITY_DETAILS or the package cache and activity event monitors

      • Can be subsequently used as input to the package cache interfaces to retrieve data for that particular section without the statement text

    • In the next section we will leverage the executable ID that uniquely identifies our statements of interest in order to perform further drilldown

    Slide 33

    Analyzing Individual Statements Using Time Spent

    Slide 34

    Analyzing Individual Queries Using Time Spent

    • Once we have pinpointed our statements of interest, our next step is to drill down into these individual statements to understand where they are spending their time

    • By understanding where the time is being spent in the query we can identify where the database server is spending effort, and look for opportunities for tuning

    • Using the executable id from problem statements identified in the previous section we can now perform an analysis of the time spent breakdown for those statements

    Slide 35

    “Where is my time being spent?”

    Show me the full hierarchy of waits + processing times for a particular statement

    select p.executable_id, r.metric_name, r.parent_metric_name,

    r.total_time_value as time, r.count, p.member

    from

    (select stmt_exec_time, executable_id

    from table(mon_get_pkg_cache_stmt(null,null,null,-2)) as s

    order by stmt_exec_time desc fetch first row only) as stmts,

    table(mon_get_pkg_cache_stmt_details(null,

    stmts.executable_id,

    null,

    -2)) as p,

    table(mon_format_xml_times_by_row(p.details)) as r

    order by stmts.executable_id, total_time_value desc

    Find statement with most time in server

    Executable ID for our statement(s) of interest

    Format XML details to produce row based format for time spent metrics

    Slide 36

    (continued)

    Row based breakdown of where time is being spent

    EXEC_ID METRIC_NAME PARENT_METRIC_NAME TIME COUNT MEMBER

    ------------ -------------------------------- ---------------------- ------- ------ ------

    x'00000001…' STMT_EXEC_TIME - 6676617 110191 0

    x'00000001…' TOTAL_ROUTINE_NON_SECT_PROC_TIME TOTAL_ROUTINE_TIME 6008956 110191 0

    x'00000001…' TOTAL_ROUTINE_USER_CODE_PROC_TIME TOTAL_ROUTINE_NON_S 6008956 110191 0

    x'00000001…' POOL_READ_TIME TOTAL_ACT_WAIT_TIME 372754 52135 0

    x'00000001…' TOTAL_ACT_WAIT_TIME STMT_EXEC_TIME 372754 - 0

    x'00000001…' TOTAL_SECTION_PROC_TIME STMT_EXEC_TIME 294907 0 0

    x'00000001…' WLM_QUEUE_TIME_TOTAL - 0 0 0

    x'00000001…' FCM_TQ_RECV_WAIT_TIME FCM_RECV_WAIT_TIME 0 0 0

    x'00000001…' FCM_MESSAGE_RECV_WAIT_TIME FCM_RECV_WAIT_TIME 0 0 0

    x'00000001…' FCM_TQ_SEND_WAIT_TIME FCM_SEND_WAIT_TIME 0 0 0

    x'00000001…' FCM_MESSAGE_SEND_WAIT FCM_SEND_WAIT_TIME 0 0 0

    x'00000001…' LOCK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' DIRECT_READ_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' DIRECT_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…‘ LOG_BUFFER_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' LOG_DISK_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' POOL_WRITE_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' AUDIT_FILE_WRITE_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' AUDIT_SUBSYSTEM_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' DIAGLOG_WRITE_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' FCM_SEND_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' FCM_RECV_WAIT_TIME TOTAL_ACT_WAIT_TIME 0 0 0

    x'00000001…' TOTAL_SECTION_SORT_PRO TOTAL_SECTION_PROC_T 0 0 0

    Slide 37

    Understanding the Time Metrics: Wait Times

    • Wait Times represent the time an agent spends blocking / waiting on particular subsystems

    • Large % wait times in a particular subsystem may indicate is a bottleneck, and is preventing agents working on statement processing from making optimal progress

    • In some cases the presence of large wait times may indicate a potential query problem

      • For example, larger than expected bufferpool read time resulting from unexpected table scan activity might indicate a missing index rather than an I/O problem

      • It might also be an indication of a bad disk though 

    • In other cases just the presence of a wait time on its own may indicate a problem

      • For example, direct reads / writes when we are expecting LOBs to be inlined, or FCM TQ waits for a query that is expected to be co-located on a partitioned system would directly indicate a problem

    Slide 38

    Wait Times (Activity)

    Slide 39

    More Wait Times (Activity)

    Slide 40

    Understanding the Time Metrics: Processing / Component Times

    • Processing / Component Times represent the time an agent thread spends processing / consuming CPU in a particular component area of DB2

      • Broken down into two pieces - the processing portion of in the component as well as the total time in the component

    • Large % processing times in a particular component subsystem indicates that most of our processing is taking place there

      • Whether this is an issue depends on what the component is

        • For example, large times in sort might be expected in a particular query, whereas their presence in others might indicate a problem

    Slide 41

    Processing/Component Times (Activity)

    Slide 42

    Other Important Time Metrics

    • A few time (and time related) metrics that bear mention here:

      • STMT_EXEC_TIME

        • The total time spent executing a statement (included the time spent in any nested invocations)

      • TOTAL_ACT_WAIT_TIME

        • The total time spent blocking on waits while executing this particular section (does not include wait times for nested invocations however which are reported in the section entry for those individual statements)

      • TOTAL_ROUTINE_TIME

        • The total amount of time the statement spent executing within routines (UDFs or Stored Procedures)

      • TOTAL_CPU_TIME

        • Not a metric in the actual “time spent” hierarchy, but generally a critical time-related metric

    Slide 43

    Advanced Diagnostics Using Runtime Explain and Section Actuals

    Slide 44

    Introducing Runtime Explain

    • There may be cases when a more detailed analysis of query execution is required than can be provided with basic monitoring metrics such as time spent

    • In these cases the tool we typically turn to is the EXPLAIN feature of DB2 – which herein we will refer to as the SQL Compiler EXPLAIN

      • This capability compiles an input SQL statement and allows you to format and view the query plan

      • Expected to be a generally accurate approximation of the query you actually ran

      • May differ due to differences in compilation environment and/or table statistics from when your query was compiled

    • An exciting new feature introduced in DB2 9.7 FP1 is the ability to perform a “runtime explain” (or “Explain From Section”) which produces explain output directly from a compiled query plan (or section) in the engine

    • Allows you to obtain the plan from the actual section you are executing

    Slide 45

    Explain from Section Procedures

    • A set of stored procedures provided in DB2 9.7 FP1 allow you to explain a runtime section into the explain tables

      • EXPLAIN_FROM_CATALOG

      • EXPLAIN_FROM_SECTION

      • EXPLAIN_FROM_ACTIVITY

      • EXPLAIN_FROM_DATA

    • Explain table content can then be processed using any existing explain tools (eg. db2exfmt)

    • Explain output can be generated from the following sources:

      • Static or dynamic statement entries in the package cache

      • Any cache entry (DETAILED) captured by the new package cache event monitor

      • Static statement from the catalog tables

      • Statement execution captured with section by the activity event monitor

    Slide 46

    Section Actuals

    • One of the key benefits of the explain from section capability is the ability to capture and format “section actuals”

      • All Explain output will contain cardinality estimates for individual operators in the plan

      • When a section based explain is performed on a statement captured by the activity event monitor we can also capture the actual cardinalities for each operator for that execution

    • Examining this data gives you an indication of what actually happened during the query execution, and whether the estimates the optimizer used in generating the query plan were accurate

    • In order to examine the actuals we will need to capture the execution of our SQL statement of interest using the activity event monitor

    Slide 47

    Capturing Activities to Obtain Actuals

    • The activity event monitor in DB2 allows the capture of execution details for individual SQL statements as well as several other recognized activities (eg. Load)

    • It can be configured to capture a variety of different metrics as well as the section data

    • Starting in DB2 9.7 FP1 the section data captured by the activity event monitor has been enhanced to now include section actuals

    • Since the capture of individual activities is quite granular we offer a fair degree of flexibility allowing the following data capture options:

      • Capture data for all activities running in a particular WLM workload

      • Capture data for all activities running in a particular WLM service class

      • Capture data for activities that violate a particular WLM threshold

    • We can also enable the capture of activities run by a specific application using the WLM_SET_CONN_ENV procedure first introduced in DB2 9.7 FP2

    • Our final example will demonstrate how to capture a statement of interest using the activity event monitor and then obtain the section actuals using the new explain capabilities in DB2 9.7 FP1

    Slide 48

    Step I: Required Setup Steps

    Create the explain tables…

    create event monitor actEvmon for activities write to table

    activity ( table activity, in monitorTBS),

    activityvals ( table activityvals, in monitorTBS ),

    activitystmt ( table activitystmt, in monitorTBS ),

    control ( table control, in monitorTBS )

    manualstart

    Execute ~/sqllib/misc/EXPLAIN.DDL

    Create the activity event monitor

    Slide 49

    Step II: Capturing the Activity Data

    Enable the event monitor and setup to capture a statement on my connection

    set event monitor actEvmon state 1

    call wlm_set_conn_env(null,

    '<collectactdata>WITH DETAILS, SECTION</collectactdata>

    <collectactpartition>ALL</collectactpartition>

    <collectsectionactuals>BASE</collectsectionactuals>')

    select t1.ident, sum(t1.data) as data,

    sum(t2.moredata) as moredata

    from t1,t2

    where t1.ident=t2.ident

    group by t1.ident

    Execute the statement I’m interested in

    Disable collection and the event monitoring once I am done

    call wlm_set_conn_env(null, '<collectactdata>NONE</collectactdata>

    <collectsectionactuals>BASE</collectsectionactuals>')

    set event monitor actEvmon state=0

    Slide 50

    Step II: Another approach

    Enable the event monitor on the default subclass, and collect details and section data

    set event monitor actEvmon state 1

    update db cfg using section_actuals base

    alter service class sysdefaultsubclass under sysdefaultuserclass

    collect activity data on all database partitions with details,section

    ( Queries of interest run and are captured… )

    Disable the event monitor once I am done

    alter service class sysdefaultsubclass under sysdefaultuserclass

    collect activity data none

    update db cfg using section_actuals none

    set event monitor actEvmon state 0

    Slide 51

    Step III: Locating the activity of interest

    Show me the executions captured for a particular statement

    select a.appl_id, a.uow_id, a.activity_id, a.appl_name,

    s.executable_id, s.stmt_text

    from activity_actevmon as a,

    activitystmt_actevmon as s

    where a.appl_id = s.appl_id and

    a.uow_id = s.uow_id and

    a.activity_id = s.activity_id and

    s.stmt_text like 'select t1.ident, sum(t1.data)%'

    Identifiers for the activity

    APPL_ID UOW_ID ACTIVITY_ID EXECUTABLE_ID STMT_TEXT APPL_NAME

    ------------------------- ------ ------------ --------------- -------------------------- ---------- *LOCAL.davek.100917004844 62 1 x'010000…1E00' select t1.ident, sum(t1.d… db2bp

    Slide 52

    Step III: Another approach

    Find the captured activities with the largest CPU time

    select a.appl_id, a.uow_id, a.activity_id, a.appl_name, r.value as total_cpu_time, s.executable_id, s.stmt_text

    from

    activity_actevmon as a,

    activitystmt_actevmon as s,

    table(mon_format_xml_metrics_by_row(a.details_xml)) as r

    where a.appl_id = s.appl_id and

    a.uow_id = s.uow_id and

    a.activity_id = s.activity_id and

    r.metric_name='TOTAL_CPU_TIME'

    order by total_cpu_time desc fetch first 5 rows only

    Statement executable id

    Identifiers for the activity

    APPL_ID UOW_ID ACTIVITY_ID APPL_NAME TOTAL_CPU_TIME EXECUTABLE_ID STMT_TEXT

    ------------------------- ------ ----------- --------- -------------- ------------- -------------------------

    *LOCAL.davek.100917004844 62 1 db2bp 30500 x'0100…01E00' select t1.ident, sum(t1.d …

    *LOCAL.davek.100917004844 64 1 db2bp 5360 x'0100…00900' CALL wlm_set_conn_env(?,? …

    *LOCAL.davek.100917001050 105 1 db2bp 4603 x'0100…04A00' CALL wlm_set_conn_env(?,? …

    *LOCAL.davek.100919015109 20 1 db2bp 444 x'0100…05000' SELECT TABNAME, TABSCHEMA …

    *LOCAL.davek.100919015109 25 1 db2bp 406 x'0100…05000' SELECT TABNAME, TABSCHEMA …

    Slide 53

    Step IV: Performing and Formatting the Explain from Section

    Identifiers for the activity

    Perform an explain on the activity of interest…

    call explain_from_activity(‘*LOCAL.davek.100715194643’,

    85,1, ’ACTEVMON’, null,

    ?,?,?,?,?)

    Now format the most recent data in the explain tables to a readable text file

    db2exfmt -d sample -w -1 -n %% -# 0 -s %% -o explain.txt

    Slide 54

    Step V: Examining the Query Plan Graph

    |

    3.584

    160

    HSJOIN

    ( 3)

    30.7964

    NA

    /-----+------\

    4.48 0.8

    80 2

    TBSCAN TBSCAN

    ( 4) ( 5)

    19.3608 11.3862

    NA NA

    | |

    112 20

    NA NA

    TABLE: DAVEK TABLE: DAVEK

    T2 T1

    Access Plan:

    -----------

    Total Cost: 30.8779

    Query Degree: 1

    Rows

    Rows Actual

    RETURN

    ( 1)

    Cost

    I/O

    |

    1

    1

    GRPBY

    ( 2)

    30.8423

    NA

    |

    Estimates vs Actuals

    Slide 55

    Final Thoughts

    Slide 56

    In Closing

    • We’ve introduced you to some of the new monitoring capabilities introduced in DB2 9.7 and how you can leverage these to help diagnose and solve query performance problems

    • The key thing to keep in mind is that performance analysis is as much an art as it is a science:

      • There’s never a single hard and fast “best” way to approach performance problems

      • Different problems may be best attacked from the top-down, bottom-up, or even sideways 

      • It should be fair to say that generally it’s desirable to first isolate and resolve system level problems before trying to analyze individual queries, but people will find different approaches work best depending on their environment

      • Hopefully the examples in this presentation have given you some ideas on ways you can leverage the latest DB2 Monitoring capabilities in your own environment

    Slide 57

    Questions?

    Slide 58

    DB2 Monitoring Resources

    Slide 59

    DB2 Monitoring Resources

    • DB2 9.7 documentation:

      • http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp

    • Related IDUG NA 2011 presentations:

      • New Ways to Solve Your Locking Problems with DB2 9.7 (C06)

    Slide 60

    David KalmukIBMdckalmuk@ca.ibm.com

    Advanced Performance Diagnostics for SQL in DB2 9.7


    Copyright © 2014 SlideServe. All rights reserved | Powered By DigitalOfficePro