1 / 40

Measuring SQL Execution Outliers (to track performance better)

Measuring SQL Execution Outliers (to track performance better). Maxym Kharchenko. 500 ms. A very important SQL. MERGE INTO orders_table USING dual ON ( dual.dummy IS NOT NULL AND id = :1 AND p_id = :2 AND order_id = :3 AND relevance = :4 AND … . Typical elapsed time: 100 ms

armand
Download Presentation

Measuring SQL Execution Outliers (to track performance better)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring SQL Execution Outliers(to track performance better) Maxym Kharchenko

  2. 500 ms

  3. A very important SQL MERGE INTOorders_tableUSING dual ON (dual.dummyIS NOT NULL AND id = :1ANDp_id = :2 ANDorder_id = :3AND relevance = :4AND … Typical elapsed time: 100 ms *Bad* elapsed time: > 200 ms

  4. SQL Latency

  5. SQL latency metrics Elapsed Elapsed Time Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id ---------------- -------------- ------------- ------ ------ ------ ------------- 635.5 10,090 0.1 31.5 16.5 77.6 fskp2vz7qrza2 Module: MYmodule merge into orders_table using dual on (dual.dummy is not null and id = :1 and p_id = :2 and order_id = :3 and relevance = :4 and …

  6. What exactly is “average” ?

  7. What exactly is “average” ? Average

  8. Most typical value “average” = “most typical” 95 % of all executions

  9. You can make predictions with “average” Probability: >= 200ms: 0.6 % Average: 100 ms

  10. Averageis a pretty decent metric

  11. As long as distribution is normal

  12. Measured Execution Times

  13. Measured Execution Times

  14. Measured Execution Times

  15. Measured Execution Times

  16. Measured Execution Times

  17. What if the real distribution is not normal?

  18. People feel *BAD* variance not the average

  19. Percentiles “average”

  20. Percentiles “average” 99th percentile

  21. Average: (what we think) typical latency is: 102 ms p99: The worst 1% of executions is at least as bad as: 532 ms

  22. SQL latency (but now with: p99)

  23. Ok, so how do we measure percentiles ?

  24. You need to capture individual query times

  25. Application side tracing App start_exec = time() Elapsed = time() – start_exec Db Exec: 4fucahsywt13m:19731969 • “True” user experience • Precise(captures “everything”) • (Lots of)DIY by developers • Captures *not only* db time

  26. Server side (10046) tracing App start_exec = time() Elapsed = time() – start_exec Db Exec: 4fucahsywt13m:19731969 • Precise(captures “everything”) • Detailed: breakdown by events and SQL “stages” • Cumbersome to process (lots of individual trace files and “events”)

  27. Sampling • v$sql.elapsed_time

  28. Sampling with number_generator as ( select level as l from dual connect by level <= 1000 ), target_sqls as ( select /*+ ordered no_merge use_nl(s) */ … from number_generator i, gv$sql s

  29. Sampling SQL> @sqlc fdcz4kx11era5                                     Gets    Ela (ms) LAST  C#   Plan hash   EXECUTIONS       pExecpExec Active---- ----------- ------------ ----------- ----------- ------------   2   245875337    1,700,541      444.62      137.57 +0 00:00:01   7   245875337            2       23.50       21.39 +0 01:15:16   3   245875337            1       26.00       10.38 +27 04:42:52

  30. Sampling SQL> @ssql fdcz4kx11era5 2 1000           Elapsed      CPU           IO      App       CCS  Ex         TIME     TIMETIMETIMETIMEPct- --- ------------ -------- ------------ -------- -------- -----     1          330        0            0        0        0     0    1          340    1,000            0        0        0  3.33    1          786      999            0        0        0  6.67    1        1,518    2,000          188        0        0    10*   2       11,963    1,999       11,103        0        0 13.33    1       14,851    4,999       10,908        0        0 16.67    1       15,724    2,000       14,780        0        0    20    1       16,471    2,000       15,163        0        0 23.33…    1       90,256    5,999       87,365        0        0 86.67    1       97,171    2,000       93,585        0       27    90    1      120,635    1,999      117,660        0        0 93.33    1      142,201    6,999      138,853        0        0 96.67    1      167,552    4,998      165,333        0        0   100

  31. Sampling SQL> @ssql2 fdcz4kx11era5 2 50000 avg 10 Elapsed                                CPU          IO Pct    Execs TIME                                  TIMETIME--- -------- ------------------------------ ----------- ----------- p0       148 .23-7.11                               .89        2.30 p10      148 7.18-14.03                            1.11        9.44 p20      146 14.03-20.26                           1.48       15.82 p30      143 20.39-29.01                           1.86       22.92 p40      146 29.1-40.73                            1.91       32.63 p50      143 40.77-55.21                           2.37       45.50 p60      142 55.22-77.92                           3.15       63.09 p70      145 77.99-113.33                          3.58       90.72 p80      141 113.41-173.64                         4.46      136.22 p90      138 174.34-634.15                         6.83      245.30

  32. Sampling SQL> @ssql3 fdcz4kx11era5 2 50000 avg 10                                                   Elapsed         CPU          IO Bucket Range (ms)              Execs Graph             TIME        TIMETIME------ -------------------- -------- ---------- ----------- ----------- -----------      1 .19-51.81                 686 ##########       22.39        1.51       20.91      2 51.81-103.44              303 ####             76.37        2.89       73.75      3 103.44-155.07             198 ##              127.59        3.55      124.23      4 155.07-206.69              91 #               174.25        4.68      169.82      5 206.69-258.32              46                 224.91        5.47      220.11      6 258.32-309.95              22                 267.26        6.90      261.46      7 309.95-361.57               7                 339.04        9.00      331.30      8 361.57-413.2                8                 264.19        6.90      258.24      9 413.2-464.83                3                 318.62        6.00      311.41     10 464.83-516.45               2                 492.26       10.00      483.53

  33. The scripts are here http://intermediatesql.com

  34. Sampling with i_gen as ( select level as l from dual connect by level <= &REPS ), target_sqls as ( select /*+ ordered no_mergeuse_nl(s) */ … from i_geni, gv$sql s • SQL access to data • Simplified time breakdown • Can capture “hours” • Slightly imprecise (captures 90-95 % of runs) • x$ data: “suspect” ?

  35. Monitoring SQL> descv$session sql_id sql_exec_start sql_exec_id v$sql_monitor /*+ MONITOR */

  36. Monitoring NAME VALUE DESCRIPTION ------------------------------ ------- ------------------------------------------------------------ _sqlmon_binds_xml_format default format of column binds_xml in [G]V$SQL_MONITOR _sqlmon_max_plan 480 Maximum number of plans entry that can be monitored. Defaults to 20 per CPU _sqlmon_max_planlines 300 Number of plan lines beyond which a plan cannot be monitored _sqlmon_recycle_time 60 Minimum time (in s) to wait before a plan entry can be recycled _sqlmon_threshold 5 CPU/IO time threshold before a statement is monitored. 0 is disabled • Precise(captures “everything”) • SQL access to data • Capture size is limited (think: “seconds”)

  37. Can I find worst performers in ASH ? 1 2 3 4 5 6 7 8 9 10 11 1, 2, 3, 7 3, 5, 7, 9 7

  38. Can I find worst performers in ASH ?

  39. Takeaways • Percentiles are better performance metrics than averages • Percentile calculation: requires capturing (most of) individualSQL runs • A number of ways exist to capture and measure individual SQL runs

  40. Thank you!

More Related