1 / 18

Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers

Florian Waas, EMC Corp. Leo Giakoumakis , Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/ 2011. Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers. Tale of a Plan Regression. Standard regression test just don’t cut it!.

cassie
Download Presentation

Plan Space Analysis Detecting Plan Regressions in Cost-based Query Optimizers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Florian Waas, EMC Corp. Leo Giakoumakis, Microsoft Corp. Shin Zhang, Microsoft Corp 06/13/2011 Plan Space AnalysisDetecting Plan Regressions in Cost-based Query Optimizers Florian Waas, EMC/Greenplum

  2. Tale of a Plan Regression Standard regression test just don’t cut it! Florian Waas, EMC/Greenplum • Applied ‘obvious’ improvement to optimizer • Passed all regression tests without problem • Shipped proudly! • Lots of customers complain about plan regressions • Hard conversation between Dev and QA • Dev: Why didn’t you find this? This change affects virtually all queries! • QA: Why can’t you tell me what to look for if it’s so ‘obvious’?

  3. Takeaways Florian Waas, EMC/Greenplum • Conventional testing of optimizer focuses on a singlebest-plan found per query • Ignores massive space of rejected alternatives • Plan Space Analysis • Takes many/all plans considered into account • Quantifies optimizer changes – even if result not affected • Detects regressions early in the development process

  4. Nomenclature Florian Waas, EMC/Greenplum • Optimizer selects best plan found (BPF) • Rejects non-trivial numbers of alternatives • Explicitly or implictly • Plan regression • Code-level change to optimizer leads to bad plan choice • Perceived or actual

  5. Dilemma of Optimizer Testing Quantify – but do not judge Florian Waas, EMC/Greenplum • Optimizers work off theoretical models • All practical models have limitations • Most non-trivial queries exceed limitations of model • May lead to contradicting optimization problems • Get query Q1 right OR query Q2… • Right or wrong is a matter of view point and business priority

  6. Standard Test Procedure Inconclusive, labor intensive, does not scale Florian Waas, EMC/Greenplum • Choose relevant workload • Freeze BPF • Apply modification • Test against frozen BPF • Diff may indicate regression • Manual intervention needed to determine actual impact • In practice: lots of false positives/negatives

  7. Desiderata for better regression tests Calls for abstract evaluation framework Florian Waas, EMC/Greenplum • Simplicity, transparency • Simple number • Meaningful correlation to system • Technology agnostic, targeted • Does not reverse engineer optimizer • ‘understands’ executor • Surgical, specific • Actionable • Applicable to any and every workload • Practical • Easy to compute, robust methodology

  8. Plan Spaces Florian Waas, EMC/Greenplum • Set of alternatives considered by optimizer • Product specific • Non-trivial size • E.g., TPC-H 5: 230+ million alternatives • Contains optimal plan(s) • According to database parameters • Think: statistics • Pairwise relationships based on cost function • E.g., cost(Popt) < cost(P)

  9. Observation Optimizer must assess plans conclusively Florian Waas, EMC/Greenplum Given a query For each plan alternative P There exists a configuration so that P is optimal Even if distinctly suboptimal in original query/configuration

  10. Ideal optimizer Establishes correct order of plan alternatives Florian Waas, EMC/Greenplum • Makes no mistakes • Establishes partial order betweenalternatives according to estimates • Estimated order matches actual execution • Regardless of actual cost values

  11. Plan Space Analysis: Principle Use Correlation as measure for quality/impact Florian Waas, EMC/Greenplum • Enumerate plan alternatives • Have optimizer cost them • Determine order O1 according to estimated cost • Execute all plans alternatives • Determine order O2 according to actual execution cost • Compute correlation of O1 and O2

  12. Plan Space Analysis: Correlation Florian Waas, EMC/Greenplum • Spearman-Coefficient • Value range [-1,1] • 1 perfect monotone function • 0 uncorrelated • etc.

  13. Plan Space Analysis: in Practice Confine PSA to relevant areas of space Florian Waas, EMC/Greenplum • Use sample of space • Uniform sampling • Galindo-Legariaet al. VLDB 1994 • Waas, Galindo-Legaria, SIGMOD 2000 • Simple hints/forcing will do too • Ignore certain plans • cost(P) > cost(Popt) * k • | act(P1) – act(P2) | < d

  14. Experiments Florian Waas, EMC/Greenplum • Commercial query optimizer • Built-in ranking module for sampling • Sample of 20 plans/query • Fixed seed for repeatability • 3 iterations for execution

  15. TPC-H • 1GB scale factor • (Very) good results overall • Known issues Results match expectations Florian Waas, EMC/Greenplum

  16. Sensitivity to Regressions PSA is effective early warning system Florian Waas, EMC/Greenplum • Modified cost model parameter • Costing of hash in HJ • BPF only affected by last modification • Detects any detrimental change immediately • Applies to all types of regressions

  17. Takeaways Florian Waas, EMC/Greenplum • Conventional testing of optimizer focuses on a singlebest-plan found per query • Ignores massive space of rejected alternatives • Plan Space Analysis • Takes many/all plans considered into account • Quantifies optimizer changes – even if result not affected • Detects regressions early in the development process

  18. Florian Waas, EMC Corp. florian.waas@emc.com leogia@microsoft.com Shin.zhang@microsoft.com www.database-research.com Thank you! Florian Waas, EMC/Greenplum

More Related