1 / 24

Testing Challenges for Extending SQL Server's Query Processor: A Case Study

Testing Challenges for Extending SQL Server's Query Processor: A Case Study. Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com . Agenda. Motivation Background Relational Data Warehousing (DW) SQL Server 2008 Starjoin improvement Testing Challenge

signa
Download Presentation

Testing Challenges for Extending SQL Server's Query Processor: A Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing Challenges for Extending SQL Server's Query Processor:A Case Study Torsten Grabs, Steve Herbert, Xin (Shin) Zhang {torsteng; stevhe; xinzh}@microsoft.com DBTest2008

  2. Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial DBMS • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008

  3. Motivation • Data warehouses are huge • Billions of rows in fact tables • Multi-terabyte database • Query response time requirements are strict • Interactive response times desired: <5 sec • Ideally: speed-of-thought response time • Plan choice is CRUCIAL for good performance • User requirements are challenging • Large input space • Zero administration overhead • Do not break existing customer base DBTest2008

  4. Background: Relational DW Business Question: Give me total sales of SQL Server 2005 in second quarter of year 2006. Example Star Query: SELECT SUM(Dollars) FROMSales S JOINProduct P ONP.Product_Key = S.Product_Key JOINPeriodPe ONPe.Date_Key = S.Date_Key WHEREProduct_Name = 'SQL Server 2005' ANDQuarter_Number = 2 AND Year = 2006 Period Date_Key Quarter_Number Year Sales Date_Key Product_Key Qty_Sold Dollars Product Product_Key Product_ID Product_Name Category Fact Table Dimension Tables DBTest2008

  5. Background: New Feature High selectivity queries Medium selectivity queries Low selectivity queries • Fact selectivity matters for plan choice • SQL 2008 improve medium selectivity queries Seek-based plans with nested loop joins Scan-based plans with bitmap hash joins Scan-based plans with regular hash joins 0% of fact rows qualify 100% of fact rows qualify fact table selectivity DBTest2008

  6. DW-specific Extensions of the SQL Query Processor SQL Server Query Optimizer Standard (join) query optimizations Cost-based plan choice Final query plan Alternative query plans Standard optimization Star query plans Schema detection Selectivity analysis Optimization extension for DW Star query detection

  7. Bitmap-based semi-join reduction Surrogate key values of rows qualifying the filter over the store dimension Hash Join Surrogate key values of rows qualifying the filter over the product dimension Join Reduction Info 1 Hash Join Join Reduction Info 2 Filter Rowset afterjoin reduction Fact Table Scan Join Reduction Processing Filter Product dimension table Join Reduction Info 2 Join Reduction Info 1 Store dimension table Rowset before join reduction

  8. Testing Challenge • Large input space of queries • Full range of selectivity • Mixed ad-hoc and parameterized queries • Complex schema and workloads • Automatic feature • Correct cost based plan choice • Smart plan pattern detection • Accurate join selectivity estimation • No knobs – no application changes required • Happy existing customers • Significant improvements • Negligible regressions DBTest2008

  9. Agenda • Motivation • Background • Relational Data Warehousing (DW) • SQL Server 2008 Starjoin improvement • Testing Challenge • Extending Enterprise-class Commercial Server • Solution • Iterative development process • Multi-dimensional testing • Case Study Results • Conclusions DBTest2008

  10. Iterative Development Process • In-cycle validation of assumptions • Mitigates risk of major end-of-cycle issues • Especially performance problems • Maximal paralleling of testing and developing efforts DBTest2008

  11. Multi-Dimensional Testing • Functional testing • Target testing to ensure core functionality • Model-based testing to ensure coverage • Performance testing • Component • Benchmark • Customer Workloads DBTest2008

  12. Functional: Target Testing DBTest2008

  13. Functional: Model-Based Testing • Large number of test dimensions • 10+ test dimensions … • If assume 3 variations each … • will generate 60K combinations! • Two abstract models covering key requirements • Schema model • Database schema and data • Query model • Star-join queries built on top of the schema model DBTest2008

  14. Functional: Schema Model DBTest2008

  15. Functional: Query Model DBTest2008

  16. Model-based Test Example • Test scenario • Testing selectivity estimation of single fact star schema • Schema model • Number and classification of tables: fact 1, dimension 5 • Relationships between tables: star schema • Cardinality: fact 100K rows, dimension 10 rows each • Data distribution: uniform • Query model • Number of facts: 1 • Number of dimensions: 10 • Dimension selectivity: 0.4~0.8 (5 choices) • Fact aggregation: 1 aggregation (12 possible types) • Nested subqueries: none • Fact selectivity: 0.1~1.0 • Single test covers 55*12 (37,836) tests cases DBTest2008

  17. Performance Testing DBTest2008

  18. Case Study Results • ~10 different workloads • 3 representative results • Decision support workload results • Microsoft sales data warehouse results • Retail workload results DBTest2008

  19. Results: Decision Support Workload • 100GB data • 70+ queries • Typical DSS scenario • Schema • Queries • Limited performance benefit for initial design • Lots of regressions initially • Good convergence over several iterations DBTest2008

  20. Results: Microsoft Sales DW • 750GB data • 50 queries • Complex queries • > 20 joins • Started with good design for performance • But: too many regressions with initial design • Converge to good result over several iterations DBTest2008

  21. Results: Retail Workload • 100GB data • 30 queries • Complex physical design • Indexes • Partitioning • Several iterations to establish the “winning” design • Significant improvements after several iterations • Regressions limited to “2 wrongs make 1 right” (see Giakoumakis/Galindo-LegariaTKDE 2008) DBTest2008

  22. Conclusions • Extension of the SQL Server in relational DW • New feature with zero administration overhead • Widely deployed system • Identified testing challenges • Balance performance improvement and regression risk • Solution • Iterative development and testing cycles • Multi-dimensional testing (functional, performance) • Iterative development and testing insights • Supports learning and adjustment during development • Delivers well-understood results • Leads to high-quality features DBTest2008

  23. Q&A http://www.microsoft.com/sql/2008Torsten torsteng@microsoft.comSteve stevhe@microsoft.comShin shin.zhang@microsoft.com DBTest2008

More Related