1 / 32

Automatic Physical Design Tuning: Workload as a Sequence

Automatic Physical Design Tuning: Workload as a Sequence. Sanjay Agrawal, Microsoft Research Eric Chu, University of Wisconsin-Madison Vivek Narasayya, Microsoft Research. Automatic Physical Design Tuning. DB applications more complex and varied. Considerable time spent on tuning.

said
Download Presentation

Automatic Physical Design Tuning: Workload as a Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Physical Design Tuning:Workload as a Sequence Sanjay Agrawal, Microsoft Research Eric Chu, University of Wisconsin-Madison Vivek Narasayya, Microsoft Research

  2. Automatic Physical Design Tuning • DB applications more complex and varied. • Considerable time spent on tuning. • Reduce cost of ownership of RDBMS. • Automatically recommend physical design. • Supported by DB vendors. • Database Engine Tuning Advisor, Microsoft • Design Advisor, IBM • SQL Access Advisor, Oracle SIGMOD 2006

  3. Microsoft Database Engine Tuning Advisor Set of queries, updates Applications Workload Query Optimizer (extended) Database Engine Tuning Advisor “What-if” Set of indexes, materialized views, horizontal partitions Microsoft SQL Server 2005 Recommendation SIGMOD 2006

  4. Create Indexes Drop Indexes Create Indexes Queries Day Updates Night Queries Day Workload as a Sequence: Motivation • Data warehousing • Query by day, update at night. • Set: No index recommended when update costs outweigh benefits. • Sequence: May exploit benefits of indexes without incurring update costs. • Insert “create” and “drop” of indexes to workload. • Exploit order of statements. SIGMOD 2006

  5. Create Indexes Drop Indexes Create Indexes Set VS Sequence • Set-based • Recommendation is robust to changes in order of statement arrival. • Can miss good recommendations compared to sequenced-based approach. • Outputs are different • Set: what indexes to create or drop? • Sequence: what indexes to create or drop and where? Queries Updates Queries SIGMOD 2006

  6. Model Workload as a Sequence • Motivation • Problem Definition • Optimal Algorithm • Disjoint Sequences • Greedy-SEQ • Experiments SIGMOD 2006

  7. C0 C1 C2 C3 CN CN+1 S1 S2 SN S3 Problem Setting Workload: S = [S1, S2, …, SN] • Cost(Si,Ci) – cost of executing Si with Ci. • TC(C1, C2) – transition cost • Sequence execution cost • Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1) Si{Select, Insert, Delete, Update} SIGMOD 2006

  8. Problem Definition • Given: • Database D, workload W = [S1, …, SN], initial configuration C0, and storage bound M. • Find configurations C1, C2, …, CN+1 such that • Minimize sequence execution cost: Nk=1((Cost(Sk,Ck) + TC(Ck-1,Ck)) + TC (CN,CN+1) • Storage of Ci≤ M, for all i. SIGMOD 2006

  9. Search Space • Given N statements and M indexes • Sequence-based tuning • 2M distinct configurations for each statement. • 2M(N+1) possible execution sequences. • Set-based tuning • 2M configurations. SIGMOD 2006

  10. Model Workload as a Sequence • Motivation • Problem Definition • Optimal Algorithm • Disjoint Sequences • Greedy Heuristic • Experiments SIGMOD 2006

  11. S1 S2 SN 0 { } { } { } SOURCE 0 0 DESTINATION Ic { } { } Id Ic Id {I} 0 {I} {I} Optimal Algorithm for Single-Index Case • Node costs: Cost(Si, { }) and Cost(Si,{I}). • Edge costs: 0, IC, and ID. • Cost of shortest path includes node and edge costs. DAG for single index, N statements SIGMOD 2006

  12. General Case – Multiple Indexes S1 S2 SN EXHAUSTIVE • At each stage, enumerate all possible configurations from the set of indexes. • Algorithm linear in the number of nodes and edges of DAG. • However, number of nodes in DAG is exponential in the number of indexes. • M indexes => O(N*2M) nodes and O(N*2M) edges. CF1 CF2 CFN C0 Ci1 Ci2 CiN CN+1 C11 C12 C1N C01 C02 C0N SIGMOD 2006

  13. Sequence, Constraints Candidateset of structures Solve sequence using EXHAUSTIVE Recommendation Optimal Solution SIGMOD 2006

  14. Search-Space Pruning Techniques to reduce number of nodes: • Cost-based Pruning • Leverages shortest-path solutions of individual indexes. • Prunes configurations at each stage without loss of optimality. • Disjoint Sequences • Divide-and-conquer approach. • Splits the input sequence and candidate index set. • Greedy-SEQ • Guarantees a polynomial number of nodes. SIGMOD 2006

  15. Model Workload as a Sequence • Motivation • Problem Definition • Optimal Algorithm • Disjoint Sequences • Greedy Heuristic • Experiments SIGMOD 2006

  16. Exploiting Disjoint Sequences • Two sequences X and Y are disjoint if they do not share any statements AND indexes. • Disjoint sequences are common • E.g., server hosts multiple applications that touch different databases. • Approach: • Split workload into disjoint sequences. • Solve each sequence independently. • Merge to get final solution. • Idea: DAG for each disjoint sequence has fewer nodes. SIGMOD 2006

  17. W {I1,I2,I3} W1 {I1} S3 S4 S5 S6 S1 S2 S7 S3 S4 S1 W2 {I2} S5 S6 S2 W3 {I3} S7 Efficiency Gain with Disjoint Sequences 8 nodes at each stage 2 nodes at each stage for each sequence SIGMOD 2006

  18. SRC S1 S3 S4 DEST I1c I1d { } {I1} {I1} { } { } W1 = [S1,S3,S4] S2 S5 S6 DEST SRC I2c I2d { } { } {I2} {I2} { } S7 DEST SRC I3c {I3} {I3} W2 = [S2,S5,S6] { } W3 = [S7] S2 S3 S4 S5 S6 S7 DEST SRC S1 {I1,I2} {I1,I2} {I2} { } {I2} { } {I3} {I3} {I1} Pu is optimal when there are no storage violations. Merge solutions of W1, W2, and W3: No storage violations SIGMOD 2006

  19. { } Pu is not a valid solution as it has configurations with storage violation. DEST S4 S5 S6 S7 DEST SRC S1 S2 S3 SRC S1 S2 S3 S4 S5 S6 S7 {I1} {I1,I2} { } {I1} {I2} {I2} {I2} { } {I3} {I3} {I2} {I1,I2} {I2} { } {I3} {I3} {I1} Merge in the presence of storage violation • Suppose storage bound allows only 1 index. Pu’= Merge P1, P2 and P3 to get a valid solution. • Note that cost of Pu is a lower bound on cost of any valid solution. SIGMOD 2006

  20. Apply Split operator to get disjoint sequences Solve each sequence independently using EXHAUSTIVE Merge results of disjoint sequences Recommendation Solution with Split and Merge Sequence, Constraints Candidateset of structures or GREEDY-SEQ SIGMOD 2006

  21. Model Workload as a Sequence • Motivation • Problem Definition • Optimal Algorithm • Disjoint Sequences • Greedy Heuristic • Experiments SIGMOD 2006

  22. Greedy Approach • Goal: • Explore a polynomial number of good configurations. • Run shortest path over the DAG constructed with these configurations. • Solution close to optimal. • Greedy-SEQ: adaptation of existing greedy technique for the sequence model. SIGMOD 2006

  23. Greedy-SEQ • Steps of Greedy-SEQ: • Get optimal solution for each index. Record configurations. • Initialize current best to be the lowest-cost solution seen so far. • Improve current best by combining with other solutions and resetting current best. Record new configurations of current best. • Repeat until no more improvement. • Run shortest-path over configurations collected. SIGMOD 2006

  24. S0 S1 S2 SK SL SN SN+1 I1 {I1} {I1} {} {} {} {I1} {} I2 {} {I2} {I2} {I2} {} {} {} {I1} {I1} {I1} {} {} {I2} I1,I2 {I2} {I2} {} {} {} {I1,I2} {I1,I2} Combining Two Single-Index Solutions SIGMOD 2006

  25. S0 S1 S2 SK SL SN SN+1 I1 {I1} {I1} {} {} {} {I1} {} I2 {} {I2} {I2} {I2} {} {} {} {I1} {I1} {I1} {} {} {I2} {I2} {} {} {} {I1,I2} {I1,I2} Combining Two Single-Index Solutions {I2} I1,I2 SIGMOD 2006

  26. Greedy-SEQ: Greedy Approach • Get optimal solution for each index. Record configurations. • Initialize current best to be the lowest-cost solution seen so far. • Improve current best by combining with other solutions and resetting current best. Record new configurations of current best. • Repeat Step 3 until no more improvement. • Run shortest-path over configurations collected. SIGMOD 2006

  27. Sequence, Constraints Candidateset of structures Apply split operator to get disjoint sequences Apply cost-based pruning oneach sequence Solve each sequence independently using EXHAUSTIVE or GREEDY-SEQ Merge results of disjoint sequences Recommendation End-to-End Solution SIGMOD 2006

  28. Model Workload as a Sequence • Motivation • Problem Definition • Optimal Algorithm • Disjoint Sequences • Greedy Heuristic • Experiments SIGMOD 2006

  29. Sequence VS Set-based approaches • % improvement relative to the optimal set-based solution. • Sequence is better in the presence of updates and/or storage bound is low. SIGMOD 2006

  30. Greedy-SEQ VS Exhaustive • Greedy-SEQ’s much faster with minimal degradation in quality. SIGMOD 2006

  31. Effectiveness of Split and Merge • With split and merge (SPMR) VS without (WO-SPMR) SIGMOD 2006

  32. Conclusion • Sequence model allows more optimization opportunities than set model. • Model the problem as finding the shortest path over a DAG. • Heuristics give nearly optimal solutions with much better performance. SIGMOD 2006

More Related