1 / 29

Selectivity-Based Partitioning

Selectivity-Based Partitioning. Alkis Polyzotis UC Santa Cruz. Query Optimization. Integral component of declarative query processing Key problem: join ordering Most important (and most complex!) module of a DBMS. R 1 R 2 R 3 R 4. Parser. R 1 R 2 R 3 R 4. Optimizer.

lan
Download Presentation

Selectivity-Based Partitioning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz

  2. Query Optimization • Integral component of declarative query processing • Key problem: join ordering • Most important (and most complex!) module of a DBMS R1 R2 R3 R4 Parser R1 R2 R3 R4 Optimizer ( (R2 R3)R1) R4 Execution Engine

  3. “Monolithic” Query Optimization • Output: a single join order based on join selectivities between tables Plan:(P E) D

  4. Partition-Based Query Optimization • Output: multiple join orders based on selectivities between fragments of tables Plan: ( (P D2) E )  ( (E D1) P )

  5. Selectivity-Based Partitioning • Divide-and-Union paradigm • Optimization problem and analysis • Partitioning algorithm • Experimental results

  6. Roadmap • Preliminaries • Problem Definition • Partitioning Algorithm • Optimal Splits • Iterative Partitioning • Experimental Results • Conclusions

  7. Data and Query Model • Chain-join queries • Example: R1 R2 R3 R4 • Relations may have optional selections • Relation  Frequency matrix • Left-deep evaluation plans • Example: R3 R2 R4 R1 R1 R4 R3 R2

  8. Problem Definition • Given: query Q, maximum partition count N • Goal: find partitioning of Q in nN partitions that minimizes query cost • On-the-fly partitioning vs. Off-line partitioning • Difficult optimization problem! • Determine the pivot relation • Determine the number of partitions • Compute a partitioning of the pivot • Determine the orderings of partitioned plans R1R21 R4 R3 R1 R2 R3 R4 R3R22 R1 R4

  9. Query Cost Function • One possibility: optimizer’s cost model • Accurate cost estimation • Solution depends on low-level system details • Difficult to gain intuitions • Our approach: query cost = number of intermediate results • Simple function that admits analysis • Sound connections to realistic cost models (Cluet and Moerkotte, ICDT’95) Cost(R3 R2 R4 R1 ) = |R3 R2| + |R3 R2 R4|

  10. Roadmap • Preliminaries • Problem Definition • Partitioning Algorithm • Optimal Splits • Iterative Partitioning • Experimental Results • Conclusions

  11. Partitioning Algorithm - Overview • State space: partitioned join orders • Partitioning algorithm: • Explore a set of states • Compute optimal partitioning for each state • Return global optimum • Our approach: order joins then partition • Another possibility: partition then order joins

  12. Distributing Tuples • Goal: Distribute tuples to minimize cost • Optimal distribution depends on: • Frequency matrices of other relations • Position (m,l)

  13. Optimal Split Theorem • Distribute each value (m,l) independently • Place (m,l) in partition that minimizes g(L,T,m,l)

  14. Partitioning Algorithm - Overview • State space: partitioned join orders • Partitioning algorithm: • Explore a set of states • Compute optimal partitioning for each state • Return global optimum

  15. Search Algorithm • Exhaustive search is impractical [ Pivot, Leading orders, Trailing orders ] • Search heuristics: • Tighter search space: [ Pivot, Optimal Leading orders ] • Iterative Partitioning • Guided search by using lower bounds on cost of partitions

  16. Encoding of State Space • State: [ Pivot , Optimal leading orders ] • Transition: insert relation in a leading order

  17. R21 R21 R22 R22 Iterative Partitioning • Key idea: (Partition, Optimize)+ • Compute optimal split for leading/trailing orders • Optimize trailing orders for the current split • Theorem: query cost can only decrease • Idea extended to more detailed cost models Leading Trailing R3 R5 R4 R3 R4 R5 R1 R2 R3 R4 R1 R5 R5 R1

  18. Search Algorithm • Initial states: single-relation leading orders • Search process: • Compute partitions with IP • Open more states with transition function • Transitions are guided by lower bound on cost function • Same lower bound can also prune states • Stopping criteria: • Search space is exhausted • Time budget is exhausted

  19. System Integration Monolithic Partition-based Parser Parser Partitioner Optimizer Optimizer Execution Engine Execution Engine

  20. Roadmap • Preliminaries • Problem Definition • Partitioning Algorithm • Optimal Splits • Iterative Partitioning • Experimental Results • Conclusions

  21. Effect of Skew Synthetic Data

  22. Execution Time Synthetic Data (Skew=1.5)

  23. Varying Time Budget Synthetic Data (Skew=1.5)

  24. Results on Real-Life Data SwissProt

  25. Conclusions • Monolithic optimization  Missed opportunities • Selectivity-Based Partitioning • Divide & Union approach • Multiple join orders per query • Join selectivity between relation fragments • Partitioning Algorithm • Iterative Partitioning • Experimental Results • Significant reduction of intermediate results

  26. Future Work • Extension to multiple pivots • Partition-then-order optimization • Efficient execution of partitioned plans • Off-line workload-aware partitioning

  27. Thank you!

  28. Partitioning Model • General case: Multi-relation partitioning • Our approach: Single-relation partitioning R1R21 R4 R3 R1 R2 R3 R4 R31R22 R1 R4 R1R22R32 R4

More Related