1 / 42

Multi-Query Optimization and Applications

Multi-Query Optimization and Applications. Prasan Roy Indian Institute of Technology - Bombay. Motivation. Queries often involve repeated computation Queries on overlapping views, stored procedures, nested queries, etc. Update expressions for a set of overlapping materialized views

donoma
Download Presentation

Multi-Query Optimization and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Query Optimization andApplications Prasan Roy Indian Institute of Technology - Bombay

  2. Motivation • Queries often involve repeated computation • Queries on overlapping views, stored procedures, nested queries, etc. • Update expressions for a set of overlapping materialized views • Automatically generated queries • XML-QL complex path expressions  SQL query batches • Our focus: Faster query processing by avoiding repeated computation Multi-Query Optimization and Applications

  3. Outline • Multi-query optimization • Application to related problems • Query result caching • Materialized view selection and maintenance • Conclusions and future work Multi-Query Optimization and Applications

  4. Multi-Query Optimization Prasan Roy, S. Seshadri, S. Sudarshan and Siddhesh Bhobe, Efficient and Extensible Algorithms for Multi-Query Optimization, ACM SIGMOD 2000

  5. Motivating Example Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D 100 100 10 B A 100 100 10 C B C D 10 10 10 10 Total Cost = 460 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications

  6. Motivating Example 100 100 10 10 BC 10 D A 10 10 100 C B 10 10 Total Cost = 370 Benefit = 90 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications

  7. Problem Statement • Find the cheapest plan exploiting transiently materialized common subexpressions (CSEs) • Assumption: No shared pipelines D A C B Common Subexpression Multi-Query Optimization and Applications

  8. Problems • Locally optimal subplans may not be globally optimal • Mutually exclusive alternatives (A JOIN B JOIN C) (B JOIN C JOIN D) (C JOIN D JOIN E) What to share: (B JOIN C) or (C JOIN D) ? • Materializing and sharing a CSE not necessarily cheaper Multi-Query Optimization and Applications

  9. Example Best Plan for A JOIN B JOIN C Best Plan for B JOIN C JOIN D 10 100 10 B A 1 10 10 C B C D 10 1 1 1 Total Cost = 154 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications

  10. Example 100 10 10 10 BC 10 D A 1 10 10 C B 10 1 Total Cost = 172 Benefit = -18 Foreign Key Dependency: ABCD Multi-Query Optimization and Applications

  11. Approach • Set up the search space of execution plans • Explore the search space to find the best execution plan Multi-Query Optimization and Applications

  12. Representation of Plan Space • AND/OR Query DAG BCD ABC Equivalence Class (OR node) Operation (AND node) BC CD AB Example Plan (Solution Graph) C D A B Multi-Query Optimization and Applications

  13. DAG Generation ModificationsUnification • Volcano: Duplicate subexpressions  No CSEs! ABC BCD BC BC CD AB C A B C D B • Modification: Duplicate subexpressions unified Multi-Query Optimization and Applications

  14. DAG Generation ModificationsSubsumption • Volcano: No expression subsumption  Missed CSEs (A>50) (A<10) (A>50) Subsumption derivation (A<10) (A>50) (A>50) (A<10 or A>50) (A>10) • Modification: Subsumption derivations introduced Multi-Query Optimization and Applications

  15. Exploring the Search SpaceAn Exhaustive Algorithm Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • Y = set of equivalence nodes in DAG • Pick X  Y which minimizes BestCost(Q, X) • Return X BestCost(Q, X) = cost of the best plan for Q given that the nodes in X are transiently materialized Too expensive! Need heuristics. Multi-Query Optimization and Applications

  16. Exploring the Search SpaceA Greedy Heuristic Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y  {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X Benefit(z | Q, X) = BestCost(Q, X) - BestCost(Q, X U {z}) Appeared in [Gupta, ICDT97]. Our Contribution: improve efficiency Multi-Query Optimization and Applications

  17. Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y  {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates • Compute Benefit efficiently • Heuristically avoid computing Benefit for some nodes   Multi-Query Optimization and Applications

  18. Improving EfficiencyOnly CSEs Materialized • CSEs identified in a bottom-up traversal BCD ABC Common Subexpression BC CD AB C D A B Multi-Query Optimization and Applications

  19. Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y  {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates  • Compute Benefit efficiently • Heuristically avoid computing Benefit for some nodes   Multi-Query Optimization and Applications

  20. Efficient Benefit Computation Incremental Re-optimization X : Set of CSEs already materialized z : unmaterialized CSE Best plan given X materialized  Best plan given X U {z} materialized • Observation Best plans change only for the ancestors of z Multi-Query Optimization and Applications

  21. Incremental Re-optimization Example X = {} BCD ABC 230 230   230 120 120 230 z = (B JOIN C) 100 100 100 100 10 10 BC BC 10 130 CD AB 100 100 100 Best Plan C D A B 10 10 10 10 Multi-Query Optimization and Applications

  22. Incremental Re-optimization Efficient Propagation • Ancestor nodes visited bottom-up in a topological order • Guarantees no revisits • Propagation path pruned if the current node’s best cost remains unchanged Multi-Query Optimization and Applications

  23. Improving EfficiencySummary Input: DAG for query Q Output: Set of nodes to materialize, corresp. best plan • X = {}; Y = set of equivalence nodes in DAG • While( Y  {} ) Pick z Y which maximizes Benefit(z | Q, X) If( Benefit(z | Q, X) > 0 ) Y = Y – {z}; X = X U {z} Else Y = {} • Return X • Restrict the set of materialization candidates  • Compute Benefit efficiently  • Heuristically avoid computing Benefit for some nodes   Multi-Query Optimization and Applications

  24. Avoiding Benefit Computation • Monotonicity Assumption • Benefit of a node does not increase due to materialization of other nodes • Often true • An earlier benefit of a node is an upper bound on its current benefit • Do not recompute a node’s benefit if another node’s current benefit is greater Optimization costs decrease by 90% Multi-Query Optimization and Applications

  25. Experimental Results • TPCD-0.1 on Microsoft SQL Server 6.5 • using SQL rewriting for MQO Multi-Query Optimization and Applications

  26. Alternatives to Greedy Volcano-SH • A lightweight post-pass heuristic • Compute the best plan for each query independently, using Volcano • Find the set of nodes in the best plans to materialize (cost-based) • Similar previous work [Subramanium and Venkataraman, SIGMOD 1998] Multi-Query Optimization and Applications

  27. Alternatives to Greedy Volcano-RU • A lightweight extension of Volcano • Batched queries optimized in sequence Q1, Q2, …, Qn • Find the best plan for query Qi given the best plans for queries Qj, j < i • Cost based materialization of nodes in best plans of Qj, j < i • Plan quality sensitive to the query sequence Multi-Query Optimization and Applications

  28. Experimental Results • TPCD-0.1 query batches Multi-Query Optimization and Applications

  29. Experimental Results • TPCD-0.1 query batches Multi-Query Optimization and Applications

  30. Features • Easily implemented • First MQO implementation integrated with a state-of-the-art optimizer (as far as we know) • Also partially prototyped on Microsoft SQL-Server • Support for index selection • Index modeled as physical property (like “interesting order”) • Extensible and flexible • New operators, data models • Readily adapts to other problems • Query result caching • Materialized view selection/maintenance Multi-Query Optimization and Applications

  31. Query Result Caching P. Roy, K. Ramamritham, S. Seshadri, P. Shenoy and S. Sudarshan, Don’t Trash Your Intermediate Results, Cache ‘em, Submitted for publication

  32. Problem Statement • Minimize the total execution time of an online workload by • Caching intermediate/final results of individual queries, and • Using these cached results to answer later queries Multi-Query Optimization and Applications

  33. System Model Multi-Query Optimization and Applications

  34. Contributions • Intermediate as well as final results cached • Optimizer-driven cache management • Adapts to workload changes • Cache-aware cost-based optimization • Novel framework for cached result matching Multi-Query Optimization and Applications

  35. Experimental Results • Overheads negligible • Performance on 900 query TPCD-1 based uniform cube-point workload Multi-Query Optimization and Applications

  36. Materialized View Selection and Maintenance Hoshi Mistry, Prasan Roy, K. Ramamritham and S. Sudarshan, Materialized View Selection and Maintenance Using Multi-Query Optimization, Submitted for publication

  37. Problem Statement • Speed up maintenance of a set of materialized views by • Exploiting CSEs between different view maintenance expressions • Selecting additional views to be materialized Multi-Query Optimization and Applications

  38. Contributions • Optimization of maintenance expressions • Support for transiently materialized “delta’’ views • Nicely integrates transient vs permanent view materialization choices Multi-Query Optimization and Applications

  39. Experimental Results • Overheads negligible • Performance benefit for maintenance of two TPCD-0.1 based SPJA views Multi-Query Optimization and Applications

  40. Conclusion • MQO is practical • Low overheads, high benefits • Easily implemented and integrated • Leads to novel solutions to related problems • Query result caching • Materialized view selection and maintenance Multi-Query Optimization and Applications

  41. Future Work • Further extensions of MQO • Shared execution pipelines • Query result caching in presence of updates • Other problems • Continuous queries, XML view caching, etc. Multi-Query Optimization and Applications

  42. Other Contributions • Garbage Collection in Object Oriented Databases • Developed a “transaction-aware” cyclic reference counting algorithm • Provided a formal proof of correctness S. Ashwin, Prasan Roy, S. Seshadri, Avi Silberschatz and S. Sudarshan, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, VLDB 1997 Prasan Roy, S. Seshadri, Avi Silberschatz, S. Sudarshan and S. Ashwin, Garbage Collection in Object-Oriented Databases Using Transactional Cyclic Reference Counting, Invited Paper, VLDB Journal, August 1998 Multi-Query Optimization and Applications

More Related