1 / 35

Materialized View Selection and Maintenance using Multi-Query Optimization

Materialized View Selection and Maintenance using Multi-Query Optimization. Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham. Materialized Views. Complex results materialized in order to speed up queries that depend on these results

yanka
Download Presentation

Materialized View Selection and Maintenance using Multi-Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham

  2. Materialized Views Complex results materialized in order to speed up queries that depend on these results • Increasingly being supported by commercial database systems (e.g. Oracle8i) • Crucial in data warehousing environments

  3. Materialized View Maintenance As underlying data changes, the materialized views need to be refreshed Efficient view maintenance crucial! • Need to provide up-to-date query responses growing • Amount of data added to data warehouses increasing • Maintenance time window shrinking

  4. Focus Efficient techniques for maintenance of a set of materialized views (MVs) by • Transient materialization of common subexpressions (CSEs) • Selection of additional MVs • Computation of the best maintenance policy and plan for each MV

  5. Transient Materialization of Common Subexpressions CSEs materialized to reduce maintenance cost by sharing computation, disposed after use • Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] • Huge search space; considered impractical • Earlier work by Sellis [TODS88] • Efficient heuristic algorithms proposed by Roy et al. [SIGMOD00]

  6. Selection of Additional MVs Additional views materialized permanently to reduce the overall maintenance cost • Motivated by Ross et al. [SIGMOD96] • restricted to incremental maintenance only • do not consider transient materialization • MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]

  7. Best Maintenance Policy and Plan Computation For each MV, • Determine the best maintenance policy (incremental or recomputation) • Find the corresponding best plan • Earlier work by Vista [EDBT98] • Does not take into account transient materialization of CSEs or presence of other MVs • Current systems need manual specification of the maintenance policy

  8. Contribution A framework that consolidates the choice of • CSEs to be transiently materialized • Additional MVs • Best maintenance plan (incremental/recomputation) Integrated with a state of the art query optimizer (Volcano [ICDE93])

  9. Example initial set incremental refresh recomputation recomputation CDE BCDE ABC permanent permanent permanent merge incremental refresh DE permanent BC merge transient dA B C D dE

  10. Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection

  11. Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection

  12. Setting Up the Maintenance Plan Space • The Query DAG representation for recomputation plans • Incorporating incremental plans

  13. Representation of the Recomputation Plan Space AND/OR Query DAG BCD ABC Equivalence Class (OR node) Operation (AND node) BC CD AB Best Plan C D A B Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00]

  14. Incremental Plans:Propagation Based Differential Generation • Differentials propagated one at a time • For each differential dR • Start at dR and compute node differentials bottom-up along the “best plan” in a topological order • Differential of a node computed as a function of its inputs and their differentials • e.g. d(E1E2) = E1 dE2 U E2dE1 U dE1dE2 where dEi= differential of Ei wrt dR • Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above Ross et al. [SIGMOD96]

  15. Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dA BCdA Equivalence Class (OR node) Operation (AND node) BC BdA Best Plan C dA B

  16. Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dB CDdB ACdB Equivalence Class (OR node) Operation (AND node) CdB CD AdB Best Plan C D A dB

  17. Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dC BDdC ABdC Equivalence Class (OR node) Operation (AND node) BdC DdC AB Best Plan dC D A B

  18. Incorporating Incremental Plans:Propagation Based Differential Generation Propagation of dD BCdD Equivalence Class (OR node) Operation (AND node) BC CdD Best Plan C dD B

  19. Incorporating Incremental Plans Merge operator AB Logical representation incremental plan recomputation plan BdA AdB dA B A dB • For each equiv node and each base differential affecting it • Introduce a new equiv node representing its differential • Populate with the differential plans • Maintain statistics for the full expression after successive merges Large space overhead!

  20. Incorporating Incremental Plans Actual space-efficient representation AB BdA AdB dA B A dB Reuse the same structure for successive propagation cycles • separate best plan pointers for each cycle • separate statistics for the full expression after successive merges Also incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00]

  21. Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection

  22. Maintenance Plan Computation Given • Set of nodes Mt materialized transiently • can include full results as well as differentials • Set of nodes Mp materialized permanently • includes full results but not differentials compute the best consolidated maintenance plan for Mp

  23. Maintenance Plan Computation Best plan computed using a query optimizer extended as follows: • Plan accessing a materialized view (trans/perm) does not include its computation, only its use • Cost of a maintenance plan totalcost(Mp, Mt) = eMpmaintcost(e | Mp, Mt) + eMttrmatcost(e | Mp, Mt) where maintcost(Mp, Mt): cost of cheapest maintenance plan for e (recomputation/incremental) trmatcost(Mp, Mt): cost of computing and materializing e

  24. Approach • Setting up the search space of maintenance plans • Best maintenance plan computation • Transient/Permanent materialized view selection

  25. Transient/Permanent Materialized View Selection Given set of MVs M already materialized, determine • Set of nodes Mt to materialize transiently • Set of nodes Mp ( M) to materialize permanently such that totalcost(Mp, Mt) is minimized Exhaustive approach too expensive. Need heuristics!

  26. Transient/Permanent Materialized View SelectionA Greedy Heuristic Input: Initial MVs M Output: Mp ( M) , Mt, corresp. best plan Begin Mp = M; Mt = {} S = set of equivalence nodes in the DAG for M While ( S  {} ) Pick z  S which maximizes Benefit(z | Mp, Mt) If ( Benefit(z | Mp, Mt)  0 ) break If ( z is a full result and maintcost(z | Mp, Mt) < trmatcost(z | Mp, Mt) ) Mp =Mp U {z} else Mt = Mt U {z} S = S – {z} Return (Mp, Mt) End How to compute Benefit(z | Mp, Mt)?

  27. Transient/Permanent Materialized View SelectionBenefit Computation Benefit(z | Mp, Mt) = gain(z | Mp, Mt) - investment(z | Mp, Mt) where gain(z | Mp, Mt) = eMp(maintcost(e | Mp, Mt) - maintcost(e | Mp, Mt U {z})) + eMt(trmatcost(e | Mp, Mt) - trmatcost(e | Mp, Mt U {z})) and investment(z | Mp, Mt) = min(maintcost(z | Mp, Mt), trmatcost(z | Mp, Mt)) if z is a full result trmatcost(z | Mp, Mt) if z is a differential Benefit computation expensive. Need efficient techniques!

  28. Transient/Permanent Materialized View SelectionImproving Efficiency of the Greedy Heuristic • Cost-propagation based incremental techniques to efficiently compute Benefit • Monotonicity assumption • Reduces the number of Benefit computations • Techniques to determine if a node can be shared across a given maintenance plan • Reduces the number of nodes considered for transient materialization Adapted from Roy et al. [SIGMOD00]. See paper for details.

  29. Benchmark • Single Views • Same views as above, refreshed separately • Set of Views • 10 views (5 with aggregates, 5 without) on 8 distinct relations, refreshed together

  30. Effect of Transient and Permanent Materialization Single Views Set of Views

  31. Effect of Adaptive Maintenance Policy Selection Single Views Set of Views

  32. Scalability Analysis Optimization Memory Requirements Optimization Time Negligible one-time costs

  33. Conclusion Presented techniques • Automate sharing of computation • Automate view selection • Automate maintenance policy selection and plan computation • Do the above in an integrated manner • leading to benefits greater than could be achieved by considering each dimension individually • Are efficient and scalable • the overall benefits greatly outweigh the one-time cost • Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)

  34. Future Work Extend presented techniques • To handle limited space • To speed up a workload of queries in addition to maintenance of a set of materialized views • To work in dynamic query result caching environments

  35. Questions

More Related