1 / 32

Achieving Scalability in OLAP Materialized View Selection

Achieving Scalability in OLAP Materialized View Selection. Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002. Topics. Overview of OLAP Exponentiality in View Selection Our Polynomial Greedy Algorithm (PGA) Test Results Conclusions Current Work. Customer. CustID. Name.

chico
Download Presentation

Achieving Scalability in OLAP Materialized View Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002

  2. Topics • Overview of OLAP • Exponentiality in View Selection • Our Polynomial Greedy Algorithm (PGA) • Test Results • Conclusions • Current Work

  3. Customer CustID Name Fact Table City Calendar CustID State/Prov DateID DateID Bind Style BindID Month Cost BindID Quarter Sell Desc Year Example Star Schema

  4. CustID Name City State/Prov DateID Month Quarter Year 1/1/98 Jan 1 1998 00001 U of M Ann Arbor MI 1/2/98 Jan 1 1998 00002 Smith & Co. Toronto Ont       12/31/00 Dec 4 2000 CustID DateID BindID Cost Sell 00002 12/31/00 PB $500 $600 BindID Desc 00222 1/1/99 HC $1100 $1300 PB Paper Back    Many Rows HC Hard Cover Star Schema Viewed with Data Customer Calendar Fact Table Bind Style

  5. Eight Dimensions of Book Database

  6. Combinatorial Explosion d i = 1 • Possible views = ℓi, where d = |dimensions| ℓi = |levels| in dimension i • Book database example • 2 dimensions, 42 = 16 views • 4 dimensions, 44 = 256 views • 6 dimensions, 46 = 4,096 views • 8 dimensions, 48 = 65,536 views

  7. Recap • Materialized views quicken query responses • Disk space limits view materialization • Update window is a constraint • Solution: Select strategic views

  8. Fact Table View Size Estimation Sample Data Estimate Request Estimated View Size Completed Work Initial Data View Selection Strategic Views View Maintenance Incremental Data Update Current Work Current Views Queries Users Query Optimization Quick Responses Our OLAP Optimization Approach

  9. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 View Selection:Example of Hypercube Lattice [HRU96] p = Part s = Supplier c = Customer

  10. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Iteration 1 {p, s} {c, s} {c, p} {s} {p} {c} {} 5.2M x 4 = 20.8M 0 x 4 = 0 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 Example of HRU Algorithm [HRU96] p = Part s = Supplier c = Customer Benefits of Possible Materialization Choices

  11. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Iteration 1 {p, s} {c, s} {c, p} {s} {p} {c} {} 5.2M x 4 = 20.8M 0 x 4 = 0 0 x 4 = 0 5.99M x 2 = 11.98M 5.8M x 2 = 11.6M 5.9M x 2 = 11.8M 6M - 1 Example of HRU p = Part s = Supplier c = Customer Benefits of Possible Materialization Choices Iteration 2 0 x 4 = 0 0 x 4 = 0 0.79M x 2 = 1.58M 0.6M x 2 = 1.2M 5.9M x 2 = 11.8M 0.8M - 1

  12. Exponentiality in HRU • O(kn2) time, where k = |views to select|, n = |possible views| • n = 2d in non-hierarchical database, where d = |dimensions| • HRU algorithm is O(k22d) time • Two sources of exponentiality • Each possible view is evaluated • Each view evaluation considers the effect of materialization on every descendent

  13. Polynomial Greedy Algorithm (PGA) Nomination Selection For each candidate Select fact table Evaluate benefit Start new path [more candidates] [path ended] [else] Select view greedily [continuing path] Nominate smallest child view [else] [termination condition met]

  14. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Example of PGA [NT02] p = Part s = Supplier c = Customer

  15. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates {p, s} {s} {} Example of PGA p = Part s = Supplier c = Customer Nomination

  16. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates Iteration 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA p = Part s = Supplier c = Customer Nomination Selection

  17. Candidates Iteration 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA {c, p, s} 6M p = Part s = Supplier c = Customer {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Nomination Selection Nomination Candidates {c, s} {s} {c} {}

  18. {c, p, s} 6M {p, s} 0.8M {c, s} 6M {c, p} 6M {s} 0.01M {p} 0.2M {c} 0.1M {} 1 Candidates Iteration 1 Candidates Iteration 2 {c, s} {s} {c} {} 0 x 2 = 0 0.79M x 2 = 1.58M 5.9M x 2 = 11.8M 6M - 1 {p, s} {s} {} 5.2M x 4 = 20.8M 5.99M x 2 = 11.98M 6M - 1 Example of PGA p = Part s = Supplier c = Customer Nomination Selection Nomination Selection

  19. Nomination Complexity • Maximum swatch width is d. • Maximum path length is d. • Finding one path is O(d2) time • Our strategy nominates a path each time a view is selected, complexity is O(d2k) time

  20. Evaluating Views in PGA • Polynomial time evaluation requires approximating materialization benefits • Account for smallest ancestor • Account for materialized view with largest overlap in descendants • Complexity of our algorithm is O(d2k2)

  21. Complexities

  22. Near Optimal Selection Query Costs (rows) d=2, ℓ = 4 Materialization Costs (rows)

  23. HRU PGA Query Costs at Four Dimensions Query Costs (thousands of rows) Materialization Costs (thousands of rows)

  24. HRU PGA Query Costs at Six Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows)

  25. HRU PGA Query Costs at Eight Dimensions Query Costs (millions of rows) Materialization Costs (thousands of rows)

  26. HRU PGA Performance at Four Dimensions Processing Time (seconds) Materialization Costs (thousands of rows)

  27. HRU PGA Performance at Six Dimensions Processing Time (minutes) Materialization Costs (thousands of rows)

  28. HRU PGA Performance at Eight Dimensions Processing Time (minutes) Materialization Costs (thousands of rows)

  29. Conclusions • PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity • PGA extends the usefulness of OLAP systems into higher dimensionality

  30. Fact Table View Size Estimation Sample Data Estimate Request Estimated View Size Completed Work Initial Data View Selection Strategic Views View Maintenance Incremental Data Update Current Work Current Views Queries Users Query Optimization Quick Responses Current Work

  31. Current Work • Design alternative data structures for materialized views in OLAP • Test impact of new data structures on update and query costs. • Integrate our work into an OLAP system

  32. References • [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada. • [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada. • [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html.

More Related