1 / 44

Quotient Cube: How to Summarize the Semantics of a Data Cube

Quotient Cube: How to Summarize the Semantics of a Data Cube. Laks V.S. Lakshmanan (Univ. of British Columbia) * Jian Pei (State Univ. of New York at Buffalo) * Jiawei Han (Univ. of Illinois at Urbana-Champaign) + * The work is partially supported by NSERC and NCE/IRIS

marlow
Download Presentation

Quotient Cube: How to Summarize the Semantics of a Data Cube

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Quotient Cube: How to Summarize the Semantics of a Data Cube Laks V.S. Lakshmanan (Univ. of British Columbia)* Jian Pei (State Univ. of New York at Buffalo)* Jiawei Han (Univ. of Illinois at Urbana-Champaign)+ * The work is partially supported by NSERC and NCE/IRIS + The work is partially supported by NSF, UI, and Microsoft Research

  2. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  3. Data Cube Base table Aggregation Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  4. Previous Work: Efficient Cube Computation • Compute a cube from a base table: e.g. (Agarwal et al. 98), (Zhao et al. 97) • View materialization with space constraint: e.g. Harinarayann et al. 96 • Handling scarcity (Ross & Srivastava 97) • Cube compression: e.g. (Sismanis et al. 02), (Shanmugasundaram et al. 99), (Want et al. 02) • Approximation: e.g. (Barbara & Sullivan 97), (Barbara & Xu 00), (Vitter et al. 98) • Constrained cube construction: e.g. (Beyer & Ramakrishnan 99) Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  5. Previous Work: Extracting Semantics From Cubes • General contexts of patterns (Sathe & Sarawagi 01) • Generalize association rules (Imielinski et al. 00) • Cube gradient analysis (Dong et al. 01) Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  6. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cube (Cell) Lattice • Many cells have same aggregate values • Can we summarize the semantics of the cube by grouping cells by aggregate values? Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  7. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C1 C2 C3 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Naïve Attempt • Put all cells having same aggregate value in a class Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  8. C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Problems w/ the Naïve Attempt • The result is not a lattice anymore! • Anomaly • The rollup/drilldown semantics is lost Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  9. C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C4 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 C5 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Better Partitioning • Quotient cube: partitioning reserving the rollup/drilldown semantics Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  10. Problem Statement • Given a cube, characterize a good way (quotient cube) of partitioning its cells into classes such that • The partition generates a reduced lattice preserving the rollup/drilldown semantics • The partition is optimal: # classes as small as possible • Compute quotient cubes efficiently Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  11. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Why A Quotient Cube Useful? • Semantic compression • Semantic OLAP browsing C3 C1 C2 C4 C5 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  12. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Why A Quotient Cube Useful? • Semantic compression • Semantic OLAP browsing C1 C2 C4 C5 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  13. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  14. C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 C4 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 C5 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Convex Partitions • A convex partition retains semantics Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  15. C1 C2 C3 (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*):9 (*,P1,f):9 C4 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 A Non-convex Partition • Anomaly • The rollup/drilldown semantics is lost Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  16. Connected Partitions • Cells c1 and c2 are connected if a series of rollup/drilldown operation starting from c1 can touch c2 • Intuitively, (each class of) a partition should be connected Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  17. Cover Partition • For a cell c, a tuple t in base table is in c’s cover if t can be rolled up to c • E.g., Cov(S1,*,spring)={(S1,P1,spring), (S1,P2,spring)} Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  18. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cover Partitions Are Convex • All cells having the same cover are in a class • (S1,P2,s) and (*,P2,*) cover same tuples in the base table  (S1,P2,*) and (*,P2,s) are in the same class. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  19. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Cover Partitions Are Connected • Cells c1 and c2 have the same cover  there must be some common ancestor c3 of c1 and c2 st c3 has the same cover • Cells c1 and c2 are in the same class and connected Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  20. Cover Partitions & Aggregates • All cells in a cover partition carry the same aggregate value w.r.t. any aggregate function • But cells in a class of MIN() may have different covers • For COUNT() and SUM() (positive), cover equivalence coincides with aggregate equivalence Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  21. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  22. Weak Congruence • Weak congruence preserves semantics Class 1 = Class 2 Class 1 c c’ c c’ rollup rollup imply rollup rollup Class 2 d d’ d d’ Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  23. Weak Congruence = Convex • Convex  no “hole” in the class  weak congruence • They preserve the rollup/drilldown semantics • Quotient cube lattice is the lattice of convex classes • How to derive the coarsest quotient cube? Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  24. Monotone Aggregate Functions • Monotone functions • S  T  f(S)  f(T) • S  T  f(S)  f(T) • MIN(), MAX(), COUNT(), PSUM(), … • The aggregate function f is monotone  f is the unique coarsest partition • MIN(): put all cells having the same MIN() value into a class Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  25. Non-monotone Functions • Bad news: f may or may not be a convex/weak congruence.  • Good news: cover partition is convex (I.e., weak congruence) and always yields a quotient cube w.r.t. any aggregate function!  Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  26. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  27. How to Compute A QC • Aggregate functions • Monotone functions • Non-monotone functions • Settings • The cube is available • Only the base table is available Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  28. Monotone Functions • The cube is available  grab all cells with the same aggregate value and put them into a class • Only the base table is available  bottom-up, depth-first search • For a cell, compute its cover, find the upper bound having the same aggregate value • Group lower bounds by upper bounds Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  29. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Example: Cover QC Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  30. Non-monotone Functions • Class merging • Find cover partition classes • Merge classes as long as convexity is retained Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  31. (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Example: AVG QC Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  32. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  33. Reduction Ratio vs. Dimensionality # base tuples = 200k Zipf factor = 2.0 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  34. Reduction Ratio vs. Zipf Factor # base tuples = 200k # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  35. Reduction Ratio vs. Base Table Size Zipf factor = 2.0 # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  36. Runtime Zipf factor = 2.0 # dimensions = 6 Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  37. Compression Ratio on Weather Data Set Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  38. Outline • Introduction and motivation • Cube lattice partitions • Semantics preserving partitions • Algorithms • Experimental results • Discussion and summary Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  39. Semantic Cube Exploration • Theoretical foundation for semantic summarization in data cube • concept and properties of quotient cubes • Efficient algorithms for quotient cube construction • Quotient cubes can be computed directly from base tables Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  40. Ongoing Research • Efficient implementation of quotient cube-based OLAP system • Data warehouse built using quotient cubes • Hierarchies and constraints • Incremental maintenance • Semantics based OLAP and mining • Efficient query answering Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  41. References (1) • R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 • S. Agarwal, R. Agrawal, P.M. Deshpande, A. Gupta, J.F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. VLDB, 1996. • D. Barbara and M. Sullivan. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. • D. Barbara and X. Wu. Using loglinear models to compress datacube. In WAIM'2000}, pages 311--322, 2000. • K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In SIGMOD'99. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  42. Reference (2) • G. Birkhoff, Lattice Theory, 2nd edition, New York, American Mathematical Society (Colloquium Publications, vol. 25), 1948. • S. Geffner, D. Agrawal, A. El Abbadi, and T. R. Smith. Relative prefix sums: An efficient approach for querying dynamic OLAP data cubes. In ICDE'99. • Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total. ICDE'96. • C.-T. Ho, J. Bruck, and R. Agrawal. Partial-sum queries in data cubes using covering codes. In PODS'97. • J. Han, J. Pei, G. Dong, and K. Wang. Efficient Computation of Iceberg Cubes with Complex Measures. In SIGMOD'01. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  43. Reference (3) • V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In SIGMOD'96. • T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing Association Rules. Technical Report, Rutgers University, August 2000. • H. V. Jagadish, J. Madar, R.T. Ng. Semantic Compression and Pattern Extraction with Fascicles. VLDB'99. • K. Ross and D. Srivastava. Fast computation of sparse datacubes. In VLDB'97. • G. Sathe and S. Sarawagi. Intelligent Rollups in Multidimensional OLAP Data. VLDB'01. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

  44. Reference (4) • J. Shanmugasundaram, U.M. Fayyad, and P. S. Bradley. Compressed Data Cubes for OLAP Aggregate Query Approximation on Continuous Dimensions. SIGKDD’99. • J. S. Vitter, M. Wang, and B. R. Iyer. Data cube approximation and historgrams via wavelets. In CIKM'98. • W. Wang, H. Lu, J. Feng, and J. X. Yu. Condensed cube: An effective approach to reducing data cube size. In ICDE'02. • Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In SIGMOD'97. • G.K. Zipf. Human Behavior and The Principle of Least Effort Addison-Wesley, 1949. Lakshmanan, Pei & Han. Quotient Cube: How to Summarize the Semantics of a Data Cube

More Related