1 / 24

PrefixCube: Prefix-sharing Condensed Data Cube

PrefixCube: Prefix-sharing Condensed Data Cube. Jianlin Feng Qiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004. Outline. Introduction Related Work ODM: Ordered Datacube Model BST-Condensed Cube Prefix-sharing Condensed Cube Comparisons

margot
Download Presentation

PrefixCube: Prefix-sharing Condensed Data Cube

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PrefixCube: Prefix-sharing Condensed Data Cube Jianlin Feng Qiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004

  2. Outline • Introduction • Related Work • ODM: Ordered Datacube Model • BST-Condensed Cube • Prefix-sharing Condensed Cube • Comparisons • Conclusions 2

  3. Introduction • Data Cube (ICDE’96) • N-dimensional cube(A1, A2, …, AN) • 2N cuboids, i.e. GROUP-BYs • The Huge Size Problem • When R is sparse, the size of a cuboid is possibly close to the size of R. • The I/O cost even for storing the cube result tuples becomes dominative. 3

  4. Related Work • Condensed Cube (ICDE’02) • Dwarf (SIGMOD’02) • Quotient Cube (VLDB’02) • QC-Tree (SIGMOD’03) • Basic idea: remove redundancies existing among cube tuples. • prefix redundancy • suffix redundancy 4

  5. Prefix redundancy • Given an example cube(A, B, C) • Each value of dimension A occurs in 4 cuboids: cuboid(A), (AB), (AC) and (ABC) • Possibly many times in each cuboid except cuboid(A) • Inter-cuboid and Intra-cuboid prefix redundancy 5

  6. Suffix Redundancy • Occurs when cube tuples belonging to different cuboids are actually aggregated from the same group of base relation tuples. • An extreme case • Let the source relation R have only one single tuple r(a1, a2, …, an, m); • 2n cube tuples can be condensed into one physical tuple: (a1, a2, …, an, V), where V = aggr(r); • together with some information indicating that it is a representative tuple. 6

  7. Thinking… • Condensed cube • It condenses those cube tuples, aggregated from one single base tuple, into a physical tuple in order to reduce cube’s size. • Dwarf • Besides suffix coalescing, i.e. multi-base-tuple condensing, it also realized full prefix-sharing so as to achieve high cube size reducing effectiveness. 7

  8. Motivation • HOW to further reduce condensed cube’s size while taking into account query characteristics we intend to answer - range query? • Augmenting BST-condensing with removing of intra-cuboid prefix redundancy! 8

  9. Ordered Datacube Model • Value ALL(or *) is encoded as 0. • A dimension D and its cardinality C • each dimension value is one-to-one mapped to an integer value between 1 and C inclusively. • N dimensions form a N-dimensional space. • The origin O(0, 0, …, 0) represents the grand total. 9

  10. Ordered Datacube Model • Under ODM, a range query against a data cube can actually be reduced to a sub-query against only one particular cuboid in the cube or a union of such sub-queries. 10

  11. BST-Condensed Cube • Base Single Tuple (BST) • t1 is a BST on SD {A} and {B} • t2 is a BST on SD {B} • A unique minimal BST-Condensed Cube can be got when fully taking advantage of each BST with all of its SDs - MinCube. 11

  12. BU-BST Condensed Cube • BottomUpBST algorithms (ICDE’02) • Each BST corresponds to only one SD. • It’s easier to compute and to restore normal cube tuple from condensed cube compared with MinCube. Note: BST Condensing is a special kind of Prefix-sharing! A group of cube tuples with sharing prefix are represented by a BST! 12

  13. A BU-BST Condensed Cube Example Note: Intra-cuboid prefix redundancy: ct3 and ct4 Inter-cuboid prefix redundancy: ct2, ct3 and ct5 13

  14. Prefix-sharing Condensed Cube - PrefixCube Prefix-sharing BST Condensing + Intra-cuboid prefix-sharing PrefixCube 14

  15. A PrefixCube Example 15

  16. Corresponding Dwarf 16

  17. PrefixCube vs. Dwarf 17

  18. Effectiveness of Size Reduction • Datasets • synthetic datasets with uniform distribution • # of tuples: 1,000,000 (a) Cardinality = 100 (b) Cardinality = 1000 18

  19. Effectiveness of Size Reduction • PrefixBUC • Full Cube (computed by BUC) • Prefix-sharing 19

  20. Impact of Data Density • Datasets • Uniform distribution • # of dimensions: 6 • Cardinality of dimensions: 100 • # of tuples: range from 1,000 to 1,000,000 20

  21. Impact of Data Skewness • Datasets • Zipf distribution • # of tuples: 1,000,000 • Cardinality of dimensions: range from 1,000 to 500 with 100 interval • Zipf factor: range from 0 to 0.8 with 0.2 interval 21

  22. Real-world Dataset • Datasets • Weather Datasets • # of tuples: 1,015,367 22

  23. Conclusion • A new cube structure PrefixCube was proposed by augmenting BU-BST condensing with intra-cuboid prefix-sharing. • It can greatly reduce data cube’s size compared with BU-BST condensed cube. • It can also reduce the impact of data skew on BU-BST condensing. • It can make a quite stable size reduction on both dense and sparse datasets. 23

  24. The End Thank u! Any question? 24

More Related