1 / 9

# Cube Tree - PowerPoint PPT Presentation

Cube Tree. Dimension: number of group-by values Relation tuples map to a point in the space Aggregates: projection of all data points on all the subspaces. Intersection between a subspace and the orthogonal hyper-plane stores the aggregates. Origin represents aggregate with no grouping

Related searches for Cube Tree

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Cube Tree' - Anita

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Dimension: number of group-by values

• Relation tuples map to a point in the space

• Aggregates: projection of all data points on all the subspaces.

• Intersection between a subspace and the orthogonal hyper-plane stores the aggregates.

• Origin represents aggregate with no grouping

• Query a group-by aggregate on the corresponding hyper-planes

• Sort-pack: (for multi-dimension data)

• Achieves excellent clustering

• Significantly reduces the overlap and dead space

• A preferred structure for Datcubes storage

• Representation of Datacube only provide good clustering for half of the total group-bys

• Degradation due to strong interleaving between points of these group-bys.

• Dataless Cubtree: Only contains aggregate values but no data values

• Better clustering than a full tree in a R-Tree

• Projection points are not interleaved

• Reduced Cubetree: Each hyper-plane which containing aggregates will form a R-Tree independently

• The dimension of R-Tree reduced by one.

• Better clustering and query performance

• A set of group-bys are compatible if there exist a sort order that guarantees no dispersion

• Allocate a group-by to one of the N R-Trees

• the set of group-bys for this R-Tree is compatible

• if a group-by cannot find a compatible set

• assign it to a set that contain all of its gorup-by attributes. (false allocation)

• Selection of sort order for Packed R-Tree is also an import parameter for favoring some prefered group-bys

• Selectively compute only those partitions that satisfy an aggregate condition

• Aggregate with low support reveal little meaning & make the cube sparse

• Conditions like

• Minimum support of a partition

• Required Range

Parent to compu the child

• Starting from a bottom single dimension groupby

• If current inputs can be pruned return

• Partition the data in this group-by

• If a partition is greater than the minsup

• recursive call on BUC with the partition as inputs

• Loop until all dimensions is done

• Similar idea of Apriori-gen

• Apriori will generate all the candidates at the same level first (breadth first)

• BUC is in depth first manner.

• To reduce memory requirement

• Dimension ordering: provide better pruning

• Cardinality, Skew & Correlation