1 / 19

Parallel Data Cube

Parallel Data Cube. Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL. Operational data collected into DW DW used to support multi-dimensional views Views form the basis of OLAP processing Our focus: the OLAP server. Data Warehousing for Decision Support.

kennethk
Download Presentation

Parallel Data Cube

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Data Cube Data Mining OLAP (On-line analytical processing) cube / group-by operator in SQL

  2. Operational data collected into DW DW used to support multi-dimensional views Views form the basis of OLAP processing Our focus: the OLAP server Data Warehousing for Decision Support

  3. Collection of feature attributes Aggregate along one or more measure attributes Reduce the granularity by collapsing dimensions Multi-dimensional views

  4. Proposed by Gray et al (Microsoft) in 1995 Exploits the relationship between cuboids to compute all 2d cuboids In OLAP views are typically pre-computed to improve query response time Data Cube Generation

  5. Top Down Cube Compute high dimension views first Exploit shared dimensions Pipesort PipeHash Bottom Up Cube Minimizes external memory sorting by partitioning first on single attributes ArrayCube Sequential Solutions

  6. ROLAP relational data representation harde to build and query smaller storage no translation from/to relational model MOLAP array representation easy to build and query large storage needs translation from/to relational model Sequential Solutions

  7. Construct the data cube lattice Estimate the edge costs Find a least cost spanning tree Compute the views by following the “pipes” Top Down Cube (Pipesort)

  8. Optimizations • Share-sorts - sharing sorting cost across multiple group-bys. • Smallest parent - computing a cuboid from the smallest previously computed parent. • Cache results - reduce I/O by caching (in memory) parent views from which other cuboids are computed. • Amortize disk-scans - compute as many child views as possible when scanning each parent.

  9. Bottom Up Cube

  10. Partition large view into memory-sized units Perform sorting operations in memory May significantly reduce external memory processing Bottom Up Cube

  11. Our Results • Parallel top-down ROLAP cube construction for shared disks (Distributed and Parallel Databases, 2002) • Parallel top-down ROLAP cube construction for distributed disks (IPDPS 2002) • Parallel bottom-up ROLAP cube construction for shared and distributed disks (Distributed and Parallel Databases, 2002) • Parallel ROLAP cube indexing for distributed disks (CCGrid 2003)

  12. Our Results • Parallel top-down ROLAP cube construction for shared disks (Distributed and Parallel Databases, 2002) • Parallel top-down ROLAP cube construction for distributed disks (IPDPS 2002) • Parallel bottom-up ROLAP cube construction for shared and distributed disks (Distributed and Parallel Databases, 2002) • Parallel ROLAP cube indexing for distributed disks (CCGrid 2003)

  13. Parallel top-down ROLAP cube Our approach: • Partition the load in advance and assign cuboids to individual processors • Local computation exploits existing optimized sequential algorithms (ROLAP) • Communication is reduced to a single phase in which work lists are distributed

  14. Cut the process tree into p “equal weight” sub-trees Each processor independently generates cuboids from its own sub-tree Load balance/stripe the output Parallel top-down ROLAP cube

  15. Tree Partitioning • Optimal tree partitioning is NP-complete • Min-max tree k-partitioning: Given a tree T with n vertices and a positive weight assigned to each vertex, delete k edges in the tree to obtain k connected components T1, T2, ... Tk+1 such that the largest total weight of a resulting sub-tree is minimized. • O(n) time, Frederickson 1990 • O(Rk(k + log d)+n) time - Becker, Perl and Schach ‘82

  16. Over Sampling

  17. Time vs. #Proc

  18. For more information ... http://cgm.dehne.net

More Related