1 / 27

Multi-way Algorithm for Cube Computation CPS 196.03 Notes 8

Multi-way Algorithm for Cube Computation CPS 196.03 Notes 8. First Programming Project. Individual project, 15 Points in final grade Sales(customer_id, item_id, item_group, item_price, purchase_date)

edmund
Download Presentation

Multi-way Algorithm for Cube Computation CPS 196.03 Notes 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-way Algorithm for Cube Computation CPS 196.03Notes 8

  2. First Programming Project • Individual project, 15 Points in final grade • Sales(customer_id, item_id, item_group, item_price, purchase_date) • Will be provided as a file during demo and for generating performance numbers for project report • Task 1: 5 Points • Interface to enter MIN_SUPPORT (% of customers) • Find frequent itemsets using Apriori (set of item_id’s) • Task 2: 5 Points (Section 5.5 in the textbook) • Interface to enter two constraint types (e.g., SUM(item_price) op const) • Use the constraints in Apriori as effectively as possible, study and demonstrate performance improvement • Task 3: 5 Points • Extension of your choice. Examples include (i) association rules, (ii) complex constraints, (iii) sequential patterns, (iv) variants of apriori, (v) FP-growth

  3. File Format • 10,123,3,54,4/4/2008 • 10,12,4,101,4/5/2008 • 14,123,3,54,8/4/2008 • … • Caveats: • Customer Vs. Item • Three datasets: Toy, Medium, and Large • Comma-separated file, one purchase per line in file, no header in file • Integers for simplicity • Note date format

  4. First Programming Project: Milestones • Feb 3: Project announced • Feb 17: Mid-project report due • Describe progress and planned extensions • Describe detailed algorithms for all three tasks • Feb 17: Sample data file will be provided for generating performance results for project report • March 2: Submit code, README file to run code, code documentation, and final project report • March 2-4: Project demos (random assignment) • March 6: Spring break. Second project announced

  5. Finalized Grading Criteria for Class • Homeworks: 15 points • Programming projects: 40 points • Midterm: 20 points • Note: Midterm is on Feb 19 (Thu) in class • Final: 25 Points

  6. ROLAP server utilities relational DBMS ROLAP Server • Relational OLAP Server tools Special indices, tuning; Schema is “denormalized”

  7. Sales City B A milk soda eggs soap Product 1 2 3 4 Date utilities MOLAP Server • Multi-Dimensional OLAP Server M.D. tools multi-dimensional server could also sit on relational DBMS

  8. Date 2Qtr 1Qtr sum 3Qtr 4Qtr TV Product U.S.A PC VCR sum Canada Country Mexico sum All, All, All MOLAP Total annual sales of TV in U.S.A.

  9. C c3 61 62 63 64 c2 45 46 47 48 c1 29 30 31 32 c 0 B 60 13 14 15 16 b3 44 28 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 a2 a3 A MOLAP B

  10. Challenges in MOLAP • Storing large arrays for efficient access • Row-major, column major • Chunking • Compressing sparse arrays • Creating array data from data in tables • Efficient techniques for Cube computation Topics are discussed in the paper for reading

  11. ROLAP Vs. MOLAP • What do the authors say? • What can you do in MOLAP that you cannot do in ROLAP? • Can the algorithm in this paper be used in ROLAP?

  12. Array Storage • Chunks • Compression • Chunk-offset compression Vs. LZW

  13. Loading Arrays from Tables • The easy case: array fits in memory • Else: • Partitions

  14. Loading Arrays from Tables • Suppose there are 1000 chunks. 10 chunks can fit in memory. The partition size is 10 chunks Table  100 ... ... 10 chunks

  15. Basic Array Cubing Algo • First find minimum spanning tree • Hierarchy of aggregates • Compute each (k-1) dimensional aggregate from its best k dimensional aggregate • One pass through the array in the right order Let us look at some basics first

  16. A a3 61 62 63 64 a2 45 46 47 48 a1 29 30 31 32 a0 B 60 13 14 15 16 b3 44 28 56 9 b2 40 24 52 5 b1 36 20 1 2 3 4 b0 a0 a1 c2 c3 C Chunked 3D Array B Dimension order CBA

  17. “a0b0” chunk c1 c0 c2 c3 b0 a0b0c0 c1 c2 c3 a0 a0 a0b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a0b2c0 c1 c2 c3 b2 b3 a0b3c0 c1 c2 c3 …

  18. a0b1 chunk c1 c0 c2 c3 b1 a0b0c0 c1 c2 c3 Done with a0b0 a0 a0 a0b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a0b2c0 c1 c2 c3 b2 b3 a0b3c0 c1 c2 c3 …

  19. a0b2 chunk c1 c0 c2 c3 b2 a0b0c0 c1 c2 c3 Done with a0b1 a0 a0 a0b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a0b2c0 c1 c2 c3 b2 b3 a0b3c0 c1 c2 c3 …

  20. Table Visualization c1 c0 c2 c3 b3 a0b0c0 c1 c2 c3 Done with a0b2 a0 a0 a0b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a0b2c0 c1 c2 c3 b2 b3 a0b3c0 c1 c2 c3

  21. Table Visualization … c1 c0 c2 c3 b0 a1b0c0 c1 c2 c3 Done with a0b3 Done with a0c* a1 a1 a1b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a1b2c0 c1 c2 c3 b2 b3 a1b3c0 c1 c2 c3 …

  22. a3b3 chunk (last one) … c1 c0 c2 c3 b0 a3b0c0 c1 c2 c3 Done with a0b3 Done with a0c* Done with b*c* a3 a3 a3b1c0 c1 c2 c3 c1 c0 c2 c3 b0 b1 a3b2c0 c1 c2 c3 b2 b3 a3b3c0 c1 c2 c3 Finish

  23. Memory Used • A: 40 distinct values • B: 400 distinct values • C: 4000 distinct values • CBA: Dimension Order • Plane AB: Need 1 chunk (10 * 100 * 1) • Plane AC: Need 4 chunks (10 * 1000 * 4) • Plane BC: Need 16 chunks (100 * 1000 * 16) • Total memory: 1,641,000

  24. Memory Used • A: 40 distinct values • B: 400 distinct values • C: 4000 distinct values • ABC: Dimension Order • Plane BC: Need 1 chunk (1000 * 100 * 1) • Plane AC: Need 4 chunks (1000 * 10 * 4) • Plane AB: Need 16 chunks (100 * 10 * 16) • Total memory: 156,000

  25. Basic Array Cubing Algo • First find minimum spanning tree • Hierarchy of aggregates • Compute each (k-1) dimensional aggregate from its best k dimensional aggregate • One pass through the array in the right order • What are the advantages and disadvantages of this algorithm?

  26. Multi-way Array Cubing Algo • What is the main idea? • Rule 1 on Page 163 • Minimum memory spanning tree • Figure 2 • Figures 3 and 4 • Theorem 1 • Basic idea of multi-pass algorithm • Tradeoff between memory usage and number of passes

More Related