html5-img
1 / 38

Caching Multi-dimensional Queries Using Chunks

Caching Multi-dimensional Queries Using Chunks. Prasad Deshpande Joint work with Jeffrey F. Naughton Karthikeyan Ramasamy Amit Shukla. OLAP Schema. Star Schema of Fact and Dimension tables Example: Product(pname, pid, pcategory) Store(sname, sid, scity, sstate, country)

Download Presentation

Caching Multi-dimensional Queries Using Chunks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caching Multi-dimensional Queries Using Chunks Prasad Deshpande Joint work with Jeffrey F. Naughton Karthikeyan Ramasamy Amit Shukla

  2. OLAP Schema • Star Schema of Fact and Dimension tables • Example: Product(pname, pid, pcategory) Store(sname, sid, scity, sstate, country) Date(dname, did, dday, dyear) Sales(pid, sid, did, dollar_sales)

  3. OLAP Queries • View data for some dimension members at different levels of aggregation • Example: Select pname, dmonth, sum(dollar_sales) From Sales, Date, Product Where pcategory = “clothes” AND dmonth in {Jan, Feb, Mar} AND Sales.did = Date.did AND Sales.pid = Product.pid Group by pid, dmonth

  4. Outline of Talk • OLAP data model and queries • Caching for OLAP queries • Chunk based caching • Chunked file organization • Implementation and performance results • Summary and future work

  5. Motivation for Caching • Require interactive response time • Queries computationally expensive due to aggregation • Possible to exploit special properties of the OLAP data model and queries

  6. Motivation Select pname, dmonth, sum(dollar_sales) From Sales, Date, Product Where pname = “blaire_cotton_shirts” AND dmonth in {Jan, Feb, Mar, Apr, May, Jun} AND Sales.did = Date.did AND Sales.pid = Product.pid Group by pname, pmonth Select pname, dmonth, sum(dollar_sales) From Sales, Date, Product Where pname = “blaire_cotton_shirts” AND dmonth in {Apr, May, Jun, Jul, Aug, Sep} AND Sales.did = Date.did AND Sales.pid = Product.pid Group by pname, pmonth Overlapping Queries

  7. Locality in OLAP Queries • Temporal • Hierarchical • Due to access patterns along the hierarchies in the dimensions • Data members related by the parent-child or sibling relationships are accessed together • Example : Cities in Wisconsin

  8. Classification • Unit of Caching • Query Level, Table level, Semantic Regions etc. • Nature of caching • Static, Dynamic • Desirable properties • Cache only relevant parts • Dynamic caching

  9. Multi-dimensional Chunking • Multi-dimensional arrays are typically stored in chunked format • Distinct values of each dimension are divided into ranges • Chunks represent semantic regions

  10. Chunking Region Date Product Schema : (Product, Region, Date, Dollar Sales)

  11. Chunk Caching • Motivation - partial/full reuse of results • Chunks -- Unit for caching • Query results • Split into a set of chunks • chunks are cached • Cache contains chunks at different levels of aggregation • Dynamic scheme • different replacement policies

  12. Reusing Cached Chunks Q1 Q3 Region Q2 Product

  13. Closure Property of Chunks • Mapping between chunks at different levels of aggregation leads to efficient computation of missing chunks • Example: 12 13 14 15 0 8 11 9 10 1 Region Region 5 7 6 4 2 3 3 0 1 2 Product

  14. Issues • Imposing order on the domain of dimensions • Use hierarchy on the dimension • Better use of hierarchical locality • Example: Cities in Wisconsin are grouped together for Region • Size of the chunk • Granularity vs. Overhead

  15. Chunked File Organization • Motivation • Reduce cost of a chunk miss • Chunk-based multi-dimensional arrays • Loss of relational access to the data • Apply chunking to relational tables • Data still stored as tuples • Tuples clustered on a chunk basis • A chunk index is built to get access based on chunk numbers

  16. Advantages • Cost of accessing a chunk proportional to the size of chunk rather than entire table • Maintains relational interface • Achieves multi-dimensional clustering • Improves performance of Bitmap Index

  17. Putting it Together • Analyze query selection predicates to get a list of chunk numbers • Group by clause specifies the level of aggregation (denoted by group by identifier) • (Group by id, Chunk number) is a key for looking up in the cache

  18. Processing a Query Client OLAP Server Cache Backend

  19. Implementation • Chunked file implemented in SHORE storage manager of the Paradise Database System • Tuples are stored in a fixed length record file called Fact file • Tuples are clustered on chunk number • A B+-Tree is used to implement the chunk index

  20. Replacement Policies • Simple LRU • Not very suitable for OLAP queries • Chunks at different levels of aggregation have different costs of computation • Benefit Based Policies • Associate a “profit metric (benefit)” with each chunk • Benefit of a chunk is measured by the fraction of base table it represents • Combined CLOCK scheme with benefit

  21. Experimental Setup • Four dimensions with hierarchy sizes 3, 2, 3, 2 • Base data size : 500,000 tuples of 20 bytes each • Cube size : 300 MB • Cache size : 30 MB • Buffer pool size at backend : 8MB • Platform : Dual processor Pentium 133 MHz with 128 MB memory • Query stream consisting of 1500 queries

  22. Query Profile • Designated hot region - a large percentage of the queries access data in the hot region • Q60, Q80 and Q100 - 60, 80 and 100% of the queries access 20% of the cube • Proximity queries - model hierarchical locality • QRandom - 100% randomly generated • QEqual - 50% random, 50% proximity • Qproximity - 20% random, 80% proximity

  23. Comparison With Query Caching

  24. Comparison With Query Caching

  25. Varying the Chunk Range

  26. Replacement Policies

  27. Summary and Future Work • Chunk based caching performs better than traditional query caching • Chunk range has to be chosen optimally • Benefit based LRU performs better than Simple LRU • Future work • Implement aggregation in the cache • Other cache policies such as pre-fetching

  28. OLAP Data Model • OLAP data is multi-dimensional • Dimensions and Metrics • Example: Schema : (Product, Region, Date, Dollar Sales) Product, Region and Date are Dimensions Dollar Sales is a metric • Dimensions have hierarchies

  29. Previous work • Static caching of entire tables [Harinarayan, Rajaraman, Ullman] • Materialized views [Srivastava, Dar, Jagadish, Levy] • Limited by query containment

  30. Benefits of Chunk Caching • Granularity of caching • Only frequently accessed chunks get cached • Query containment not necessary • Better reuse of cached results • Uniformity • Easier to reuse results

  31. Bitmap performance • Bitmaps used for selection • Bits set may be scattered • Chunks lead to clustering of data • Fewer I/Os • Example Product Schema: (Product, Region, $Sales) Select all tuples corresponding to Madison Region Madison

  32. Bitmap Performance

  33. Performance Metrics • Average query execution time • Cost saving ratio (CSR) • Percentage of the total cost of queries saved due to hits in the cache • Gives a more accurate measure of the benefit of caching than the hit ratio

  34. Comparison With Query Caching

  35. Comparison With Query Caching

  36. Varying the Chunk Range

  37. Replacement Policies

  38. Putting it Together • Split chunk number list depending on chunks present in the cache • Chunks present in the cache are answered from the cache • Generate SQL statement for fetching missing chunks from the backend • Backend uses chunked file for efficient retrieval of missing chunks

More Related