1 / 24

Improved Query Performance With Variant Indexes

Improved Query Performance With Variant Indexes. Patrick O’Neil, Dallan Quass. Presented by Bo Han. Motivation. 1. Speed up the queries on data warehoues data warehoues are large read-mostly always perform queries of aggregate, filter, and group the data. Motivation.

murray
Download Presentation

Improved Query Performance With Variant Indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Query Performance With Variant Indexes Patrick O’Neil, Dallan Quass Presented by Bo Han

  2. Motivation 1. Speed up the queries on data warehoues • data warehoues are large • read-mostly • always perform queries of aggregate, filter, and group the data

  3. Motivation 2. The first rigorous examination of variant indexes in the literature • Advantages over traditional Value-List indexes for certain classes of queries • More than one type of index available on a column

  4. Motivation 3. Introducing a new indexing approach to support OLAP-type queries • Datacube • Multi-dimensional query • Depends on summary tables

  5. Value-List Index (B+ tree) Leaf Node RID Problem: A key values will have large number of associated RID’s!

  6. Bitmap Indexes Bitmap for Brand Dell 1001000 HP 0100101 Sony 0010000 IBM 0000010 Bitmap for Size 12 0000011 13 0100000 14 1001100 15 0010000 • A Bitmap for a value: an array of bits. The ith bit is set to 1 • if the ith record has the value • A Bitmap index: consists of one bitmap for each value that attribute can take • A Bitmap is an alternate method of representing RID-lists in a Value-List index (low-cardinality)

  7. Bitmap Indexes 1. More space efficient than RID lists in a Value-List index • No compression • |RID|=32bits, #row=n, #distinct value=m • If m<32 m*n<32*n • Compression: Run-length encoding • 2. More CPU efficient for may functions • Boolean operations • ex1: Select Brand From Product • Where Brand=‘HP’ and Size=13 (AND) • ex2: Select Pid From Product • Where Size>12 and Size<15 (OR)

  8. Bitmap Indexes • 2. More CPU efficient for may functions • Count • Select count(*) From Product • Where Brand=‘Dell’ and Size>14 • Each individual bitmap is small and frequently used ones can be cached in memory • 4. Available in most major commercial DBMS

  9. Projection Index A projection index for column duplicates all column values for lookup by ordinal number . • Easy to locate • N=1000*p+s (p: page#, s: slot#) • Few disk I/O projection index for col2

  10. Bit-Sliced Index A set of bitmap slices which are orthogonal to the data held in a projection index. (i.e. a bitwise vertical partition) B5 bit-slice B4 B3 B2 B1 B0 Col2: 20 52 20 62 10 34 1 49 Bnn: bitmap representing set of non null values in the indexed column

  11. Comparison of Indexes (evaluating Single-Column Sum Aggregates) Select SUM(doloar_sales) From Sales Where condition Analyze the disk page I/O cost Plan 1 : Direct access to the rows to calculate the Sum 100million rows, Len(row)=200B, |page|=4K 20rows/page, |Foundset|=2million rows Plan 2 : Calculating Sum through a Projection Index Len(doloar_sales)=4B, 1000rows/page 100,000 pages

  12. Comparison of Indexes (evaluating Single-Column Sum Aggregates) Plan 3 : Calculating the Sum through a Value-List(Bitmap) Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for each non-null value v in the index for C { Designate the set of rows with value v as Bv SUM += v * COUNT(Bf AND Bv); } Return SUM; Bf: 100,000,000bits=12,500,000B3125 pages Bv: 100,000,000RIDs of 4 bytes each100,000 pages Total: 103,125pages

  13. Comparison of Indexes (evaluating Single-Column Sum Aggregates) Plan 4 : Calculating the SUM through a Bit-Sliced Index if (COUNT (Bf AND Bnn) = = 0) Return null; SUM = 0.0; for i = 0 to N SUM += * COUNT(Bi AND Bf); Return SUM; Bf: 100,000,000bits=12,500,000B3125 pages 2 million rows: 21Bitmaps Total:22*3125=68,750 pages

  14. Comparison of Indexes (evaluating Single-Column Sum Aggregates)

  15. Evaluating Aggregate Function

  16. Range Evaluation Performance

  17. Evaluating OLAP-style Queries • OLAP approach creates precalculates results of some Grouped Queries and stores them in summary tables. • The expected set of queries is known in advance? • Size of data in summary tables grows as the product of the number of values in the independent dimensions (space requirement?) How to speed up Join and Group By ? Join Indexes and Bitmap-Join-Indexes

  18. Join Indexes A join index: an index on one table that involves a column value from different table through a commonly encountered join.

  19. Bitmap Join Index A Bitmap Join Index spans multiple tables and improves query performance between the joined tables.

  20. Bitmap Join Index Select Sum(Dollar_sales) From Sales S Natural Join Product P Natural Join Customer C Where P.Brand=‘Dell’ AND C.State=‘PA’ = and

  21. Calculating Groupset Aggregates • Select Sum(F.A) From S,D1,D2,D3 Where condition Group by D1.d1, D2.d2, D3.d3 • Using Value-List index to determine Groupset (F.di=Di.di, without join!) • Using Projection index on F.A to get SUM(F.A)

  22. Improved Grouping Efficiency Problem: Groupsets and rows are randomly placed on disk. Segmentation: Partition rows in F into Segments. Query evaluation: one segment at a time. Clustering: Cluster the fact table F D1 =d1-1 111111111111111111111110000000000000000000… =d1-2 000000000000000000000001111111111111111000… …… D2 =d2-1 111111000000000000000001111110000000000000… =d2-2 000000111111100000000000000001111111100000… …… D3 =d3-1 110000110000000000000001111110000000000000… =d3-2 001100001100000000000000000001111111100000… …… =d3-n3 000011000001100000000000000001111111100000… (d1-1, d2-1, d3-1) 11000000000000000000000000000000000000… (d1-1, d2-1, d3-2) 00110000000000000000000000000000000000… Groupset Indexes: Keyvalues are a concatenation of the dimensional primary-key values

  23. Conclusion • Analyze Value-List index, Bitmap index, Projection index and Bit-Sliced index • Combine Bitmap indexing and physical row clustering to evaluate OLAP queries involving aggregation and grouping

  24. Reference • Improved Query Performance With Variant Indexes – Patrick O’Neil and Dallan Quass, Proc. ACM SIGMOD Conf. 1997, Pages 38-49. • Bitmap Index Design and Evaluation – C.Y. Chan and Y.E. Ioannidis 1998. 6 • Database System Implementation – Hector Garcia M., Jeffrey D.U. and Jennifer W., Prentice Hall, 2000 • Encoded Bitmap Indexing for Data Warehouses – M.C. Wu and A.P. Buchmann 1998. 2 • An Efficient Bitmap Encoding Scheme for Selection Queries – C.Y. Chan and Y.E. Ioannidis 1998. 6 • Multidimensional Indexing and Query Coordination for Tertiary Storage Management – A. Shoshani and L.M. Bernardo, etc. 1999. 10 • Multi-Table Joins Through Bitmapped Join Indices – P. O’Neil and G. Graefe 1995. 9

More Related