- 43 Views
- Uploaded on
- Presentation posted in: General

Bitmap Indices for Data Warehouse

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Bitmap Indices for Data Warehouse

Jianlin Feng

School of Software

SUN YAT-SEN UNIVERSITY

Feb 27, 2009

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200

- Ad-Hoc
- Give N dimensions (attributes), every combination is possible: 2Ncombinations
- A Data Cube equals to 2N GROUP-Bys

- High Dimensions ( > 20)
- Large Number of Records

- R-Trees or KD-Trees
- Effective only for moderate number of dimensions
- Efficient only for queries involving all indexed dimensions.

- For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.

- Fix the order of the records in the base table
- Store

- Project records along some dimension
- i.e, A single Column
- Keeping the record order
- Keeping the duplicates

- Like “array” in C language

base table

Projection Index

- Build an index for each dimension (attribute);
- A Projection Index
- A B-Tree
- 1 Primary B-Tree, N -1 Secondary B-Trees

- For each involved dimension, use the index on that dimension to select records;
- “AND” the records to get the final answer set.

- Projection Index (B-Tree is similar)
- Scan each involved dimension,
- And return a set of RIDs.
- Intersection the RID sets
- Sets have different lengths
- We can use Sort and Merge to do the Intersection

- Life is easier
- when all the sets have the same length and in the same order
- Use 1/0 to record the membership of each record

- Fix the order of records in the base table
- Suppose the base table has m records
- For each dimension
- For each distinct dimension value (as the KEY)
- Build a bitmap with m bits (as the POSITIONS)
- A bitmap is like an Inverted Index

- “AND”, “OR” operations
- realized by bitwise logical operations
- Well supported by hardware

- Number of Bitmap (Indices)
- How to build bitmap indices for dimensions with large distinct values
- Temperature dimension

- Size (i.e., Length) of a Single Bitmap

- Encoding
- Reduce the Number of Bitmaps

- Binning
- Reduce the Number of Bitmaps

- Compression
- Reduce the Size of a Single Bitmap

- Equality-encoded
- Good for equality queries，such as “temperature == 100”
- Basic Bitmap Index

- Bit-sliced index
- Assume dimension A has c distinct values, use log2c bitmap indices to represent each record (its value)

- Range-encoded
- Good for one-sided range queries, such as “Pressure < 56.7”

- Interval-encoded
- Good for two-sided range queries, such as“35.8 < Pressure < 56.7”

- Encoding mainly considers discrete dimension values
- Usually integers

- Basic Ideas of Binning
- Build a bitmap index for a bin instead of for a distinct value
- The Number of Bitmaps has nothing to do with the number of distinct values in a dimension.

- Pros and Cons
- Pros：control the number of bitmap via controling the number of bins.
- Cons：need to check original dimension values to decide if the records really satisfy query conditions.

- General-purpose compression methods
- Software packages are widely available
- Tradeoff between query processing and compression ration
- De-compress data first

- Specific methods
- BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.
- Adopted since Oracle 7.3

- WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006.
- Used in Lawrence Berkeley Lab for high-energy physics

- BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.

- Based on run-length encoding
- For consecutive 0s or 1s in a bit sequence (part of a bitmap)

- Use machine WORD as the unit for compression
- Instead of BYTE in BBC

- Design Goal：
- reduce the overhead of de-compression, in order to speed-up query response.

- Bit sequence B： 11111111110001110000111111110001001
- fill：a set of consecutive identical bits (all 0s or all 1s)
- The first 10 bits in B
- fill = count“+”bit value
- 1111111111=10 “+” 1

- tail: a set of mizxed 0s and 1s
- The last 8 bits in B

- Run：
- Run = fill + tail

- Basic Ideas of WAH
- Define fill and tail appropriately so that they can be stored in WORDs.

- On one dimensional range queries
- The query response time grows linearly in the number of hits.

- B-tree has the same time complexity
- Records selected by each single dimension can not be easily combined.

- Query response time
- I/O + CPU
- I/O: WAH > BBC
- CPU: BBC > WAH
- In Total：using WAH is 10 times faster than using BBC

- Model 204. (Pat O’Neil,1987)
- The first that adopted bitmap index
- Basic Bitmap Index, No binning, No compression
- Now owned by Computer Corporation of America

- Oracle （1995）
- Adopted compressed bitmap index since 7.3
- Probably use BBC for compression, Equality-encoded, No binning.

- Sybase IQ
- bit-sliced index(Pat O’Neil et al,1997)
- No binning, No compression
- For dimension with small number of distinct values, use Basic Bitmap Index.

- Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.