- 58 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Bitmap Indices for Data Warehouse' - maisie-humphrey

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Star Schema Vs. Multi-dimensional Range Queries

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200

Characteristics of Multi-Dimensional Range Queries in Data Warehouse

- Ad-Hoc
- Give N dimensions (attributes), every combination is possible: 2Ncombinations
- A Data Cube equals to 2N GROUP-Bys
- High Dimensions ( > 20)
- Large Number of Records

Multi-Dimensional Index Fails!

- R-Trees or KD-Trees
- Effective only for moderate number of dimensions
- Efficient only for queries involving all indexed dimensions.
- For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.

Projection Index

- Fix the order of the records in the base table
- Store
- Project records along some dimension
- i.e, A single Column
- Keeping the record order
- Keeping the duplicates
- Like “array” in C language

base table

Projection Index

Multi-dimensional Range Queries : A General Idea

- Build an index for each dimension (attribute);
- A Projection Index
- A B-Tree
- 1 Primary B-Tree, N -1 Secondary B-Trees
- For each involved dimension, use the index on that dimension to select records;
- “AND” the records to get the final answer set.

How to make the “AND” operation fast?

- Projection Index (B-Tree is similar)
- Scan each involved dimension,
- And return a set of RIDs.
- Intersection the RID sets
- Sets have different lengths
- We can use Sort and Merge to do the Intersection
- Life is easier
- when all the sets have the same length and in the same order
- Use 1/0 to record the membership of each record

General Ideas of Bitmap Index

- Fix the order of records in the base table
- Suppose the base table has m records
- For each dimension
- For each distinct dimension value (as the KEY)
- Build a bitmap with m bits (as the POSITIONS)
- A bitmap is like an Inverted Index
- “AND”, “OR” operations
- realized by bitwise logical operations
- Well supported by hardware

Size of Bitmap Indices

- Number of Bitmap (Indices)
- How to build bitmap indices for dimensions with large distinct values
- Temperature dimension
- Size (i.e., Length) of a Single Bitmap

Three Solutions

- Encoding
- Reduce the Number of Bitmaps
- Binning
- Reduce the Number of Bitmaps
- Compression
- Reduce the Size of a Single Bitmap

Encoding Strategies

- Equality-encoded
- Good for equality queries，such as “temperature == 100”
- Basic Bitmap Index
- Bit-sliced index
- Assume dimension A has c distinct values, use log2c bitmap indices to represent each record (its value)
- Range-encoded
- Good for one-sided range queries, such as “Pressure < 56.7”
- Interval-encoded
- Good for two-sided range queries, such as“35.8 < Pressure < 56.7”

Binning

- Encoding mainly considers discrete dimension values
- Usually integers
- Basic Ideas of Binning
- Build a bitmap index for a bin instead of for a distinct value
- The Number of Bitmaps has nothing to do with the number of distinct values in a dimension.
- Pros and Cons
- Pros：control the number of bitmap via controling the number of bins.
- Cons：need to check original dimension values to decide if the records really satisfy query conditions.

Compression Strategies

- General-purpose compression methods
- Software packages are widely available
- Tradeoff between query processing and compression ration
- De-compress data first
- Specific methods
- BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.
- Adopted since Oracle 7.3
- WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006.
- Used in Lawrence Berkeley Lab for high-energy physics

WAH(Word-aligned Hybrid Bitmap code )

- Based on run-length encoding
- For consecutive 0s or 1s in a bit sequence (part of a bitmap)
- Use machine WORD as the unit for compression
- Instead of BYTE in BBC
- Design Goal：
- reduce the overhead of de-compression, in order to speed-up query response.

Run-length encoding

- Bit sequence B： 11111111110001110000111111110001001
- fill：a set of consecutive identical bits (all 0s or all 1s)
- The first 10 bits in B
- fill = count“+”bit value
- 1111111111=10 “+” 1
- tail: a set of mizxed 0s and 1s
- The last 8 bits in B
- Run：
- Run = fill + tail
- Basic Ideas of WAH
- Define fill and tail appropriately so that they can be stored in WORDs.

WAH vs. B-tree vs. BBC

- On one dimensional range queries
- The query response time grows linearly in the number of hits.
- B-tree has the same time complexity
- Records selected by each single dimension can not be easily combined.
- Query response time
- I/O + CPU
- I/O: WAH > BBC
- CPU: BBC > WAH
- In Total：using WAH is 10 times faster than using BBC

Characteristics of Industrial Products

- Model 204. (Pat O’Neil,1987)
- The first that adopted bitmap index
- Basic Bitmap Index, No binning, No compression
- Now owned by Computer Corporation of America
- Oracle （1995）
- Adopted compressed bitmap index since 7.3
- Probably use BBC for compression, Equality-encoded, No binning.
- Sybase IQ
- bit-sliced index(Pat O’Neil et al,1997)
- No binning, No compression
- For dimension with small number of distinct values, use Basic Bitmap Index.

References

- Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.

Download Presentation

Connecting to Server..