Bitmap Indices for Data Warehouse

1 / 22

# Bitmap Indices for Data Warehouse - PowerPoint PPT Presentation

Bitmap Indices for Data Warehouse. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009. Star Schema Vs. Multi-dimensional Range Queries. SUM (qty * amt) WHERE ProdId in [p1.. p10] AND custId &lt; 200.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Bitmap Indices for Data Warehouse' - maisie-humphrey

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Bitmap Indices for Data Warehouse

Jianlin Feng

School of Software

SUN YAT-SEN UNIVERSITY

Feb 27, 2009

Star Schema Vs. Multi-dimensional Range Queries

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200

• Give N dimensions (attributes), every combination is possible: 2Ncombinations
• A Data Cube equals to 2N GROUP-Bys
• High Dimensions ( > 20)
• Large Number of Records
Multi-Dimensional Index Fails!
• R-Trees or KD-Trees
• Effective only for moderate number of dimensions
• Efficient only for queries involving all indexed dimensions.
• For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.
Projection Index
• Fix the order of the records in the base table
• Store
• Project records along some dimension
• i.e, A single Column
• Keeping the record order
• Keeping the duplicates
• Like “array” in C language

base table

Projection Index

Multi-dimensional Range Queries : A General Idea
• Build an index for each dimension (attribute);
• A Projection Index
• A B-Tree
• 1 Primary B-Tree, N -1 Secondary B-Trees
• For each involved dimension, use the index on that dimension to select records;
• “AND” the records to get the final answer set.
How to make the “AND” operation fast?
• Projection Index (B-Tree is similar)
• Scan each involved dimension,
• And return a set of RIDs.
• Intersection the RID sets
• Sets have different lengths
• We can use Sort and Merge to do the Intersection
• Life is easier
• when all the sets have the same length and in the same order
• Use 1/0 to record the membership of each record
General Ideas of Bitmap Index
• Fix the order of records in the base table
• Suppose the base table has m records
• For each dimension
• For each distinct dimension value (as the KEY)
• Build a bitmap with m bits (as the POSITIONS)
• A bitmap is like an Inverted Index
• “AND”, “OR” operations
• realized by bitwise logical operations
• Well supported by hardware
Size of Bitmap Indices
• Number of Bitmap (Indices)
• How to build bitmap indices for dimensions with large distinct values
• Temperature dimension
• Size (i.e., Length) of a Single Bitmap
Three Solutions
• Encoding
• Reduce the Number of Bitmaps
• Binning
• Reduce the Number of Bitmaps
• Compression
• Reduce the Size of a Single Bitmap
Encoding Strategies
• Equality-encoded
• Good for equality queries，such as “temperature == 100”
• Basic Bitmap Index
• Bit-sliced index
• Assume dimension A has c distinct values, use log2c bitmap indices to represent each record (its value)
• Range-encoded
• Good for one-sided range queries, such as “Pressure < 56.7”
• Interval-encoded
• Good for two-sided range queries, such as“35.8 < Pressure < 56.7”
Binning
• Encoding mainly considers discrete dimension values
• Usually integers
• Basic Ideas of Binning
• Build a bitmap index for a bin instead of for a distinct value
• The Number of Bitmaps has nothing to do with the number of distinct values in a dimension.
• Pros and Cons
• Pros：control the number of bitmap via controling the number of bins.
• Cons：need to check original dimension values to decide if the records really satisfy query conditions.
Compression Strategies
• General-purpose compression methods
• Software packages are widely available
• Tradeoff between query processing and compression ration
• De-compress data first
• Specific methods
• BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.
• WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006.
• Used in Lawrence Berkeley Lab for high-energy physics
WAH(Word-aligned Hybrid Bitmap code )
• Based on run-length encoding
• For consecutive 0s or 1s in a bit sequence (part of a bitmap)
• Use machine WORD as the unit for compression
• Instead of BYTE in BBC
• Design Goal：
• reduce the overhead of de-compression, in order to speed-up query response.
Run-length encoding
• Bit sequence B： 11111111110001110000111111110001001
• fill：a set of consecutive identical bits (all 0s or all 1s)
• The first 10 bits in B
• fill = count“+”bit value
• 1111111111=10 “+” 1
• tail: a set of mizxed 0s and 1s
• The last 8 bits in B
• Run：
• Run = fill + tail
• Basic Ideas of WAH
• Define fill and tail appropriately so that they can be stored in WORDs.
WAH vs. B-tree vs. BBC
• On one dimensional range queries
• The query response time grows linearly in the number of hits.
• B-tree has the same time complexity
• Records selected by each single dimension can not be easily combined.
• Query response time
• I/O + CPU
• I/O: WAH > BBC
• CPU: BBC > WAH
• In Total：using WAH is 10 times faster than using BBC
Characteristics of Industrial Products
• Model 204. (Pat O’Neil,1987)
• The first that adopted bitmap index
• Basic Bitmap Index, No binning, No compression
• Now owned by Computer Corporation of America
• Oracle （1995）
• Adopted compressed bitmap index since 7.3
• Probably use BBC for compression, Equality-encoded, No binning.
• Sybase IQ
• bit-sliced index(Pat O’Neil et al,1997)
• No binning, No compression
• For dimension with small number of distinct values, use Basic Bitmap Index.
References
• Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.