Bitmap indices for data warehouse
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Bitmap Indices for Data Warehouse PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Bitmap Indices for Data Warehouse. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009. Star Schema Vs. Multi-dimensional Range Queries. SUM (qty * amt) WHERE ProdId in [p1.. p10] AND custId < 200.

Download Presentation

Bitmap Indices for Data Warehouse

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Bitmap indices for data warehouse

Bitmap Indices for Data Warehouse

Jianlin Feng

School of Software


Feb 27, 2009

Star schema vs multi dimensional range queries

Star Schema Vs. Multi-dimensional Range Queries

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200

Characteristics of multi dimensional range queries in data warehouse

Characteristics of Multi-Dimensional Range Queries in Data Warehouse

  • Ad-Hoc

    • Give N dimensions (attributes), every combination is possible: 2Ncombinations

    • A Data Cube equals to 2N GROUP-Bys

  • High Dimensions ( > 20)

  • Large Number of Records

Multi dimensional index fails

Multi-Dimensional Index Fails!

  • R-Trees or KD-Trees

    • Effective only for moderate number of dimensions

    • Efficient only for queries involving all indexed dimensions.

  • For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.

Projection index

Projection Index

  • Fix the order of the records in the base table

    • Store

  • Project records along some dimension

    • i.e, A single Column

    • Keeping the record order

    • Keeping the duplicates

  • Like “array” in C language

base table

Projection Index

Multi dimensional range queries a general idea

Multi-dimensional Range Queries : A General Idea

  • Build an index for each dimension (attribute);

    • A Projection Index

    • A B-Tree

      • 1 Primary B-Tree, N -1 Secondary B-Trees

  • For each involved dimension, use the index on that dimension to select records;

  • “AND” the records to get the final answer set.

How to make the and operation fast

How to make the “AND” operation fast?

  • Projection Index (B-Tree is similar)

    • Scan each involved dimension,

    • And return a set of RIDs.

    • Intersection the RID sets

      • Sets have different lengths

      • We can use Sort and Merge to do the Intersection

    • Life is easier

      • when all the sets have the same length and in the same order

      • Use 1/0 to record the membership of each record

General ideas of bitmap index

General Ideas of Bitmap Index

  • Fix the order of records in the base table

  • Suppose the base table has m records

  • For each dimension

    • For each distinct dimension value (as the KEY)

    • Build a bitmap with m bits (as the POSITIONS)

    • A bitmap is like an Inverted Index

  • “AND”, “OR” operations

    • realized by bitwise logical operations

    • Well supported by hardware

Basic bitmap index p o neil model 204 1987

Basic Bitmap IndexP. O’Neil, Model 204,1987

Size of bitmap indices

Size of Bitmap Indices

  • Number of Bitmap (Indices)

    • How to build bitmap indices for dimensions with large distinct values

    • Temperature dimension

  • Size (i.e., Length) of a Single Bitmap

Three solutions

Three Solutions

  • Encoding

    • Reduce the Number of Bitmaps

  • Binning

    • Reduce the Number of Bitmaps

  • Compression

    • Reduce the Size of a Single Bitmap

Encoding strategies

Encoding Strategies

  • Equality-encoded

    • Good for equality queries,such as “temperature == 100”

    • Basic Bitmap Index

  • Bit-sliced index

    • Assume dimension A has c distinct values, use log2c bitmap indices to represent each record (its value)

  • Range-encoded

    • Good for one-sided range queries, such as “Pressure < 56.7”

  • Interval-encoded

    • Good for two-sided range queries, such as“35.8 < Pressure < 56.7”



  • Encoding mainly considers discrete dimension values

    • Usually integers

  • Basic Ideas of Binning

    • Build a bitmap index for a bin instead of for a distinct value

    • The Number of Bitmaps has nothing to do with the number of distinct values in a dimension.

  • Pros and Cons

    • Pros:control the number of bitmap via controling the number of bins.

    • Cons:need to check original dimension values to decide if the records really satisfy query conditions.

A binning example values of dimension a lie in 0 100

A Binning Example:Values of Dimension A lie in [0, 100]

Compression strategies

Compression Strategies

  • General-purpose compression methods

    • Software packages are widely available

    • Tradeoff between query processing and compression ration

      • De-compress data first

  • Specific methods

    • BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.

      • Adopted since Oracle 7.3

    • WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006.

      • Used in Lawrence Berkeley Lab for high-energy physics

Wah word aligned hybrid bitmap code

WAH(Word-aligned Hybrid Bitmap code )

  • Based on run-length encoding

    • For consecutive 0s or 1s in a bit sequence (part of a bitmap)

  • Use machine WORD as the unit for compression

    • Instead of BYTE in BBC

  • Design Goal:

    • reduce the overhead of de-compression, in order to speed-up query response.

Run length encoding

Run-length encoding

  • Bit sequence B: 11111111110001110000111111110001001

  • fill:a set of consecutive identical bits (all 0s or all 1s)

    • The first 10 bits in B

    • fill = count“+”bit value

    • 1111111111=10 “+” 1

  • tail: a set of mizxed 0s and 1s

    • The last 8 bits in B

  • Run:

    • Run = fill + tail

  • Basic Ideas of WAH

    • Define fill and tail appropriately so that they can be stored in WORDs.

Word aligned hybrid bitmap code 32 bit word

Word-aligned Hybrid Bitmap code:32-bit WORD

Wah vs b tree vs bbc

WAH vs. B-tree vs. BBC

  • On one dimensional range queries

    • The query response time grows linearly in the number of hits.

  • B-tree has the same time complexity

    • Records selected by each single dimension can not be easily combined.

  • Query response time

    • I/O + CPU

    • I/O: WAH > BBC

    • CPU: BBC > WAH

    • In Total:using WAH is 10 times faster than using BBC

Characteristics of industrial products

Characteristics of Industrial Products

  • Model 204. (Pat O’Neil,1987)

    • The first that adopted bitmap index

    • Basic Bitmap Index, No binning, No compression

    • Now owned by Computer Corporation of America

  • Oracle (1995)

    • Adopted compressed bitmap index since 7.3

    • Probably use BBC for compression, Equality-encoded, No binning.

  • Sybase IQ

    • bit-sliced index(Pat O’Neil et al,1997)

    • No binning, No compression

    • For dimension with small number of distinct values, use Basic Bitmap Index.



  • Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.

  • Login