Bitmap indices for data warehouse
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Bitmap Indices for Data Warehouse PowerPoint PPT Presentation


  • 37 Views
  • Uploaded on
  • Presentation posted in: General

Bitmap Indices for Data Warehouse. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Feb 27, 2009. Star Schema Vs. Multi-dimensional Range Queries. SUM (qty * amt) WHERE ProdId in [p1.. p10] AND custId < 200.

Download Presentation

Bitmap Indices for Data Warehouse

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bitmap indices for data warehouse

Bitmap Indices for Data Warehouse

Jianlin Feng

School of Software

SUN YAT-SEN UNIVERSITY

Feb 27, 2009


Star schema vs multi dimensional range queries

Star Schema Vs. Multi-dimensional Range Queries

SUM (qty * amt)

WHERE ProdId in [p1.. p10] AND custId < 200


Characteristics of multi dimensional range queries in data warehouse

Characteristics of Multi-Dimensional Range Queries in Data Warehouse

  • Ad-Hoc

    • Give N dimensions (attributes), every combination is possible: 2Ncombinations

    • A Data Cube equals to 2N GROUP-Bys

  • High Dimensions ( > 20)

  • Large Number of Records


Multi dimensional index fails

Multi-Dimensional Index Fails!

  • R-Trees or KD-Trees

    • Effective only for moderate number of dimensions

    • Efficient only for queries involving all indexed dimensions.

  • For Ad-hoc Rang Queries, Projection Index is usually better, and Bitmap Index is even better.


Projection index

Projection Index

  • Fix the order of the records in the base table

    • Store

  • Project records along some dimension

    • i.e, A single Column

    • Keeping the record order

    • Keeping the duplicates

  • Like “array” in C language

base table

Projection Index


Multi dimensional range queries a general idea

Multi-dimensional Range Queries : A General Idea

  • Build an index for each dimension (attribute);

    • A Projection Index

    • A B-Tree

      • 1 Primary B-Tree, N -1 Secondary B-Trees

  • For each involved dimension, use the index on that dimension to select records;

  • “AND” the records to get the final answer set.


How to make the and operation fast

How to make the “AND” operation fast?

  • Projection Index (B-Tree is similar)

    • Scan each involved dimension,

    • And return a set of RIDs.

    • Intersection the RID sets

      • Sets have different lengths

      • We can use Sort and Merge to do the Intersection

    • Life is easier

      • when all the sets have the same length and in the same order

      • Use 1/0 to record the membership of each record


General ideas of bitmap index

General Ideas of Bitmap Index

  • Fix the order of records in the base table

  • Suppose the base table has m records

  • For each dimension

    • For each distinct dimension value (as the KEY)

    • Build a bitmap with m bits (as the POSITIONS)

    • A bitmap is like an Inverted Index

  • “AND”, “OR” operations

    • realized by bitwise logical operations

    • Well supported by hardware


Basic bitmap index p o neil model 204 1987

Basic Bitmap IndexP. O’Neil, Model 204,1987


Size of bitmap indices

Size of Bitmap Indices

  • Number of Bitmap (Indices)

    • How to build bitmap indices for dimensions with large distinct values

    • Temperature dimension

  • Size (i.e., Length) of a Single Bitmap


Three solutions

Three Solutions

  • Encoding

    • Reduce the Number of Bitmaps

  • Binning

    • Reduce the Number of Bitmaps

  • Compression

    • Reduce the Size of a Single Bitmap


Encoding strategies

Encoding Strategies

  • Equality-encoded

    • Good for equality queries,such as “temperature == 100”

    • Basic Bitmap Index

  • Bit-sliced index

    • Assume dimension A has c distinct values, use log2c bitmap indices to represent each record (its value)

  • Range-encoded

    • Good for one-sided range queries, such as “Pressure < 56.7”

  • Interval-encoded

    • Good for two-sided range queries, such as“35.8 < Pressure < 56.7”


Binning

Binning

  • Encoding mainly considers discrete dimension values

    • Usually integers

  • Basic Ideas of Binning

    • Build a bitmap index for a bin instead of for a distinct value

    • The Number of Bitmaps has nothing to do with the number of distinct values in a dimension.

  • Pros and Cons

    • Pros:control the number of bitmap via controling the number of bins.

    • Cons:need to check original dimension values to decide if the records really satisfy query conditions.


A binning example values of dimension a lie in 0 100

A Binning Example:Values of Dimension A lie in [0, 100]


Compression strategies

Compression Strategies

  • General-purpose compression methods

    • Software packages are widely available

    • Tradeoff between query processing and compression ration

      • De-compress data first

  • Specific methods

    • BBC (Byte-aligned Bitmap Code ), Antoshenkov,1994,1996.

      • Adopted since Oracle 7.3

    • WAH(Word-aligned Hybrid Bitmap code ), Wu et al 2004, 2006.

      • Used in Lawrence Berkeley Lab for high-energy physics


Wah word aligned hybrid bitmap code

WAH(Word-aligned Hybrid Bitmap code )

  • Based on run-length encoding

    • For consecutive 0s or 1s in a bit sequence (part of a bitmap)

  • Use machine WORD as the unit for compression

    • Instead of BYTE in BBC

  • Design Goal:

    • reduce the overhead of de-compression, in order to speed-up query response.


Run length encoding

Run-length encoding

  • Bit sequence B: 11111111110001110000111111110001001

  • fill:a set of consecutive identical bits (all 0s or all 1s)

    • The first 10 bits in B

    • fill = count“+”bit value

    • 1111111111=10 “+” 1

  • tail: a set of mizxed 0s and 1s

    • The last 8 bits in B

  • Run:

    • Run = fill + tail

  • Basic Ideas of WAH

    • Define fill and tail appropriately so that they can be stored in WORDs.


Word aligned hybrid bitmap code 32 bit word

Word-aligned Hybrid Bitmap code:32-bit WORD


Wah vs b tree vs bbc

WAH vs. B-tree vs. BBC

  • On one dimensional range queries

    • The query response time grows linearly in the number of hits.

  • B-tree has the same time complexity

    • Records selected by each single dimension can not be easily combined.

  • Query response time

    • I/O + CPU

    • I/O: WAH > BBC

    • CPU: BBC > WAH

    • In Total:using WAH is 10 times faster than using BBC


Characteristics of industrial products

Characteristics of Industrial Products

  • Model 204. (Pat O’Neil,1987)

    • The first that adopted bitmap index

    • Basic Bitmap Index, No binning, No compression

    • Now owned by Computer Corporation of America

  • Oracle (1995)

    • Adopted compressed bitmap index since 7.3

    • Probably use BBC for compression, Equality-encoded, No binning.

  • Sybase IQ

    • bit-sliced index(Pat O’Neil et al,1997)

    • No binning, No compression

    • For dimension with small number of distinct values, use Basic Bitmap Index.


References

References

  • Kurt Stockinger, Kesheng Wu, Bitmap Indices for Data Warehouses, In Wrembel R., Koncilia Ch.: Data Warehouses and OLAP: Concepts, Architectures and Solutions. Idea Group, Inc. 2006.


  • Login