1 / 20

Various Bitmap Indexes for Warehouse Data Sets

kameryn
Download Presentation

Various Bitmap Indexes for Warehouse Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Various Bitmap Indexes for Warehouse Data Sets Ivan Dyedov and Daniel Elozory

    2. BITMAP: Sequence of bits representing attribute values One bit for each record BITMAP INDEX: collection of bitmaps used to represent an attribute DEFINITIONS

    3. Example from text book SAMPLE BITMAP INDEX Actually a value-list bitmap indexActually a value-list bitmap index

    4. Bitmap for each distinct attribute value Contains a “1” for each row in the relation where that attribute value is found Contains a “0” for all other rows Similar to RID-list for each key value <key value, RID-list> Value-List Index

    5. Bitmap index on attribute gender Find all females (gender=“Female”) Find all “1”s in the “Female” bitmap and return their rids Don’t have to go to the data level PROCESSING A SIMPLE QUERY

    6. Find the number of females Return the number of bits set to “1” for the “Female” bitmap Don’t have to go into the data level QUERY with COUNT()

    7. Find all people that are married or divorced (status = “married” or status = “divorced”) Same as before, but scan multiple bitmaps Performance is much better than B+ trees! for (i = 0; i < len(Bmarried); i++) Bresult[i] = Bmarried[i] OR Bdivorced[i]; NOT is performed similarly, requiring an extra Existence Bitmap QUERY with AND/OR/NOT/XOR in the WHERE clause Existence map contains a “1” for all rows that actually exist Result = ~B1[i] & EBM[i]Existence map contains a “1” for all rows that actually exist Result = ~B1[i] & EBM[i]

    8. Size of each bitmap (in bits) is equal to the number of rows in the relation Number of bitmaps for an attribute is equal to the number of distinct attribute values of that attribute (cardinality) Total space needed (in bits) = cardinality x number of rows SPACE COMPLEXITY

    9. Density = 1 / cardinality Value-list indexes take up very small amount of space for high density attributes For low density attributes (32+ distinct values) space usage is high Comparable to RID-list index (32 bits per RID) DENSITY

    10. Time needed to read a bitmap = Time to read one block x bitmap size / block size one 6KB block can hold 48K bits Has to scan all of the needed bitmaps only once TIME COMPLEXITY Evaluating 48K records at a timeEvaluating 48K records at a time

    11. Store the indexed attribute values separately In order of appearance in the relation To find a row containing a value of the attribute Scan the projection index instead of data level Number of blocks needed to scan is smaller Best for column product queries Projection Index

    12. A bitmap for each bit used to represent an attribute value Possible to use for numeric values Bi[n] represents the i-th bit of the attribute value in row n of the relation BIT-SLICED INDEX

    13. BIT-SLICED INDEX EXAMPLE

    14. SUM() and AVERAGE() can be calculated with a Bit-sliced index SUM = 0.00 For i = 0 to N // N = number of bits representing attribute value SUM += 2i * COUNT(Bi) If a condition in the WHERE clause exists AND Bi with the foundset Not useful for MIN() or MAX() OTHER AGGREGATES Foundset is the bitmap containing a “1” for each row that is to be processed Foundset can be calculated using the logical operators explained before based on the conditionFoundset is the bitmap containing a “1” for each row that is to be processed Foundset can be calculated using the logical operators explained before based on the condition

    15. Bit-sliced index can be used Generate bitmaps for LT,GT,EQ using logical operators in a single loop Best for wide ranges Good for narrow ranges Value-list is best RANGE QUERIES Look at all the bits from the most significantLook at all the bits from the most significant

    16. Bitmaps are often compressed Because of the possible high space usage with low density attributes Compressing is an advanced topic COMPRESSION

    17. Used in warehouse data sets which are large and are not updated frequently Updated in a batch fashion while the database is down Star-schema with fact and dimension tables Not good for data that is modified regularly Updates will require us to modify ALL the associated bitmap indexes Applications

    18. STAR SCHEMA JOIN INDEX Fact table…Fact table…

    19. In case you missed our names… Ivan Dyedov and Daniel Elozory QUESTIONS / COMMENTS?

    20. Bitmap Index http://en.wikipedia.org/wiki/Bitmap_index Bitmap Index vs. B-tree Index: Which and When? http://www.oracle.com/technology/pub/articles/sharma_indexes.html Understanding Bitmap Indexes http://www.dbazine.com/oracle/or-articles/jlewis3 Database Management Systems (3rd) 2003 Ramakrishnan, Gehrke References

More Related