approximate encoding for direct access and query processing over compressed bitmaps n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps PowerPoint Presentation
Download Presentation
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps

Loading in 2 Seconds...

play fullscreen
1 / 26

Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps. Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio State University Hakan Ferhatosmanoglu – The Ohio State University Ali Saman Tosun – University of Texas at San Antonio.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
approximate encoding for direct access and query processing over compressed bitmaps

Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps

Tan Apaydin – The Ohio State University

Guadalupe Canahuate – The Ohio State University

Hakan Ferhatosmanoglu – The Ohio State University

Ali Saman Tosun – University of Texas at San Antonio

presentation outline
Presentation Outline
  • Motivation
  • Goal
  • Approximate Bitmaps (AB) encoding
  • AB example
  • Theoretical analysis
  • Experiments and Results
  • Conclusion
motivation
Motivation
  • Bitmap indices
    • Data warehouses
    • Scientific data
    • Visualization applications
    • Bitwise operations
  • Bitmap Compression
    • Run-length encoders
      • Word Aligned Hybrid (WAH)
      • Byte-aligned Bitmap Code (BBC)
motivation1
Motivation
  • The row numbers do not longer correspond to the bit position in the bitmap
  • Queries over few particular rows
    • As expensive as queries asking for all the rows
  • Commonly, users are only interested in a small subset of the dataset at a time.
  • For example:
    • A query over the transactions of the last 7 days
    • Spatial queries over objects in a specific geographical area
motivation2
Motivation
  • Visualization applications
    • Millions of different readings ordered by their geographic location
    • Users ask range queries over some of the readings for a given area
    • The answers are highlighted in the screen
    • Several degrees of resolution make approximate answers acceptable
our goal
Our Goal
  • Enable direct access over any subset of the bitmap
  • Achieve effective compression
  • Maintain bitwise operations for query execution
  • Trade-off efficiency vs. accuracy
    • No false negatives
the approach
The approach
  • Our solution is inspired by Bloom Filters
    • A 2m bit array indexed using k independent hash functions
    • A data object is inserted by setting the k positions in the array corresponding to the hash values of the object
    • False positives can happen, but false negatives cannot
approximate bitmaps ab
Approximate Bitmaps (AB)
  • A bloom filter-like structure
  • Only the set bits are inserted into the AB
  • Three levels of encoding:
    • Per table, per attribute, per bitmap column
  • Parameters:
    • The hash string mapping function, F
    • The k hash functions, {H1(x),…,Hk(x)}
    • The size of the AB, n = αs = 2m
  • Precision in terms of α and k, ~(1-(1-e-k/α)k)
ab example
AB Example
  • A bitmap table for a dataset with 8 rows and 3 attributes. Each attribute is divided into 3 categories.
  • Bitmap Table Size: 72 bits
  • Number of set bits = 24.
  • F(i,j) = concatenate(i,j) = x
  • H1(x) = x mod 32
  • m = 5
  • AB Size: 25 = 32 bits
ab example insertion
AB Example - Insertion
  • Initially all bits in the AB are zero
  • To insert set bit in (1,1)
ab example insertion1
AB Example - Insertion
  • To insert set bit in (1,1)
    • x = 11
    • H(11) = 11 mod 32 = 11
    • AB(11) = 1
ab example insertion2
AB Example - Insertion
  • To insert set bit in (5,4)
    • x = 54
    • H(54) = 54 mod 32 = 22
    • AB(22) = 1
ab example insertion3
AB Example - Insertion
  • After all insertions
ab example analysis
AB Example - Analysis
  • Estimated Precision:
    • α = ABSize/Set Bits
    • α = 32/24 = 1.33
    • k = 1
    • FP = (1-e-k/α)
    • P = 1-FP
    • P = 1-(1-e-1/1.33)
    • P = 47%
  • The underlined positions are false positives
  • Only 8 out of the 48 zeros are set in the AB
ab example retrieval
AB Example - Retrieval
  • Row 4:
    • (4,7): H(47) = 15
      • AB(15)=0
    • (4,8): H(48) = 16
      • AB(16)=1
  • Row 5:
    • (5,7): H(57) = 25
      • AB(25)=1
    • Stop
  • Consider this query, asking for 4 rows
  • This a range query over 4 rows, where the third attribute falls into C1 or C2
ab example retrieval1
AB Example - Retrieval
  • Row 6:
    • (6,7): H(67) = 3
      • AB(67)=1
      • Stop
  • Approx Query Answer:
    • {1,1,1,0}
  • Exact Answer:
    • {0,1,1,0}
  • Consider this query, asking for 4 rows
approximate bitmaps ab mapping function f
Approximate Bitmaps (AB) – Mapping Function F
  • F maps each cell in the bitmap table to a unique string (the hashing string)
  • For one AB per table and one AB per attribute, the bit in row i column j is identified by
    • F(i,j) = i << w || j, where w is large enough to accommodate all j
  • For one AB per column, the bit in row i is identified by
    • F(i,j) = i
approximate bitmaps ab hash functions
Approximate Bitmaps (AB) – Hash Functions
  • Single Hash Function
    • Called once and the result is divided into pieces.
    • Each piece considered as the value of a different hash function.
    • Secure Hash Algorithm (SHA), developed by National Institute of Standards and Technology (NIST)
  • Multiple Hash Functions
    • Independent hash functions
    • For large number, similar performance

Hash Function H0 H1 H2 ... H9

Bits 159..144 143..128 127..112 ... 15..0

SHA Output 0100100010001010 1000010100100001 0111100011100010 ... 0000010101110011

approximate bitmaps ab fp rate
Approximate Bitmaps (AB) – FP Rate
  • FP Rate: Probability that all k bits are set by another data object
  • n is the size of the AB
  • s is the number of set bits
  • n = αs, α = n/s
approximate bitmaps ab size
Approximate Bitmaps (AB) – Size
  • In terms of α:
    • n = αs
    • m = ceil(log2(αs))
  • One AB per dataset:
    • s = |A|*N
  • One AB per attribute:
    • s = N
  • One AB per column:
    • s depends on the data distribution
experimental setup
Experimental Setup
  • Three datasets:
  • Query by sampling (randomly selecting the columns queried)
  • Varying the number of rows queried from 100 to 10K
experimental results size
Experimental Results - Size
  • Always use the max α that produces a smaller or comparable AB than WAH
experimental results precision
Experimental Results - Precision
  • As αincreases, the precision increases steadily and is very close to 1 for larger α
  • Precision increases as k increases up to the optimum point
  • Because large number of hash functions produces more collisions
experimental results exec time
Experimental Results – Exec Time
  • Execution time of the AB depends on the number ofrows queried, not in the number of rows in the dataset
  • For queries over less than 10%~15% of the rows, AB execution is up to 3 orders of magnitude faster than WAH
conclusion
Conclusion
  • AB encoding approximates the bitmaps using multiple hashing of the set bits
  • Allows efficient retrieval of any subset of rows and columns
  • Trade-off between bitmap size and precision
  • Three levels of encoding
  • Approximate query answers are given without database access
questions and comments
Questions and Comments
  • Thank you!

Email: canahuat@cse.ohio-state.edu