1 / 5

Indexing Scientific Data With FastBit

Indexing Scientific Data With FastBit. Motivating Examples Find the collision events with the most distinct signature of Quark Gluon Plasma Find the ignition kernels in a combustion simulation Track a layer of exploding supernova These are not typical database searches:

ilar
Download Presentation

Indexing Scientific Data With FastBit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexing Scientific Data With FastBit • Motivating Examples • Find the collision events with the most distinct signature of Quark Gluon Plasma • Find the ignition kernels in a combustion simulation • Track a layer of exploding supernova • These are not typical database searches: • Large high-dimensional data sets (1000 time steps X 1000 X 1000 X 1000 cells X 100 variables) • Most data records never modified, i.e., append-only data • Multi-dimensional queries: 500 < Temp < 1000 && CH3 > 10-4 && … • Large answers (hit thousands or millions of records) • Seek collective features e.g., regions of interest, not average and sum operations • New searching technology needed

  2. A Good Candidate: Bitmap Index Data values b0 b1 b2 b3 b4 b5 • First commercial version • Model 204, P. O’Neil, 1987 • Take less time to build than B-trees • Efficient for querying: only bitwise logical operations • A < 2  b0 OR b1 • A > 2  b3 OR b4 OR b5 • Efficient for multi-dimensional queries • Use bitwise operations to combine the partial results • Size may be large: one bit per distinct value per row • Definition: Cardinality == number of distinct values • Compact for low cardinality attributes, say, cardinality < 100 • Worst case: cardinality = N, number of rows; index size: N*N bits • First commercial version • Model 204, P. O’Neil, 1987 • Take less time to build than B-trees • Efficient for querying: only bitwise logical operations • A < 2  b0 OR b1 • A > 2  b3 OR b4 OR b5 • Efficient for multi-dimensional queries • Use bitwise operations to combine the partial results • Size may be large: one bit per distinct value per row • Definition: Cardinality == number of distinct values • Compact for low cardinality attributes, say, cardinality < 100 • Worst case: cardinality = N, number of rows; index size: N*N bits =0 =1 =2 =3 =4 =5 0 1 5 3 1 2 0 4 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 A < 2 A < 2 2 < A

  3. 31 bits 31 bits (62 groups skipped) … 31 bits Merge neighboring groups with identical bits 0 0 31 literal bits 31 literal bits 1 0 31-bit count=63 Encode each group using one 32-bit word 32 bits Compression Makes It Better Example: 2015 bits 10000000000000000000011100000000000000000000000000000……………….00000000000000000000000000000001111111111111111111111111 Main Idea: Use run-length-encoding, but... partition bits into 31-bit groups [not 32 bit] on 32-bit machines • Name: Word-Aligned Hybrid (WAH) code • Key features: • Compressed indices typically 30% of raw data • 10X faster in answering queries than the most competitive bitmap index • Worst case index size 4N words, not N*N

  4. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Handling Collective Features:Regions of Interest • FastBit has been used in • GridCollector for High-Energy Physics Experiment STAR • Dexterous Data Explorer (DEX) for query driven visualization • Dynamic histograming for network traffic analysis • On the right is an illustration of our region-growing approach FastBit Data Query Region Growing Index Region Tracking 2-D connected regions identified with line segments (in green) Line segments come out of FastBit compressed bitmaps

  5. Future Plans • Software development • Release FastBit under LGPL (John, March ’07) • Fastbit Integration with ROOT (John, Sept ’07) • Fastbit Integration with HDF5 for Particle Physics (Kurt) • Finding Regions of Interest • Existing work only dealt with data on regular meshes • Working on extensions to AMR mesh (Kurt), GTC mesh (John), and tetrahedral mesh (Rishi) • New features (research) • Parallel version • Table groups / partitions • Range join

More Related