Loading in 2 Seconds...
Loading in 2 Seconds...
Retrieving Objects from Toriodal Mesh Data Using FastBit Technology – A Progress Report Outline Overview of FastBit technology Recent progresses John Wu Scientific Data Management, Berkeley Lab http://sdm.lbl.gov/fastbit
FastBit Started In a Big Smash Searching for clues to Quark-Gluon Plasma in a large set of high-energy collisions High-Energy Physics experiment STAR 600 participants / 50 institutions / 12 countries Data rate 200 MB/s Data collected 5 PB ~ 1 Billion collision events, 5 MB per event (equivalent to having millions of variables) Challenge: finding 100 or so events with the best evidence of QGP
FastBit 10x Faster than DBMS Queries on 12 most queried attributes (2.2 million records) from STAR High-Energy Physics Experiment, average attribute cardinality 222,000 Experiments confirm that: WAH compressed indexes are 10X faster than bitmap indexes from a DBMS, 5X faster than our own implementation of BBC Size of WAH compressed indexes is only 30% of raw data size (a B+-tree index from a popular DBMS system is 3-4X) 2-D queries 5-D queries [Wu, Otoo, Shoshani 2001]
FastBit Grew with a Big Boom Searching for a more fuel efficient combustion engine (Homogeneous-Charge Compression Ignition engine) Require detailed numerical simulation with hundreds of variables Simulation mesh: 1000 x 1000 x 1000 1000s time steps per simulation Challenge: finding and tracking ignition kernels
FastBit Finds Volumes Faster Than Best Isocontour Finder FastBit finds volume of interest efficiently with compressed representation of the volume FastBit identifies volumes of interest as efficient as the best algorithm that identify the surface only (isocontouring), in theory FastBit is three times faster than the best isocontouring algorithm in VTK 3X [Wu, Koegler, Chen, Shoshani 2003] [Stockinger, Shalf, Bethel, Wu 2005]
FastBit Milestones • 2007/08: FastBit speed up drug discovery tool (first publication not involving any FastBit developers) • 2007/08: First public release, version a0.7 • 2007/06: Physical design reviewed • 2007/06: First PhD thesis involving FastBit completed • 2006/03: Prove formal optimality • 2006/02: Work on Enron data made headline at PRIMEUR • 2005/05: Appeared in ACM TechNews • 2005/05: Grid Collector wins ISC Award • 2005/01: CRD news report on FastBit • 2004/12: WAH patent issued
FastBit Progress Report Two-level encoding Feature identification on toroidal mesh http://sdm.lbl.gov/fastbit
Two Levels Are Better Than One • Most commonly used bitmap index is one-level equality encoded (e1) • Multi-level encoding was postulated to possibly improve query performance [Wu, Otoo, Shoshoni, 2000] [Sinha, Winslett, 2007] • Through extensive analyses, we found the correct number of coarse level bins to use, and ensure that the two-level encoding always perform better [Wu, Stockinger, Shoshani] bn = binary encoding e1 = one-level equality ee = equality-equality re = range-equality ie = interval-equality
Feature Identification on Toroidal Mesh • Defines connectivity based on the distances computed from (x, y, z) coordinates • Two ways to speed up the feature identification • work with lines instead of points • use an efficient connected component labeling algorithm • 10 – 100 times faster than working with points [Sinha, Winslett, Wu]
Better Approach – Redefine Connectivity • Redefine connectivity based on toroidal coordinates • Node A is connect to B and C on the same circle • To D and E on the circle just below the current one in the same plane • To F and G on the circle of the same radius in the plane just before • By symmetry, there are four more points on circles above and after • A total of 10 neighbors for every node – more than previous approach • Advantages of such connectivity definition • Neighbors of consecutive nodes on a circle, i.e., arc, also form arcs • These neighboring arcs fall on four different circles • Our labeling algorithm examines only two out of four circles
New Connectivity Improves Region Finding • Preliminary results • Three different labeling methods shown • XYZ: a nearest-neighbor mesh constructed from (x, y, z) coordinates • Torus – 1: connectivity described on previous page, label nodes • Torus – 2: connectivity described previously, label arcs • Speedup = ratio of total time used by two methods
New Approach Scales Well • Approach torus – 1 scales linearly with the number of nodes in the regions of interest • Approach torus – 2 scales linearly with the number of arcs in the regions on interests • Number of arcs <= number of nodes on the boundaries of the regions • O(|arcs|) O(|boundary|) • For regions defined with simple range conditions such “potential >= 1e-8”, where the boundaries of the regions are isocontours, approach torus – 2 scales as well as the best isocontouring algorithms • Need formal proof
Future Plans • GTC data • Wrap up the current work on 3D GTC data • Prepare for new 5D data • Add visualization front-end • Work with particles • FastBit software • Python API? • Other applications • Visualization • ?
FastBit is an efficient searching tool for data-driven science. Key techniques in FastBit have been extensively exercised. If you have an application that requires searching operations, feel free to contact us.