1 / 12

Testing the In-Memory Column Store for in- d atabase physics analysis

Testing the In-Memory Column Store for in- d atabase physics analysis. Dr. Maaike Limper. About CERN. CERN - European Laboratory for Particle Physics. Support the research activities of 10 000 scientists from 110+ nationalities.

marcus
Download Presentation

Testing the In-Memory Column Store for in- d atabase physics analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing the In-Memory Column Store for in-database physics analysis Dr.Maaike Limper

  2. About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from 110+ nationalities Largest machine in the world, the Large Hadron Collider: 27km, 6000+ superconducting magnets Four main experiments: ATLAS, ALICE, CMS, LHCb Maaike Limper - CERN

  3. Higgs Boson discovery 4 July 2012: Scientists from ATLAS and CMS present Higgs discovery result Plots of the invariant mass of photon-pairs produced at the LHC show a significant bump around 125 GeV … • Operation of the Large Hadron Collider and its experiments relies on Oracle databases: conditions data, metadata, logging & monitoring data, … • … but the data-points in these plots did not came out of a database Maaike Limper - CERN

  4. CERN openlab My project: “Test the possibility of using the Oracle database for physics analysis” “CERN openlabis a unique public-private partnership between CERN and leading ICT companies. Its mission is to accelerate the development of cutting-edge solutions to be used by the worldwide LHC community” http://openlab.web.cern.ch Maaike Limper - CERN

  5. In-database physics analysis Higgs decay to 2 photons candidate: event display from the ATLAS experiment Maaike Limper - CERN

  6. In-database physics analysis Analysis queries • Predicate filtering to quickly apply object quality-criteria • Each analysis-specific query uses unique combination of columns J/ψ Ψ(3686) Physics Analysis database Separate physics-objects in separate tables Physics-object described by hundreds of variables wide tables! Maaike Limper - CERN

  7. The problem • Analysis query performance typically limited by I/O reads • Full table scans over tables with many columns, while only few columns are used for each specific analysis • Combination of columns unique for each query • Can’t index every column! Maaike Limper - CERN

  8. In-Memory Column Store • Profit from fast In-Memory reads • Read only columns relevant for the specific analysis query Oracle’s In-Memory Column Store provides a solution to reduce I/O read time, especially for tables with many columns Maaike Limper - CERN

  9. Compression rates Average compression rate of dataset is 2.1 with query compression and 3.6 with capacity high: physics-objects represent the bulk of the data 17/6/2014 • COMPRESS FOR QUERY vs CAPACITY HIGH • “electron”  typical physics-object data: mixture of int, float, double • “Event Filter” only booleans (mostly false), best compression • “Missing Energy”  table with floats & double, worst compression Maaike Limper - CERN

  10. Simple query performance 17/6/2014 Comparing “read from disk” vs IMC time: 1000x faster Comparing “read from buffer cache” vs IMC time: 40x faster Note2x more memory needed to put data in the buffer cache compared to placing it in the In-Memory Column store ! Maaike Limper - CERN

  11. Complex query performance With IMC only 10 s to make this plot, allowing the analyst to quickly optimize results while trying different variable combinations 17/6/2014 Comparing “read from disk” vs IMC time: 70x faster Comparing “read from buffer cache” vs IMC time: 7x faster Maaike Limper - CERN

  12. Conclusion 17/6/2014 IMC’s STAR-story: • Situation: In-database physics analysis is limited by I/O • Task: Remove I/O bottleneck for any query using any combination of columns in a table • Action: Use Oracle’s In-Memory Column Store • Take advantage of fast reads from cache • Columnar compression increases size of data that fits in-memory • Access only relevant columns and use predicate pruning to further reduce I/O • Result: I/O bottleneck removed, real-time in-database physics analysis is now possible* *while the Oracle database is not currently used for physics analysis, this study shows promising results using the In-Memory Column Store for in-database physics analysis Maaike Limper - CERN

More Related