1 / 34

Oracle

Oracle. A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments. The Story So Far…. 1992: CHEP – DB panel , CLHEP K/O, CVS … 1994: start of OO projects 1997: proposal of ODBMS+MSS; BaBar 2001: CMS change of baseline Objy

rodp
Download Presentation

Oracle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle A Role in LHC Data Handling? Jamie Shiers, IT-DB Based on work with early releases of Oracle 9i by IT-DB + experiments

  2. The Story So Far… • 1992: CHEP – DB panel, CLHEP K/O, CVS … • 1994: start of OO projects • 1997: proposal of ODBMS+MSS; BaBar • 2001: CMS change of baseline Objy • 2003: POOL ready for production • POOL production plans include use of EDG Replica Location Service • Deployment on Oracle 9iAS+9iRAC/DB at Tier0/1?

  3. ODBMS • BaBar (SLAC) claim probably largest DB in the world • 681.8 TB stored in 473205 files • CERN 300TB (COMPASS) + HARP (30TB) + CMS (300TB) + others • Recently migrated 300TB@120MB/s out of ODBMS • Oracle + ‘flat file’ solution • Many high level similarities to LHC proposal • Time pressure required pragmatic solution – could not wait for POOL

  4. Migration History - Data Rates http://lxshare075d:8888/

  5. Data Processing Diagram Processing Node LOG 9940 9940B Input diskpools2x200GB Output disk pool ORACLE Castor Castor 10 MB/s overall data throughput per node Sustained rates of 120MB/s over 24 hour periods

  6. Oracle for LHC? Numerous concrete examples for non-Physics data Machine construction / controls Detector construction / assembly Physics infrastructure (book-keeping, catalogues etc.)

  7. Oracle & LHC – What is ~Clear • Will continue to be used as part of EDMS service • Will continue be used “à la LEP” for logging, monitoring, control of LHC • Will continue to be used for detector construction / assembly / monitoring • Total Data: ~10TB • 2nd Sun cluster for physics apps ~300GB disk • Growing maybe to 10TB by LHC startup

  8. Oracle & LHC – What is Likely • Will continue to be used as part of EDMS service • Will continue be used “à la LEP” for logging, monitoring, control of LHC • Will continue to be used for detector construction / assembly / monitoring • Total Data: ~10TB • 2nd Sun cluster for physics apps ~300GB disk CERN Engineering Data Management System ~300,000 documents, many related to LHC construction

  9. Oracle Usage: Some Examples • Detector DB • Conditions DB • Run Catalogues • The Grid Details in hidden slides

  10. The Grid Example: POOL file catalogue (based on EDG-RLS)

  11. POOL File Catalogue • Require ~106 entries / expt now • Rising to ~108 / 109 (?) in 2008 / 2020 • A few KB / entry; a few TB total • Implementation based on EDG-RLS • Deployed at Tier0/Tier1 on • Oracle 9iAS / Oracle9iRAC • Have to demonstrate it can meet requirements (# concurrent users / transaction rate / manageability / cost of ownership) • Fall back: 9iAS + non-RAC (Tomcat/MySQL at Tier2/3) • Open question about event-level meta-data • COMPASS / HARP 100-200bytes/event • LEP “collaboration Ntuple” 200 columns = 1KB/event • Could result in 100TB – 1PB data volumes

  12. Oracle for Physics Data • Focus on scalability issues: • Current Very Large Database (VLDB) market in 1-50TB • Can we really extend by 3 orders of magnitude?

  13. Oracle for Physics DataKey Issues • Complexity of data • Oracle’s support for Objects? • C++ binding (OCCI) • Volume of data • Several hundred PB • Oracle 9i technologies: • VLDB support • 9iRAC

  14. Oracle for Physics DataKey Issues • Complexity of data • Oracle’s support for Objects? • C++ binding • Oracle C++ Call Interface (OCCI) • Object Type Translator (OTT) • Volume of data • Oracle 9i technologies

  15. OCCI / OTT Can handle HEP data models • Define data model using SQL • Generate C++ definitions & code using OTT • Add user attributes & code in classes that inherit from generated ones • Tested for a variety of non-trivial data models • Objects embedded by value and/or reference • Arrays of … • Polymorphic tables • Templated transient classes with multiple inheritance on the transient side

  16. Oracle for Physics DataKey Issues • Complexity of data • Extensive use of Oracle’s support for Objects • C++ binding (OCCI) • Volume of data • Several hundred PB • Oracle 9i technologies: • VLDB support • 9iRAC

  17. Data R A W E S D A O D TAG 1TB/yr 10TB/yr 100TB/yr Tier1 1PB/yr (1PB/s prior to reduction!) Tier0 random seq. Users

  18. LHC Data Volumes Data Category Annual Total RAW 1-3PB 10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB 100TB TAG 1TB 10TB Total per experiment ~4PB ~40PB Grand totals (15 years) ~16PB ~250PB

  19. Divide & Conquer • Split data from different experiments • Split different data categories • Different schema, users, access patterns,… • Focus on mainstream technologies & low-risk solutions • VLDB target: 100TB databases • How do we build 100TB databases? • How do we use 100TB databases to solve 100PB problem?

  20. Why 100TB DBs? • Possible today • Expected to be mainstream within a few years • Vendors must provide support • (See also hidden slides)

  21. Oracle for Physics DataKey Issues • Complexity of data • Extensive use of Oracle’s support for Objects • C++ binding (OCCI) • Volume of data • Several hundred PB • Oracle 9i technologies: • 9iRAC • VLDB support

  22. Potential Benefits of 9iRAC • Scalability • Allows 100TB databases to be supported using commodity h/w: Intel/Linux server nodes • Manageability • Small number of RAC manageable with foreseeable resources: tens – hundreds of smaller single instances not • Better Resource Utilization • Shared disk architecture avoids hot-spots and idle / overworked nodes • Shared cache improves performance for frequently accessed read-only data

  23. LHC Data Volumes Data Category Annual Total RAW 1-3PB 10-30PB Event Summary Data - ESD 100-500TB 1-5PB Analysis Object Data - AOD 10TB 100TB TAG 1TB 10TB Total per experiment ~4PB ~40PB Grand totals (15 years) ~16PB ~250PB

  24. 100TB DBs & LHC Data • Analysis data: 100TB ok for ~10 years • One 9iRAC per experiment • Intermediate: 100TB ~1 year’s data • ~40 9iRACs • RAW data: 100TB = 1 month’s data • 400 9iRACs to handle all RAW data • 10 RACs / year, 10 years, 4 experiments

  25. RAW Data: a few PB / year • Access pattern: sequential • Access frequency: ~once per year • Use time partitioning + offline tablespaces • Historic data copied to “tape” • “Eventuellement” dropped from DB catalogue • Restored on demand • 100TB = 10 day time window • Current data (1 RAC) historic data (2nd RAC)

  26. Partitions & Files: Limits • Currently limited to 216 • 179 years if 1 partition / day • ~500TB DBs with ~10GB files • Current practical limit is 38,000 files / DB • Sufficient to build 100TB DBs • Need to be raised at some stage in the future…

  27. Event Summary Data (ESD) • ~100-500TB / experiment / year • Yotta-byte DBs predicted by 2020! • 1000,000,000 TB • Can RAC capabilities grow fast enough to permit just 1 RAC / experiment? • ++500TB / year • An open question …

  28. Oracle Deployment DAQ cluster: current data – no history reconstruct analysis export tablespaces to RAW cluster AOD/TAG 1 total? ESD cluster: 1/year? 1? to/from MSS to RCs to/from RCs

  29. VLDB issues • Oracle addressing limits of current architecture • Already permits 2EB databases theoretically… • Limits on e.g. # files, partitions etc are expected to be significantly increased beyond Oracle 9i • Limited to 216 architecturally, 38K measured • An area of work, but not concern…

  30. Storage Issues • Oracle Number format (8 bytes) provides greater precision than IEEE double (22 B) • Mapping 1000 classes with numeric data members to Oracle Number requires effort! • Solutions being investigated to allow efficient storage of floats / doubles / ints without user specifying precision / range • Target: next major Oracle release?

  31. Oracle & CERN

  32. If You Want to Know More… http://cern.ch/LCG/ http://cern.ch/db/ http://cern.ch/hep-proj-database/

  33. Summary – Oracle for LHC • A clear & important role to play • Likely to be used for non-event data • Hybrid solution (POOL) is the baseline for physics data • RDBMS backend to POOL in progress

More Related