1 / 69

Object Databases as Data Stores for HEP

Object Databases as Data Stores for HEP. Dirk Düllmann CERN, IT/ASD & RD45. Outline Part I. Intro HEP Data Models Data Management for LHC Object Database Features What makes ODBMS suitable for HEP Data Models ? Objectivity/DB Features Scalability Data Management and Distribution

perry-sloan
Download Presentation

Object Databases as Data Stores for HEP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Databases as Data Stores for HEP Dirk Düllmann CERN, IT/ASD & RD45 INFN School on Modern OO s/w Applications and Methodologies

  2. Outline Part I • Intro • HEP Data Models • Data Management for LHC • Object Database Features • What makes ODBMS suitable for HEP Data Models ? • Objectivity/DB Features • Scalability • Data Management and Distribution • Alternative products and approaches • HEP Projects using Objectivity/DB • Prototypes and production systems INFN School on Modern OO s/w Applications and Methodologies

  3. ALICE • Heavy ion experiment at LHC • Studying ultra-relativistic nuclear collisions • Relatively short running period • 1 month/year = 1PB/year • Extremely high data rates • 1.5GB/s INFN School on Modern OO s/w Applications and Methodologies

  4. ATLAS • General-purpose LHC experiment • High Data rates: • 100MB/second • High Data volume • 1PB/year • Test beam projects using Objectivity/DB in preparation: • Calibration database • Expect 600GB raw and analysis data INFN School on Modern OO s/w Applications and Methodologies

  5. CMS • General-purpose LHC experiment • Data rates of 100MB/second • Data volume of 1PB/year • Two test beams projects based on Objectivity successfully completed. • Database used in the complete chain:Test beam DAQ, Re-construction and Analysis INFN School on Modern OO s/w Applications and Methodologies

  6. LHCb • Dedicated experiment looking for CP-violation in the B-meson system. • Lower data rates than other LHC experiments. • Total data volume around 400TB/year. INFN School on Modern OO s/w Applications and Methodologies

  7. Event Tracker Calor. TrackList HitList Track Hit Track Hit Track Hit Track Hit Track Hit HEP Data Models • HEP data models are complex! • Typically hundreds of structure types (classes) • Many relations between them • Different access patterns • LHC experiments rely on OO technology • OO applications deal with networks of objects • Pointers (or references) are used to describe relations INFN School on Modern OO s/w Applications and Methodologies

  8. Data Management at LHC • LHC experiments will store huge data amounts • 1 PB of data per experiment and year • 100 PB over the whole lifetime • Distributed, heterogeneous environment • Some 100 institutes distributed world-wide • (Nearly) any available hardware platform • Data at regional-centers? • Existing solutions do not scale • Solution suggested by RD45: ODBMS coupled to a Mass Storage System • e.g. HPSS INFN School on Modern OO s/w Applications and Methodologies

  9. Object Database Features INFN School on Modern OO s/w Applications and Methodologies

  10. Object Persistency • Persistency • Objects retain their state between two program contexts • Storage entity is a complete object • State of all data members • Object class • OO Language Support • Abstraction • Inheritance • Polymorphism • Parameterised Types (Templates) INFN School on Modern OO s/w Applications and Methodologies

  11. OO Language Binding • User has to deal with copying between program and I/O representations of the same data • User has to traverse the in-memory structure • User has to write and maintain specialised code for I/O of each new class/structure type • Tight Language Binding • ODBMS allow to use persistent objects directly as variables of the OO language • C++, Java and Smalltalk (heterogeneity) • I/O on demand • No explicit store & retrieve calls INFN School on Modern OO s/w Applications and Methodologies

  12. Database# Cont.# Page# Slot# Navigational Access • Unique Object Identifier (OID) per object • Direct access to any object in the distributed store • Natural extension of the pointer concept • OIDs allow to implement networks of persistent objects (“associations”) • Cardinality: 1:1 , 1:n, n:m • uni- or bi-directional (referential integrity!) • OIDs are used via so-called “smart-pointers” INFN School on Modern OO s/w Applications and Methodologies

  13. How do “smart pointers” work? • d_Ref<Track> is a smart pointer to a Track • The database automatically locates objects as they are accessed and reads them. • User does not need to know about physical locations. • No host or file names in the code • Allows de-coupling of logical and physical model INFN School on Modern OO s/w Applications and Methodologies

  14. Event Tracker Calor. TrackList HitList Track Hit Track Hit Track Hit Track Hit Track Hit A Code Example Collection<Event> events; // an event collection Collection<Event>::iterator evt; // a collection iterator // loop over all events in the input collection for(evt = events.begin(); evt != events.end(); evt++) { // access the first track in the tracklist d_Ref<Track> aTrack; aTrack = evt->tracker->trackList[0]; // print the charge of all its hits for (int i = 0; i < aTrack->hits.size(); i++) cout << aTrack->hits[i]->charge << endl; } INFN School on Modern OO s/w Applications and Methodologies

  15. Object Clustering • Goal: Transfer only “useful” data • from disk server to client • from tape to disk • Physically cluster objects according to main access patterns • Clustering by type • e.g. Track objects are “always” accessed with their hits • Main access patterns may change over time • Performance may profit from re-clustering • Clustering of individual objects • e.g. All Higgs events should reside in one file INFN School on Modern OO s/w Applications and Methodologies

  16. Physical Model and Logical Model • Physical model may be changed to optimise performance • Existing applications continue to work INFN School on Modern OO s/w Applications and Methodologies

  17. Concurrent Access • Support for multiple concurrent writers • e.g. Multiple parallel data streams • e.g. Filter or reconstruction farms • e.g. Distributed simulation • Data changes are part of a Transaction • ACID: Atomic, Consistent, Isolated, Durable • Access is co-ordinated by a lock server • MROW: Multiple Reader, One Writer per container (Objectivity/DB) INFN School on Modern OO s/w Applications and Methodologies

  18. Objectivity Specific Features INFN School on Modern OO s/w Applications and Methodologies

  19. Federation Database Container Page Object Objectivity/DB Architecture • Architectural Limitations: OID size 8 bytes • 64K databases • 32K containers per database • 64K logical pages per container • 4GB containers for 64kB page size • 0.5GB containers for 8kB page size • 64K object slots per page • Theoretical limit: 10 000PB • assuming database files of 128TB • RD45 model assumes 6.5PB • assuming database files of 100GB • extension or re-mapping of OID have been requested INFN School on Modern OO s/w Applications and Methodologies

  20. Scalability Tests • Federated Databases of 500GB have been demonstrated • Multiple federations of 20-80GB are used in production • 32 filter nodes writing in parallel into one federated database • 200 parallel readers (Caltech Exemplar) • Objectivity/DB shows expected scalability • Overflow conditions on architectural limits are handledgracefully • Only minor problems found and reported back • 2GB file limit; fixed and tested up to 25GB • Federations of hundreds of TB are possible with the current version INFN School on Modern OO s/w Applications and Methodologies

  21. Application Host Application & Disk Server Application Application Objy Client Objy Client Objy Server ObjyLock Server Objy Server Objy Server HPSS Server HPSS Client Data Server connected to HPSS Disk Server A Distributed Federation INFN School on Modern OO s/w Applications and Methodologies

  22. Site 1 Site 2 Site 3 Data Replication • Objects in a replicated DB exists in all replicas • Multiple physical copies of the same object • Copies are kept in sync by the database • Enhance performance • Clients access a local copy of the data • Enhance availability • Disconnected sites may continue to work on a local replica Wide Area Network INFN School on Modern OO s/w Applications and Methodologies

  23. Schema Evolution • Evolve the object model over the experiment lifetime • migrate existing data after schema changes • minimise impact on existing applications • Supported operations • add, move or remove attributes within classes • change inheritance hierarchy • Migration of existing Objects • immediate: all objects are converted using an upgrade application • lazy: objects are upgraded as they are accessed INFN School on Modern OO s/w Applications and Methodologies

  24. Object Versioning • Maintain multiple versions of a single “logical” object • Example: Versions of calibration data in the BaBar calibration DB package

  25. Other O(R)DBMS Products • Versant • Unix and Windows platforms • Scalable, distributed architecture • Independent databases • Currently most suitable fall back product • O2 • Unix and Windows platforms • Incomplete heterogeneity support • Recently bought by Unidata (RDBMS vendor) and merged with VMARK (data warehousing) INFN School on Modern OO s/w Applications and Methodologies

  26. Other O(R)DBMS Products II • Objectstore - Object Design Inc. • Unix and Windows platforms • Scalability problems • Proprietary compiler, kernel driver • ODI re-focussed on web applications • POET • Windows platform • Low end, scalability problems • What will the big Object Relational Vendors provide? INFN School on Modern OO s/w Applications and Methodologies

  27. Object Relational Databases • Present data either as object or as table • Table rows translate into objects • New data type “Row Id” allows navigation • Abstract data types (ADTs) • C++ or Java based DB API • Some ODBMS features still missing • Inheritance, Polymorphism, Templates • Clustering of individual objects INFN School on Modern OO s/w Applications and Methodologies

  28. ODBMS Architectures • ODBMS are less server centric than RDBMS • ODBMS client cache keeps frequently used objects close to the CPU • ODBMS client implements a large part of the functionality • Improved scalability • Main design parameters • Granularity of data exchange • Granularity of locking • Implementation of OID (logical vs. physical) INFN School on Modern OO s/w Applications and Methodologies

  29. Object Server & Object Locking • Versant DB • Object exchange between client and server • Only requested data is send • Every object needs an network transfer • Server “knows” about objects • Object level queries may be run on the server • Fat server, thin client • Single object locking • Simpler model for application programmer • Degrades scalability and performance INFN School on Modern OO s/w Applications and Methodologies

  30. Page Server & Container Locking • Objectivity/DB • Page exchange between client and server • Page does contain not only requested data • In case of good clustering, it contains other objects that will be requested soon • Server only “knows” about I/O pages • Thin server, fat client • Improved scalability • Locking on container level • All objects in one container are locked at once • Improved scalability and performance INFN School on Modern OO s/w Applications and Methodologies

  31. The ODMG Standard • Standardise on • Data definition language ODL • Data interchange format OIF • Language binding for C++, Java and SmallTalk • Goal: Simplify code migration from one database vendor to another • Most frequently used API elements are covered by the ODMG 2.0 standard. • Performance and the feature set of different ODBMS products differs significantly ! • Any large scale migration will still require a significant effort! INFN School on Modern OO s/w Applications and Methodologies

  32. HEP Projects based on Objectivity/DB INFN School on Modern OO s/w Applications and Methodologies

  33. Production - BaBar • BaBar at SLAC, due to start taking data in 1999 • Objectivity/DB is used to store event, simulation, calibration and analysis data • Expected amount 200TB/year, majority of storage managed by HPSS • Mock Data Challenge 2: • Production of 3-4 Million events in August/September • Partly distributed to remote institutes • Cosmic runs starting in October INFN School on Modern OO s/w Applications and Methodologies

  34. Production - ZEUS • ZEUS is a large detector at the DESY electron-proton collider HERA • Since 1992 study of interactions between electrons and protons • Analysis environment: mainly FORTRAN code based on ADAMO • Objectivity/DB is used for event selection in the analysis phase • Store ~20GB of “tag data” - plan to extend to 200GB • Reported a significant gain in performance and flexibility compared to the old system INFN School on Modern OO s/w Applications and Methodologies

  35. Production - AMS • The Alpha Magnetic Spectrometer already took data on a NASA space shuttle flight this year and will be used later on the International Space Station. • Search for antimatter and dark matter • Data amount 100GB • Objectivity/DB is used to store production data, slow control parameters and NASA auxiliary data INFN School on Modern OO s/w Applications and Methodologies

  36. Production - CERES/NA45 • Heavy ion experiment at the SPS • Study of e+e- pairs in relativistic nuclear collisions • Successful use of Objectivity/DB from a reconstruction farm (32 Meiko CS2 nodes) • Expect to write 30 TB of raw data during 30 days of data taking • Reconstructed and filtered data will be stored using the Objectivity production service. INFN School on Modern OO s/w Applications and Methodologies

  37. Production - CHORUS • Searching for neutrino oscillations • Using Objectivity/DB for an online emulsion scanning database. • Plans to deploy this application at outside sites. • Also studying Objectivity/DB for TOSCA - a proposed follow-on experiment. INFN School on Modern OO s/w Applications and Methodologies

  38. COMPASS • COMPASS expects to begin full data taking in 2000 with a preliminary run in 1999. • Some 300TB of raw data will be acquired per year at rates up to 35MB/second. • Analysis data is expected to be stored on disk, requiring some 3-20TB of disk space. • Some 50 concurrent users and many passes through the data are expected. • Rely on the Objectivity production service at CERN INFN School on Modern OO s/w Applications and Methodologies

  39. Summary • Objectivity/DB based data stores provide • a single logical view of complex object models • integration with multiple OO languages • support for physical clusteringof data • scaling up to PB distributed data stores • seamless integration with MSS like HPSS • Adopted by a large number of HEP experiments • even FORTRAN based experiments evaluate Objectivity/DB for analysis and data conservation • Entering production phase at CERN now • Objectivity service has been set-up for ATLAS, CMS, COMPASS and NA45 INFN School on Modern OO s/w Applications and Methodologies

  40. Tutorial Example 1 • Objective: Populate a database with persistent Events • Define all involved classes • Simple object model consisting of :Event, Tracker, Track, Calo, Cluster • Create federated db, databases and containers • tracking and calo data are kept in separate databases (files) • Create events • Events contain randomly generated tracks and clusters INFN School on Modern OO s/w Applications and Methodologies

  41. Defining a persistent class • Define a C++ class in a .ddl file • very similar to a normal C++ header file • some restrictions apply (see next slides) • some additional features are available • Inherit from the persistent base class class Event : public d_Object {public: int eventNr; ... }; • Introduce the new class to the database schema • Run to Objectivity Schema Processor ooddlx INFN School on Modern OO s/w Applications and Methodologies

  42. DDL Restrictions • Persistent classes may not: • contain other persistent classes as data members • They may contain references to other persistent class though • Late (multiple-) inheritance from d_Object helps to keep transient and persistent classes in sync • contain C++ pointers or references • Neither directly nor through embedded classes • replace C++ pointers by database smart pointers • is the more intrusive change • Type declarationsof pointers referencing persistent objects have to be changed for all clients of a persistent capable class. • Code that uses these variables stays untouched. INFN School on Modern OO s/w Applications and Methodologies

  43. DDL - Additional Features • Persistent classes may use in addition: • variable length arrays as data members • Example: A Tracker object contains a variable number of Track objectsooVarray(Track) trackList; • bi-directional associations • Example: Each Event has one Tracker, each Tracker belongs to one Event. ooHandle(Tracker) itsTracker <-> itsEvent; • 1-to-N or N-to-M associations • Example: One Run object keeps links to all its “N” events:ooHandle(Event) itsEvents[]; INFN School on Modern OO s/w Applications and Methodologies

  44. Schema Handling • Definitions of persistent capable classes made in .DDL files • ooddlx processor generates appropriate headers & source code • Schema is added to federated database • Applications are built using generated files and the Objectivity library INFN School on Modern OO s/w Applications and Methodologies

  45. HepODBMS Layer • Goal: • Independence from vendor and/or release changes • Naming indirection of most prominent API classes • Provide missing features of the ODMG standard • HEP specific high level classes • Session set-up and diagnostic • Transaction control • Clustering Hint classes • Very large collections (> 109 Objects) • Hierarchical Object Naming INFN School on Modern OO s/w Applications and Methodologies

  46. Database Session Control • HepDbApplication - encapsulates db session control • Start/Commit/Abort Transactions • Set Lock handling options, lock wait time, number of retries • High level interface that allows • open/create/find FDBs, DBs and containers • Provide job or transaction level diagnostics for • cache efficiency • disk I/Os • object accesses and updates • container and object extension • steered by API and/or environment variables • heavily based on the ooSession class from Objectivity • small changes for Solaris, NT and transaction abort INFN School on Modern OO s/w Applications and Methodologies

  47. Setting up a DB session using the HepDbApplication class main(){HepDbApplication dbApp; // create an appl. objectdbApp.init(“MyFD”); // init FD connection dbApp.startUpdate(); // update mode transactiondbApp.db(“analysis”); // switch to db “analysis” // create a new container ContRef histCont = dbApp.container(“histos”); // create a histogram in this containerHepRef(Histo1D) h = new(histCont) Histo1D(10,0,5); dbApp.commit(); // Commit all changes} INFN School on Modern OO s/w Applications and Methodologies

  48. Object Clustering • The “new” operator provided by Objectivity allows to specify a clustering hint • may be a db, container or object reference • in which db, which container or close to which other object should the new object go • HepODBMS contains classes to encapsulate the clustering strategy in “Clustering Hint” objects • clustering into single physical containers (< .5 GB for 8kB pages) • clustering into logical containers (infinite size, spread over several db files) • parallel writing without lock contention • parallel load balanced reading • definition of class based clustering through persistent objects INFN School on Modern OO s/w Applications and Methodologies

  49. Clustering by Class // class definition in Track.ddlclass Track : public d_Object { d_Double phi; d_Double theta; d_ULong noOfHits;// more stuffpublic:static HepContainerHint clustering;};[…] // define clustering at startupTrack::clustering = dbApp.container(“tracks”);[…] // use the clustering defined for tracksHepRef(Track) aTrack = new (Track::clustering()) Track; INFN School on Modern OO s/w Applications and Methodologies

  50. ODBMS Stores on a Larger Scale INFN School on Modern OO s/w Applications and Methodologies

More Related