1 / 16

Persistent User Data using Objectivity

Persistent User Data using Objectivity . Vincenzo Innocente CERN/EP/CMC. The missing Milestone. Introduction. Last RD45 milestone was about private persistent data and classes

errin
Download Presentation

Persistent User Data using Objectivity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Persistent User Datausing Objectivity Vincenzo Innocente CERN/EP/CMC The missing Milestone

  2. Introduction • Last RD45 milestone was about private persistent data and classes • Although a model was developed and prescriptions provided there was no evidence that it would have worked in a HEP-experiment production environment • In CMS, following and extending the RD45 model, we have developed procedures which allows any physicist • to develop and test private persistent classes • to manage its own private persistent objects User Collections

  3. User Tag (N-tuple) Tracker Alignment Ecal calibration Tracks Event Collection Collection Meta-Data Electrons Event HEP Data • Environmental data • Detector and Accelerator status • Calibrations, Alignments • Event-Collection Meta-Data (luminosity, selection criteria, …) • … • Event Data, User Data Navigation is essential for an effective physics analysis Complexity requires coherent access mechanisms User Collections

  4. Requirements • Software Development: • Physics reconstruction developers should be able to develop, test and integrate persistent classes without interferer with other developments (same as for transient classes) • “End Users” should be able to develop and use private persistent classes • Data: • Physicists (“End Users”) should be able to access any kind of data without interfering with its production • Physicists should be able to populate private databases, using and referencing “common objects”, without interfering with production activities • Environment: • Development and running environment should be the same for system (experiment-wide) and user data • Access mechanisms should be the same for system and user data User Collections

  5. Technical Solutions • FD-Shallow-copy: A “federation shallow-copy” is a local copy of .boot and .FDDB ooinstalled -nocatalog with all original database files made read-only • Development • Named schema (few: 5 or so) are used to avoid interferences and ease integration • Development and tests are performed against fd-shallow-copy • Schema is exchanged using ooschemadump/upgrade • Standard scripts (today making use of SCRAM, tomorrow integrated into SCRAM) are provided to parse ddl • A rich middle-ware of C++ classes, often template, is provided to reduce (to zero?) the Objectivity-specific code to be known by physicists • In particular a user development environment is provided to develop “concrete-Tags” of simple structure User Collections

  6. Technical Solutions • Object shallow-copy Local copy with (one-way-)references to constituents • Object deep-copy Local copy with local copy of constituents • Data: • Users always start with a local federation-shallow-copy • Events are never modified in place: reconstruction always generate a new event collection and a new event-data structure with a shallow copy of the parent event • Users can produce deep copy of (part of) the event for a selected sample and generate a “user collection” • Concrete Tags (user private persistent objects) can be added to a user collection User Collections

  7. Navigation • Top Level: • User sees and navigates a Unix-like tree structure through a C++ or Python API (Shell) • Implementation is by Objy naming (root is a database system name) or any other object-containment mechanism mapped to a Unix-like tree by the “Shell” • Collections • We use a fully hirarchical composite collection system with metadata associated to each component • It allows sequential and random access with full support for fast user selection on MetaData • It can be used to organize any kind of objects that need indexing but slow update • Event • Navigation in the event structure and from the event to the configuration is implemented using one-way references (pure ooRefs) User Collections

  8. Owner Name DataSet Name Dataset Collection MetaData & “User Tag” “Run” Collection Rec Event User Collections

  9. User Collection “By Reference” MetaData & “User Tag” DB Name (physical location) Context Name Collection Name “Run” Collection User Collections are populated by User Filters Multiple User Filters (each populating a different User Collection) are allowed in a single ORCA job Original RecEvent User Collections

  10. RecApplication I/O Federation Datset Collection or User Collection Histograms & Tags Create/extend User Collections Append new Run to a Dataset Store RecReader Request Output Run is a new event collection containing new “data” (digis & RecObjs) and reference to or replica of input data Output User Collections are unmodified sub-samples of the input collection User Collections

  11. EvId RecEvent EvId DigiEvent SimEvent SimEvent SimEvent SimEvent Top Level Event Structure (COBRA5) Run Crossing Trigger Pile-up SimEvent User Collections

  12. Vector of Digi Vector of Digi Index Raw Event RawData are identified by the corresponding ReadOut. RawData belonging to different “detectors” are clustered into different containers. The granularity will be adjusted to optimize I/O performances. An index at RawEvent level is used to avoid the access to all containers in search for a given RawData. A range index at RawData level could be used for fast random access in complex detectors. RawEvent ReadOut ReadOut ... RawData RawData Index implemented as an ordered vector of pairs User Collections

  13. CMS Reconstructed Objects Reconstructed Objects produced by a given “algorithm” are managed by a Reconstructor. RecEvent A Reconstructed Object (Track) is split into several independent persistent objects to allow their clustering according to their access patterns (physics analysis, reconstruction, detailed detector studies, etc.). The top level object acts as a proxy. Intermediate reconstructed objects (RHits) are cached by value into the final objects . S-Track Reconstructor “esd” Track SecInfo “rec” S Track .. Track Constituents “aod” Vector of RHits S Track User Collections

  14. Id-2 Id-1 RecEvent RecEvent Id-1 Id-2 DigiEvent DigiEvent Id-0 SimEvent SimEvent SimEvent SimEvent SimEvent Re-Reconstruction & Clones Run Run Id-1 Local Replica Crossing Trigger Pile-up User Collections

  15. Collection “By Value” MetaData & “User Tag” New Owner Name DataSet Name Run Collection New RecEvent with new or cloned Digis & RecObjs User Collections

  16. Physical clustering User Collections

More Related