1 / 28

Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th 2002

Summary of the Persistency RTAG Report ( heavily based on David Malon’s slides ) - and some personal remarks. Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th 2002. SC2 mandate to the RTAG.

clark
Download Presentation

Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18 th 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary of the Persistency RTAG Report(heavily based on David Malon’s slides)- and some personal remarks Dirk Düllmann dirk.duellmann@cern.ch CCS Meeting, April 18th 2002

  2. SC2 mandate to the RTAG • Write the product specification for the Persistency Framework for Physics Applications at LHC • Construct a component breakdown for the management of all types of LHC data • Identify the responsibilities of Experiment Frameworks, existing products (such as ROOT) and as yet to be developed products • Develop requirements/use cases to specify (at least) the metadata /navigation component(s) • Estimate resources (manpower) needed to prototype missing components

  3. Guidance from the SC2 • The RTAG may decide to address all types of data, or may decide to postpone some topics for other RTAGS, once the components have been identified. • The RTAG should develop a detailed description at least for the event data management. • Issues of schema evolution, dictionary construction and storage, object and data models should be addressed.

  4. RTAG Composition • One member from each experiment, one from IT/DB, one from ROOT team: • Fons Rademakers (Alice) • David Malon (ATLAS) • Vincenzo Innocente (CMS) • Pere Mato (LHCb) • Dirk Düllmann (IT/DB) • Rene Brun (ROOT) Quoting Vincenzo’s report at CMS Week (6 March 02) “Collaborative, friendly atmosphere” “Real effort to define a common product” This is already an accomplishment.

  5. Response of RTAG to mandate and guidance (excerpted from report) • Intent of this RTAG is to assume an optimistic posture regarding the potential for commonality among the LHC experiments in all areas related to data management • Limited time available to the RTAG precludes treatment of all components of a data management architecture at equal depth • will propose areas in which further work, and perhaps additional RTAGs, will be needed

  6. Response of RTAG to mandate and guidance (excerpted from report) • Consonant with SC2 guidance, the RTAG has chosen to focus its initial discussions on the architecture of a persistence management service based upon a common streaming layer, and on the associated services needed to support it • Even if we cannot accomplish everything we aspire to, we want to ensure that we have provided a solid foundation for a near-term common project • While our aim is to define components and their interactions in terms of abstract interfaces that any implementation must respect, it is not our intention to produce a design that requires a clean-slate implementation

  7. Response of RTAG to mandate and guidance (excerpted from report) • For the streaming layer and related services, we plan to provide a foundation for an initial common project that can be based upon the capabilities of existing implementations, and upon ROOT’s I/O capabilities in particular • While new capabilities required of an initial implementation should not be daunting, we do not wish at this point to underestimate the amount of repackaging and refactoring work required to support common project requirements

  8. Status • Reasonable agreement on design criteria, e.g., • Component oriented, communication through abstract interfaces, no back channels, components make no assumptions about implementation technology of components with which they communicate • Persistence for C++ data models is the principal target, but our environments are already multilingual; should avoid constructions that make language migration and multi-language support difficult • Architecture should not preclude multiple persistence technologies • Experiments’ transient data models should not need compile-time/link-time dependencies on persistence technology in order to use persistence services

  9. Status II • Reasonable agreement on design criteria, e.g., • Transient object types may have several persistent representations, the type of a transient object restored from a persistent one may be different than the type of the object that was saved, a persistent object cannot assume it “knows” what type of transient object will be built from it • …more… • Component discussions and requirement discussions have been uneven—extremely detailed and highly technical in some areas, with other areas neglected thus far for lack of time • Primary focus has been on issues and components involved in defining a common persistence service • Cache manager, persistence manager, storage manager, streamer service, placement service, dictionary service(s), … • Object identification, navigation, …

  10. The proposed initial project • Charge to the initial project is to deliver the components of a common file-based streaming layer sufficient to support persistence for all four experiments’ event models, with management of the resulting files hosted in a relational layer • Elaboration of what this means appears in the report • Note that persistence service is intended to support all kinds of data—it is not specific to event data

  11. Initial project components • The specification in the report describes agreed-upon common project components, including • persistence manager • placement service • streaming service • storage manager • externalizable technology-independent references • services to support event collections • connections to grid-provided replica management and LFN/PFN services

  12. Initial project implementation technologies • Streaming layer should be implemented using the ROOT framework’s I/O services • Components with relational implementations should make no deep assumptions about the underlying technology • Nothing intentionally proposed that precludes implementation using such open source products as MySQL

  13. RTAG’s First Component Diagram (under discussion)

  14. Persistence Manager • Principal point of contact between experiment-specific frameworks and persistence services • Handles requests to store state of an object, returning a token (e.g., a persistent address) • Handles requests to retrieve state of an object corresponding to a token • Like Gaudi/Athena persistence service

  15. Dictionary services • Manages descriptions of transient (persistence-capable) classes and their persistent representations • Interface to obtain reflective interface about classes • Entries in dictionary may be generated by disparate sources • Rootcint, ADL, LHCb XML, … • With automatic converter/streamer generation, likely that some persistent representations will be derivable from transient representation, but possibility of multiple persistent representations suggests separate dictionaries

  16. Streaming or conversion service • Consults transient and persistent representation dictionaries to produce persistent (or “persistence-ready”) representation of a transient object, or vice versa

  17. Placement service • Supports runtime control of physical placement, equivalent to “where” hints in ODMG • Intended to support physical clustering within in event, and to separate events written to different physics streams • Interface seen by experiments’ application frameworks is independent of persistence technology • Insufficiently specified in the RTAG report

  18. References and reference services • Common class for encapsulating persistent addresses • Externalizable • Independent of storage technology, likely with an opaque payload that is technology-specific • In hybrid model, intent is that from a Ref, one can determine what “file” is needed without consulting the particular storage technology

  19. Store manager • Stores and retrieves variable-length stream of bytes • Deals with issues at the file level

  20. RefLFN service • Translates an Object Reference into the logical filename of the file containing the referenced objects • Expected that Ref will have some kind of file id, that can be used to determine logical file name in the grid sense

  21. LFN{PFN}PosixName service • Translate a logical filename into the posix name of one physical replica of this file • Expect to get this from grid projects, though common project may need to deliver a service that hides several possible paths behind a single interface

  22. Event collection services • Support for explicit event collections (not just collections by containment) • Support for collections of event references • Queryable collections: Like a list of events, together with queryable tags • Possibly indexed

  23. MetaData Catalog CacheMgr SelectorSvc FileCatalog PlacementSvc StorageMgr DictionarySvc IteratorSvc CustomCacheMgr PersistencyMgr StreamerSvc StreamerSvc DictionarySvc StreamerSvc IMCatalog TGrid TChain TEventList TDSet IFCatalog TBuffer, TMessage, TRef, TKey IPers ICache IPers TTree One possible mapping to a ROOT implementation (under discussion) IReflection ICnv C++ TClass, etc. TStreamerInfo IPReflection IPlacement TFile, TDirectory TSocket IReadWrite TFile

  24. Clarify Resources, Responsibilities & Risks • Expected resources at project start and their evolution • Commitments from experiments, the ROOT team and IT • Who does what (and until when)? • Who develops which software component(s)? • Who maintains those components afterwards? • Who develops production services around those? • What is the procedure for dropping any of those services? • Any “in house” development involves a significant maintenance commitment by somebody and risk for somebody else • Need to agree on these commitments

  25. Components & Interfaces • Need to rapidly agree on concrete component interfaces • Could we classify/prioritize interfaces by risk? • eg by damage caused by a later interface change • At least the major (aka external) interfaces need the official blessing of the Architects’ Forum and can not be modified without it’s agreement • No component infrastructure defined so far • Component inheritance hierarchy, component factories, component mapping to shared libraries etc. • LCG approach needs to be “compatible” with • several LCG sub-projects • several different experiment frameworks • several existing HEP packages eg ROOT I/O • several RDBMS implementations • May need to assume some instability until solid foundation is accepted for LCG applications area

  26. Components & Interface cont. • Fast adoption of existing “interface” classes may be tempting but is also very risky • Should not just bless existing header files which were conceived as implementation headers • Should take the opportunity to (re-) design a minimal, but complete set of abstract interfaces • And then implement them using existing technology

  27. Timescales, Functionality & Technologies of an Initial Prototype • Experiments have somewhat differing timescales for their first use of LCG components • Synchronization of initial release schedule would definitely improve chances of success • Experiments may favor different subsets of full functionality for a first prototype • Need to agree on main requirements for prototype s/w and associated services to guide implementation and technology choices • Synchronization of feature content and implementation technology is required • Which RDBMS backend? What are the deployment requirement? • “lightweight” system (end-user managed) - maybe reduced requirements on scalability and fault tolerance and even on functionality • Fully managed production system - based on established database services (incl. backup, recovery from h/w fault …) • May need prototype implementation for both

  28. Summary • Persistency RTAG has delivered its final report to the SC2 • Significant agreement on requirements and a component breakdown have been achieved • Report does not define all components and their interaction in full or equal depth • A common project on object persistency is proposed • Possible next steps • Clarify available resources and responsibilities • Agree on scope, timescale and deployment model of first project prototype • Rapidly agree on concrete set of component interfaces and spawn work packages to implement them • Continue to resolve remaining open questions

More Related