1 / 28

Management of Large Scale Data Productions for the CMS Experiment

Management of Large Scale Data Productions for the CMS Experiment. Presented by L.M.Barone Università di Roma & INFN. The Framework. The CMS experiment is producing a large amount of MC data for the development of High Level Trigger algorithms (HLT) for fast data reduction at LHC

etoile
Download Presentation

Management of Large Scale Data Productions for the CMS Experiment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Management of Large Scale Data Productions for the CMS Experiment Presented by L.M.Barone Università di Roma & INFN ACAT2000 - FNAL

  2. The Framework • The CMS experiment is producing a large amount of MC data for the development of High Level Trigger algorithms (HLT) for fast data reduction at LHC • Current production is half traditional (Pythia + CMSIM/Geant3) half OO (ORCA using Objectivity/DB) ACAT2000 - FNAL

  3. The Problem Dealing with actual MC productions and not with 2005 data taking • Data size ~ 106 - 107 events, 1 MB/ev ~ 104 files (typically 500 evts/file) • Resource dispersion many production sites CERN,FNAL,Caltech, INFN etc. ACAT2000 - FNAL

  4. The Problem (cont’d) • Data Relocation data produced in site A are stored centrally (CERN); site B may need a fraction of them; combinatorics increasing • Objectivity/DB does not make life easier(but the problem would exist anyway) ACAT2000 - FNAL

  5. Objectivity Database Objectivity Database ytivitcejbO esabataD ORCA Production 2000 Signal Zebra files with HITS HEPEVT ntuples CMSIM MC Prod. MB Catalog import Objectivity Database ORCA Digitization (merge signal and MB) Objectivity Database ORCA ooHit Formatter ORCA Prod. Catalog import HLT Algorithms New Reconstructed Objects Objectivity Database HLT Grp Databases Mirrored Db’s (US, Russia, Italy..) ACAT2000 - FNAL

  6. The Old Days • Question: how was it done before ?A mix of ad hoc scripts/programs with a lot of manual intervention... but the problem was smaller and less dispersed ACAT2000 - FNAL

  7. Requirements for a Solution • Solution must be as automatic as possible  decrease manpower • Tools should be independent from data type and from site • Network traffic should be optimized (or minimized ?) • Users need complete information on data location ACAT2000 - FNAL

  8. Present Status • Job creation is managed by a variety of scripts in different sites • Job submission again goes through diverse methods, from UNIX commands to LSF or Condor • File transfer has been managed up to now by Perl scripts not generic, not site independent ACAT2000 - FNAL

  9. Present Status (cont’d) • The autumn 2000 production round is a trial towards standardization same layout (OS, installation)  same scripts (T.Wildish) for non Objy data transfer first use of GRID tools (see talk by A.Samar) validation procedure for production sites ACAT2000 - FNAL

  10. Collateral Activities • Linux + CMS software automatic installation kit (INFN) • Globus installation kit (INFN) • Production monitoring tools with Web interface ACAT2000 - FNAL

  11. What is missing ? • Scripts and tools are still too specific and not robust enough need practice on this scale • Information serviceneeds a clear definition in our context and then an effective implementation (see later) • File replication management is just appearing and needs careful evaluation ACAT2000 - FNAL

  12. Ideas for Replica Management • A case study with Objectivity/DB(thanks to C.Grandi Bologna,INFN) • can be extended to any kind of file ACAT2000 - FNAL

  13. Cloning federations • Cloned federations have a local catalog (boot file) • It is possible to manage each of them in an independent way. Some databases may be attached (or exist) only in one site • “Manual work” is needed to keep the schemas synchronized (this is not the key point today...) ACAT2000 - FNAL

  14. RC1 Boot CERN Boot RC1 FD DB_a CERN FD DB_b RC2 Boot RC2 FD DB2 DBn DB3 DB1 Cloning federations Clone FD ACAT2000 - FNAL

  15. Productions • Using a DB-id pre-allocation system it is possible to produce databases at RCs which can then be exported to other sites • A notification system is needed to inform other sites when a database is completed • This is today accomplished by GDMP using a publish-subscribe mechanism ACAT2000 - FNAL

  16. Productions • When a site receives notification, it can: • ooattachdb to the remote site DB • copy the DB and ooattachdb it locally • ignore it ACAT2000 - FNAL

  17. RC1 Boot CERN Boot RC1 FD CERN FD DBn+1 DBn+m RC2 Boot RC2 FD DBn+m+1 DB2 DBn DB3 DB1 DBn+m+k Productions ACAT2000 - FNAL

  18. Analysis • In each site a complete catalog with the location of all the datasets is needed. Some DBs are local and some are remote • In case more copies of a DB are available it would be nice to have in the local catalog the closest one (NWS) ACAT2000 - FNAL

  19. Information service • Create an Information Service with information about all the replicas of the databases (GIS ?) • In each RC there is a reference catalog which is updated taking into account the available replicas • It is even possible to have a catalog created on-the-fly only for the datasets needed by a job ACAT2000 - FNAL

  20. RC2 Boot RC2 FD DBn+m+1 DBn+m+k Analysis CERN Boot RC1 Boot CERN FD RC1 FD DBn+1 DBn+m DB2 DBn DB3 DB1 DBn+m+1 DBn+1 DBn+m DBn+m+k ACAT2000 - FNAL

  21. Logical vs Physical Datasets • Each dataset is composed by one or more databases • datasets are managed by application-sw • Each DB is uniquely identified by a DBid • DBid assignment is a logical-db creation • The physical-db is the file • zero, one or more instancies • The IS manages the link between a dataset, its logical-dbs and its physical-dbs ACAT2000 - FNAL

  22. Logical vs Physical Datasets Dataset: H2 pccms1.bo.infn.it::/data1/Hmm1.hits.DB shift23.cern.ch::/db45/Hmm1.hits.DB id=12345 Hmm.1.hits.DB pccms1.bo.infn.it::/data1/Hmm2.hits.DB shift23.cern.ch::/db45/Hmm2.hits.DB id=12346 Hmm.2.hits.DB pccms3.pd.infn.it::/data3/Hmm2.hits.DB Hmm.3.hits.DB id=12347 shift23.cern.ch::/db45/Hmm3.hits.DB Dataset: H2e pccms5.roma1.infn.it::/data/Hee1.hits.DB shift49.cern.ch::/db123/Hee1.hits.DB id=5678 Hee.1.hits.DB pccms5.roma1.infn.it::/data/Hee2.hits.DB shift49.cern.ch::/db123/Hee2.hits.DB id=5679 Hee.2.hits.DB pccms5.roma1.infn.it::/data/Hee3.hits.DB id=5680 Hee.3.hits.DB shift49.cern.ch::/db123/Hee3.hits.DB ACAT2000 - FNAL

  23. Database creation • In each production site we have: • a production federation including incomplete databases • a reference federation with only complete databases (both local and remote ones) • When a DB is completed it is attached to the site reference federation • The IS monitors the reference federations of all the sites and updates the database list ACAT2000 - FNAL

  24. pc.rc1.net DB4 RC1 Prod DB5 RC1 Ref DB5 DB5 Database creation shift.cern.ch CERN FD 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 DB5.db pc.rc1.net::/ps.data shift.cern.ch::/shift/data 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 DB5.DB pc.rc1.net::/pc/data DB1 DB2 DB3 DB4 ACAT2000 - FNAL

  25. Replica Management • In case of multiple copies of the same DB each site may choose which copy to use: • it should be possible to update the reference federation at given times • it should be possible to create on-the-fly a mini-catalog only with information about the datasets requested by a job • this kind of operation is managed by application-sw (e.g. ORCA) ACAT2000 - FNAL

  26. DB2 Replica Management shift.cern.ch pc1.bo.infn.it DB1 CERN FD BO Ref DB3 DB1 DB2 pc1.pd.infn.it PD Ref 0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0002 DB2.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0003 DB3.DB shift.cern.ch::/shift/data ACAT2000 - FNAL

  27. Summary of the Case Study • Basic functionalities of a Replica Manager for production are already implemented in GDMP • The use of an Information Server would allow easy synchronization of federations and optimized data access during analysis • The same functionalities offered by the Objectivity/DB catalog may be implemented for other kind of files ACAT2000 - FNAL

  28. Conclusions (?) Globus and the various GRID projects try to address the issue of Large Scale distributed data access Their effectiveness is still to be proven The problem again is not the software, it is the organization ACAT2000 - FNAL

More Related