1 / 15

WP2 - Data Management

WP2 - Data Management. L.M.Barone Università di Roma & INFN. WP Goals. “...to permit the secure access of massive amounts of data...to move and replicate data at high speed from one site to another and to manage the synchronisation of remote data copies” (dal Technical Annex di DataGrid).

jered
Download Presentation

WP2 - Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP2 - Data Management L.M.Barone Università di Roma & INFN Commissione Nazionale I

  2. WP Goals “...to permit the secure access of massive amounts of data...to move and replicate data at high speed from one site to another and to manage the synchronisation of remote data copies” (dal Technical Annex di DataGrid) Commissione Nazionale I

  3. Keywords • Automation • Caching • Generic Interface • MetaData • Data Mover • Replica Manager • Security Commissione Nazionale I

  4. People SEDE NOME FTE Bari: L.Silvestris 0.3 G.Zito 0.5 (0.3) Pisa: S.Arezzini 0.3 (0.3) A.Controzzi 0.5 F.Donno 0.2 (0.2) F.Schifano 0.2 Roma1: L.M.Barone 0.3 (0.3) A.Lonardo 0.3 A.Michelotti 0.3 G.Organtini 0.2 D.Rossetti 0.2 (0.2) Commissione Nazionale I

  5. Deliverables • Requirements for Data Location Broker 5/2001 • Definition of a metadata syntax 7/2001 • Replica Management at file level 12/2001 Commissione Nazionale I

  6. An Example • Ideas for a Replica Manager: • Management of production in a distributed environment: • Data produced in many sites • Data collected in a single reference site • Data analyzed in many sites • Data sometimes are moved, sometimes may be accessed via network • A case study with Objectivity/DB • can be extended to any kind of file Commissione Nazionale I

  7. RC1 Boot CERN Boot RC1 FD DB_a CERN FD DB_b RC2 Boot RC2 FD DB2 DBn DB3 DB1 Cloning federations Clone FD Commissione Nazionale I

  8. RC1 Boot CERN Boot RC1 FD CERN FD DBn+1 DBn+m RC2 Boot RC2 FD DBn+m+1 DB2 DBn DB3 DB1 DBn+m+k Productions GDMP GDMP GDMP GDMP Commissione Nazionale I

  9. RC2 Boot RC2 FD DBn+m+1 DBn+m+k Analysis CERN Boot RC1 Boot CERN FD RC1 FD DBn+1 DBn+m DB2 DBn DB3 DB1 DBn+m+1 DBn+1 DBn+m DBn+m+k Commissione Nazionale I

  10. Logical vs Physical Datasets Dataset: H2 pccms1.bo.infn.it::/data1/Hmm1.hits.DB shift23.cern.ch::/db45/Hmm1.hits.DB id=12345 Hmm.1.hits.DB pccms1.bo.infn.it::/data1/Hmm2.hits.DB shift23.cern.ch::/db45/Hmm2.hits.DB id=12346 Hmm.2.hits.DB pccms3.pd.infn.it::/data3/Hmm2.hits.DB Hmm.3.hits.DB id=12347 shift23.cern.ch::/db45/Hmm3.hits.DB Dataset: H2e pccms5.roma1.infn.it::/data/Hee1.hits.DB shift49.cern.ch::/db123/Hee1.hits.DB id=5678 Hee.1.hits.DB pccms5.roma1.infn.it::/data/Hee2.hits.DB shift49.cern.ch::/db123/Hee2.hits.DB id=5679 Hee.2.hits.DB pccms5.roma1.infn.it::/data/Hee3.hits.DB id=5680 Hee.3.hits.DB shift49.cern.ch::/db123/Hee3.hits.DB Commissione Nazionale I

  11. Logical vs Physical Datasets • Each dataset is composed by one or more databases • datasets are managed by application-sw • Each DB is univocally identified by a DBid • DBid assignment is a logical-db creation • The physical-db is the file • zero, one or more instancies • The GIS manages the link between a dataset, its logical-dbs and its physical-dbs Commissione Nazionale I

  12. pc.rc1.net DB4 RC1 Prod DB5 RC1 Ref DB5 DB5 Database creation shift.cern.ch CERN FD 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 DB5.DB pc.rc1.net::/pc.data 0001 DB1.DB shift.cern.ch::/shift/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0004 DB4.DB pc.rc1.net::/pc/data shift.cern.ch::/shift/data 0005 DB5.db pc.rc1.net::/ps.data shift.cern.ch::/shift/data DB1 DB2 DB3 DB4 Commissione Nazionale I

  13. DB2 Replica Management shift.cern.ch pc1.bo.infn.it DB1 CERN FD BO Ref DB3 DB1 DB2 pc1.pd.infn.it PD Ref 0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0002 DB2.DB shift.cern.ch::/shift/data 0003 DB3.DB shift.cern.ch::/shift/data 0001 DB1.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0002 DB2.DB shift.cern.ch::/shift/data pc1.bo.infn.it::/data 0003 DB3.DB shift.cern.ch::/shift/data Commissione Nazionale I

  14. Example Summary • Basic functionalities of a Replica Manager for production will be tested by end of 2000 on CMS production (GDMP) • Next comes an Information Server to allow easy synchronization of federations and optimized data access during analysis • The same functionalities shown for Objectivity/DB may/should be implemented for other kind of files Commissione Nazionale I

  15. Conclusions • Data Management Tools are needed to face the complexity of new generation experiments (not only LHC) • The GRID projects (INFN and EU) are already providing solutions to real life problems • Milestones and objectives are well defined(to meet them will not be trivial) Commissione Nazionale I

More Related