1 / 13

Data Grid projects in HENP R. Pordes, Fermilab

Many HENP projects are working on the infrastructure for global distributed simulated data production, data processing and analysis e.g. Experiments now taking data : BaBar (.3 PB/year), D0 (.5 PB/year in 2002), STAR Funded projects GriPhyN USA NSF, $11.9M + $1.6M PPDG I USA DOE, $2M

luna
Download Presentation

Data Grid projects in HENP R. Pordes, Fermilab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Many HENP projects are working on the infrastructure for global distributed simulated data production, data processing and analysis e.g. Experiments now taking data : BaBar (.3 PB/year), D0 (.5 PB/year in 2002), STAR Funded projects GriPhyN USA NSF, $11.9M + $1.6M PPDG I USA DOE, $2M PPDG II USA DOE, $9.5M EU DataGrid EU $9.3M Just funded or Proposed projects iVDGL USA NSF, $15M + $1.8M + UK DTF USA NSF, $45M + $4M/yr DataTag EU EC, $2M? GridPP UK PPARC, > $15M Data Grid projects in HENPR. Pordes, Fermilab • Other national projects • UK e-Science (> $100M for 2001-2004) • Italy, France, (Japan?)

  2. Current Data Grid Projects at FermilabFermilab, • D0-SAM • D0 Run 2 Data Grid. Experiment starting to take data now. 500 Physicists; 80 institutions; 17 Countries 30 Tbytes in the data store. • Started 3 years ago by the Fermilab Run 2 Joint Offline project. • Currently ~100 active users; 8 sites in US and Europe. • http://d0db.fnal.gov/sam • PPDG • Three year, ten institution, DOE/SciDAC funded development and integration to deploy data grids for HENP. • D0 and CMS Fermilab data handling groups are collaborating and contributing to the project. • “Vertical Integration” of Grid middleware components into HENP experiments’ ongoing work. • Laboratory for “experimental computer science” • http://www.ppdg.net

  3. Transparent data replication and caching with support for multiple transports and interface to mass storage systems (bbftp, scp; enstore, sara) Well developed meta-data catalog for data, job and physics parameter tracking. Generalized Interface to batch systems (lsf, fbs,pbs,condor). “Economic concepts” to implement collaboration policies for resource management and scheduling. Interface for physics data set definition and selection Interfaces to user programs in C++, python. Support for robust production service with restart facilities, self-managing agent servers; monitoring and logging services throughout. Support for simulation production as well as event data processing analysis D0 SAM

  4. Simplified Database Schema MC Process & Decay Data Tier Run Run Conditions Luminosity Calibration Alignment Physical Data Stream Events ID Event Number Trigger L1 Trigger L2 Trigger L3 Off-line Filter Thumbnail Files ID Name Format Size # Events Trigger Configuration Event-File Catalog Project File Storage Locations Creation & Processing Info Station Config. & Cache info Group and User information Volume

  5. D0 Monte Carlo Management System Ocean/Mountain/Prairie Input Requests Event Generator SAM Tape Station cache Station cache Admin Tools Review/ Authorize MC Management DB Tables FNAL Data Serving Station Remote Station Analysis Project Establish priorities Assign Work SAM Tape Station cache Station cache MSS Tape Web Summaries Remote pieces Existing SAM pieces New parts

  6. Data Added to SAM 160k Files 25TB

  7. SAM on the Global Scale MSS MSS WAN • Current active stations • FNAL • Lyon Fr(IN2P3), • Amsterdam Nl(NIKHEF) • Lancaster UK • Imperial College UK • Others in US Stations at FNAL MSS Interconnected network of primary cache stations Communicating and replicating data where it is needed.

  8. Funding US DOE approved 1/1/3/3/3 $M, 99 – 03 Computer ScienceGlobus (Foster), Condor (Livny), SDM (Shoshani), Storage Resource Broker (Moore) Physics BaBar, Dzero, STAR, JLAB, ATLAS, CMS National LaboratoriesBNL, Fermilab, JLAB, SLAC, ANL, LBNL UniversitiesCaltech, SDSS, UCSD, Wisconsin Hardware/NetworksNo funding Experiments successfully embrace and deploy grid services throughout their data handling and analysis systems, based on shared experiences and developments. Computer Science groups evolve common interfaces and services so to serve not only a range of High Energy and Nuclear Physics experiment needs but also other scientific communities. PPDG – profile and 3 year goals

  9. Extending Grid services: Storage Resource Management and Interfacing. Robust File Replication and Information Services. Intelligent Job and Resource Management. System monitoring and information capture. End to End applications: Experiments data handling systems in use now and in the near future to give real-world requirements, testing and feedback. Error reporting and response Fault tolerant integration of complex components Cross-project activities: Authenitcation and Certificate authorisation and exchange. European Data Grid common project for data transfer (Grid Data Management Pilot) SC2001 demo with GriPhyN. PPDG Main areas of work:

  10. Align PPDG milestones to Experiment data challenges, first year: ATLAS – production distributed data service – 6/1/02 BaBar – analysis across partitioned dataset storage – 5/1/02 CMS – Distributed simulation production – 1/1/02 D0 – distributed analyses across multiple workgroup clusters – 4/1/02 STAR – automated dataset replication – 12/1/01 JLAB – policy driven file migration – 2/1/02 Management /coordination challenge:

  11. Example data grid heirachy – CMS Tier 1 and 2 Tier 2s used for simulation production today 4 4 4 4 Other Tier 1 centers Tier 0 (CERN) 3 3 3 3 3 T2 T2 T2 3 Tier 1FNAL 3 3 T2 T2 3 3 3 3 3 3

  12. PPDG views cross-project coordination as important: • Other Grid Projects in our field: • GriPhyN – Grid for Physics Network • European DataGrid • Storage Resource Management collaboratory • HENP Data Grid Coordination Committee • Deployed systems: • ATLAS, BaBar, CMS, D0, Star, JLAB experiment data handling systems • iVDGL – International Virtual Data Grid Laboratory • Use DTF computational facilities? • Standards Committees: • Internet2 High Energy and Nuclear Physics Working Group • Global Grid Forum

More Related