1 / 35

The DataGRID Project

The DataGRID Project. Fabrizio Gagliardi / Olivier Martin / Mirco Mazzucato / Les Robertson CERN - IT Division September 2001. The DataGrid Project. Susanna Tosi (CNR-CED) s.tosi@cedrc.cnr.it DataGrid Dissemination Office. Edited by the DataGrid Dissemination Office, CNR.

maddox
Download Presentation

The DataGRID Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The DataGRID Project Fabrizio Gagliardi / Olivier Martin / Mirco Mazzucato / Les Robertson CERN - IT Division September 2001

  2. The DataGrid Project Susanna Tosi (CNR-CED) s.tosi@cedrc.cnr.it DataGrid Dissemination Office Edited by the DataGrid Dissemination Office, CNR

  3. The EU DataGrid ProjectFabrizio GagliardiCERNDataGrid General ManagerMarch 2001F.Gagliardi@cern.ch

  4. The DataGRIDA Testbed for Worldwide Distributed Scientific Data Analysis Nordunet Conference Helsinki Les Robertson CERN - IT Division 29 September 2000 les.robertson@cern.ch

  5. LHC Computing&Grid(s) Mirco Mazzucato INFN-Padova

  6. Enabling Worldwide Scientific Collaboration • an example of the problem • the DataGRID solution • concluding remarks

  7. The Beginning of DataGRID The DataGRID project evolved from the conjunction of • the search for a practical solution to building the computing system for CERN’s next accelerator – the Large Hadron Collider (LHC) • and the appearance of Ian Foster and Carl Kesselman’s book – The GRID – Blueprint for a New Computing Infrastructure

  8. The Problems • Vast quantities of data • Enormous computing requirements • Researchers spread all over the world

  9. The Large Hadron Collider Project 4 detectors CMS ATLAS Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs LHCb

  10. Computing fabric at CERN (2005) StorageNetwork 12 Thousands of CPU boxes 1.5 0.8 8 6 * 24 * FarmNetwork 0.8 960 * Hundreds oftape drives * Data Ratein Gbps Real-timedetector data LAN-WAN Routers 250 Storage Network 5 0.8 0.5 M SPECint95 > 5K processors 0.5 PByte disk > 5K disks One experiment ! Thousands of disks

  11. All physics at CERN Today ~10 000 ~1 200 ~25 ~0,1 LHC Computing Fabric at CERN Estimated computing resources required at CERN for LHC experiments in 2006 collaboration ALICE ATLAS CMS LHCB Total 1 760 000 420 000 520 000 600 000 220 000 CPU capacity (SPECint95) 2006 3 000 3 000 3 000 estimated # cpus in 2006 1 500 10500 disk capacity (TB) 2006 800 750 650 450 2 650 9,1 3,7 3,0 1,8 0.6 mag.tape capacity (PB) 2006 aggregate I/O rates (GB/sec) 340 100 100 40 100 disk 3,0 1,2 0,8 0,8 0,2 tape Effective throughputof LAN backbone

  12. Simulated Collision in the ATLAS Detector

  13. Complex Data = More CPU Per Byte Estimated CPU Capacity required at CERN K SI95 5,000 Moore’s law – some measure of the capacity technology advances provide for a constant number of processors or investment 4,000 LHC 3,000 2,000 Other experiments 1,000 0 1998 1999 2000 2001 2003 2004 2005 2006 2007 2008 2009 2010 2002 les.robertson@cern.ch Jan 2000:3.5K SI95

  14. R&D testbed Physics WAN Systems administration Mass Storage disks processors Funding • Requirements growing faster than Moore’s law • CERN’s overall budget is fixed Estimated cost of facility at CERN ~ 30% of offline requirements* Budget level in 2000 for all physics data handling *assumes physics in July 2005, rapid ramp-up of luminosity

  15. CERN's Users in the World Europe: 267 institutes, 4603 usersElsewhere: 208 institutes, 1632 users

  16. Solution (i) Large Scale Computing Fabrics • Long experience in HEP with large clusters – processors, disk farms, mass storage reliable, manageable, flexible growth • Applications adapted to a well established computing model • Currently using thousands of simple PCs, IDE disk servers, Ethernet • Everything is using commodity components, tape storage excepted • New developments needed to scale these up by an order of magnitude to tens of thousands of components • maintaining reliability and availability targets • containing management costs • Deploying Terabit LAN switches • New levels of management automation – installation, monitoring, auto-diagnosing, self-healing

  17. Large Scale Computing Fabrics (cont) • But the requirements are greater than can be satisfied at a single site • political/financial arguments against very large facilities • national constraints from funding organizations • exploiting existing computing center infrastructure Compare with geographical distribution of super-computing centers

  18. Lab m Uni x USA FermiLab Uni b UK Rutherford Lab a Tier 1 Uni n CERN Tier2 France IN2P3/Lyon Physics Department Italy CNAF/Bologna Desktop NL NIKHEF Lab b Lab c  Uni y Uni b   les.robertson@cern.ch Solution (ii)Regional Centres - a Multi-tier Model Is this usable? manageable?

  19. The Promise of Grid Technology What does the Grid do for you? • you submit your work • and the Grid • Finds convenient places for it to be run • Optimises use of the widely dispersed resources • Organises efficient access to your data • Caching, migration, replication • Deals with authentication to the different sites that you will be using • Interfaces to local site resource allocation mechanisms, policies • Runs your jobs • Monitors progress • Recovers from problems • .. and .. Tells you when your work is complete

  20. The GRID provides the glue When the network is as fast as the computer’s internal links, the machine disintegrates across the net into a set of special purpose appliances (Ian Foster)

  21. Cosmology Chemistry Environment Applications High Energy Physics Biology Distributed Data- Remote Problem Remote Collaborative Computing Intensive Visualization Solving Instrumentation Application Applications Applications Applications Applications Applications Toolkits Toolkit Toolkit Toolkit Toolkit Toolkit Toolkit Grid Services Resource-independent and application-independent services (Middleware) authentication, authorization, resource location, resource allocation, events, accounting, remote data access, information, policy, fault detection Resource-specific implementations of basic services Grid Fabric E.g., Transport protocols, name servers, differentiated services, CPU schedulers, public key (Resources) infrastructure, site accounting, directory service, OS bypass The Grid from a Services View Applications : E.g., :

  22. Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link • The Anatomy of the Grid: Enabling Scalable Virtual Organizations, • I. Foster, C. Kesselman, S. Tuecke, Intl J. Supercomputer Applns, 2001. www.globus.org/research/papers/anatomy.pdf • The Globus Team:Layered Grid Architecture Application

  23. The DataGRID Project www.eu-datagrid.org

  24. DataGRID Partners Managing partners UK PPARC Italy INFN France CNRS Holland NIKHEF Italy ESA/ESRIN CERN – proj.mgt. - Fabrizio Gagliardi Industry IBM (UK), Communications & Systems (F), Datamat (I) Associate partners University of Heidelberg, CEA/DAPNIA (F), IFAE Barcelona, CNR (I), CESNET (CZ), KNMI (NL), SARA (NL), SZTAKI (HU) Finland- Helsinki Institute of Physics & CSC, Swedish Natural Science Research Council (Parallelldatorcentrum–KTH, Karolinska Institute), Istituto Trentino di Cultura, Zuse Institut Berlin,

  25. The Data Grid Project - Summary • European dimension • EC funding 3 years, ~10M Euro • Closely coupled to several national initiatives • Multi-science • Technology leverage – • Globus, Condor, HEP farming & MSS, Monarc, INFN-Grid, Géant • Emphasis – • Data – Scaling - Reliability • Rapid deployment of working prototypes - production quality • Collaboration with other European and US projects • Status – • Started 1 January 2001 • Testbed 1 scheduled for operation at end of year • Open – • Open-source and communication • Global GRID Forum • Industry and Research Forum

  26. (>40) Dubna Lund Moscow Estec KNMI RAL Berlin IPSL Prague Paris Brno CERN Lyon Santander Milano Grenoble PD-LNL Torino Madrid Marseille BO-CNAF HEP sites Pisa Lisboa Barcelona ESRIN ESA sites Roma Valencia Catania Testbed Sites Francois.Etienne@in2p3.fr - Antonia.Ghiselli@cnaf.infn.it

  27. Applications • HEP • The four LHC experiments • Live proof-of-concept prototype of the Regional Centre model • Earth Observation • ESA-ESRIN • KNMI (Dutch meteo) climatology • Processing of atmospheric ozone data derived from ERS GOME and ENVISAT SCIAMACHY sensors • Biology • CNRS (France), Karolinska (Sweden) • Application being defined

  28. DataGRID Challenges • Data • Scaling • Reliability • Manageability • Usability

  29. DataGRID Challenges (cont.) • Large, diverse, dispersed project • but coordinating this European activity is one of the project’s raisons d’être • Collaboration, convergence with US and other Grid activities – this area is very dynamic • Organising adequate Network bandwidth – a vital ingredient for success of a Grid • Keeping the feet on the ground – The GRID is a good idea but not the panacea suggested by some

  30. Programme of work Middleware - starting with a firm base in the Globus toolkit • Grid Workload Management, Data Management, Monitoring services Fabric • Fully automated Local Computing Fabric management • Mass Storage Production quality testbed • Testbed Integration & Network Services • > 40 sites • Géant infrastructure Scientific Applications • Earth Observation • Biology • High Energy Physics operate a productive environment for end-end applications

  31. Grid Middleware Building on an existing framework (Globus) • workload management • The workload is chaotic – unpredictable job arrival rates, data access patterns • The goal is maximising the global system throughput (events processed per second) • Start with Condor Class-Ads Current issues • Declaration of data requirements at job submission time • The application discovers the objects it requires during execution • mapping of objects to the files managed by the Grid • Decomposition of jobs (e.g. moving jobs where data is) • Interactive workloads

  32. Data Management & Application Monitoring • data management • Management of petabyte-scale data volumes, in an environment with limited network bandwidth and heavy use of mass storage (tape) • Caching, replication, synchronisation • Support for object database model • application monitoring • Tens of thousands of components, thousands of jobs and individual users • End-user - tracking of the progress of jobs and aggregates of jobs • Understanding application and grid level performance • Administrator – understanding which global-level applications were affected by failures, and whether and how to recover

  33. Fabric Management Local fabric – • Effective local site management of giant computing fabrics • Automated installation, configuration management, system maintenance • Automated monitoring and error recovery - resilience, self-healing • Performance monitoring • Characterisation, mapping, management of local Grid resources • Mass storage management • multi-PetaByte data storage • Expect tapes to be used only for archiving data • “real-time” data recording requirement • active tape layer – 1,000s of users • uniform mass storage interface • exchange of data and meta-data between mass storage systems

  34. Infrastructure Operate a production quality trans European “testbed” • Interconnecting clusters in about 40 sites • Integrate, build and operate successive releases of the project middleware • Negotiate and manage the network infrastructure • Initially Ten-155, migrating to Géant • Demonstrations, data challenges  performance, reliability • Production environment for applications • Inter-working with US projects (GriPhyN, PPDG, DTF, iVDGL)

  35. Concluding remarks • The vision is – easy and reliableaccess to very large, shared, worldwide distributed computing facilities, without the user having to know the details • The DataGRID project will provide – a large (capacity & geography) working testbedpractical experience and tools that can be adapted to the needs of a wide range of scientific and engineering applications

More Related