1 / 31

Middleware Development and Deployment Status

Middleware Development and Deployment Status. Tony Doyle. Contents. What is GridPP doing as part of the International effort? What was GridPP1? Is GridPP a Grid? What is planned for GridPP2? What lies ahead? Summary Why? What? How? When?. What are the Challenges? What is the scale?

mboyle
Download Presentation

Middleware Development and Deployment Status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Middleware Development and Deployment Status Tony Doyle PPE & PPT Lunchtime Talk

  2. Contents • What is GridPP doing as part of the International effort? • What was GridPP1? • Is GridPP a Grid? • What is planned for GridPP2? • What lies ahead? • Summary • Why? What? How? When? • What are the Challenges? • What is the scale? • How does the Grid work? • What is the status of (EGEE) middleware development? • What is the deployment status? PPE & PPT Lunchtime Talk

  3. Science generates data and might require a Grid? Earth Observation Bioinformatics Astronomy Digital Curation Healthcare ? Collaborative Engineering PPE & PPT Lunchtime Talk

  4. What are the challenges? • Must • share databetween thousands of scientists with multiple interests • link major (Tier-0 [Tier-1]) and minor (Tier-1 [Tier-2])computer centres • ensure all data accessible anywhere, anytime • grow rapidly, yet remainreliablefor more than a decade • cope withdifferent management policiesof different centres • ensure data security • be up and running routinely by2007 PPE & PPT Lunchtime Talk

  5. What are the challenges? 2. Software efficiency 1. Software process 3. Deployment planning 4. Link centres 10. Policies 5. Share data Data Management, Security and Sharing 9. Accounting 8. Analyse data 7. Install software 6. Manage data PPE & PPT Lunchtime Talk

  6. Tier-1 Scale Step-1.. financial planning Step-2.. Compare to (e.g. Tier-1) expt. requirements Ian Foster / Carl Kesselman: "A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities." Step-3.. Conclude that more than one centre is needed Step-4.. A Grid? Currently network performance doubles every year (or so) for unit cost. PPE & PPT Lunchtime Talk

  7. What is the Grid? Hour Glass I. Experiment Layer e.g. Portals II. Application Middleware e.g. Metadata III. Grid Middleware e.g. Information Services IV. Facilities and Fabrics e.g. Storage Services PPE & PPT Lunchtime Talk

  8. How do I start? http://www.gridpp.ac.uk/start/ • Getting started as a Grid user • Quick start guide for LCG2GridPP guide to starting as a user of the Large Hadron Collider Computing Grid. • Getting an e-science certificateIn order to use the Grid you need a Grid certificate. This page introduces the UK e-Science Certification Authority, which issues cerficates to users. You can get a certificate from here. • Using the LHC Computing Grid (LCG)CERN's guide on the steps you need to take in order to become a user of the LCG. This includes contact details for support. • LCG user scenarioThis describes in a practical way the steps a user has to follow to send and run jobs on LCG and to retrieve and process the output successfully. • Currently being improved.. PPE & PPT Lunchtime Talk

  9. Job Submission(behind the scenes) Input “sandbox” DataSets info UI JDL Output “sandbox” grid-proxy-init SE & CE info Output “sandbox” Expanded JDL Job Submit Event Job Query Input “sandbox” + Broker Info Publish Job Status Storage Element Globus RSL Job Status Job Status Replica Catalogue Information Service Resource Broker Author. &Authen. Job Submission Service Logging & Book-keeping Compute Element PPE & PPT Lunchtime Talk

  10. Deliver a 24/7 Grid service to European science build a consistent, robust and secure Grid network that will attract additional computing resources. continuously improve and maintain the middleware in order to deliver a reliable service to users. attract new users from industry as well as science and ensure they receive the high standard of training and support they need. 100 million euros/4years, funded by EU >400 software engineers + service support 70 European partners Enabling Grids for E-sciencE PPE & PPT Lunchtime Talk

  11. Prototype MiddlewareStatus & Plans (I) • Workload Management • AliEn TaskQueue • EDG WMS (plus new TaskQueue and Information Supermarket) • EDG L&B • Computing Element • Globus Gatekeeper + LCAS/LCMAPS • Dynamic accounts (from Globus) • CondorC • Interfaces to LSF/PBS (blahp) • “Pull components” • AliEn CE • gLite CEmon (being configured) Blue: deployed on development testbed Red: proposed LHCC Comprehensive Review – November 2004 11

  12. Storage Element Existing SRM implementations dCache, Castor, … FNAL & LCG DPM gLite-I/O (re-factored AliEn-I/O) Catalogs AliEn FileCatalog – global catalog gLite Replica Catalog – local catalog Catalog update (messaging) FiReMan Interface RLS (globus) Data Scheduling File Transfer Service (Stork+GridFTP) File Placement Service Data Scheduler Metadata Catalog Simple interface defined (AliEn+BioMed) Information & Monitoring R-GMA web service version; multi-VO support Prototype MiddlewareStatus & Plans (II) LHCC Comprehensive Review – November 2004 12

  13. Security VOMS as Attribute Authority and VO mgmt myProxy as proxy store GSI security and VOMS attributes as enforcement fine-grained authorization (e.g. ACLs) globus to provide a set-uid service on CE Accounting EDG DGAS(not used yet) User Interface AliEn shell CLIs and APIs GAS Catalogs Integrate remaining services Package manager Prototype based on AliEn backend evolve to final architecture agreed with ARDA team Prototype MiddlewareStatus & Plans (III) LHCC Comprehensive Review – November 2004 13

  14. LCG ARDA EGEE Expmts CB PMB Deployment Board User Board Tier1/Tier2, Testbeds, Rollout Service specification & provision Requirements Application Development User feedback Metadata Storage Workload Network Security Info. Mon. PPE & PPT Lunchtime Talk

  15. Middleware Development Network Monitoring Configuration Management Grid Data Management Storage Interfaces Information Services Security PPE & PPT Lunchtime Talk

  16. Application Development ATLAS LHCb CMS SAMGrid (FermiLab) BaBar (SLAC) QCDGrid PhenoGrid PPE & PPT Lunchtime Talk

  17. GridPP Deployment Status GridPP deployment is part of LCG (Currently the largest Grid in the world) The future Grid in the UK is dependent upon LCG releases Three Grids on Global scale in HEP (similar functionality) sites CPUs • LCG (GridPP) 90 (15) 8700 (1500) • Grid3 [USA] 29 2800 • NorduGrid 30 3200 PPE & PPT Lunchtime Talk

  18. LCG Overview • By 2007: • 100,000 CPUs • - More than 100 institutes worldwide • building on complex middleware being developed in advanced Grid technology projects, both in Europe (Glite) and in the USA (VDT) • prototype went live in September 2003 in 12 countries • Extensively tested by the LHC experiments during this summer PPE & PPT Lunchtime Talk

  19. Deployment Status (26/10/04) • Incremental releases: significant improvements in reliability, performance and scalability • within the limits of the current architecture • scalability is much better than expected a year ago • Many more nodes and processors than anticipated • installation problems of last year overcome • many small sites have contributed to MC productions • Full-scale testing as part of this year’s data challenges • GridPP “The Grid becomes a reality” – widely reported British Embassy (USA) Technology Sites British Embassy (Russia) PPE & PPT Lunchtime Talk

  20. Data Challenges • Ongoing.. • Grid and non-Grid Production • Grid now significant • ALICE - 35 CPU Years • Phase 1 done • Phase 2 ongoing LCG • CMS - 75 M events and 150 TB: first of this year’s Grid data challenges Entering Grid Production Phase.. PPE & PPT Lunchtime Talk

  21. Data Challenge • 7.7 M GEANT4 events and 22 TB • UK ~20% of LCG • Ongoing.. • (3) Grid Production • ~150 CPU years so far • Largest total computing requirement • Small fraction of what ATLAS need.. Entering Grid Production Phase.. PPE & PPT Lunchtime Talk

  22. LHCb Data Challenge 186 M Produced Events Phase 1 Completed 3-5 106/day LCG restarted LCG paused LCG in action 1.8 106/day DIRAC alone 424 CPU years (4,000 kSI2k months), 186M events • UK’s input significant (>1/4 total) • LCG(UK) resource: • Tier-1 7.7% • Tier-2 sites: • London 3.9% • South 2.3% • North 1.4% • DIRAC: • Imperial 2.0% • L'pool 3.1% • Oxford 0.1% • ScotGrid 5.1% Entering Grid Production Phase.. PPE & PPT Lunchtime Talk

  23. Paradigm ShiftTransition to Grid… 424 CPU · Years May: 89%:11% 11% of DC’04 Jun: 80%:20% 25% of DC’04 Jul: 77%:23% 22% of DC’04 Aug: 27%:73% 42% of DC’04 PPE & PPT Lunchtime Talk

  24. More Applications • ZEUS uses LCG • needs the Grid to respond to increasing demand for MC production • 5 million Geant events on Grid since August 2004 • QCDGrid • For UKQCD • Currently a 4-site data grid • Key technologies used • - Globus Toolkit 2.4 • - European DataGrid • eXist XML database • managing a few hundred gigabytes of data PPE & PPT Lunchtime Talk

  25. Issues “LCG-2 MIDDLEWARE PROBLEMS AND REQUIREMENTS FOR LHC EXPERIMENT DATA CHALLENGES” First large-scale Grid production problems being addressed… at all levels https://edms.cern.ch/file/495809/2.2/LCG2-Limitations_and_Requirements.pdf PPE & PPT Lunchtime Talk

  26. Coordinates resources that are not subject to centralized control … using standard, open, general-purpose protocols and interfaces … to deliver nontrivial qualities of service YES. This is why development and maintenance of LCG is important. YES. VDT (Globus/Condor-G) + EDG/EGEE(Glite) ~meet this requirement. YES. LHC experiments data challenges over the summer of 2004. 5 Is GridPP a Grid? http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf http://agenda.cern.ch/fullAgenda.php?ida=a042133 PPE & PPT Lunchtime Talk

  27. What was GridPP1? • A team that built a working prototype grid of significant scale > 1,500 (7,300) CPUs > 500 (6,500) TB of storage > 1000 (6,000) simultaneous jobs • A complex project where 82% of the 190 tasks for the first three years were completed A Success “The achievement of something desired, planned, or attempted” PPE & PPT Lunchtime Talk

  28. Aims for GridPP2? From Prototype to Production BaBarGrid BaBar EGEE SAMGrid CDF D0 ATLAS EDG LHCb ARDA GANGA LCG ALICE CMS LCG CERN Tier-0 Centre CERN Prototype Tier-0 Centre CERN Computer Centre UK Tier-1/A Centre UK Prototype Tier-1/A Centre RAL Computer Centre 4 UK Tier-2 Centres 19 UK Institutes 4 UK Prototype Tier-2 Centres Separate Experiments, Resources, Multiple Accounts Prototype Grids 'One' Production Grid 2004 2007 2001 PPE & PPT Lunchtime Talk

  29. Planning: GridPP2 ProjectMap Structures agreed and in place (except LCG phase-2) PPE & PPT Lunchtime Talk

  30. What lies ahead? Some mountain climbing.. Annual data storage: 12-14 PetaBytes per year CD stack with 1 year LHC data (~ 20 km) 100 Million SPECint2000 Importance of step-by-step planning… Pre-plan your trip, carry an ice axe and crampons and arrange for a guide… Concorde (15 km) In production terms, we’ve made base camp  100,000 PCs (3 GHz Pentium 4) We are here (1 km) Quantitatively, we’re ~9% of the way there in terms of CPU (9,000 ex 100,000) and disk (3 ex 12-14*3 years)… PPE & PPT Lunchtime Talk

  31. Why? 2. What? • 3. How? 4. When? • From Particle Physics perspective the Grid is: • 1. needed to utilise large-scale computing resources efficiently and securely • 2. a) a working prototype running today on large testbed(s)… • b) about seamless discovery of computing resources • c) using evolving standards for interoperation • d) the basis for computing in the 21st Century • e) not (yet) as transparent or robust as end-users need • 3. see the GridPP getting started pages • (two-day EGEE training courses available) • a) Now, at prototype level, for simple(r) applications (e.g. experiment Monte Carlo production) • b) September 2007 for more complex applications (e.g. data analysis) – ready for LHC PPE & PPT Lunchtime Talk

More Related