1 / 56

EGEE A Large-scale Production Grid Infrastructure

EGEE A Large-scale Production Grid Infrastructure. Erwin Laure EGEE Technical Director. ISSGC06 July 16-28, 2006 Ischia, Italy. Lost in Definitions?. Defining the “Grid”: Access to (high performance) computing power Distributed parallel computing

zola
Download Presentation

EGEE A Large-scale Production Grid Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EGEEA Large-scale Production Grid Infrastructure Erwin Laure EGEE Technical Director ISSGC06 July 16-28, 2006 Ischia, Italy

  2. Lost in Definitions? Defining the “Grid”: • Access to (high performance) computing power • Distributed parallel computing • Improved resource utilization through resource sharing • Increased storage provision • Controlled access to distributed storage • Interconnection of arbitrary resources (sensors, instruments, …) • Collaboration between users/resources • Higher abstraction layer above network services • Corresponding security • … EGEE - A Large-scale Production Grid Infrastructure

  3. Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. • This interconnection of users, resources, and services for jointly addressing dedicated tasks is called a virtual organization. • Comparison between Grids and Networks: • Networks realize message exchange between endpoints • Grids realize services for the users  higher level of abstraction EGEE - A Large-scale Production Grid Infrastructure

  4. Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure

  5. The EGEE Project • Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)” • EGEE • 1 April 2004 – 31 March 2006 • 71 partners in 27 countries, federated in regional Grids • EGEE-II • 1 April 2006 – 31 March 2008 • Expanded consortium • 91 partners EGEE - A Large-scale Production Grid Infrastructure

  6. Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure

  7. EGEE Infrastructure Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 60 Virtual Organizations Country participating in EGEE EGEE - A Large-scale Production Grid Infrastructure

  8. EGEE Infrastructures • Production service • Scaling up the infrastructure with resource centres around the globe • Stable, well-supported infrastructure, running only well-tested and reliable middleware • Pre-production service • Run in parallel with the production service (restricted nr of sites) • First deployment of new versions of the gLite middleware • Test-bed for applications and other external functionality • T-Infrastructure (Training&Education) • Complete suite of Grid elements and application (Testbed, CA, VO, monitoring, support, …) • Everyone can register and use GILDA for training and testing 20 sites on 3 continents EGEE - A Large-scale Production Grid Infrastructure

  9. EGEE Operations Process • Geographically distributed responsibility for operations: • There is no “central” operation • Regional Operation Centers • Responsible or resource centers in their region • Tools are developed/hosted at different sites: • GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) • Grid operator on duty • 6 teams working in weekly rotation • CERN, IN2P3, INFN, UK/I, Ru,Taipei • Crucial in improving site stability and management • Expanding to all ROCs in EGEE-II • Operations coordination • Weekly operations meetings • Regular ROC managers meetings • Series of EGEE Operations Workshops • Nov 04, May 05, Sep 05, June 06 • Procedures described in Operations Manual • Introducing new sites • Site downtime scheduling • Suspending a site • Escalation procedures; etc. Highlights: • Distributed operation • Evolving and maturing procedures • Procedures being in introduced into and shared with the related infrastructure projects EGEE - A Large-scale Production Grid Infrastructure

  10. Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure

  11. Production Grid Middleware Key factors in EGEE Grid Middleware Development: • Strict software process Use industry standard software engineering methods • Software configuration management, version control, defect tracking, automatic build system, … • Conservative approach in what software to use Avoid “cutting-edge” software • Deployment on over 100 sites cannot assume a homogenous environment – middleware needs to work with many underlying software flavors Avoid evolving standards • Evolving standards change quickly (and sometime significantly cf. OGSI vs. WSRF) – impossible to keep pace on > 100 sites Long (and tedious) pathfrom prototypes to production EGEE - A Large-scale Production Grid Infrastructure

  12. LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 EGEE Middleware: gLite • Exploit experience & existing components • VDT (Condor, Globus) • EDG/LCG • AliEn • … • Develop a lightweight stack of EGEE generic middleware • Dynamic deployment • Pluggable components • Focus is on re-engineering and hardening • March 4, 2006: gLite 3.0 gLite 3.0 EGEE - A Large-scale Production Grid Infrastructure

  13. Developing • gLite 3.0 now available on production infrastructure • After gLite 3.0: • Continuous release of single components • As needed by users and as made available by developers • Major releases provide a “check-point” • In general in coincidence with major application challenges • Continuing development to • Bring components not yet included in release to maturity • Improve functionality • Increase robustness • Increase usability • Improve the compliance to international standards EGEE - A Large-scale Production Grid Infrastructure

  14. GIN Grid Interoperability Leading role in building world-wide grids • Incubator for new Gridprojects world-wide • Interoperation efforts • Bilateral: EGEE/OSG, EGEE/NDGF, EGEE/NAREGI • Multilateral: Grid Interoperability Now (GIN) • Experiences and requirements fed back into standardization process (GGF – now OGF) • Strengthening contacts with industry EGEE - A Large-scale Production Grid Infrastructure

  15. Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences Middleware APST Globus GT4 Condor Building Software for the Grid Courtesy IBM Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure

  16. Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences Lower Middleware Middleware APST Globus GT4 Condor Bonds Building Software for the Grid Upper Middleware & Tools Courtesy IBM, Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure

  17. Middleware structure • Higher-Level Grid Services may or may not be used by the applications • should help them but not be mandatory • Foundation Grid Middleware is deployed on the infrastructure • should not assume the use of Higher-Level Grid Services • must be complete and robust • should allow interoperation with other major grid infrastructures EGEE - A Large-scale Production Grid Infrastructure

  18. gLite Grid Middleware Services Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf EGEE - A Large-scale Production Grid Infrastructure

  19. File and ReplicaCatalogs User Interface Resource Broker Computing Element Storage Element Site X Job submission Information System submit query discover services retrieve update credential publish state publish state submit query retrieve AuthorizationService EGEE - A Large-scale Production Grid Infrastructure

  20. gLite Software Process JRA1 Development Directives Error Fixing Software Serious problem SA3 Integration SA3 Testing & Certification SA1 Pre-Production Deployment Packages Testbed Deployment Problem Fail SA1 Production Infrastructure Pre-Production Deployment Fail Integration Tests Pass Functional Tests Pass Fail Installation Guide, Release Notes, etc Scalability Tests Release Pass EGEE - A Large-scale Production Grid Infrastructure

  21. Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure

  22. EGEE Applications • >20 applications • Astronomy • Biomedicine • Computational Chemistry • Earth Sciences • Financial Simulation • Fusion • Geo-Physics • High Energy Physics • Further applications in evaluation Applications now moving from testing to routine and daily usage EGEE - A Large-scale Production Grid Infrastructure

  23. Mont Blanc (4810 m) Downtown Geneva High Energy Physics Large Hadron Collider (LHC): • One of the most powerful instruments ever built to investigate matter • 4 Experiments: ALICE, ATLAS, CMS, LHCb • 27 km circumference tunnel • Due to start up in 2007 EGEE - A Large-scale Production Grid Infrastructure

  24. Accelerating and colliding particles EGEE - A Large-scale Production Grid Infrastructure

  25. The LHC Accelerator The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors EGEE - A Large-scale Production Grid Infrastructure

  26. Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments LHC DATA This is reduced by online computers that filter out a few hundred “good” events per sec. EGEE - A Large-scale Production Grid Infrastructure

  27. simulation Data Handling and Computation for Physics Analysis reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch EGEE - A Large-scale Production Grid Infrastructure

  28. LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid EGEE - A Large-scale Production Grid Infrastructure

  29. Example: HEP LHCb • LHC data and service challenges • Preparing for LHC start-up in 2007 • Ensure key services & infrastructure are in place • Emphasis on providing a service • Computing needs of experiments • E.g. LHCb: ~700 CPU years in 2005 on the EGEE infrastructure • E.g. ATLAS: over 10,000 jobs per day Massive data transfers > 1.5 GB/s ATLAS ATLAS EGEE - A Large-scale Production Grid Infrastructure

  30. Example: Addressing emerging diseases • Emerging diseases know no frontiers. Time is a critical factor International collaboration is required for: • Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines Avian influenza: human casualties EGEE - A Large-scale Production Grid Infrastructure

  31. WISDOM, the first step • WISDOM focuses on drug discovery for neglected and emerging diseases. • Summer 2005: World-wide In Silico Docking On Malaria • 46 million ligands docked in 6 weeks • ~1 million virtual ligands selected • 1TB of data produced • 1000 computers in 15 countries • Equivalent to 80 CPU years • Spring 2006: drug design against H5N1 neuraminidase involved in virus propagation • impact of selected point mutations on the efficiency of existing drugs • identification of new potential drugs acting on mutated N1 H5 N1 EGEE - A Large-scale Production Grid Infrastructure

  32. Challenges for high throughput virtual docking 300,000 Chemical compounds: ZINC & Chemical combinatorial library Millions of chemical compounds available in laboratories High Throughput Screening 2$/compound, nearly impossible Molecular docking (Autodock) ~100 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining Target (PDB) : Neuraminidase (8 structures) EGEE - A Large-scale Production Grid Infrastructure

  33. Example: Pharmacokinetis • A lesion is detected in an MRI study of a patient – start with virtual biopsy • The process requires obtaining a sequence of MRI volumetric images. • Different images are obtained in different breath-holds. • Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. • The total computational cost of a clinical trial of 20 patients is around 100 CPU days. EGEE - A Large-scale Production Grid Infrastructure

  34. Sumatra, March 28, 2005 Mw=8.5 Peru, June 23, 2001 Mw=8.4 Example: Determining earthquake mechanisms • Seismic software application determines epicentre, magnitude, mechanism • Analysis of Indonesian earthquake (28 March 2005) • Seismic data within 12 hours after the earthquake • Solution found within 30 hours after earthquake occurred • 10 times faster on the Grid than on local computers • Results • Not an aftershock of December 2004 earthquake • Different location (different part of fault line further south) • Different mechanism Rapid analysis of earthquakes important for relief efforts EGEE - A Large-scale Production Grid Infrastructure

  35. Flood forecasting problem • Many kinds of data • Meteorological, hydrological, hydraulic • Generated by simulations or obtained from sensors • Permanent or periodically updated • Publicly available or with restricted access EGEE - A Large-scale Production Grid Infrastructure

  36. ITU-BR system for RRC 2006 • ITU-BR developed a system for RRC 2006 • Run compatibility andcomplementary analysis • 84 PCs executing168 parallel tasks • Compatibility analysis < 4h GreatSuccess ! • ITU-BR wanted to be sure and do even better • Provide more CPU power • Reduce risks by providing a supplementary system • Gain experience on how to access large and reliable computing resources ‘on demand’ • EGEE used a subset of its Grid for RRC 2006 • Over 400 PCs • Compatibility analysis < 1h EGEE - A Large-scale Production Grid Infrastructure

  37. The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure

  38. User Information & Support • More than 170 training events and summer schools across many countries • >3000 people trained induction; application developer; advanced; retreats • Material archive online with ~250 presentations • Public and technical websites • Dissemination material  constantly evolving to expand information and keep it up to date • 4 conferences organized (~ 460 @ Pisa) • Next conference: September 2006 in Geneva ~600 participants EGEE - A Large-scale Production Grid Infrastructure

  39. Industry and EGEE-II • Industry Task Force • Group of industry partners in the project • Links related industry projects (NESSI, BEinGRID, …) • Works with EGEE’s Technical Coordination Group • Collaboration with CERN openlab project • IT industry partnerships for hardware and software development • EGEE Business Associates (EBA) • Companies sponsoring work on joint-interest subjects • Industry Forum • Led by Industry to improve Grid take-up in Industry • Organises industry events and disseminates grid information • e.g. this Wednesday here at the school EGEE - A Large-scale Production Grid Infrastructure

  40. The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure

  41. Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences ??? Middleware APST Globus GT4 Condor Building Software for the Grid Upper Middleware & Tools Lower Middleware Courtesy IBM, Bonds Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure

  42. Portals on EGEE P-Grade Genius EGEE - A Large-scale Production Grid Infrastructure

  43. Example: Biomedicine • Parallel simulationof blood flowon the Grid • Onlinevisualizationof simulationresults on thedesktop • Interactivesteering ofsimulation • Grid is„invisible“ Cooperation with University Amsterdam EGEE - A Large-scale Production Grid Infrastructure

  44. Example: Flooding Crisis Support • Simulation of floodingon the Grid • Onlinevisualizationof simulationresults in theCAVE • Interactivesteering ofsimulation • Grid is„invisible“ Cooperation with Slowak Academy of Sciences EGEE - A Large-scale Production Grid Infrastructure

  45. Scientific Visualization Use your favourite device to connect to the Grid: Sony PSP – PlayStation Portable EGEE - A Large-scale Production Grid Infrastructure

  46. Not only portals • Portals are a good way to bring computing power to end-users • In most cases domain specific • Application programmers (and portal programmers) need more powerful interfaces • Workflow engines • Higher level programming abstractions (SAGA, DRMAA, …) • Programming environments (gEclipse) • Compilers? • … EGEE - A Large-scale Production Grid Infrastructure

  47. The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure

  48. EU GRID Projects related to EGEE EGEE - A Large-scale Production Grid Infrastructure

  49. GIN Related Infrastructures EGEE - A Large-scale Production Grid Infrastructure

  50. The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure

More Related