1 / 35

Status and evolution of the EGEE Project and its Grid Middleware

This presentation by Frédéric Hemmer, the Middleware Manager at CERN, provides an overview of the EGEE Project, including its goals, structure, grid operations, middleware, networking activities, and applications in high energy physics and biomedical fields. It also discusses the geographical extensions and collaborations with other projects. The emphasis is on operating a production grid and supporting end-users.

lespada
Download Presentation

Status and evolution of the EGEE Project and its Grid Middleware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Status and evolution of the EGEE Project and its Grid Middleware By Frédéric Hemmer Middleware Manager CERN Geneva, Switzerland International Conference on Next Generation Networks Brussels, Belgium June 2, 2005

  2. Contents • The EGEE Project • Overview and Structure • Grid Operations • Middleware • Networking Activities • Applications • High Energy Physics • Biomedical • Summary and Conclusions Next Generation Networks - June 2, 2005

  3. EGEE goals • Goal of EGEE: develop a service grid infrastructure which is available to scientists 24 hours-a-day • The project concentrates on: • building a consistent, robust and secure Grid network that will attract additional computing resources • continuously improve and maintain the middleware in order to deliver a reliable service to users • attracting new users from industry as well as science and ensure they receive the high standard of training and support they need Next Generation Networks - June 2, 2005

  4. EGEE EGEE is the largest Grid infrastructure project in Europe: • 70 leading institutions in 27 countries, federated in regional Grids • Leveraging national and regional grid activities • ~32 M Euros EU funding for initially 2 years starting 1st April 2004 • EU review, February 2005 successful • Preparing 2nd phase of the project – proposal to EU Grid call September 2005 • Promoting scientific partnership outside EU Next Generation Networks - June 2, 2005

  5. EGEE Geographical Extensions • EGEE is a truly international under-taking • Collaborations with other existing European projects, in particular: • GÉANT, DEISA, SEE-GRID • Relations to other projects/proposals: • OSG: OpenScienceGrid (USA) • Asia: Korea, Taiwan, EU-ChinaGrid • BalticGrid: Lithuania, Latvia, Estonia • EELA: Latin America • EUMedGrid: Mediterranean Area • … • Expansion of EGEE infrastructure in these regions is a key element for the future of the project and international science Next Generation Networks - June 2, 2005

  6. EGEE Activities • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users Next Generation Networks - June 2, 2005

  7. EGEE Activities • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users Next Generation Networks - June 2, 2005

  8. Country providing resources Country anticipating joining 131 sites, 30 countries • >12,000 cpu • ~5 PB storage Computing Resources – April 2005 Next Generation Networks - June 2, 2005

  9. Infrastructure metrics Countries, sites, and CPU available in EGEE production service EGEE partner regions Other collaborating sites Next Generation Networks - June 2, 2005

  10. Service Usage • VOs and users on the production service • Active HEP experiments: • 4 LHC, D0, CDF, Zeus, Babar • Active other VO: • Biomed, ESR (Earth Sciences), Compchem, Magic (Astronomy), EGEODE (Geo-Physics) • 6 disciplines • Registered users in these VO: 600 • In addition to these there are many VO that are local to a region, supported by their ROCs, but not yet visible across EGEE • Scale of work performed: • LHC Data challenges 2004: • >1 M SI2K years of cpu time (~1000 cpu years) • 400 TB of data generated, moved and stored • 1 VO achieved ~4000 simultaneous jobs (~4 times CERN grid capacity) Number of jobs processed/month Next Generation Networks - June 2, 2005

  11. RC RC RC RC ROC RC RC RC RC RC ROC RC RC RC CIC CIC RC ROC RC CIC OMC CIC CIC CIC RC RC RC ROC RC RC RC Grid Operations • The grid is flat, but • Hierarchy of responsibility • Essential to scale the operation • CICs act as a single Operations Centre • Operational oversight (grid operator) responsibility • rotates weekly between CICs • Report problems to ROC/RC • ROC is responsible for ensuring problem is resolved • ROC oversees regional RCs • ROCs responsible for organising the operations in a region • Coordinate deployment of middleware, etc • CERN coordinates sites not associated with a ROC RC - Resource Centre ROC - Regional Operations Centre CIC – Core Infrastructure Centre Next Generation Networks - June 2, 2005

  12. GIIS Monitor + Monitor Graphs Sites Functional Tests GOC Data Base Scheduled Downtimes Live Job Monitor GridIce – VO + fabric view Certificate Lifetime Monitor Grid monitoring • Operation of Production Service: real-time display of grid operations • Accounting information • Selection of Monitoring tools: Next Generation Networks - June 2, 2005

  13. LCG Deployment Schedule Next Generation Networks - June 2, 2005

  14. LCG Service Challenges • “Service Challenge 2” • Throughput test from LCG Tier-0 to LCG Tier-1 sites • Started 14th March • Set up Infrastructure to 7 Sites • NL, IN2P3, FNAL, BNL, FZK, INFN, RAL • 100MB/s to each site • 500MB/s combined to all sites at same time • 500MB/s to a few sites individually • Goal : by end March’05, sustained 500 MB/s at CERN Next Generation Networks - June 2, 2005

  15. SC2 met its throughput targets • >600MB/s daily average for 10 days was achieved - Midday 23rd March to Midday 2nd April • Not without outages, but system showed it could recover rate again from outages • Load reasonable evenly divided over sites (give network bandwidth constraints of Tier-1 sites) Next Generation Networks - June 2, 2005

  16. Service Challenge 3 • Throughput phase • 2 weeks sustained in July 2005 • Primary goals: • 150MB/s disk – disk to Tier1s; • 60MB/s disk (T0) – tape (T1s) • Secondary goals: • Include a few named T2 sites (T2 -> T1 transfers) • Encourage remaining T1s to start disk – disk transfers • Service phase • September – end 2005 • Start with ALICE & CMS, add ATLAS and LHCb October/November • All offline use cases except for analysis • More components: WMS, VOMS, catalogs, experiment-specific solutions • Implies production setup (CE, SE, …) Next Generation Networks - June 2, 2005

  17. EGEE Activities • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users Next Generation Networks - June 2, 2005

  18. Future EGEE Middleware - gLite • Intended to replace present middleware (LCG-2) • Developed mainly from existing components • Aims to address present shortcomings and advanced needs from applications • Regular, iterative updates for fast user feedback • Makes use of web-services where currently feasible LCG-1 LCG-2 gLite-1 gLite-2 Globus 2 based Web services based Application requirements http://egee-na4.ct.infn.it/requirements/ Next Generation Networks - June 2, 2005

  19. Architecture & Design • Design team including representatives from Middleware providers (AliEn, Condor, EDG, Globus,…) and Operations, including US partners produced middleware architecture and design. • Takes into account input and experiences from applications, operations, and related projects • Focus on medium term (few months) and commonalities with other projects (e.g. OSG) • Effective exchange of ideas, requirements, solutions and technologies • Coordinated development of new capabilities • Open communication channels • Joint deployment and testing of middleware • Early detection of differences and disagreements • The 2nd release of gLite (v1.1) made in May’05 • http://cern.ch/glite/packages/R1.1/R20050430/default.asp • http://cern.ch/glite/documentation gLite is not “just” a software stack, it is a “new” framework for international collaborative middleware development. Much has been accomplished in the first year. However, this is “just” the first step. Next Generation Networks - June 2, 2005

  20. gLite Services in Release 1Software stack and origin (simplified) • Storage Element • glite-I/O (AliEn) • Reliable File Transfer (EGEE) • GridFTP (Globus) • SRM: Castor (CERN), dCache (FNAL, DESY), other SRMs • Catalog • File/Replica & Metadata Catalogs (EGEE) • Security • GSI (Globus) • VOMS (DataTAG/EDG) • Authentication for C and Java based (web) services (EDG) • Computing Element • Gatekeeper (Globus) • Condor-C (Condor) • CE Monitor (EGEE) • Local batch system (PBS, LSF, Condor) • Workload Management • WMS (EDG) • Logging and bookkeeping (EDG) • Condor-C (Condor) • Information and Monitoring • R-GMA (EDG) Now doing rigorous scalability and performance tests on pre-production service Next Generation Networks - June 2, 2005

  21. Software Process • JRA1 Software Process is based on an iterative method • It comprises two main 12-month development cycles divided in shorter development-integration-test-release cycles lasting 1 to 4 weeks • The two main cycles start with full Architecture and Design phases, but the architecture and design are periodically reviewed and verified. • The process is documented in a number of standard documents: • Software Configuration Management (SCM) Plan • Test Plan • Quality Assurance Plan • Developer’s Guide Next Generation Networks - June 2, 2005

  22. Functional Tests Release Process Development Integration Testing Deployment Packages Software Code Fail Pass Testbed Deployment Integration Tests Fix Fail Pass Installation Guide, Release Notes, etc Next Generation Networks - June 2, 2005

  23. Bug Counts and Trends May 18, 2005 # Defects/KLOC: 2.01 Next Generation Networks - June 2, 2005

  24. gLite: What’s next? • Focus and Priority is • Bug Fixing • Support to Service Challenge 3 • File Transfer Service (FTS) • New planned features in 1.2 • VOMS 1.5 (Oracle support) • CE: Condor support (in addition to PBS/LSF) • WMproxy (Web Services Interface including bulk job submission) • Service discovery interface to BDII • File Transfer Service improvements • R-GMA aligned with the LCG-2 version • Beyond 1.2 • DGAS accounting system • Job Provenance • Globus Workspace Services • Harmonization of Security Models • Integration with Service Discovery Next Generation Networks - June 2, 2005

  25. EGEE Activities • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) Emphasis in EGEE is on operating a production grid and supporting the end-users Next Generation Networks - June 2, 2005

  26. Outreach & Training • Public and technical websites constantly evolving to expand information available and keep it up to date • 3 conferences organised • ~ 300 @ Cork, ~ 400 @ Den Haag, ~500 @ Athens • Pisa 4th project conference 24-28 October ’05 • More than 70 training events (including the GGF grid school) across many countries • ~1000 people trained • induction; application developer; advanced; retreats • Material archive with more than 100 presentations • Strong links with GILDA testbed and GENIUS portal developed in EU DataGrid Next Generation Networks - June 2, 2005

  27. Pilot New Deployment of applications • Pilot applications • High Energy Physics • Biomed applications http://egee-na4.ct.infn.it/biomed/applications.html • Generic applications –Deployment under way • Computational Chemistry • Earth science research • EGEODE: first industrial application • Astrophysics • With interest from • Hydrology • Seismology • Grid search engines • Stock market simulators • Digital video etc. • Industry (provider, user, supplier) Next Generation Networks - June 2, 2005

  28. HEP (ATLAS) Utilisation 12000 ~660K jobs total in (LCG, Nordugrid, US Grid3) ~400 kSI2k years of CPU In latest period average ~7K jobs/day with ~5K in LCG 6000 Next Generation Networks - June 2, 2005

  29. Bioinformatics • GPS@: Grid Protein Sequence Analysis • NPSA is a web portal offering proteins databases and sequence analysis algorithms to the bioinformaticians (3000 hits per day) • GPS@ is a gridified version with increased computing power • Need for large databases and big number of short jobs • xmipp_MLrefine • 3D structure analysis of macromolecules from (very noisy) electron microscopy images • Maximum likelihood approach for finding the optimal model • Very compute intensive • Drug discovery • Health related area with high performance computation need • An application currently being ported in Germany (Fraunhofer institute) Next Generation Networks - June 2, 2005

  30. Medical imaging • GATE • Radiotherapy planning • Improvement of precision by Monte Carlo simulation • Processing of DICOM medical images • Objective: very short computation time compatible with clinical practice • Status: development and performance testing • CDSS • Clinical Decision Support System • knowledge databases assembling • image classification engines widespreading • Objective: access to knowledge databases from hospitals • Status: from development to deployment, some medical end users Next Generation Networks - June 2, 2005

  31. Medical imaging • SiMRI3D • 3D Magnetic Resonance Image Simulator • MRI physics simulation, parallel implementation • Very compute intensive • Objective: offering an image simulator service to the research community • Satus: parallelized and now running on EGEE resources • gPTM3D • Interactive tool for medical images segmentation and analysis • A non gridified version is distributed in several hospitals • Need for very fast scheduling of interactive tasks • Objectives: shorten computation time using the grid • Status: development of the gridified version being finalized Next Generation Networks - June 2, 2005

  32. RLS, VO LDAP Server: CC-IN2P3 4 RBs: CNAF, IFAE, LAPP, UPV 20 resource centres • CEs: >2000 CPUs • SEs: >20TB disk 15 RBs 2 RLS 1 LDAP Server Status of Biomedical VO PADOVA BARI Next Generation Networks - June 2, 2005

  33. Grid conclusions • e-Infrastructures deployment creates a powerful new tool for science – as well as applications from other fields • Investments in grid projects and e-Infrastructure are growing world-wide • Applications are already benefiting from Grid technologies • Open Source is the right approach for publicly funded projects and necessary for fast and wide adoption • Europe is strong in the development of e-Infrastructure also thanks to the initial success of EGEE • Collaboration across national and international programmes is very important Next Generation Networks - June 2, 2005

  34. Summary • EGEE is the first attempt to build a worldwide Grid infrastructure for data intensive applications from many scientific domains • A large-scale production grid service is already deployed and being used for HEP and BioMed applications with new applications being ported • Resources & user groups are expanding • A process is in place for migrating new applications to the EGEE infrastructure • A training programme has started with many events already held • “next generation” middleware is being tested (gLite) • First project review by the EU successfully passed in Feb’05 • Plans for a follow-on project are being prepared Next Generation Networks - June 2, 2005

  35. Contacts • EGEE Web Site http://www.eu-egee.org • How to join http://public.eu-egee.org/join/ • gLite Web Site http://www.glite.org • EGEE Project Office project-eu-egee-po@cern.ch Next Generation Networks - June 2, 2005

More Related