1 / 71

EGEE Project and Middleware Overview

EGEE Project and Middleware Overview. Marco Verlato. Padova 9 / 5 / 2008. Outline. Introduction The EGEE project Infrastructure Applications Operations and Support The EGEE Middleware: gLite Grid access services Security services Information & Monitoring services

heier
Download Presentation

EGEE Project and Middleware Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EGEE Project and Middleware Overview Marco Verlato Padova 9 / 5 / 2008

  2. Outline • Introduction • The EGEE project • Infrastructure • Applications • Operations and Support • The EGEE Middleware: gLite • Grid access services • Security services • Information & Monitoring services • Data Management services • Job Management services • Conclusions

  3. What is a Grid? • Name “Grid” chosen by analogy with electric power grid (Foster and Kesselman 1997) • Vision: plug-in computer for processing power just like plugging in toaster for electricity. • Concept has been around for decades (distributed computing, metacomputing) • Key difference with the Grid is to realise the vision on a global scale.

  4. What is a Grid? • “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” Ian Foster -- Carl Kesselman, 1998 • “A grid is a combination of networked resources and the corresponding middleware, which provides services for the user” Erwin Laure, EGEE T.D., ISSGC2007 • The users of a Grid are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources, e.g.: the 4 LHC experiments, the community of biomedical researchers, etc

  5. What is a Grid? • It relies on advanced software, called middleware • Middleware automatically finds the data the scientist needs, and the computing powerto analyse it • Middleware balances the load on different resources. It also handles security, accounting, monitoringand much more

  6. Enabling Grid for E-sciencE project Flagship Grid infrastructure project co-funded by the European Commission starting from April 2004 Entering now in the 3° phase • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day

  7. Disciplines and users ~8000 users listed in registered VOs Digital libraries, disaster recovery, computational sciences, etc. http://cic.gridops.org/index.php?section=home&page=volist

  8. Types of applications • Simulation • LHC Monte Carlo simulations; Fusion; WISDOM • Jobs needing significant processing power; Large number of independent jobs; limited input data; significant output data • Bulk Processing • HEP ; Processing of satellite data • Distributed input data; Large amount of input and output data; Job management (WMS); Metadata services; complex data structures • Parallel Jobs • Climate models, computational chemistry • Large number of independent but communicating jobs; Need for simultaneous access to large number of CPUs; MPI libraries • Short-response delays • Prototyping new applications; grid Monitoring grid; Interactivity • Limited input & output data; processing needs but fast response and quality of service • Workflow • Medical imaging; flood analysis • Complex analysis algorithms; complex dependencies between jobs • Commercial Applications • Non-open source software; Geocluster (seismic platform); FlexX (molecular docking); Matlab, Mathematics; Idl, … • License server associated to an application deployment model

  9. High Energy Physicsmachines and detectors pp @ √s=14 TeV L : 1034/cm2/s L: 2.1032 /cm2/s Chambres à muons Trajectographe Calorimètre - 2,5 million collisions per second LVL1: 10 KHz, LVL3: 50-100 Hz 25 MB/sec digitized recording 40 million collisions per second LVL1: 1 kHz, LVL3: 100 Hz 0.1 to 1 GB/sec digitized recording

  10. In silico drug discovery • Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons • Grids open new perspectives to in silico drug discovery • Reduced cost and adding an accelerating factor in the search for new drugs • International collaboration is required for: • Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines • Avian influenza: • bird casualties

  11. Wide In Silicio Docking On Malaria http://wisdom.healthgrid.org/

  12. Example: Pharmacokinetis • A lesion is detected in an MRI study of a patient – start with virtual biopsy • The process requires obtaining a sequence of MRI volumetric images. • Different images are obtained in different breath-holds. • Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. • The total computational cost of a clinical trial of 20 patients is around 100 CPU days.

  13. Example: Civil Protection - Fire Risk

  14. EGEE workload in 2007 Data: 25Pb stored 11Pb transferred CPU: 114 Million hours Estimated cost if performed with Amazon’s EC2 and S3: € 47,486,548 http://gridview.cern.ch/GRIDVIEW/same_index.phphttp://calculator.s3.amazonaws.com/calc5.html?

  15. EGEE-II to EGEE-III • EGEE-III • To be co-funded under European Commission call INFRA-2007-1.2.3 • 32M€ EC funds compared to 36M€ for EGEE-II • Key objectives • Expand/optimise existing EGEE infrastructure, include more resources and user communities • Prepare migration from a project-based model to a sustainable federated infrastructure based on National Grid Initiatives • 2 year period – May 2008 to April 2010 • No gap between EGEE-II and EGEE-III (1 month extension to EGEE-II) • Similar consortium • Now structured on a national basis (National Grid Initiatives/Joint Research Units)

  16. European Grid Initiative (EGI) • Need to prepare permanent, common Grid infrastructure • Ensure the long-term sustainability of the European e-Infrastructure independent of short project funding cycles • Coordinate the integration and interaction between National Grid Infrastructures (NGIs) • Operate the production Grid infrastructure on a European level for a wide range of scientific disciplines Must be no gap in the support of the production grid

  17. EGEE operations Operations Coord. Centre (OCC) - management, oversight of all operational and support activities Regional Operations Centres (ROC) - providing the core of the support infrastructure, each supporting a number of resource centres within its region Resource Centres (RC) - providing resources (computing, storage, network…) -At FZK, coordination and management of user support, single point of contact for users

  18. Monitoring Visualization 18

  19. The Italian Production Grid ~5000 CPUs 950TB (Disk ) + 750 TB (Tape) 40 ‘resource centers’: INFN Grid + SPACI + ENEA + 5 RCs: • Istituto Tecnologie Biomediche – CNR/BARI (LIBI Project) • PERUGIA University • Istituto Linguistica Computazionale CNR-PISA • Scuola Normale Superiore – PISA • ESA-ESRIN Significant expansion foreseen thanks to: Recent PONs TriGrid, PI2S2 Cybersar, Scope, Cresco http://grid-it.cnaf.infn.it

  20. SPACI SouthernPartnershipforAdvancedComputationalInfrastructure 1.5 Tflops ISUFI/CACT Center for Advanced Computing Technologies University of Salento Director: Prof. Giovanni Aloisio IA64 (Itanium 2) DMA/ICAR Dept. of Mathematics and Applications University of Naples “Federico II” & ICAR (Section of Naples) Director: Prof. Almerico Murli MIUR/HPCC Center of Excellence for High Perfomance Computing University of Calabria Director: Prof. Lucio Grandinetti

  21. GEANT CNR Tor Vergata Access to not standard platform AIX – IRIS (afs pool account, lcmaps, yaim customized)

  22. The EGEE support infrastructure • ROC C • ROC B • RC A ROC N VO Support C • RC A VO Support B RC A VO Support A • RC B • RC B RC B • RC C • RC C RC C VO TPM C • ROC C • ROC B ROC N VO TPM B VO TPM A CIC Portal GGUS Central System COD Deployment support Middleware support Deployment support Network Support TPM Middleware support Middleware support Network Support Middleware support Other Grids Other Grids Other Grids Middleware support Middleware support Middleware support Other Grids Other Grids Other Grids

  23. Italian ROC Support • The Italian ROC provides local front line support to Virtual Organization, Users and Resources Centres • The Italian Roc team is organized in daily shifts: • 2 people per shift, 2 shifts per day, from Monday to Friday. • Activities planned during the shift • Log trouble tickets created, updated and closed, problems on grid services and sites, monitor successful site certification • check the status of production grid services and the GRIS status of production CE and SE. • check the status of the production sites using the monitoring tools • Periodic (every 15 days) phone conferences • ROC teams and site managers • Provide and write the ROC report for the weekly EGEE operation meeting

  24. Infrastructures geographical or thematic coverage Support Actions key complementary functions Applications improved services for academia, industry and the public Registered Collaborating Projects 25 projects have registered as of September 2007:web page

  25. e-Infrastructures adopting gLite e-Infrastructures interoperable or in pro- gress to be made interoperable with gLite ~80 countries “linked” together ! e-Infrastructure projects & others Grids

  26. EGEE strategy towards interoperability: The best solution is to have common interfacesthrough the development and adoption of standards. The gLite reference forum for standardization activities is the Open Grid Forum Many contributions (e.g. OGSA-AUTH, BES, JSDL, new GLUE-WG, UR, RUS, SAGA, INFOD, NM, …) Problems: Infrastructures are already in production Standards are still in evolution and often underspecified OGF-GIN follows a pragmatic approach balance between application needs vs. technology push GIN Standards

  27. Example of Interoperability scenario

  28. The GILDA t-Infrastructure (https://gilda.ct.infn.it) • 20 sites in 3 continents • > 11000 certificates issued, >20% renewed at least once • > 250 courses, training events, official university curricula • > 2,000,000 hits on the web site from >100 different countries • > 4.5 TB of training material downloaded from the web site

  29. The INFN Grid Schools(https://agenda.infn.it/conferenceDisplay.py?confId=89)(https://agenda.infn.it/conferenceDisplay.py?confId=85) • Two Grid Schools held in Martina Franca (Taranto, Italy) from 5th to 23rd of November 2007 • 1 week Grid Site Administrator Training Course (to prepare the “Grid-in-a-Room” infrastructure to be used in the following weeks) • 2 weeks Application Integration Training School • 7 applications belonging to different fields such as hadron therapy, data-mining, neural networks, environment and civil protection, hydrology, optimization COMPLETELY “gridified” already during the school • By the end of the school, some applications were also running on INFN production Grid using the resources of several virtual organizations (GRIDIT, THEOPHYS, PAMELA, BIO) • Full report available at: https://agenda.infn.it/materialDisplay.py?materialId=1&confId=85 29

  30. LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 gLite 3.0 EGEE Middleware Distribution • Combines components from different providers • Condor and Globus (via VDT) • LCG (LHC Computing Grid) • EDG (European Data Grid) • Others • After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006 • gLite 3.0 released in May 2006, current release is 3.1 • Develop a lightweight stack of generic middleware useful to EGEE applications • Pluggable components – cater for different implementations • Follow SOA approach, WS-I compliant where possible • Focus now is on re-engineering and hardening • Business friendly open source license: Apache 2.0

  31. Condor Globus MyProxy ... EDG . . . VDT LCG gLite in the Grid “ecosystem” 2001 OSG, … DataTAG CrossGrid ... SRM 2004 GridCC NextGrid EGEE DEISA … interactive EU USA Used in Future grids

  32. The middleware structure • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware will be deployed on the EGEE infrastructure • Must be complete and robust • Should allow interoperation with other major grid infrastructures • Should not assume the use of Higher-Level Grid Services

  33. gLite services orchestration User Interface Workload Management Logging & Bookkeeping Information System submit query discover services retrieve update credential publish state submit publish state query retrieve File and ReplicaCatalogs Site X Computing Element Storage Element AuthorizationService

  34. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  35. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  36. User Interface (UI) • The access point to the EGEE Grid is the User Interface (UI) • It provides the CLI toolsto access the functionalities offered by the gLite Services • They allowto perform some basic Grid operations: • create the user proxy needed for authentication/authorization • retrieve the status of different resources from the Information System • copy, replicate and delete files from the Grid • list all theresources suitable to execute a given job • submitjobs for execution • cancel jobs • retrievethe output of finished jobs • show the status of submitted jobs • retrieve the logging and bookkeeping information of jobs • It provides the APIsto allow the development ofGrid-enabled applications

  37. GENIUS Grid Portal • Developed by INFN & NICE s.r.l. • GUI mapped to gLite cmd line • Write JDL, Submit JDL, Check status, Download result • GUI for Storage access • TRIANA integration: execute DAG workflows

  38. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  39. Structure of a X.509 certificate Public key Subject: C=IT, O=INFN, OU=Personal Certificate,L=LNL,CN=Marco Verlato Issuer: C=IT, O=INFN, CN=INFN CA Validity: Not Before: Mar 15 13:28:54 2008 GMT Not After : Mar 15 13:28:54 2009 GMT Serial Number: 3235 (0xca3) + passphrase CA Digital signature Security: Basic Concepts • GSI Authentication based on PKI X.509 SSL infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) • Commonly used in web browsers to authenticate to sites • Trust between CAs and sites is established (offline) • In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates • Proxies can • Be delegated to a service such that it can act on the user’s behalf • Include additional attributes (like VO information via the VO Membership Service VOMS, see next) • Be stored in an external proxy store (myProxy) • Be renewed (in case they are about to expire)

  40. Bare certificates are not enough for defining user capabilities on the Grid Users belong to VO’s, to groups inside a VO and may have special roles VOMS provides a way to add attributesto a certificate proxy: mutual authentication of client and server VOMS produces a signed Attribute Certificate (AC) the client produces a new proxy that contains the attributes The attributes are used to provide the user with additional capabilities according to the VO policies Authentication Request OK C=IT/O=INFN /L=Padova/CN=Marco Verlato/CN=proxy Query AuthDB VOMSAC VOMSAC VOMS === VO cms extension information === VO: cms subject: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Marco Verlato/Email=Marco.Verlato@pd.infn.it issuer: /C=CH/O=CERN/OU=GRID/CN=host/voms.cern.ch attribute: /cms/prod/ROLE=manager/Capability=NULL attribute: /cms/Role=NULL/Capability=NULL timeleft : 11:59:45 client VO Membership Service: VOMS

  41. LCAS / LCMAPS • Local Centre Authorization Service (LCAS) • Checks if the user is authorized (currently using the grid-mapfile) • Checks if the user is banned at the site • Checks if at that time the site accepts jobs • Local Credential Mapping Service (LCMAPS) • Maps grid credentials to local credentials (eg. UNIX uid/gid, AFS tokens, etc.) • Map also VOMS group and roles (full support of FQAN) "/VO=cms/GROUP=/cms" .cms "/VO=cms/GROUP=/cms/prod" .cmsprod "/VO=cms/GROUP=/cms/prod/ROLE=manager" .cmsprodman

  42. Security overview

  43. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  44. GRIS and BDIIs BDII top-level Berkeley DatabaseInformation Index Queries WMS WN 2 minutes BDII site-level Site UI FTS - Based on ldap - Standardized information provider (GIP) - GLUE-1.3 schema - Top level Used with 230+ sites - Roughly 60 instances in EGEE BDII resource MDS GRIS provider provider

  45. GRIS and BDIIs

  46. For users R-GMA appears similar to a single relational database Implementation of OGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Typical deployment consists of Producer and Consumer Services on a one per site basis (MON box), and a centralized Registry and Schema Publish Tuples Producer application Producer Service API SQL“INSERT” Register Registry Service Query Tuples SQL“SELECT” Locate Send Query Consumer application Consumer Service API Receive Tuples Schema Service SQL“CREATE TABLE” R-GMA/MON box

  47. GridICE monitoring - MON boxalso hosts the GridICE extended GRIS (on port 2136) - Usually deployed together aSE

  48. GridICE monitoring tool

  49. gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf

  50. Data Services • Need common interface to storage resources • Storage Resource Manager (SRM) • Need to keep track where data is stored • File and Replica Catalogs • Need scheduled, reliable file transfer • File transfer services • Need a way to describe files’ content and query them • Metadata catalog • Heterogeneity • Data is stored on different storage systems using different access technologies • Distribution • Data is stored in different locations – in most cases there is no shared file system or common namespace • Data needs to be moved between different locations • Data description • Data are stored as files: need a way to describe files and locate them according to their contents

More Related