1 / 28

gLite Overview

gLite Overview. Roberto Barbera Univ. of Catania and INFN EELA First Tutorial Madrid, 22.02.2006. Outline. gLite software Processes and Releases Components Overview Status of Subsystems Testing Status Deployment Status Short Term Plans Summary and Conclusions. Condor. Globus.

ronli
Download Presentation

gLite Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gLite Overview Roberto Barbera Univ. of Catania and INFN EELA First Tutorial Madrid, 22.02.2006

  2. Outline • gLite software Processes and Releases • Components Overview • Status of Subsystems • Testing Status • Deployment Status • Short Term Plans • Summary and Conclusions EELA First Tutorial, Madrid, 22.02.2006

  3. Condor Globus MyProxy ... EDG . . . VDT LCG The Grid “ecosystem” 2001 OSG, … DataTAG CrossGrid ... SRM GridCC NextGrid EGEE DEISA … 2004 USA EU Used in Future grids EELA First Tutorial, Madrid, 22.02.2006

  4. Overview of the gLite Middleware • The gLite Grid services follow a Service Oriented Architecture • facilitate interoperability among Grid services • allow easier compliance with upcoming standards • Architecture is not bound to specific implementations • services are expected to work together • services can be deployed and used independently • The gLite service decomposition has been largely influenced by the work performed in the LCG project EELA First Tutorial, Madrid, 22.02.2006

  5. gLite Software Processes • Testing Team • Test Release candidates on a distributed testbed • CERN, RRZN Uni Hannover, Imperial College • Raise bugs as needed • Iterate with Integrators & Developers • Design Team • Architecture Definition • design description of Service presented in the Architecture document • Implementation work plan • Progress tracked monthly at the EMT • EMT defines release contents • Based on work plan progress • Based on essential items needed • so far mainly for the HEP experiments, BioMed and Operations needs • Decide on target dates for tags • Taking into account enough time for integration & testing • Deployment on Pre-production • Service Challenge • Feedback from larger number of sites and different level of competence • Raise Critical bugs as needed • Critical bugs fixed with Quick Fixes when possible • Integration Team • produces Release Candidates • based on received tags • Performs • Build, Smoke Test, Deployment Modules • Iterate with developers • Once Release Candidate passed functional tests • produces documentation, release notes and final packaging • announce the release on the glite Web site and the glite-discussion mailing list. • Deployment on Production • Selected set of Services • Based on applications needs • FTS, R-GMA, VOMS EELA First Tutorial, Madrid, 22.02.2006

  6. gLite 1.1.2 Special Release for SC File Transfer Service gLite 1.4.1 Service Release ~60 Defect Fixes gLite 1.1.1 Special Release for SC File Transfer Service gLite 1.3 File Placement Service FTS multi-VO Refactored RGMA & CE gLite 1.4 VOMS for Oracle SRMcp for FTS WMproxy LBproxy DGAS gLite 1.0 Condor-C CE gLite I/O R-GMA WMS L&B VOMS Single Catalog gLite 1.2 File Transfer Agents Secure Condor-C gLite 1.1 File Transfer Service Metadata catalog Functionality QF1.4.1__27_2005 QF1.4.1__28_2005 gLite 1.5 Release Date QF1.3.0_22_2005 QF1.4.1__29_2005 QF1.4.1__30_2005 QF1.3.0_20_2005 QF1.4.1__26_2005 QF1.3.0_21_2005 QF1.4.1__25_2005 QF1.3.0_19_2005 QF1.1.2_11_2005 QF1.3.0_18_2005 gLite 1.5 Functionality Freeze QF1.1.0_09_2005 QF1.1.0_10_2005 QF1.0.12_04_2005 QF1.3.0_17_2005 QF1.1.0_07_2005 QF1.1.0_08_2005 QF1.0.12_02_2005 QF1.1.2_16_2005 QF1.0.12_03_2005 QF1.1.2_13_2005 QF1.1.0_05_2005 QF1.1.0_06_2005 QF1.2.0_14_2005 QF1.2.0_15_2005 QF1.0.12_01_2005 QF1.3.0_24_2005 QF1.1.2_12_2005 QF1.3.0_23_2005 April 2005 May 2005 June 2005 July 2005 Aug 2005 Sep 2005 Oct 2005 Nov 2005 Dec 2005 Jan 2006 Feb 2006 gLite Releases and Planning EELA First Tutorial, Madrid, 22.02.2006

  7. gLite status • gLite v1.5 • Hydra, AMGA and GPbox • LFC and DPM integrated • gLite v1.4 released 30/09/2005 • WM Proxy & LB Proxy, DGAS • FPS & FTS unified, MySql Support added • gLite v1.3 released 05/08/2005 • File Placement Service, File Placement Service clients added to UI and WNs modules • new data transfer agents including architecture refactoring to allow proper inter-VO scheduling • gLite v1.2 released 22/07/2005 • File Transfer Service and the File Transfer Agents • gLite v. 1.1 released 13/05/2005 • File Transfer Service and the Metadata Catalog • gLite v. 1.0 released 05/04/2005 • See documentation on http://www.glite.org EELA First Tutorial, Madrid, 22.02.2006

  8. Middleware in EGEE Applications by R.Jones, 24/10 • Provide specific solutions for supported applications • Host services from other projects • More rapid changes than Foundation Grid Middleware • Deployed as application software using procedure provided by grid operations Higher-Level Grid Services Workload Management Replica Management Visualization Workflows Grid economies etc. Foundation Grid Middleware Security model and infrastructure Computing (CE) & Storage Elements (SE) Accounting Information providers and monitoring • Application independent • Evaluate/adhere to new stds • Emphasis on robustness/stability over new functionality • Deployed as a software distribution by grid operations EELA First Tutorial, Madrid, 22.02.2006

  9. Middleware in EGEE • application independent “Grid Foundation Middleware” all services that need to be deployed on a production Grid infrastructure in order to provide a consistent, dependable service. It can be regarded as the “Middleware Infrastructure”. • higher level “Grid Services”. comprise higher level services that certain, but not all, VOs require. EELA First Tutorial, Madrid, 22.02.2006

  10. Grid Foundation Middleware (1/2) • A Service Oriented Architecture (SOA) approach is essential to allow different Grid foundation services to interface with services not developed inside EGEE-II. • Adherence to widely accepted international standards (WSRF) • Components Security model and infrastructure • define and enforce policies within VOs as well as those needed by the resource provider (i.e. between VOs). • resource access control, resource access auditing and VO membership management. • policy definition and enforcement, dynamic connectivity, auditing, and interoperability with Shibboleth-based authentication and authorisation infrastructures. Computing Element (CE) • set of services that provide homogenous, managed, and secure access to heterogeneous, remote computing resources. • to directly submit jobs or to dynamically deploy VO specific schedulers. • collaboration w/ Condor and Globus. EELA First Tutorial, Madrid, 22.02.2006

  11. Grid Foundation Middleware (2/2) Storage Element (SE) • provides homogeneous access to storage resources, including managed data transfer. • POSIX-like I/O and FTS. Accounting • collecting the relevant information locally at the resources and making it available at a global or VO level (in a secure manner) for statistical, billing, or scheduling purposes. • Prototype. Information and monitoring • Information on resources must be accessible to other services in a dependable and timely manner. • Supporting: monitoring of resources and allow user level monitoring (information and monitoring infrastructure, and service discovery). • Allowing free information flow across different Grid flavours. EELA First Tutorial, Madrid, 22.02.2006

  12. Grid Services • Grid Services comprise higher level services typically VO specific • exploit the Grid Foundation Middleware for achieving their purpose. • Grid Services not only from within EGEE-II • Produce a middleware infrastructure that is general and capable of hosting Grid Services coming from other sources and projects. • Services: • workload management services • Grid scheduler like functionality • logging, bookkeeping, and job provenance services • keep track of actions performed at the Grid level • replica management • services that reliably schedule data movement and catalog updates, • visualization services • for various information collected on the Grid, • workflow services • application specific abstraction of the workflow, • Grid economies • economy based scheduling, advanced reservation, etc. EELA First Tutorial, Madrid, 22.02.2006

  13. gLite components overview Near Future Access Services Grid AccessService API CLI now Security Services Information & Monitoring Services Authorization Auditing Information &Monitoring Job Monitoring Service Monitoring Authentication Dynamic Connectivity Service Discovery Data Services Job Management Services MetadataCatalog File & ReplicaCatalog JobProvenance PackageManager Accounting StorageElement DataMovement WorkloadManagement ComputingElement Site Proxy EELA First Tutorial, Madrid, 22.02.2006

  14. Status of Security Services • Most Services rely on GSI and MyProxy • Still using well understood GT2 implementation • Authentication can be expensive • Several subsystems provide bulk operations • VOMS • Manages VO Membership • Provides support for Groups and Roles • Support for MySQL and Oracle DB backend • Included in the VDT • VOMS Admin • Support for Oracle and MySQL back ends • VOMS ADMIN (Oracle) still problematic • Deployed on the Production Infrastructure • Interfaced with OSG’s VOMSRS EELA First Tutorial, Madrid, 22.02.2006

  15. Status of Job Management Services • Logging and Bookkeeping (L&B) • Tracks jobs during their lifetime (in terms of events) • LB Proxy • Provides faster, synchronous and more efficient access to LB services • Support for “CE reputability ranking“ • Maintains recent statistics of job failures at CE’s • Working on inclusion of L&B in the VDT • Computing Element (CE) • Service representing a computing resource • CE moving towards a VO based local scheduler • Batch Local ASCII Helper (BLAH) • More efficient parsing of log files (these can be left residing on a remote machine) • Support for hold and resume in BLAH • Usefull e.g. to put a job on hold, waiting the staging of the input data • Condor-C GSI enabled • CE Monitor (CEMon) • Better support for the pull mode • Security support • GridIce plugin for CEMon implemented • Included in VDT and used in OSG for resource selection • GPbox • XACML-based policy maintainer, parser and enforcer. • Can be used for authorizations checks at various levels. EELA First Tutorial, Madrid, 22.02.2006

  16. Status of Job Management Services • Workload Management System (WMS) • Backward compatibility with LCG-2 • WMProxy • Web service interface to the WMS • Allows support of bulk submissions and jobs with shared sandboxes • Support for MPI job even if the file system is not shared between CE and Worker Nodes (WN) • Support of R-GMA as resource information repository to be used in the matchmaking besides BDII and CEMon • Support for Data management interfaces (DLI and StorageIndex) • Support for shallow resubmission • Resubmission happens in case of failure only when the job didn't start running • Support for execution of all DAG nodes within a single CE • chosen by user or by the WMS matchmaker • Support for file peeking to access files during the execution of the job • Integration with G-Pbox - considering simple AuthZ policies • Support for pilot job • DGAS Accounting • Accumulates Grid accounting information about the usage of Grid resources by users / groups (e.g. VOs) for billing and scheduling policies • CEs can be instrumented with proper sensors to measure the resources used • Job provenance • Long term job information storage • Useful for • debugging, post-mortem analysis, comparison of job executions in different environments • statistical analysis • WMS, CE, LB are considered for inclusion in the next LCG-2.7.0 release • Currently deployed on the Pre-production service and DILIGENT testbed • Tested on many private instances EELA First Tutorial, Madrid, 22.02.2006

  17. Status of Data Management Services • FiReMan catalog • Resolves logical filenames (LFN) to physical location of files (URL understood by SRM) and storage elements • Oracle and MySQL versions available • Secure services, using VOMS groups, ACL support • Full set of Command Line tools • Simple API for C/C++ wrapping a lot of the complexity for easy usage • Attribute support • Symbolic link support • Exposing interfaces suitable for matchmaking (StorageIndex and DLI ) • Separate catalog available as a keystore for data encryption (‘Hydra’) • Deployed on the Pre-Production Service and DILIGENT testbed • gLite I/O • POSIX-like access to Grid files • Interfaced to Castor, dCache and DPM Storage Resource Managers • Added a remove method to be able to delete files • Configuration using the common Service Discovery interfaces • Improved error reporting • Has been used for the BioMedical Demo in Pisa (Oct’05) • Deployed on the Pre-Production Service and the DILIGENT testbed • AMGA MetaData Catalog • NA4 contribution • Result of JRA1 & NA4 prototyping together with PTF assessment • Used by the LHCb experiment • Has been used for the BioMedical Demo in Pisa (Oct’05) EELA First Tutorial, Madrid, 22.02.2006

  18. Status of File Transfer Service • Reliable file transfer • Full scalable implementation • Java Web Service front-end, C++ Agents, Oracle or MySQL database support • Support for • Channel, Site and VO management • Gsiftp and Storage Resource Management (SRM) interfaces • Interfaces for management and statistics monitoring • Has been in use by the Service Challenges for the last 6 months. • FTS evolved to include • Support for MySQL and Oracle • Multi-VO support • SRM copy support • MyProxy server as a CLI argument • Many small changes/optimizations revealed by Service Challenges usage EELA First Tutorial, Madrid, 22.02.2006

  19. Status of Information Systems • R-GMA • Essentially bug fixes & consolidation • Merging LCG & gLite code base • Secure version • Used in production as monitoring data aggregator • Job status published from Logging & Bookkeeping every 5 minutes • Interfaced from the Workload Management System • Service Discovery • An interface has been defined and implemented for 3 back-ends • R-GMA • BDII • Configuration File • Command Line tool for end users • Used by WMS and Data Management clients • Production Services still using BDII as the Information System • Pre-Production Service has started to use R-GMA EELA First Tutorial, Madrid, 22.02.2006

  20. gLite Services for Release 1.5Components Summary and Origin • Computing Element • Gatekeeper, WSS • Globus • Condor-C • Condor • BLAH, CE Monitor • EGEE • Local batch system • PBS, LSF, Condor, SGE, BQS • DGAS Accounting • EDG/EGEE • Workload Management • WMS, Logging and bookkeeping • EDG/EGEE • Condor-C • Condor • Job Provenance • EGEE • Storage Element • File Transfer/Placement • EGEE • glite-I/O • AliEn • GridFTP • Globus • Castor, DPM • CERN • dCache • FNAL, DESY • Catalog • File and Replica Catalog • EGEE • Metadata Catalog • EGEE/NA4 • Information and Monitoring • R-GMA • EDG/EGEE • Service Discovery • EGEE • BDII • EDG/LCG • Security • VOMS • DataTAG, EDG/EGEE • GSI, WSS • Globus • LCAS/LCMAPS • EDG/EGEE • Authorization for C and Java based (web) services • EDG/EGEE/Globus • GPBox • EGEE • User Interface • EDG/EGEE EELA First Tutorial, Madrid, 22.02.2006

  21. Status of gLite Deployment • Production • File Transfer Service (FTS) • R-GMA (Monitoring & Accounting Data Aggregation) • VOMS/VOMS Admin • Preproduction Service • 14 sites • CERN, CNAF, PIC Computing Elements are connected to the production worker nodes • ~ 1.5M Jobs submitted • FTS, WMS/LB/CE, FireMan, gLite I/O (DPM, Castor), R-GMA • Others • DILIGENT (Digital Library Project) has deployed a number of those services as well • GILDA EELA First Tutorial, Madrid, 22.02.2006

  22. gLite Testing Status EELA First Tutorial, Madrid, 22.02.2006

  23. Status of parallel activities • Revision of the Architecture, Design and Work plan documents • https://edms.cern.ch/document/594698/ • https://edms.cern.ch/document/573493/ • https://edms.cern.ch/document/606574/ • Advanced Reservation • Architecture proposed • https://edms.cern.ch/file/508055/2-2/EGEE-JRA1-AR-508055-v2-2.pdf • Integration with WMS prototyped • http://agenda.cern.ch/askArchive.php?base=agenda&categ=a052420&id=a052420s3t5/transparencies • Work presented at the 1st IEEE International Conference on e-Science and Grid Computing in Melbourne December 2005 • OMII & GT4 evaluations • https://edms.cern.ch/document/683456/ • https://edms.cern.ch/document/672123/ • Interfacing of ProActive to gLite • Demonstrated at the 2nd Grid PlugTests event • Hands on with gLite tutorial • Development of a new Web Services based CE • CREAM: http://grid.pd.infn.it/cream/field.php EELA First Tutorial, Madrid, 22.02.2006

  24. Convergence of gLite and LCG-2 • Converge from LCG and gLite to a single middleware stack called gLite. The first version will be gLite 3.0.0 • gLite 1.5.0 and LCG 2.7.0 will be the last independent releases (expected in January) • gLite 3.0.0 will contain the following components: • All components already in LCG 2.7.0 plus upgrades • this already includes new versions of VOMS, R-GMA and FTS • The Workload Management System and the DGAS accounting system of gLite 1.5.0 • The PPS will have also the other gLite 1.5.0 components • After the release the missing components of gLite 1.5.0 will be included in a new release, or the same functionality will be added to the existing components • Will start from the data management system (Fireman, gLiteI/0, Hydra, AMGA) EELA First Tutorial, Madrid, 22.02.2006

  25. Timeline for gLite 3.0.0 • TCG proposed timeline: • Integration: During January • On component level (separate build systems) • Merging the service configuration tools • Taking into account outcome of the site manager’s survey • Testing and Certification: During February/March • Preproduction service as deployment test • Based on existing test suites • Very little time for merging the test frames • Public Release: April/May • Deploying WLM in parallel on large sites • Small sites can afford just one gatekeeper EELA First Tutorial, Madrid, 22.02.2006

  26. Releases after gLite 3.0.0 • Driven by the TCG according to the EGEE-II process • other components form gLite 1.5.0 • adaptation of components to meet user needs • Introduction of missing functionality • Input from applications Task Forces EELA First Tutorial, Madrid, 22.02.2006

  27. Summary and conclusions • The EGEE middleware: • Is exiting prototyping phase and entering real production phase (LHC first real data are only 1 years away from now!) • Implements a full and complete stack of grid services that can be used all together or separately at user’s discretion • Closely follow the standardization process going in GGF and other for a standardization of grid middlewares EELA First Tutorial, Madrid, 22.02.2006

  28. Further information • EGEE http://www.eu-egee.org/ • gLite http://www.glite.org/ EELA First Tutorial, Madrid, 22.02.2006

More Related