1 / 20

WP2 Infrastructure and Service Management

This review highlights the goals, achievements, challenges, metrics, and conclusions of the ETICS project's infrastructure and service management at CERN. The focus is on local autonomy, supporting heterogeneity, and ensuring reproducibility of operations.

mwheeler
Download Presentation

WP2 Infrastructure and Service Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP2Infrastructure and Service Management Miron Livny, Peter Couvares (University of Wisconsin) Marian ZUREK (CERN) ETICS Final Review CERN, Geneva - 15 February 2008

  2. Goals Achievements Challenges Metrics and statistics Conclusions Contents ETICS Final Review - WP2 - CERN, 15 February 2008

  3. Goals • Identify, setup and maintain the computing resources providing enough capacity and diversity to test the software produced by pilot projects. Update and expand the resources as identified by the requirements • Setup and maintain the project software databases and repositories adding new libraries or updating the existing items • Produce and implement SLAs and support policies and documentation for the ETICS service describing what users can expect in terms of of availability, accessibility and user support ETICS Final Review - WP2 - CERN, 15 February 2008

  4. Goals • Autonomy • The focus is on local autonomy of the ETICS sites: • Each site maintains control of local policies and service administration • Service deployment reflects local priorities, constraints, and administrative processes of local resource owners, and it is more accountable to their needs. • Fault-tolerance mechanisms are provided: • Technical problems (support can be given by all sites personnel) • Backups, service redundancy • Social issues (overloaded systems administrators, insufficient resources, etc. • Streamlining operations! ETICS Final Review - WP2 - CERN, 15 February 2008

  5. Goals • Heterogeneity must be supported • Different projects need different platforms, architectures, operating systems, compilers, development tools. Technologies evolve in time, but older versions don’t disappear • More than 50 different combinationsof operating system distributions and CPU architectures are currently available • However, operations must be reproducible • Use the same service foundations (a common configuration database, the same versions of tools and services) across all sites • Critical deployment procedures are established, tested and followed by all sites (as detailed in deliverable D2.4) • Leverage mature, well known software tools and technologies (NMI/Condor, MySQL, Tomcat, axis, python, java) ETICS Final Review - WP2 - CERN, 15 February 2008

  6. ETICS Sites Usage CERN focuses on supporting EGEE/gLite, DILIGENT, EGEE/Applications and OMII-Europe UW-Madison focuses on supporting US software projects like TeraGrid, Open Science Grid, VDT and Condor INFN focuses on supporting EGEE/gLite andOMII-Europe ETICS 2nd EU Review - CERN - 15 February 2008 ETICS Final Review - WP2 - CERN, 15 February 2008 6

  7. Achievements - Infrastructure • A robust, scalable and distributed infrastructure is in place and operating 24/7. • The number of supported platforms has reached 50+ • The number of the CPUs reached 200+ and growing • The infrastructure setup has been designed to take into local site policies (security, access, firewalls, etc.) • Redundant backups, service recovery scenarios well addressed ETICS Final Review - WP2 - CERN, 15 February 2008

  8. Achievements - Repository • WP2 has deployed and maintained the ETICS Repository • Initially the repository has been based on a structure developed on top of AFS with a web-server interface (very efficient read-only access) • A fully-featured Repository service including web interface, client-APIs, support for packages, reports, metrics, etc is now in place and operating on a dedicated server - please see the separate WP5 presentation and demo ETICS Final Review - WP2 - CERN, 15 February 2008

  9. Achievements - IPv6 • ETICS has established a collaboration (MoU under preparation) with EGEE/SA2 (Network Support) and EUChinaGrid on IPv6 testing • A dedicated infrastructure composed of ETICS core services running at CERN and worker nodes operating in Rome (GARR) and Paris (UREC/IN2P3) has been deployed • Initial prototype fully integrated in the ETICS production environment • Operational tests of the BDII service client-server interaction fully deployed and showcased • Development and integration of an IPv6 plug-in for code compliance checking in place ETICS Final Review - WP2 - CERN, 15 February 2008

  10. Achievements - IPv6 ETICS Final Review - WP2 - CERN, 15 February 2008

  11. Achievements - Diligent testing • Deployment of a dedicated Diligent testing infrastructure in Budapest (needed to address security/confidentiality issues and some deployment issues with the middleware needed by the Diligent services) • Connecting remote resources initially to the CERN pre-production and later to the production infrastructure • Dedicated node configurations have been deployed, so that ONLY the Diligent tests are authorized to run on the Budapest nodes (more on this in the WP4 presentation) ETICS Final Review - WP2 - CERN, 15 February 2008

  12. Achievements - multi-node tests • The infrastructure for the multi-node tests has been deployed • Condor functionality has been extended following ETICS requirements (for example to address some node synchronization problems) • The system is now deployed and initially opened to a selected number of advanced users (gLite, Diligent) to validate some real world cases (more information in the WP4 presentation and the afternoon demo) ETICS Final Review - WP2 - CERN, 15 February 2008

  13. Achievements - Virtualization • Needed for reliable testing of privileged operations (e.g. gLite test jobs require privileged access in order to properly execute) • Tests which require administrator access and modify a server cannot be securely executed on resources shared between multiple projects and users. • Isolating such tests inside virtual machines on shared resources enables privileged testing • Reproducibility of difficult-to-deploy environments • Needed to increase system scalability and maintainability • Initial deployment of several servers running the SLC4 as host and up to 5 virtual images on each node • Use of the VMWare as virtualization technology ETICS Final Review - WP2 - CERN, 15 February 2008

  14. Achievements - Virtualization • For the time being deployment as a prototype within the CERN IT infrastructure only, future extensions to the other sites is being considered in the context of ETICS 2 • Example of usage • EGEE/gLite deployment test using root environment • Quick detection of missing or conflicting software packages for the gLite services • Integration within the EGEE SA3 release process (see the test results for org.glite.testsuites in the afternoon demo) ETICS Final Review - WP2 - CERN, 15 February 2008

  15. Achievements - job migration • One of the major accomplishments of ETICS has been to produce a scalable infrastructure. • Using features provided by the execution engine, it is possible to migrate build and test jobs from one site to others based on matching user and resource requirements. • Users have the possibility of submitting build and test requests from a single ETICS portal (or redundant replicas). The jobs are transparently executed where resources are present and available • This approach maintains the independence of the sites, while providing the scalability required by complex distributed projects. ETICS Final Review - WP2 - CERN, 15 February 2008

  16. Challenges Cross-oceanic relation - Time Zone issues Europe is about to finish the day when UoW-Madison, Wisconsin colleagues are coming to work (-7 hrs difference). This has been addressed by regular phone-conferences and ad-hoc calls in case of burning operational issues Multi-node testing deployment This is cutting edge technology, which has never been tried before. Many unknown issues arose on the way. We have managed to address them and have provided a practical working solutions to our users. Job Migration Need to take into account each site’s local technologies, processes, and policies, e.g., each site needs to be running latest version of Metronome, have different firewall rules, etc ETICS Final Review - WP2 - CERN, 15 February 2008

  17. Metrics and statisticsDeliverables • D2.1 - Service Level Agreement document (PM03) • D2.2 - Infrastructure installation and usage documentation (PM06) • D2.3 - Status of certification, integration and validation test bed setup (prototype) (PM12) • D2.4 - Status of certification, integration and validation test bed setup (PM22) • D2.5 - Final service usage report (PM24) • Milestones • M2.1 - Software and tools repository available (PM03) ETICS 2nd EU Review - CERN - 15 February 2008 ETICS Final Review - WP2 - CERN, 15 February 2008 17

  18. 3 production sites: CERN, INFN, UW-Madison >50 distinct platforms (arch/OS combinations) supported >200 dedicated CPUs Users 220 registered users 18 registered projects (plus a number of support projects) Production Build and Test Work >53,000 build/test operations submitted by production users Another >15,000 of jobs on the test/dev/certification servers >325,000 individual jobs executed as part of the build/test operations. Metrics and statisticsFacilities ETICS Final Review - WP2 - CERN, 15 February 2008

  19. ETICS has delivered a production implementation of a reliable infrastructure for continuous build and testing Productive collaboration between sites has been established, which has allowed each to enforce reproducibility of the environment, while maintaining local independence of operations The infrastructure has been profitably used by the major ETICS communities since the beginning We have successfully delivered advanced features providing multi-node testing (co-scheduled resources for testing client/server operations), cross-site submission, execution in privileged environments, protection of confidential code Conclusions ETICS Final Review - WP2 - CERN, 15 February 2008

  20. Thanks http://www.eu-etics.org ETICS Final Review - WP2 - CERN, 15 February 2008

More Related