1 / 21

A Draft Model for Discussion Jamie.Shiers@cern.ch Rome, 13-14 March 2008

Grid Operations in EGI / NGIs Operations task-force: Maite Barroso, Jamie Shiers, Nick Thackray (CH), Sven Hermann (DE), Rolf Rumler (FR), Per Öster (FI), Fotis Karayannis (GR), Tiziana Ferrari (IT) + input from John Gordon (GB) and many others. A Draft Model for Discussion

taryn
Download Presentation

A Draft Model for Discussion Jamie.Shiers@cern.ch Rome, 13-14 March 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Operations in EGI / NGIsOperations task-force: Maite Barroso, Jamie Shiers, Nick Thackray (CH), Sven Hermann (DE), Rolf Rumler (FR), Per Öster (FI), Fotis Karayannis (GR), Tiziana Ferrari (IT) + input from John Gordon (GB) and many others... A Draft Model for Discussion Jamie.Shiers@cern.ch Rome, 13-14 March 2008

  2. Disclaimer • What is presented here is clearly a draft for discussion • It is based upon a number of assumptions, which will be explained next • Feedback on this proposal, not only from the NGIs and experts, but also from the Application Communities that they support, is clearly required and is actively solicited www.eu-egi.org

  3. Key Assumptions • Timeline: • There are many applications using Grid resources in a production fashion today • Continuity must be provided to these communities – in other words the move to an “EGI world” must be non-disruptive and on time • Functionality: • Similarly, the key functionality provided in terms of operations must be maintained both during and after the transition phase www.eu-egi.org

  4. What is EGI Operations? • To answer this question, we need a much better idea of what “the EGI Grid” will be… Is it: • A large-scale, production Grid infrastructure – build on National Grids that interoperate seamlessly at many levels, offering reliable and predictable services to a wide range of applications, ranging from “mission critical” to prototyping and research? • A loosely coupled federation of NGIs with little or no cross-grid activity, heterogeneous and sometimes incompatible middleware stacks, no cross-grid accounting, no need for coordinated operations or management

  5. What is EGI Operations? • To answer this question, we need a much better idea of what “the EGI Grid” will be… Focus on: • A large-scale, production Grid infrastructure – build on National Grids that interoperate seamlessly at many levels, offering reliable and predictable services to a wide range of applications, ranging from “mission critical” to prototyping and research • A loosely coupled federation of NGIs with little or no cross-grid activity, heterogeneous and sometimes incompatible middleware stacks, no cross-grid accounting, no need for coordinated operations or management

  6. And the EGI Added Value? • In order to be both attractive and maintainable, Grids need to have the following attributes: • Low cost of entry; • Low cost of ownership. both in terms operations as well as application and user support • The basic principles of reliability and usabilitymust be designed in from the start – adding them later is not consistent with the goals of low cost of ownership.

  7. How is this achieved? • We should not forget one of the key features of the Grid – resilience to failure / scheduled downtime of individual components and / or sites • This significant advantage can only be realised through a sufficient degree of interoperability & interoperation • But gives individual NGIs much more freedom & flexibility! www.eu-egi.org

  8. EGI Operations Principles • Reliability of Grid services and SLAs; • Multi-level operation model; • EGI, NGI and ROC; • Multiple middleware stacks; • Planning, coordination and gathering of new requirements; • Cooperation; • Federation, interoperability and data aggregation. www.eu-egi.org

  9. Reliability / SLAs • A large-scale, production Grid infrastructure – built on National Grids that interoperate seamlessly at many levels, offering reliable and predictable services to a wide range of applications, ranging from “mission critical” to prototyping and research. • It is understood that it will be a long and continuous process to reach this goal, with additional NGIs and/or application communities joining at different times, with varying needs and different levels of “maturity” • In addition, sites of widely varying size, complexity and stage of maturity must clearly be taken into account • The EGI shall negotiate the minimal size and set of functions for an NGI to participate in a wider context, including the associated Service Level Agreements. • This includes the agreement and follow-up of the associated certification processes. In some cases, these requirements may be more stringent than those used within a given NGI. • Only a subset of sites participating within an NGI may satisfy the wider requirements at the EGI level. www.eu-egi.org

  10. Multi-level operation model • Highly centralized models – e.g. for monitoring – have been shown to be both intrusive and non-scalable • This suggests a move to a multi-level operations model • e.g. EGI/regional “cluster”/ NGI … • Whilst building on the positive experience of today’s production Grids, these concerns must nevertheless be taken into account as part of the EGI / NGI architecture. • This includes designing and deploying for low-cost-of entry and ownership, whilst maintaining sufficient flexibility to meet the requirements of the application clusters. • The EGI shall foster agreement on the definition of the key operations infrastructure, its establishment and delivery. • Such functions are preferably located at one or more NGIs • to offer both resilience and scalability www.eu-egi.org

  11. EGI, NGI and ROC • The participation of NGIs to the operation of the European grid infrastructure requires a set of services to be operated in a coherent way. • Currently, within EGEE, this is guaranteed by the ROCs, that either span over several countries (NGIs) or are serving one country only. • The NGIs must assure that the services are operated, either at the NGI level or through associating into ROC equivalents. • Regardless of the technical organization, all the NGIs need to be individually represented in an EGI operations board, where strategies and general problems are discussed. www.eu-egi.org

  12. Multiple middleware stacks • EGI operations will be responsible for guaranteeing support to all the adopted middleware stacks in collaboration with the operations staff from NGIs www.eu-egi.org

  13. Planning, coordination and gathering of new requirements • The EGI operations team is mainly responsible for operations planning and coordination of efforts by the various NGIs and other parties. • Also, EGI operations staff work towards a smooth evolution of tools and operational procedures according to [any] new requirements [ that are] gathered www.eu-egi.org

  14. Cooperation • EGI and NGI operations cooperate to solve problems of common interest such as: • guidelines for robust services, • security best practices, • middleware security issues, • steering of new developments, • site maintenance, • intervention procedures, • incident response, • escalation procedures • and so forth. • For this reason, EGI promotes and coordinates meetings, workshops, EGI and NGI joint working groups, etc. www.eu-egi.org

  15. Federation, interoperability and data aggregation • EGI must federate a variety of operational aspects – some of which are implemented by NGIs and/or component sites. • Consistency of security procedures, user support, incident tracking, monitoring and accounting must be ensured. • EGI ensures interoperability of operational tools/infrastructures for security, monitoring, support, accounting, etc. • In order to aggregate usage information for VOs, users and NGIs, operational data such as • monitoring information, • availability statistics and accounting records – collected by the NGIs need to be aggregated at the EGI level for SLA monitoring in full respect of the relevant national legal constraints. www.eu-egi.org

  16. Core Operations Tasks • Regional Operations coordination; • Coordination and support for roll out of mw updates; • Grid security and incident response coordination; • Interoperations (OSG, EU related projects); • Weekly operations meetings and operations workshops; • Support from mw resident service experts; • Middleware release support; • VO Membership Service; • Service Availability Monitoring; • User support coordination and the global Grid user support (GGUS); • Certification authority for various VOs; • Monitoring; • Pre-production coordination; • Triage of incoming problems and assignment of tickets to second line support units www.eu-egi.org

  17. Operations Resources • Resource estimation from draft document for EGI_DS deliverable 5.1 www.eu-egi.org

  18. EGI Transition Proposal • The EGEE Grid is currently used for large-scale production by a number of scientific VOs. • It will be unacceptable to them to have a disruptive transition to a different operational grid in EGI. • The two options are: • Define the EGI model very quickly to allow a smooth transition during EGEE III • Assume that Day One EGI Operations follow the EGEE model and any subsequent change is evolutionary • Given the experience in previous Grid projects, it is presumably too late for the first so we propose a working assumption of the second. Adapted from proposal by John Gordon, hence focus on EGEE. Requirement forsmooth and timely transition equally valid for other production Grids!

  19. How to achieve this? • The EGEE Operational model has three levels: EGEE-wide, Regional, and National • Don’t forget we already have national duties like CA management. • The migration to EGI will involve a migration of duties down towards NGIs • The migration from Central to Regional has started in EGEE III • Our proposal is that responsibility for the balance between Regional and National be left to the group of NGIs that make up each existing Region. • They have the joint duty to continue the existing EGEE service in their region. • They have the freedom to deliver this any way they choose • at one extreme they may decide to continue with the existing ROC and organise its funding internally. • at the other they may decide to devolve everything to each NGI • More likely is some combination of the two, with some migration from the former to the latter over time. • Leave this to the regions. They can then progress independently as suits regional and national needs and priorities. EGI defines and monitors the operational service definition to ensure a seamless grid for the users. Adapted from proposal by John Gordon, hence focus on EGEE. Requirement forsmooth and timely transition equally valid for other production Grids!

  20. Key Issues • Non-disruptive & timely transition from current Operations scenarios to EGI+NGIs • Ensuring “value-for-money”: • Applications Communities; • NGIs; • Funding agencies; must all be convinced that any money involved is not only well but also optimally spent! www.eu-egi.org

  21. Summary & Conclusions • Over many years we have built up production Grid infrastructures, much experience in their operations, as well as large and diverse application communities • Continuity is a key requirement for the future, as we move to a sustainable, long-term and multi-disciplinary e-infrastructure • Value for money is essential and at all levels www.eu-egi.org

More Related