160 likes | 164 Views
SA1 Execution Plan Status and Issues. Ian Bird EGEE Operations Manager. EGEE is proposed as a project funded by the European Union under contract IST-2003-508833. SA1 Objectives. Core Infrastructure services: Operate essential grid services Grid monitoring and control:
E N D
SA1 Execution PlanStatus and Issues Ian Bird EGEE Operations Manager EGEE is proposed as a project funded by the European Union under contract IST-2003-508833 All Activity Meeting – 14 January 2004 - 1
SA1 Objectives • Core Infrastructure services: • Operate essential grid services • Grid monitoring and control: • Proactively monitor the operational state and performance, • Initiate corrective action • Middleware deployment and resource induction: • Validate and deploy middleware releases • Set up operational procedures for new resources • Resource provider and user support: • Coordinate the resolution of problems from both Resource Centres and users • Filter and aggregate problems, providing or obtaining solutions • Grid management: • Coordinate Regional Operations Centres (ROC) and Core Infrastructure Centres (CIC) • Manage the relationships with resource providers via service-level agreements. • International collaboration: • Drive collaboration with peer organisations in the U.S. and in Asia-Pacific • Ensure interoperability of grid infrastructures and services for cross-domain VO’s • Participate in liaison and standards bodies in wider grid community All Activity Meeting – 14 January 2004 - 2
Operations Infrastructure All Activity Meeting – 14 January 2004 - 3
SA1 Management All Activity Meeting – 14 January 2004 - 4
CERN (OMC, CIC) UK+Ireland (CIC,ROC) France (CIC, ROC) Italy (CIC, ROC) Germany+Switzerland (ROC) Northern Europe (ROC) South West Europe (ROC) South East Europe (ROC) Central Europe (ROC) Russia (CIC – M12, ROC) 48 Partners involved in SA1 ROC’s in several regions are distributed across many sites SA1 Partners Organisational Issues • ROC’s must take responsibility: • Chapter in execution plan • Organisation within region • Reporting to overall SA1 • CIC’s vs ROC’s • Need to clarify overlap/separation – dedicated discussion All Activity Meeting – 14 January 2004 - 5
Milestones & Deliverables All Activity Meeting – 14 January 2004 - 6
Work breakdown – tasks, products • Basic level: • OMC, CIC, ROC, Security • Next levels will depend on refinement of roles and responsibilities All Activity Meeting – 14 January 2004 - 7
Training requirements • New people: • LCG, EDG, Globus – tutorials? • New sites: • LCG procedures, middleware, policies, operations • Longer term: • Grid services, OGSA and associated technologies etc All Activity Meeting – 14 January 2004 - 8
Risk Analysis • People: • Need to identify unfunded effort now • Hiring process will not have people in place by April 1 • Short term contracts – • tends to attract inexperienced people • Continuity • Middleware: • Response to operational needs • Packaging and installation not flexible enough • Scope-creep of middleware – bug fixes vs functional changes • Infrastructures: • National infrastructures cannot integrate in EGEE • UK e-Science, NorduGrid • Must be addressed in execution plans • Use LCG Deployment and Security Risk Analyses • http://lcg.web.cern.ch/LCG/PEB/risk/ , • http://proj-lcg-security.web.cern.ch/proj-lcg-security/RiskAnalysis/risk.html • Quite comprehensive • Many issues in common All Activity Meeting – 14 January 2004 - 9
Issues related to other activities • JRA1: • Understand requirements setting process • Agree on split of responsibilities for integration/certification/testing • Understand support issues for LCG-2 based middleware • Discussions in progress • SA2: • No real issues for Execution plan • Operationally will work together – GOC vs NOC, understanding of SLA’s • Security: • Operational security – relationship to JRA3 • Catch-all CA – whose responsibility? (JRA3?) • JRA2: • Project and task breakdown and reporting structures focussed on products and not services All Activity Meeting – 14 January 2004 - 10
Changes requested to TA • No significant changes foreseen • Execution plan will address • Refinement of roles (CIC, ROC, OMC etc) • Interactions between them • Resource and service delivery All Activity Meeting – 14 January 2004 - 11
General Issues • Peering with US Grid infrastructures • Needs interaction from project start • Federation chapters should detail how national infrastructures integrate/migrate to EGEE • Hiring • (No) People to work on execution plan • Focus on getting LCG-2 running – EGEE will have operational infrastructure at project start • Need to agree that ROC’s (or defined people in each region) have organisational responsibility for the region • Cannot deal with 48 partners directly • All aspects: • Negotiate SLAs with Resource centres and with EGEE • Operational responsibility • Organisational responsibility – planning and reporting (effort reports, etc) All Activity Meeting – 14 January 2004 - 12
Execution plan • Execution plan is not well advanced: • Many people focussed on getting LCG-2 going • Outline exists – sections and contents quite well defined • Template agreed for chapters for each region • Input and ideas for some sections • Much work still to do: • Refine roles and responsibilities for ROC, CIC • How to approach SLA’s • Do we need a “GDB”(OAG)? – ROC and CIC managers All Activity Meeting – 14 January 2004 - 13
Overview – introduction Organisation Clarify roles, responsibilities, interactions Processes Security group Policy group – relation to NA5 Federation organisation Chapter per region -> template Service set up and deployment What services, functions Relation to US, Asia projects Transition process New centres – process Training Policies Operations and tools Certification, Middleware support CA and VO management Operational Security Metrics – measurement, validation of services SLA’s Management PBS WBS Staffing and resources plan From regions Training plan Risk assessment Deliverables and responsibilities Execution Plan - TOC All Activity Meeting – 14 January 2004 - 14
Resources: Resource centres Resources - At month 1, 6, 15? When RC joins EGEE FTE for support (need > min) Name system manager, security manager Policy for: Which VO supported … System support and Availability (on-call hours etc) … Expertise available Ability to run general grid services (IS, RLS, etc) ROC organization: Organization – teams, etc People – and responsibilities – must name unfunded people (and funded if known) Support – how it is organized and managed, especially for distributed ROC’s Operational support – how region is organized and managed Execution plan – ROC template All Activity Meeting – 14 January 2004 - 15
Steps to project start-up • Hiring • Refine roles and interactions between CIC, ROC, NOC and other operational organisations • Clarify interactions with JRA1 • Clarify security issues • Clarify relationship to international grid projects for VO’s that are larger than EGEE (LCG, other HEP,…) • Understand national infrastructure integration issues • What requirements will this place on operations, mw, processes, security, etc. • Set up basic policy frameworks: • How VOs access resources, limitations, rules, etc – sufficient that infrastructure can be exploited from project start • Identify sites willing to run development service (based on EDG tb sites?) • Set up OMC team • Organise membership of QA, Security, Requirements, etc. groups • Define transition process from LCG operations to EGEE operations All Activity Meeting – 14 January 2004 - 16