1 / 18

Deployment Summary

Deployment Summary. GridPP11. 15th September 2004. Jeremy Coles J.Coles@rl.ac.uk. Overview. Where are we now?. What is deployment all about anyway?. Who is doing it?. Planning and metrics. Issue 1: Communications. Issue 2: Fabric management. Where are we now?.

adsila
Download Presentation

Deployment Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deployment Summary GridPP11 15th September 2004 Jeremy Coles J.Coles@rl.ac.uk

  2. Overview • Where are we now? • What is deployment all about anyway? • Who is doing it? • Planning and metrics • Issue 1: Communications • Issue 2: Fabric management

  3. Where are we now? Who is flying the plane? Are the developers bailing out? We have paying passengers – do we know where we are going? … oh, and can we keep it working, navigate, land and offer a real service?

  4. Who is flying the plane? … introducing the err…. DTEAM + site system administrators + …

  5. Deployment Board • Replaces GridPP1 Technical Board • Mandate • Determine and oversee execution of tech plan • Report to PMB • Ensure GridPP-wide issues discussed/solved • Provide forum for tech info exchange • Oversee deployment and use of GridPP h/w • Tier1 – Tier2 coordination/liaison • Ensure integration of external tech developments

  6. DB members • Production Manager • Tier1/A Manager • 4 T2 Technical Coordinators • HEP SYSMAN chair • CERN T0/Deployment • Applications Area Coordinator • Middleware Area Coordinator • Technical experts (invited by DB chair) • UK NGS • EGEE/Ireland • DB chair ~18 people

  7. DB relations LCG/EGEE/CERNT0 PMB UK NGS DB UB GridPP DTEAM M/S/N APPS T1AB T2B

  8. What must deployment address? • Core infrastructure services • Resource brokers • Informational services • Data management services • Virtual Organisation management • Replica Location Service • BDII • Grid monitoring • Monitor operational performance • Monitor operational state • Problem resolution + operations support tools • Middleware deployment • Required local validation of common middleware • Feedback issues to LCG/EGEE • Continuous upgrade • Mechanism(s) • Resource induction • New site joining procedures • Provide support for middleware installation • Advise on operational procedures • User support • Provide a support service for users (filter and distribute) • Monitor effectiveness of support • Provide training and induction courses • Documentation (and quality) • Resource support • Respond to and coordinate resolution of fabric problems • Engage wider community to resolve new problems

  9. Areas (2) • Communication • Representation within experiments • Procedures and mechanisms within community • Applications • Ensuring local VOs receive support and guidance • Participate in testing and validation exercises • Network services • Network performance monitoring • Demand (aggregate traffic) vs supply (performance) • Resource allocation/reservation • Components • Workload management • Data management • Storage management • Information services • Inter-grid collaboration • Participate in discussions to work closer with other Grids • Ensure interoperability of infrastructure and services • Service-level agreements • Monitor Tier-2 compliance with MoUs • Access policies • Security • Certification authority • Implement and monitor policy • Incident response • Policy management • Operations planning • Understand usage patterns • Capacity planning • Monitoring problems log

  10. Navigation • No clear plans within LCG for overall deployment – improving • Some confusion about EGEE connections • GridPP2 project plan is not complete and we have dependencies • Currently developing in a “best guess” environment • It is not always clear exactly where decisions get made • What does the planning environment look like so far? • There are already pressing issues to be addressed: • What is the UK stance regarding fabric management tools (LCFGng is being phased out) • How are we going to measure deployment and operations success – metrics • What is the communications plan given that LCG-ROLLOUT has become a gossip column – support, news, problem reporting

  11. Are we communicating…? Areas Grid news – no well defined broadcast route – e.g. middleware updates Site News – operational incidents on Grid, site updates Support – user, deployment Problems – As found by daily tests or discovered by users Issues • LCG-ROLLOUT is overloaded! • Lack of visibility about what is happening at sites – upgrade, site problem • Problems may generate many queries • No tracking for support or logging of queries • … and therefore poor ability to search for other experiences • Options • Set up a new news area based on RSS (new entries are placed in categories that people can register to receive updates from) – just use of GOC pages? • Establish support desk for GridPP – but there are concerns about expertise • DTEAM area & better documentation

  12. An example [LCG-Problems] mail list has 2 members!

  13. Are we going up or down? Metrics Work in progress!

  14. Metrics (2) Work in progress!

  15. Maintenance • Migration to SL3 is starting. • Next public release of LCG supports SL3 WNs, certification complete. • Service nodes remain at RH7.3 for now. • LCFGng is not an option SL3 nodes. • LCG supports one install method for SL3. • Manual install techique (Actually not very manual) • Can be built into any framework already in use • Kickstart and scripts, Cfengine, NPACI Rocks, Quattor, stateless linux or even LCFG • This release expected this month.

  16. Quattor • Community effort for quattor installaion of LCG2 nearing completion. 98% done. • Quattor has similar architecture and concept to LCFG. LCFG effort not wasted. • Advantages • CERN and the RAL Tier1/A will use quattor for LCG. - Support and self help for others available . • LCG M/W will not be tied to or released with quattor. • Disadvantages • A lot to learn before any pay back.

  17. Steve’s 5 questions Should the UK use or at least favour one fabric management solution? Yes – probably Quattor Once SL3 port is available is RH 7.3 still wanted anywhere? Maybe on very few shared sites Is an OS other than SL3 needed for GridPP sites and users? Need to ask experiments – perhaps if CERN upgrades soon Does any site have a conflict with proposed deployment of LCG into SL3? No – most want to move off of RH 7.3 Is there a site to work with RAL learning Quattor? Manchester?

  18. Summary • LCG2 deployed. 1500+ CPUs • Smooth running. Easy and seamless deployments. Service quality • The DTEAM! • The plans (& metrics) are being developed – many dependencies • LCG-ROLLOUT needs to migrate to news & helpdesk services • LCFG will be phased out. Quattor on SLC3 is coming.

More Related