1 / 19

Tier 1 status, a summary based upon a internal review

Tier 1 status, a summary based upon a internal review. Volker Gülzow DESY. Information sources. Input : Review of Tier 1 readiness June 8th 2006 @ Cern Reviewers: John Gordon (RAL), Volker Gülzow (DESY) Chair, Alessandro de Salvo (INFN Rome), Jeff Templon (NIKHEF), Frank Würthwein (UCSD)

olaf
Download Presentation

Tier 1 status, a summary based upon a internal review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tier 1 status, a summary based upon a internal review Volker Gülzow DESY LHCC comprehensive review 2006

  2. Information sources Input: • Review of Tier 1 readiness June 8th 2006 @ Cern • Reviewers: John Gordon (RAL), Volker Gülzow (DESY) Chair, Alessandro de Salvo (INFN Rome), Jeff Templon (NIKHEF), Frank Würthwein (UCSD) • From a questionnaire to Tier 1‘s, from questions to the Experiments (Tier 1‘s, Middleware, Interoperability) • From documents from MB, CRRB • CTDR’s + supplement • Tier 1 milestone plans • LCG-wiki’s • From other sources LHCC comprehensive review 2006

  3. Review Process I Mandate: (Discussed in MB) “… review pays specific attention to the following topics: • state of readiness of CERN and the Tier-1 centres, including operational procedures and expertise, 24 X 7 support, resource planning to provide the required capacity and performance, site test and validation programme; • the essential components and services missing in SC4 and the plans to make these available in time for the initial LHC service; • the EGEE-middleware deployment and maintenance process, including the relationship between the development and deployment teams, and the steps being taken to reduce the time taken to deploy a new release; • the plans for testing the functionality, reliability and performance of the overall service; • interoperability between the LCG sites in EGEE, OSG and NDGF;” http://www.cern.ch/lcg/documents/mb/service_review_mandate_jun06.doc LHCC comprehensive review 2006

  4. LHCC comprehensive review 2006

  5. Tier1/2 Summary Table • 40 Tier2 centres have their data included in above table. • 9 more centres plan to join as soon as possible. Source: Chris Eck, CRRB April 2006 LHCC comprehensive review 2006

  6. Overall Comments to Tier 1‘s The Tier 1 requirements are currently changing due to accelerator time schedule, new resource planning from the experiments will show up in October A lot of diversity among the Tier-1’s i.e. • Background • Technology • Funding • Staffing • # of experiments, size LHCC comprehensive review 2006

  7. Overall Comments to Tier 1‘s (June06) • Not all the Tier-1’s have reached the level of readiness, which is required for LHC start-up. • Key-factors are organisational gaps in implementing off-hour service, funding problems, communication with experiments (two sided problem) • There are severe risks with the scalability of the resources. • The manpower situation on the Tier 1‘s was not always transparent during the review LHCC comprehensive review 2006

  8. Source: Les Robertson LHCC comprehensive review 2006

  9. Overall Comments to Tier 1‘s • The overall monitoring of the Tier 0/1/2 complex is of very great importance. • The Tier 2 associations are not completely clear. This needs immediate clarification • The support concept for Tier 2/Tier 3 centres by Tier 1’s is not well determined. This is partly because of unclear requirements from the experiments. • At this stage, one should no longer make distinction between production and SC4 infrastructure (experiments complain) LHCC comprehensive review 2006

  10. milestone planshttps://twiki.cern.ch/twiki/pub/LCG/MilestonesPlans LHCC comprehensive review 2006

  11. „Communication“ • Clear (and redundant) contact persons (e.g. liaison officers) have to be nominated on both sides. • Clear/precise information from the experiments, well structured. • Web based monitoring pages for operational issues should be made available by the experiments. LHCC comprehensive review 2006

  12. „Communication“ • Operations meetings OPS/SCM/RSM are important -> mandate etc. reviewed by MB • GGUS is a well accepted tool and should be used as the main tracking tool. Further improvements are needed (e.g. GUI, amount of mails, support for full set of problem categories, “when can a case be declared closed?”) LHCC comprehensive review 2006

  13. „24x7“ • A full 24x7 in the sense of live monitoring and alarming and for a certain class of problems „immediate“ reaction is required. A „on call“-Service still has to be setup at many sites. It‘s required to • have the right tools, which are often not sufficient. For the setup of tools, a initiative (eg via HEPIX) should be started to sharpen the tool set, which is helpful for Tier 2 and Tier 3‘s as well. • Have adequate staff available -> management. In the focus of MB. LHCC comprehensive review 2006

  14. „Management issues“ • The funding situation is not clear at every centre. A revised ramp up planning may help. This has to be followed carefully. • Clear, up to date and realistic requirements from the Exp. would help the Tier 1‘s to acquire on time. • At some centres critical work is carried out by temporary staff, depending on the country this can cause severe problems. LHCC comprehensive review 2006

  15. „Middleware“ • The introduction of gLite 3 was a bit “bumpy”, people were somewhat confused. • Many emotions prior to real experience were expressed, which was not helpful. • There were lots of complaints but only very little error reporting. • The “post mortem” analysis of the process was very much appreciated. LHCC comprehensive review 2006

  16. „Middleware“ • Sites were not able to meet the tight time constraints. • Reasons were (and are) • lack of manpower, • lack of understanding, • Site localization • coordination with needs of non-LHC experiments. LHCC comprehensive review 2006

  17. „Middleware“ • Stable production environments have to be the no. 1 goal today. Worry about effort diverted on side projects. • The Software was not mature enough, we need to find ways to guarantee readiness of software when released. • The representation of operational issues in the TCG is not adequate, the Tier 1’s should be better represented, their input has to be taken. • The TCG should include operational issues in the priority list and allow sites to influence the ranking. • Full VOMS needed! • The error reporting from the users has to improve. • The middleware urgently needs proper operational interfaces: • Logging • Diagnostics • Service operation interfaces LHCC comprehensive review 2006

  18. „Interoperability“ • The experiments should make the importance of the problem clear. • Interoperability of the grids needs more attention and manpower as there is today if required • Can we expect uniform testing (SFT’s), monitoring, accounting, and metrics for ALL WLCG sites? LHCC comprehensive review 2006

  19. Conclusion: • Excellent work was done at the Tier 1’s on many tasks • The cultural gap has to be bridged • The 24x7 case is almost open • Monitoring of sites strongly recommended • the funding and staffing situation needs careful attention • Middleware robustness and operational hooks needed • More binding acting in certain areas is required (on all Tier levels) • The new ramp up does not allow to lean back LHCC comprehensive review 2006

More Related