1 / 20

GridPP Report

GridPP Report. Tony Doyle. Contents. Technical Design Reports Timescales Oversight Committee Summary Current concerns Actions (and how these were addressed) Feedback from the July 1 (OC7) meeting “Get Fit” Plan and Problem Solving Beyond GridPP2. June Reports.

fedora
Download Presentation

GridPP Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridPP Report Tony Doyle Collaboration Board Meeting

  2. Contents • Technical Design Reports • Timescales • Oversight Committee Summary • Current concerns • Actions (and how these were addressed) • Feedback from the July 1 (OC7) meeting • “Get Fit” Plan and Problem Solving • Beyond GridPP2.. Collaboration Board Meeting

  3. June Reports Computing Technical Design Reports: http://doc.cern.ch/archive/electronic/cern/ preprints/lhcc/public/ ALICE: lhcc-2005-018.pdf ATLAS: lhcc-2005-022.pdf CMS: lhcc-2005-023.pdf LHCb: lhcc-2005-019.pdf LCG: lhcc-2005-024.pdf LCG Baseline Services Group Report: http://cern.ch/LCG/peb/bs/BSReport-v1.0.pdf Contains all you (probably) need to know about LHC computing Collaboration Board Meeting

  4. Timescales • Service Challenges – UK deployment plans Collaboration Board Meeting

  5. Functionality Fits on a page. Concentrate on robustness and scale. Experiments have assigned priorities. Collaboration Board Meeting

  6. July Documents • PPARC Oversight Committee Papers • Seventh GridPP Oversight Committee (July 2005) • Executive Summary • Project Map • Link to Project MapDatabase (Excel) Version (v2) • Resource Report • LCG Report • EGEE Report • Deployment Report • Middleware/Security/Network Report • Applications Report • User Board Report • Tier-1/A Report • Tier-2 Report • Dissemination Report • UK Analysis • Metrics and Deployment • Middleware Planning • Experiment engagement questionnaire • See http://www.gridpp.ac.uk/docs/oversight/ Addressed various concerns of the OC Collaboration Board Meeting

  7. Exec2 Summary • GridPP2 has already met 21% of its original targets with 86% of the metrics within specification • “Get fit” plan described (requested by OC) • gLite 1 was released in April as planned but components have not yet been deployed or their robustness tested by the experiments • Service Challenge (SC) 2 addressing networking was a success at CERN and the Tier-1 • SC3 addressing file transfers for the experiments is about to commence • Long-term concern: hardware at the Tier-1 in 2007-08 • Short-term concerns: under-utilisation of resources and the deployment of Tier-2 resources Collaboration Board Meeting

  8. RAL joins labs worldwide in successful Service Challenge 2 • The GridPP team at Rutherford Appleton Laboratory (RAL) in Oxfordshire recently joined computing centres around the world in a networking challenge that saw RAL transfer 60 terabytes of data over a ten-day period. A home user with a 512 kilobit per second broadband connection would be waiting 30 years to complete a download of the same size. Collaboration Board Meeting

  9. gLite 1 Collaboration Board Meeting

  10. 100 green sites sitting on a grid • Thu 16 Jun 2005 • Last week the UK CIC-on-duty team celebrated the milestone of having 100 sites passing the Sites Functional Test. Thanks to all the sites who acted promptly to trouble tickets raised by the UK team during their shift. Collaboration Board Meeting

  11. Actions GridPP to submit the proposal for LCG phase 2 funding to the Committee prior to its submission to Science Committee (minute 4.9). • Done. 27 page report inc. input from OC at http://www.gridpp.ac.uk/docs/gridpp2/SC_GridPP2_LCG_1.0.docunfunded GridPP to clarify the situation with regard to ATLAS production run tests for the next physics workshop (minute 5.3). • See News Item http://www.gridpp.ac.uk/news/-1119651840.463358.wlg • (and slide) GridPP to provide an update on progress resolving problems caused by mismatches between local batch systems and the capabilities of the grid Resource broker (minute 6.3). • (See slide) GridPP to more fully document its alignment with each of the individual experiments (minute 15.2). • An experiment engagement questionnaire has been used (initial input in February and further [updated] input in June). See http://www.gridpp.ac.uk/eb/workdoc/gridusebyexpts_0605.doc Collaboration Board Meeting

  12. Actions GridPP to define its usage policy with respect to Tier-1 allocations(minute 15.4). • See http://www.gridpp.ac.uk/docs/oversight/GridPP-PMB-57-Tier1A_1.0.doc and documents within (“fair shares” using PPARC Form X information) GridPP to produce an updated risk register (minute 15.5). • Incorporated in the new Project Map at (with 7 “high” risks) http://www.gridpp.ac.uk/pmb/ProjectManagement/GridPP2_ProjectMap_2.htm GridPP to produce a “get-fit” plan for production metrics (minute 15.6). • See Metrics and Deployment document http://www.gridpp.ac.uk/docs/oversight/GridPP-PMB-64-Metrics.doc and its incorporation into the Project Map GridPP to define its metrics for job success (minute 15.7). • Adopted EGEE-wide definition at http://ccjra2.in2p3.fr/EGEE-JRA2/QAmeasurement/showstatsVO.php (See slides) GridPP to produce a statement of intent regarding its adoption of gLite (minute 15.8). • See Middleware Selection document http://www.gridpp.ac.uk/docs/oversight/GridPP-PMB-65-Middleware.doc Collaboration Board Meeting

  13. Metrics Action GridPP to define its metrics for job success (minute 15.7). • GridPP adopts the EGEE-wide definition at http://ccjra2.in2p3.fr/EGEE-JRA2/QAmeasurement/showstatsVO.php The (web-based) QA system accounts for Workload Management System registered job successes (that can then be categorised by Virtual Organisation or Resource Broker) Before introducing the figures it should be understood that there are caveats: • It only measures what the WMS “sees” • doesn't catch failure of WMS to register job in the first place (but this is a rare occurrence) • if a job half way through the script fails (for example tries but fails to copy a file) but the script completes successfully then WMS sees everything as OK. • If a VO (e.g. LHCb) deploys an agent then the WMS only registers the success of the initial (python) script: strategy enables higher overall LHCb performance (combined push-PULL model). (This currently leads to other problems in overall accounting should contention become an issue). • Overall: an end user may see either: • 1. a worse efficiency • failed job for other hidden e.g. data management problems • 2. a better efficiency by • choosing selected sites according to the Site Functional Test performance index; • deploying an agent to initiate real jobs at sites where the agent succeeded. • Physicists are “smart” and now “see” > 90% efficiency but the definition here is one defined within a given VO adopting their own methods (and from informed input from people currently submitting jobs to the system). Collaboration Board Meeting

  14. Overview Integrated over all VOs and RBs for first half of 2005 Successes/Day 13806 Success %64% • Key point: Improving from 42% to 78% during 2005 [For the UK RB (lcgrb01.gridpp.rl.ac.uk) Successes/Day 319 Success %69%] Collaboration Board Meeting

  15. OC Preliminary Feedback ALL earlier actions were considered as “done” from OC perspective GridPP to investigate alternative procurement strategies in order to improve Tier-1/A utilisation Actions: Tier-1/A Board I. evaluate alternative approaches User Board – GridPP13 MEETING • improve experiment estimates GridPP to associate more resources for technical documentation (for end users and system administrators) Actions: • Internal advertising: is anyone within GridPP willing/able to take up the role of “Documentation Officer”? • (There will be an incentive for this) • If this fails, to advertise the post using role description (being drafted) Deployment Board – GridPP13 MEETING Collaboration Board Meeting

  16. OC Preliminary Feedback • GridPP to develop a deployment model that works for smaller T2 centres in association with CERN • GridPP to provide a gap analysis for LCG (using the baseline services and the [classified] experiment components as described in the TDRs) • GridPP to address UB questionnaire outcomes (perceptions as well as actual shortcomings) • GridPP to document the high-level "value" GridPP is adding/delivering (using Project Map) • OC8 in February 2006 “important” (not “G8 on Wednesday”) Collaboration Board Meeting

  17. The “Get Fit” Plan • Set SMART (Specific Measurable Achievable Realistic Time-phased) Goals Collaboration Board Meeting

  18. “I take it plea bargaining is out of the question?” • See Dave’s talk Collaboration Board Meeting

  19. The “Get Fit” Plan • … not (yet) “The Final Solution” • We hope this drives the right behaviour • Plea bargaining is (probably) OK.. Collaboration Board Meeting

  20. Summary • LHC Technical Design Reports define an endpoint • Responsive-mode deployment/development • Timescales for LHC are soon – first cosmics data taken • Oversight Committee – improve “efficiency” • Some particular issues discussed at GridPP13: • Tier-1/A utilisation • Documentation Officer • “Get Fit” plan endorsed by OC • requires support from everyone to improve metrics • There are 14 deployment problems (some interdependency) that need to be solved • Many areas are now quantifiable (significant progress here) • Service Challenges will help focus attention • Improved communication and documentation (become a scientologist?!) • Aim: measured end-to-end performance improvements during 2005 • Beyond GridPP2: input required over the summer to PPARC LHC exploitation planning review Collaboration Board Meeting

More Related