1 / 12

User Board Overview

User Board Overview. Dan Tovey University Of Sheffield. Tier-1 Planning. Quarterly UB meeting in April (see minutes)  updated Tier-1 planning figures Shortfall of T1 resources in future years, (especially 2008) evident.

Download Presentation

User Board Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User Board Overview Dan Tovey University Of Sheffield

  2. Tier-1 Planning • Quarterly UB meeting in April (see minutes) updated Tier-1 planning figures • Shortfall of T1 resources in future years, (especially 2008) evident. • Will need to consider if expt. requirements can be met by Tier-2 resources  need to demonstrate clear need for Tier-1 functionality. • Requests which can be met by Tier-2 to be discussed with Tier-2 board. • ‘Other Experiments’ line removed from Tier-1 Schedule following detailed Tier-1 board plan  all users must make representation to UB to get access to resources

  3. Tier-1 Planning • Tier-1 utilisation figures frequently fall significantly short of both requests and allocations • sends the wrong message • Often not fault of experiments (e.g. middleware / operational problems) but experiments must work to produce more realistic estimates • Move to strict allocation of Disk resources (no over-allocation)  helps Tier-1 team. • Also synchronise with spending cycle  aim to ensure complete use of all new resources as soon as on-line

  4. DB Links • Stronger links with Deployment Board are seen as vital  standing invitation for DB representation at UB meetings.

  5. UB Concerns • How are experiments that globally are not moving to the Grid to be handled? • Site stability & User support • Balance of effort at Tier-1: much used for CMS (SRM) and later LCG SC, but what about smaller user communities? • What about ‘non-standard’ OS at Tier-2 sites  can render useless to some experiments. UB and Tier-2 board need to persuade to work towards standardisation.

  6. Questionnaire • User Board questionnaire updated for latest OsC process. • No big changes from February • Some new comments/concerns: • fragmented support structure • All stick and no carrot • held up by problems with establishing the VO • Not all experiments supported by large Tier-2s • Further details at: • http://www.gridpp.ac.uk/eb/workdoc/gridusebyexpts_0605.doc

  7. Pleasure: LHCb Shared data (LHCb RTTC production May/June) The data reported are preliminary (accuracy at 5%) 5% produced with plain DIRAC sites 95% produced with LCG sites

  8. Pleasure: ATLAS • Using the Grid for 100% of Simulation, Digitisation and Reconstruction. • 8.5M fully simulated ATLAS events produced • 20% of LCG jobs in UK • Overall throughput good, and improving …

  9. Pain: ATLAS • But … experience has been painful! • Significant throughput problems experienced in January/February • production goals descoped (15M events planned vs. 8.5M ev. actual). • Identified problems (highlights – see also questionnaire): • System appears to function best when only one person submitting jobs! • Lack of a distributed mechanism for prioritising jobs • Lack of inter-operability between LCG and other Grids: load balancing and data replication have to be done 'by hand'. Leads to production errors (e.g. same sample produced multiple times on different grids) • Too much human intervention required to set, adjust and enforce priorities • Could not saturate CPU resources on LCG easily (rate doubled with a simple change of scripts/person!): production time does not scale with cpu requirements • Job definition/submission very (expert) labour intensive • Absolute need for a SE/SRM solution for small files. • Urgent need for VOMS, integrated with other grid tools for resource allocation/access/monitoring/accounting

  10. H1 Tests

  11. H1 Tests 30 Jobs failed: 22 due to Grid problems (gridproxy/misc.)

  12. H1 Tests

More Related