1 / 22

GridPP4 Project Management

GridPP4 Project Management. Pete Gronbech April 2012 GridPP28 Manchester. Since the last meeting. LHC is still building up to full running again after the Christmas technical stop. Tier-1 running well, and also busy with infrastructure upgrades

dulcea
Download Presentation

GridPP4 Project Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridPP4 Project Management Pete Gronbech April 2012 GridPP28 Manchester

  2. Since the last meeting • LHC is still building up to full running again after the Christmas technical stop. • Tier-1 running well, and also busy with infrastructure upgrades • Tier-2s busy installing new hardware and new networking equipment. • GridPP4 1st tranche hardware money spent • Digital Research Infrastructure Grant equipment money spent.

  3. Accelerator Update • This year the collision energy is 8 TeV (=beam energy 4 TeV), slightly higher from last years 3.5 TeV. • Started first beams four weeks ago, mostly for testing of 'safety systems'. • First physics at 2 x 4 TeV was one week ago starting with 2 x 3 bunches. At this very moment collisions are taking place with 2 x 1092 bunches, already giving more collisions than last year with 2 x 1380 bunches. • In a few days aim to be at the nominal number of bunches for this year, 2 x 1380, (but higher luminosities = more collisions than last year because of the higher energy and smaller beta* at the interaction point). • On Friday there will be 3 days of machine development followed by the first Technical Stop of the year. Back to production again for data taking in the beginning of May for an 8 week period.

  4. Tier-1 • CPU hardware delivered and commissioned in time to meet WLCG pledge • Both tranches of disk has been delivered and deployed • Upgrade to CASTOR 2.1.11-8 completed • Operations very stable following many upgrades in February.

  5. Tier-2s • All grants for 1st tranche of hardware issued and should have been spent • sites should have hardware to meet 2012 pledge. • All sites have been trying to spend the money this Financial Year. • Most sites made significant upgrades and coupled with the DRI grants have been able to enhance the infrastructure and networking both within the clusters and across campus to the JANET connections. • Future MoU’s showed shortfalls in Storage capacity more than CPU, which meant an emphasis on disk purchases. • Prices were inflated and deliveries extended due to the flood in Thailand causing a worldwide shortage. • However prices for networking equipment came down substantially in January which did compensate in part at some sites.

  6. DRI and GridPP4 Grants • Instructions for JeS issued 9/11/11 • GridPP4 grants issued very quickly some in December 2011. • DRI Bids solicited 8/11/11 • DRI Project team reviewed responses very quickly during 18thNovember to 8th December and revised to meet the £3M target once this was known. • JeS instructions were sent out on 9th December. • Grants issued early January 2012. • All equipment on sites by end of March 2012.

  7. UKI CPU contribution (LHC) Since April 2011 Country stats CPU March 2012 – GStat2.0

  8. UKI VOs Non LHC VOs are getting squeezed Since March 2011 Previous year

  9. VO support across sites

  10. UKI Tier-1 & Tier-2 contributions Since March 2011 Previous year

  11. Storage • From GStat2.0 August 2010 March 2011 April 2012 Truth somewhere in between, Q112 report will help clarify the situation. Quarterly Reported Resources

  12. GridPP4 ProjectMap Q411

  13. Q411 • Tier 1 staff, Service availability for Atlas due to castor and network issues. • Atlas data availability (92%) • CMS Red Metrics are all due to Bristol • Data group no of blog posts low, and NFSv4 study late. • Security delay in running SSC. • Execution, no of vacant posts, and review of service to expts. • Outreach, no of news items, Press releases, KE meetings low. • Q112 reports due in at the end of this month or earlier preferably!!!

  14. Non LHC Storage Stats so far

  15. Project map - statistics Metrics Milestones

  16. Manpower • GridPP was running at reduced manpower for the later part of 2011, with ~2 FTE short at the T2s and ~4 FTE at RAL. • Both T1 and 2’s have now filled the posts so there should be the capacity to do development work that has been on hold due to the shortages.

  17. Risk register • Highlighted risks • Recruitment and retention – Still a concern but currently more stable. • Resilience to Storage – Problems with batches of Storage • CASTOR is critical and although more stable now, has serious consequences when it fails. • Insufficient funding for T2 h/w: Increased equipment costs (esp Disk), and increases Experiment Resource requests. Mitigated by DRI investment to a certain extent. • Contention for resources anticipated to be more of an issue as LHC use increases and squeezes the minor VO’s.

  18. Timeline 2006 2007 2008 GridPP3 GridPP2 GridPP2+ End of GridPP2(31 August 2007) Start of GridPP3(1 April 2008) GridPP celebrated it’s 10th Birthday in December 2011 2009 2010 2011 GridPP4 GridPP3 Start of GridPP4(1 April 2011) 2012 2013 2014 GridPP4

  19. From the start of GridPP3to the present time • At the start of GridPP4~27000 CPUs and ~7PB disk reported. • Now ~31000 CPUs and ~27PB (If GSTAT is to be believed) • The UK reported approx 370GSI2K hours last year, just ahead of Germany and France, and is still the largest in the EGI grid.

  20. Reporting • The main LHC experiments will continue to report on the Tier 1 and the Tier2 performance as both Analysis and Production sites • The tier 2 sites reporting continues as before with reports going via the Production Manager. • Slight modifications to enable better tracking of Non LHC VO storage use. • Storage, Security, NGI and Dissemination have separate reports.

  21. Summary • The first accounting period completed and the 1st tranche of h/w funding was allocated. • Last Autumn and this Spring particularly busy with GridPP h/w and DRI grants. Tendering, quotes, purchasing and now installations and upgrades. • Should plan to be stable in time for the next data taking in May, although the load seen on Tier 2s is more aligned with Physics conferences than data taking in some cases. • A reminder that we are in continuous accounting period which started at the end of the last one. i.e. from 1st November through to a date to be determined, dependant on STFC capital spend profiling.

More Related