slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Status of the WLCG Tier-2 Centres PowerPoint Presentation
Download Presentation
Status of the WLCG Tier-2 Centres

Loading in 2 Seconds...

play fullscreen
1 / 17

Status of the WLCG Tier-2 Centres - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Status of the WLCG Tier-2 Centres. M.C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN , October 27 th 2008. Sources of Information. Discussions with experiment representatives in July

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Status of the WLCG Tier-2 Centres' - austin-cummings


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Status of the WLCG Tier-2 Centres

M.C. VetterliSimon Fraser University and TRIUMF

WLCG Overview Board,CERN, October 27th 2008

slide2

Sources of Information

  • Discussions with experiment representatives in July
  • APEL monitoring portalhttp://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php
  • WLCG reliability reportshttp://lcg.web.cern.ch/LCG/accounts.htm
  • October GDB mtg; dedicated to Tier-2 issueshttp://indico.cern.ch/conferenceDisplay.py?confId=20234
  • Talks from the last OB & LHCCSlides labeled with a * are from MV’s LHCC rapporteur talk
slide3

Tier-2 Performance Summary*

  • Overall, the Tier-2s are contributing much more now
  • Significant fractions of the Monte Carlo simulations are being done in the T2s for all experiments
  • Reliability is better, but still needs to improve
  • CCRC’08 exercise is generally considered a success for the Tier2s
slide4

Tier-2 Centres in CCRC’08 – General*

  • Overall, the Tier-2s and the experiments considered the CCRC’08 exercise to be a success
  • The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out
  • Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated
  • While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested
slide5

Tier-2 Issues/Concerns

As of CB and meetings with experiments this summer

  • Communications: Do Tier-2s have a voice? Is there a good mechanism for disseminating information?
  • Better monitoring: Pledges vs actual vs used
  • Hardware acquisitions:What should be bought? kSI2006?
  • Tier-2 capacity:Size of datasets? Effect of LHC delay?
slide6

Tier-2 Issues/Concerns

  • Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern
  • User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2s especially.
  • Federated Tier-2s: Tools to federate? Monitoring? (averaging)
  • Interoperabilityof EGEE, OSG, and NDGF should be improved
  • Software/Middleware updates: Could be smoother; too frequent
slide7

Communications for Tier-2s

  • Identified by the T2s at the last CB as a serious problem.Interesting to me that many in experiment computing management did not share this concern.
  • Should communication be organized according to experiment or to Tier-1 association? There are also differing opinions on this. There are two issues: Grid middleware/operations Experiment software
  • My view after studying this is that the situation is OK for “tightly coupled” Tier-2s, but not for remote and smaller Tier-2s that are not well coupled to a Tier-1.
slide8

Communications for Tier-2s

  • Many lines of communication do indeed exist.
  • Some examples are:CMS hastwo Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN)- attend all operations meetings - feed T2 issues back to the operations group - write T2-relevant minutes - organize T2 workshops  ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T2 site manager- weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1
slide9

Communications for Tier-2s

ATLAS uses its cloud structure for communications- Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e.g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinatorRegional organizations, such as:+ France Tier-2/3 technical group:- coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management+ GRIF:Tier-2 federation of 5 labs around Paris+ Canada:Weekly teleconferences of technical personnel (T1 & T2) to share information and prepare for upgrades, large production, etc.+ Many others exist; e.g. in the US and the UK

slide10

Communications for Tier-2s

  • Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2s a voice there.
  • Tier-2 mailing list:Actually exists and is being reviewed for completeness & accuracy
  • Tier-2 GDB:The October GDB was dedicated to Tier-2 issues+ reports from experiments: role of the T2s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year?
slide13

Tier-2 Installed Resources

  • But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema
  • Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T2s better Bad because APEL accounting still has problemsAccounting seems to be very finicky; breaks when the CE or MON box is upgraded
  • How are jobs distributed to the Tier-2s?
slide14

Tier-2 Hardware Questions

  • How does the LHC delay affect the requirements and pledges for 2009?+ We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well
slide15

Tier-2 Hardware Questions

  • How does the LHC delay affect the requirements and pledges for 2009?+ We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well
  • We need to use something other than SpecInt2000!+ this benchmark is totally out-of-date & useless for new CPUs + continued delays in SpecHEP can cause sub-optimal decisions
slide16

Tier-2 Hardware Questions

  • Networking to the nodes is now an issue.+ with 8 cores per node, 1 GigE connection ≈ 16.8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7.5 MB/sec at ATLAS and much more (x10?) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking GigE the worker nodes.

 We need more guidance from the experiments The next round of purchases is now!

slide17

Summary

  • The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T2s now.
  • The CCRC’08 exercise is considered a success by the Tier2s and by the experiments.
  • Availability and reliability are up, but still need improvement.
  • Resource acquisition vs pledges is better but still needs work
  • Issues for Tier2s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed