1 / 10

RuTier-2 status report (Russian Federation)

A.Minaenko NL Cloud Meeting, 05 April 2011 , CERN , Geneva. RuTier-2 status report (Russian Federation). ATLAS RuTier-2 tasks.

platt
Download Presentation

RuTier-2 status report (Russian Federation)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A.Minaenko NL Cloud Meeting, 05 April 2011 , CERN, Geneva RuTier-2 status report (Russian Federation)

  2. ATLAS RuTier-2 tasks • Russian Tier-2 (RuTier-2) computing facility is planned to supply with computing resources all 4 LHC experiments including ATLAS. It is a distributed computing center including computing farms of 6 institutions: RRC-KI (Moscow), JINR (Dubna), IHEP (Protvino), PNPI (St.Petersburg), ITEP, SINP (Moscow both). Two smaller sites MEPhI and FIAN are now present in the TiersOfAtlas list but they have smaller resources • The main RuTier-2 task is providing facilities for physics analysis of collected data using mainly AOD, DESD and group/user derived data formats • Now group atlas/ruexists in the framework of ATLAS VO. It includes physicists intending to carry analysis in RuTier-2 mostly and the group list contains 46 names at the moment. The group will have privilege of write access to local RuTier-2 disk resources (space token LOCALGROUPDISK) • All the data used for analysis should be stored on disks • The second important task is production and storage of MC simulated data • The full size of data storage and CPU needed for their analysis are proportional to the collected statistics. The resources needed should constantly grow with the increase of the number of collected events. In 2009, 2010, 2011 expected effective time of data taking is equal to 2.2, 5.8, 5.8 * 106 sec. These numbers define the required resources and their evolution

  3. RuTier-2 resources for ATLAS by the end of 2010 • Red – sites for user analysis of ATLAS data, the other for simulation only • The total number of CPU cores in 2010 (2009) is about 3200 (2500), increase by 40% • ATLAS disk resource in 2010 (2009) is about 980 (560) TB ), increase by 75%. This disk space is sufficient to keep ATLAS statistics 2009-2010 (AOD+DESD+group data) • Now the main type of LHC grid jobs is official production jobs and CPU resources are at the moment dynamically shared by all 4 LHC VO • ATLAS share in CPU is about 1/3

  4. ATLAS space tokens at RuTier-2 • DATADISK – 654 TB • 3 group disks are assigned to RuTier-2 • RRC-KI – exotic • JINR – SM • IHEP - JetEtmiss

  5. Space token current status

  6. RuTier-2 CPU resources usage in 2009 and 2010 26.8M*kSI2k*hour 33.1M*kSI2k*hour • The consumed CPU time in 2010 is by about 24% larger than in 2009 regardless 40% increase in the CPU number indicating some efficiency decrease • CPU usage by different VO is quite non-uniform in time and dynamical sharing of CPU resources strongly increase the efficiency of their usage

  7. RuTier-2 CPU resources usage in 2010 (all 4 LHC exp.) • ALICE – 35% • ATLAS – 32% • CMS – 25% • LHCb – 8% • JINR – 41% • RRC-KI – 26% • IHEP – 12% • ITEP – 9% • PNPI – 6% • SINP – 5%

  8. Estimate of resources needed to fulfil RuTier-2 tasks in 2011 • By the end of 2011 the ATLAS statistics will be almost doubled in comparison with the end of 2010 • Correspondingly total resources for all Tiers-2 requested by ATLAS for 2011 increase for • CPU from 239 (2010) up to 278 kHS06 • Disks from 20.1 (2010) up to 34.2 PB • Knowing share of Russia+Dubna users in ATLAS the resources needed for ATLAS RuTier-2 in 2011 can be extrapolated • CPU for ATLAS RuTier-2 in 2011 (2010) – 1500 (1300) or 13.4 (11.6) in kHS06 • Disks for ATLAS RuTier-2 in 2011 (2010) – 1640 (960) TB • The main problem – 700 TB of new disks are necessary

  9. RuTier-2 resources for 2011 and further • At the end of each year we got funding from the Ministry of Science for RuTier-2 resources increase. The last such payment was at the end of 2009 and it was announced that the existing program is completed. We got nothing in 2010 • So, for 2011 we can rely on possible resources increase by some institutes for their internal funding • JINR has already purchased and installed 480 additional CPU cores and one 40 TB file server. Now they have 1584 cores of about 14 kHS06. The contract almost signed for purchase of 480 additional CPU cores and 7x40TB of fileservers. The resources will be available in summer • RRC-KI should purchase additional resources and, probably, some other institutes • The request to the Ministry of Science is being prepared for funding ATLAS (+others) upgrade for 5 years beginning from 2012. RuTier-2 computing is included in this request. But the result will be known later

  10. Some ATLAS problems visible at sites • There were several overfills of shared area used for ATLAS sw installation at some sites. It is not a problem for sites, in principle, to increase the size of the area but they should be informed beforehand that this is necessary. Now in ATLAS VO card the corresponding size is requested to be 300 GB • At our and many other sites analysis jobs submitted with Panda backend often use lcg-cp command to fetch data from local SE to the WNs. lcg-cp command can use only gsiftp protocol for data transfer and it is much less efficient than dcap (rfio) protocol for dCache (DPM) SE. The usage of “right ” protocols is able considerably decrease the number of failed analysis jobs • The most frequently used mode of data fetching to a WN in Panda backend is file prestaging. All files needed are copied to WN and then analysis begins. When large number of jobs arrive to a site and begin this prestaging they interfere with each other and data transfer can be considerably slowed down and CPU time is wasted during this process. File stager is more efficient in this case (analysis begins after the first file is copied, other files are copied in parallel with the analysis). Why it is not used as a default?

More Related