1 / 22

LCG Fabric status

LCG Fabric status. Content Major achievements during the last 5 month  Infrastructure  Resources  Automation  Services Major decisions Outlook for the next few months (activities and milestones). In July adjustment of Level 2 milestones

akina
Download Presentation

LCG Fabric status

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LCG Fabric status Bernd Panzer-Steindel, CERN/IT

  2. Content • Major achievements during the last 5 month Infrastructure Resources Automation Services • Major decisions • Outlook for the next few months (activities and milestones) Bernd Panzer-Steindel, CERN/IT

  3. In July adjustment of Level 2 milestones Updated Fabric Level 2 Milestones This status covers the milestones and activities between April – June and July - August Bernd Panzer-Steindel, CERN/IT

  4. Major achievements during the last 5 month Infrastructure (I) Computer Center Refurbishment • To host the expected amount of equipment in 2007 one has • to upgrade the space, cooling and electricity infrastructure in the • computer center • Today we have a capacity of about 600 KW • We are already using ~ 450 KW • The goal is to refurbish the infrastructure to deliver 2.5 MW • for electricity and cooling (== 2 x 2.5 MW) Bernd Panzer-Steindel, CERN/IT

  5. Major achievements during the last 5 month Infrastructure (II) Substation Building • Milestone 1.2.3.3 • Sub-station civil engineering starts • 01-September 2003 • started on the 18th of August Bernd Panzer-Steindel, CERN/IT

  6. Major achievements during the last 5 month Infrastructure (III) Preparing for the electricity refurbishment and upgrade of the right side of the computer center Milestone 1.2.3.2 Right half of m/c room migrated to vault 01-Aug-2003  finished on the 15th of August Bernd Panzer-Steindel, CERN/IT

  7. Major achievements during the last 5 month Infrastructure (IV) Populating the vault in the computer center Disk server CPU server Tape silos and servers Bernd Panzer-Steindel, CERN/IT

  8. 9m double rows of racks for critical servers Aligned normabarres 18m double rows of racks 12 Mario racks or 36 19” racks 528 box PCs 105kW 1440 1U PCs 288kW 324 disk servers 120kW(?) Major achievements during the last 5 month Infrastructure (V) New layout plan for the computer center equipment Bernd Panzer-Steindel, CERN/IT

  9. Major achievements during the last 5 month Resources (I) The movement of the STK tape silos was a prerequisite for the refurbishment of the computer center. During the period March to May : • 5 STK tape silos on loan were installed in the vault • About 25000 cartridges were redistributed • 30 new 9940B tape drives and tape servers were installed. • A further 30 drives (9840, 3590E) were redistributed • The remaining 5 silos in the computer center were dismantled and sent back to STK Bernd Panzer-Steindel, CERN/IT

  10. Major achievements during the last 5 month Resources (III) Computing Data Challenge The movement of the silos and the replacement of 9940A tape drives with 9940B tapes drives gave the opportunity to use all 50 new tape drives for a short period exclusively  try to reach 1 GB/s in an emulated CDR environment • preparations started in April, the DC was running 2. – 5. May • very difficult  small time window, system setup, in parallel to production and refurbishment • made heavy use of the openlab equipment (Enterasys router, IA64 HP nodes) Milestone 1.2.7.3 1 July Integration of the 10 Gbit equipment into the prototype  done on the first of May • “side effects” : • stress test of the equipment (tape servers and drives, new robot installation), we solved a few problems and thus improved the service • very good financial deal with STK for the SILO movement and one free drive • test of flexibility, fast reconfiguration of equipment Bernd Panzer-Steindel, CERN/IT

  11. Major achievements during the last 5 month Resources (IV) TAPE TPSRV101-112 TAPE TPSRV113-124 CPU TBED001-12 CPU TBED0013-24 CPU TBED0025-36 CPU TBED0037-48 -12 -13 -14 -15 513-V -16 4 4 4 4 4 4 TBED… ST11 ST21 ST12 ST13 ST22 ST23 -23 -17 8 8 -24 ST1+ ST2 ST5 + ST6 -18 ST14 ST15 ST24 ST25 4 4 4 4 -19 -20 20 -21 -22 4 DISK LXSHARE108D-119D Except LXSHARE115D DISK LXSHARE001D-12D DISK LXSHARE013D-24D DISK LXSHARE025D-36D 20 x Hoplapro -25 -26 -27 TAPE TPSRV050-52 TAPE TPSRV054-57 TAPE TPSRV058-62 613-R Backbone TAPE TPSRV001-15 TAPE TPSRV002-16 TAPE TPSRV018-22 TAPE TPSRV023-27 TAPE TPSRV028-32 1 Gbyte/s IT Computing Data Challenge  Network Topology the setup was using 40 CPU server, 60 disk server and on average 45 tape server Bernd Panzer-Steindel, CERN/IT

  12. Major achievements during the last 5 month Resources (V) 1 Gbyte/s IT Computing Data Challenge  achieved data rates running in parallel with increasing production service 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 920MB/s average over a period of 3 days with an 8 hour period of 1.1GB/s and peaks of 1.2GB/s daytime tape server intervention In addition : 600 MB/s into Castor for 12 hours , then window of opportunity closed services started time in minutes Bernd Panzer-Steindel, CERN/IT

  13. Major achievements during the last 5 month Resources (VI) 2003 CPU and disk resources Old milestone, 2-3 month delays L3M CPU and disk capacity upgrade of Lxbatch 2/24/03 L3M Node capacity upgrade of the Prototype • Delivery and installation of the 2003 disk and processor resources • took place in : • April for 350 CPU nodes • June/July for 55 disk servers • was late due to CERN purchasing procedures and delivery delays • The missing resources were covered through the resources from the • LCG prototype • disk and processor nodes on loan for different experiments Bernd Panzer-Steindel, CERN/IT

  14. 2000 1800 1600 1400 LHCb 1200 ATLAS CMS KSI2000 ALICE 1000 LHC baseline non-LHC 800 Lxbatch 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 January 2004 December 2004 time Major achievements during the last 5 month Resources (VIII) Preliminary estimation of resource requests and resource availability for the 2004 Physics Data Challenges (CERN) Bernd Panzer-Steindel, CERN/IT

  15. Major achievements during the last 5 month Resources (IX) Re-costing • Re-costing exercise during April and May • Representatives from IT and the 4 LHC experiments • Review the equipment cost for phase 2 of LCG and 2009-2010 take into account slight changes in the model and the adjusted requirements from the experiments  Excel table  explanation note • LCG seminar in July and paper in August Bernd Panzer-Steindel, CERN/IT

  16. Major achievements during the last 5 month Resources (X) Summary of the results from the re-costing exercise * A bug in the original paper is here corrected All units in [ million CHF ] Bernd Panzer-Steindel, CERN/IT

  17. Major achievements during the last 5 month Services (I) CASTOR In Q1 of 2003 split of CASTOR into development and service part. Service team is about 2 FTE Development is now about 5 FTE (one IT person joined at the 1st September) The new castor architecture and design has been presented on 24th June user meeting on 12th August PEB the paper has been distributed to key people for feedback information exchange at next HEPIX (especially with Fermilab) Milestones October -03 : Demonstrate concept of pluggable scheduler and high rate request handling February -04 : Integrated prototype of the whole system April -04 : Production system ready for deployment Bernd Panzer-Steindel, CERN/IT

  18. Major achievements during the last 5 month Services (II) GRID-Fabric interface After LCG-1 is now stabilizing, the coordination work between the LCG-1 team and the Fabric service teams have restarted (installation procedures, security, network access, LSF issues,etc.) 1.2.4.2.1 probably delayed until November, maybe merge 1 and 2 Bernd Panzer-Steindel, CERN/IT

  19. Major achievements during the last 5 month Automation (I) Installation and Configuration • quattor is a system administration toolkit being designed and implemented by the EDG WP4 group and the IT/FIO group. It has different component which are/will be used to administer the CERN T0/T1 center. • The Software Package Management Agent (SPMA) • The Node Configuration Manager(NCM) subsystem • The Automated Installation Infrastructure (AII) • The Configuration Database(CDB) • The Fault Tolerant System (FT) • Part of the packages (SPMA, CDB) are already deployed on 1200 systems in the • center and have been used 2 weeks ago to upgrade to LSF 5.1 on 800 batch nodes • in an automatic way (no LSF interruption, no user noticed….) • Milestone 1.2.2.1 SPMA – full production 01.08.2003 • was on 1200 nodes on 18th August 2003 Milestone 1.2.2.7 FT – development started 01.09.2003  started 01.09.2003 Bernd Panzer-Steindel, CERN/IT

  20. Important decisions during the last 5 month • Tape costs for LHC experiments to be paid by IT costs in 2004  600 KCHF 2005  1000 KCHF no extra budget for IT, thus has to come from CPU+disk budget proposal to EP by Division Leader for cost sharing • IBM joined openlab, 20 TB StorageTank installation in July • RedHat Linux policy changes  free distribution lifetime = 12 month, support stops after that  longer support for RH advanced server version (licenses) need to go to a 12 month Linux certification cycle (wanted to have 18-24 month) other sites have same problem, some coordination ongoing ( HEPIX) currently preparations for certifying RH10 Bernd Panzer-Steindel, CERN/IT

  21. Outlook for the next 3 month (I) Coming Milestones + 1 month Bernd Panzer-Steindel, CERN/IT

  22. Outlook for the next 3 month (II) • Delivery and Installation of new resources  440 CPU nodes  60 disk server in November 2003 (2004 budget) • Changes in the network backbone to host more GB connections arrangements with openlab and the prototype necessary Bernd Panzer-Steindel, CERN/IT

More Related