1 / 28

Distributed Facilities for U.S. ATLAS

Distributed Facilities for U.S. ATLAS. Rob Gardner Indiana University PCAP Review of U.S. ATLAS Physics and Computing Project Argonne National Laboratory OCTOBER 30, 2001. Outline. Requirements Approach Organization Resource Requirements, current funding Schedule

isaura
Download Presentation

Distributed Facilities for U.S. ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Facilities for U.S. ATLAS Rob Gardner Indiana University PCAP Review of U.S. ATLAS Physics and Computing Project Argonne National Laboratory OCTOBER 30, 2001

  2. Outline • Requirements • Approach • Organization • Resource Requirements, current funding • Schedule • Grid Testbed and Prototype Tier 2 development • US LHC Common Computing Projects • Summary Rob Gardner Distributed Facilities for U.S. ATLAS

  3. Distributed IT Infrastructure • A wide area computational infrastructure for U.S. ATLAS • A network of distributed computing devices • A network of distributed data caches & stores • Connectivity • Physicists with data • Computers with data (at all scales) • Physicists with each other (collaboration) • Distributed information, portals • Efforts • Data Grid R&D (see talks by Wenaus, Schopf) • Prototype Tier 2 sites at Boston and Indiana • Networking and Grid Testbed coordinated by Ed May (ANL) Rob Gardner Distributed Facilities for U.S. ATLAS

  4. Requirements • Access • Efficient access to resources at the Tier 1 facility • Data distribution to remote computing devices • Information • A secure infrastructure to locate, monitor and manage collections of distributed resources • Analysis planning framework • Resource estimation • “Matchmaker” tools to optimally connect physicist+CPU+data+etc… • Scalable • Add arbitrary large numbers of computing devices as they become available • Add arbitrarily large numbers of data sources as they become available Rob Gardner Distributed Facilities for U.S. ATLAS

  5. Approach • ~5 strategic remote sites (Tier 2s) • Scale of each facility: • MONARC estimates • ATLAS NCB/WWC (World Wide Computing Group) • National Tier 1 facility • 209K Spec95 • 365TB Online disk • 2 PB tertiary • Tier 2 = Tier 1 * 20% • Networking Rob Gardner Distributed Facilities for U.S. ATLAS

  6. Organization • Facilities Subproject 2.3.2 • Coordination: • Tier 2 centers (Gardner, Shank) • Testbed (May) • Networking (McKee) • Grid monitoring (Yu) Rob Gardner Distributed Facilities for U.S. ATLAS

  7. Role of Tier 2 Centers • User Analysis • Standard configuration optimized for analysis at the AOD level • ESD objects required for some analyses • Primary Resource for Monte Carlo Simulation • Data distribution caches (depends on distribution model) • Remote tertiary data stores • HSM services for quick, local AOD archival • MC data of all types (GEN, RAW, ESD, AOD, LOD) from all Tier 2’s & users • Relieve pressure, improve efficiency of Tier 1 systems • Effective use with grid software Rob Gardner Distributed Facilities for U.S. ATLAS

  8. Typical Tier 2 • CPU: 50KSpecInt95 (t1: 209K SI95) • Commodity Pentium/Linux • Estimated 144 Dual Processor Nodes (t1: 640 nodes) • Online Storage: 70 TB Disk (t1: 365 TB) • High Performance Storage Area Network • Baseline: Fiber Channel Raid Array Rob Gardner Distributed Facilities for U.S. ATLAS

  9. ‘Remote’ Data Stores • Exploit existing infrastructure • mass store infrastructure at 2 of the 5 Tier 2 centers • Assume existing HPSS or equivalent license, tape silo, robot • Augment with drives, media, mover nodes, and disk cache • Each site contributes 0.3-0.5 PB store • AOD archival, MC ESD+AOD archival Rob Gardner Distributed Facilities for U.S. ATLAS

  10. Personnel (Lehman 11/00) MANPOWER ESTIMATE SUMMARY IN FTEs WBSNo:2Funding Type:Infrastructure11/13/00 8:08:38 PM Description:US ATLAS ComputingInstitutions:AllFunding Source :All FY 01FY 02FY 03FY 04FY 05FY 06Total IT I1.04.06.010.010.07.038.0 IT II.01.02.02.05.05.015.0 Physicist1.01.01.01.01.0.05.0 TOTAL LABOR2.06.09.013.016.012.058.0 update10/01 FTE’s NSF funded (GriPhyN and ½ iVDGL reorganized into Software): 1.0 1.0 1.0 1.0 1.0 0.0 5.0 GriPhyN 0.0 3.0 3.0 3.0 3.0 3.0 15.0 iVDGL 0.0 0.5 0.5 0.5 0.0 0.0 1.5 PPDG 0.0 0.8 1.0 1.0 0.0 0.0 2.8 ITR2 telemetry 0.0 0.5 1.0 1.0 1.0 1.0 4.5 iVDGL grid ops 1.0 5.8 6.5 6.5 5.0 4.0 28.8 total Rob Gardner Distributed Facilities for U.S. ATLAS

  11. Tier 2 Costs (Lehman 11/01) Rob Gardner Distributed Facilities for U.S. ATLAS

  12. Funding for Tier 2 Centers • Additional funding for prototype tier 2 centers and for permanent tier 2 centers will need to be found Rob Gardner Distributed Facilities for U.S. ATLAS

  13. Schedule • R&D Tier 2 centers • Two prototype tier 2 sites selected in 01: Boston U and Indiana U • Initial (university funded) centers established in 01 • Support analysis of DC1 data in summer 02 • DC2 production and analysis • Production Tier 2’s – FY ‘04 & FY ‘05 • Operation – FY ‘05, FY ‘06 & beyond • Full Scale System Operation, 20% (‘05) to 100% (‘06) (as for Tier 1) Rob Gardner Distributed Facilities for U.S. ATLAS

  14. Persistent Grid Testbed for US-ATLAS ATLAS-US PCAP Meeting at ANL Oct 30, 2001 Ed May Argonne National Laboratory E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  15. Background & Motivation • Based on previous meetings of the US groups, in particular Summer 2000 at IU, organizational meeting Winter 2000-2001 at UM. • Establish a persistent grid test-bed of ATLAS-US level 1, level 2 and other sites April 2001. • Participating sites ANL, BNL, LBNL, BU, UM, IU, OU and UTA. • Provide a focus for working with PPDG and GriPhyN. Ultimately with CERN & EDG. E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  16. Participants • ANL HEP Ed May, Jerry Gieraltowski • LBNL(PDSF) Stu Loken, Shane Cannon • BNL Rich Baker, Torre Wenus, Danton Yu • Boston U Saul Youssef, Jim Shank • Indiana U Rob Gardner • Univ. of Michigan Shawn Mckee, Erc Myers • Univ. of Oklahoma Horst Severini, Pat Skubic • Univ. of Texas at Arlington Kaushik De More information: • http://www.usatlas.bnl.gov/computing/grid/ E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  17. Esnet, Mren NPACI, Abilene Calren Esnet, Abilene, Nton Esnet Abilene 8 Sites in Testbed, ‘01 U Michigan Boston University UC Berkeley LBNL-NERSC Argonne National Laboratory Brookhaven National Laboratory Oklahoma University Indiana University University of Texas at Arlington HPSS sites Rob Gardner Distributed Facilities for U.S. ATLAS

  18. Planning & Implementation • During 1 year (2001) implement testbed with globus 1.1.3 and 1.1.4 • Provide an environment for Grid developers and testers ... relatively small number of friendly users. Not for production use. • Establishment of a technical working group with regular phone/VRVS meetings. • production users. E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  19. P & I continued • Long list of technical issues of what services and management issues. • Scope of interest varies widely with institution: e.g. • Data Cataloging and Replication (BNL) • Objectivity Database issues (ANL) • User & Account management (UM, IU) • Remote job execution (BU) E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  20. Activities & Accomplishments • GridView grid testbed status (UTA) • Magda distributed data manager prototype (BNL) • Pacman package manager (BU,BNL) • GRIPE A grid sign-up tool (IU) • Distributed job management prototyping with Condor (BU,UTA,OU • Testing of distributed data replication (MAGDA, GDMP, Globus) with • Atlas applications (Tilecal testbeam, ATLFast in Athena) (ANL,BU,OU) • Network Performance and monitoring. (UM, BNL) E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  21. Testbed Software • Testbed has been functional for ~ 8 months • Accounts (individual user, group) created at all sites • Grid credentials (based on globus CA) distributed • To be updated with ESnet CA credentials • Grid software at each node in the site: • Globus 1.1.4 • Condor 6.3 • ATLAS core software distribution at some of the sites (for developers) • ATLAS related grid software: Pacman, Magda, Gridview • Start grid-wide applications in 02 Rob Gardner Distributed Facilities for U.S. ATLAS

  22. Future Activities for Testbed • Focus on Environments for Applications • Compatibility with EDG • Preparations for Atlas Data Challenge 1 and 2 E. May Rob Gardner Distributed Facilities for U.S. ATLAS

  23. IU Tier 2 Configuration • Gateway: atlas.uits.iupui.edu • Nodes: atlas01 – atlas016 • 400 MHz PII, 256MB • 4.3 GB SCSI local disk • 100 Mbs NIC • Switch • HP ProCurve 4000M 10/100Base-TX • Disk and Storage • /usr/lhc1 60 GB • /usr/lhc2 60 GB • lhc1.uits.iupui.edu 200 GB attached RAID, AFS • Generic atlas account into local HPSS Rob Gardner Distributed Facilities for U.S. ATLAS

  24. IU Notes • Tertiary storage • Currently IBM 3494 robot with ~10 TB ATLAS dedicated storage • New StorageTek to be installed in Feb 02 (capacity 360 TB) • HPSS software • Connectivity: • Located at Indianapolis campus (IUPUI) (OC12) • Better connectivity than Bloomington (DS3) • Bloomington-Indianapolis dark fiber project >1/2 complete, future installations could be located at IUB • Machine room adjacent to Abilene NOC, Global NOC • IU will develop grid operations center (iGOC) as part of iVDGL • Trouble-ticket system, monitoring, administrative Rob Gardner Distributed Facilities for U.S. ATLAS

  25. BU Router internet Tufts internet 2 MIT Harvard Boston University Tier 2 RAID disk array IBM R30 100 TB SGIs 230 cpu IBM sp 64 cpu Linux Farm 128 PIII NoX Campus Network Access GRID conference center High end graphics lab SGI Onyx x4 RE II, 9 O2 Atlas dedicated shared OC12 622 Mb/s Rob Gardner Distributed Facilities for U.S. ATLAS

  26. BU Notes • The 100 Terabyte mass store will be upgraded to 150 Terabytes. • Upgrading the local 100 Mb/s ethernet to Gigabit is being considered. • Nominal network bandwidth is proportional to the thickness of the purple lines. • For both IU and BU clusters, hardware funding in FY 02 will be used to optimize support for analysis of data challenge production • BU: large RAID purchase • IU: some RAID plus CPU upgrades Rob Gardner Distributed Facilities for U.S. ATLAS

  27. US LHC Common Computing Projects • Meeting of ATLAS & CMS PM’s, CERN 1/01 • Identify possible common work areas (facilities, networking, database, grid) • Facilities Workshops (R Baker, V Odell): • BNL (3/01), FNAL (7/01), LBL (10/01) • Networking Workshops (S McKee): • IU (6/01), UM (10/01) Rob Gardner Distributed Facilities for U.S. ATLAS

  28. Summary • Prototype tier 2 centers chosen, U-funded resources in place • Persistent testbed for grid projects (PPDG, GriPhyN/iVDGL, EDG) established • US LHC coordination for facilities and networking established, working groups formed • Hiring for prototype tier 2 centers & ATLAS-grid integration begun • Facilities grid planning document; coherent with Software grid development; includes networking and facilities infrastructure Rob Gardner Distributed Facilities for U.S. ATLAS

More Related