Distributed Facilities for U.S. ATLAS

Distributed Facilities for U.S. ATLAS Rob Gardner Indiana University PCAP Review of U.S. ATLAS Physics and Computing Project Argonne National Laboratory OCTOBER 30, 2001

Outline • Requirements • Approach • Organization • Resource Requirements, current funding • Schedule • Grid Testbed and Prototype Tier 2 development • US LHC Common Computing Projects • Summary Rob Gardner Distributed Facilities for U.S. ATLAS

Distributed IT Infrastructure • A wide area computational infrastructure for U.S. ATLAS • A network of distributed computing devices • A network of distributed data caches & stores • Connectivity • Physicists with data • Computers with data (at all scales) • Physicists with each other (collaboration) • Distributed information, portals • Efforts • Data Grid R&D (see talks by Wenaus, Schopf) • Prototype Tier 2 sites at Boston and Indiana • Networking and Grid Testbed coordinated by Ed May (ANL) Rob Gardner Distributed Facilities for U.S. ATLAS

Requirements • Access • Efficient access to resources at the Tier 1 facility • Data distribution to remote computing devices • Information • A secure infrastructure to locate, monitor and manage collections of distributed resources • Analysis planning framework • Resource estimation • “Matchmaker” tools to optimally connect physicist+CPU+data+etc… • Scalable • Add arbitrary large numbers of computing devices as they become available • Add arbitrarily large numbers of data sources as they become available Rob Gardner Distributed Facilities for U.S. ATLAS

Approach • ~5 strategic remote sites (Tier 2s) • Scale of each facility: • MONARC estimates • ATLAS NCB/WWC (World Wide Computing Group) • National Tier 1 facility • 209K Spec95 • 365TB Online disk • 2 PB tertiary • Tier 2 = Tier 1 * 20% • Networking Rob Gardner Distributed Facilities for U.S. ATLAS

Organization • Facilities Subproject 2.3.2 • Coordination: • Tier 2 centers (Gardner, Shank) • Testbed (May) • Networking (McKee) • Grid monitoring (Yu) Rob Gardner Distributed Facilities for U.S. ATLAS

Role of Tier 2 Centers • User Analysis • Standard configuration optimized for analysis at the AOD level • ESD objects required for some analyses • Primary Resource for Monte Carlo Simulation • Data distribution caches (depends on distribution model) • Remote tertiary data stores • HSM services for quick, local AOD archival • MC data of all types (GEN, RAW, ESD, AOD, LOD) from all Tier 2’s & users • Relieve pressure, improve efficiency of Tier 1 systems • Effective use with grid software Rob Gardner Distributed Facilities for U.S. ATLAS

Typical Tier 2 • CPU: 50KSpecInt95 (t1: 209K SI95) • Commodity Pentium/Linux • Estimated 144 Dual Processor Nodes (t1: 640 nodes) • Online Storage: 70 TB Disk (t1: 365 TB) • High Performance Storage Area Network • Baseline: Fiber Channel Raid Array Rob Gardner Distributed Facilities for U.S. ATLAS

‘Remote’ Data Stores • Exploit existing infrastructure • mass store infrastructure at 2 of the 5 Tier 2 centers • Assume existing HPSS or equivalent license, tape silo, robot • Augment with drives, media, mover nodes, and disk cache • Each site contributes 0.3-0.5 PB store • AOD archival, MC ESD+AOD archival Rob Gardner Distributed Facilities for U.S. ATLAS

Personnel (Lehman 11/00) MANPOWER ESTIMATE SUMMARY IN FTEs WBSNo:2Funding Type:Infrastructure11/13/00 8:08:38 PM Description:US ATLAS ComputingInstitutions:AllFunding Source :All FY 01FY 02FY 03FY 04FY 05FY 06Total IT I1.04.06.010.010.07.038.0 IT II.01.02.02.05.05.015.0 Physicist1.01.01.01.01.0.05.0 TOTAL LABOR2.06.09.013.016.012.058.0 update10/01 FTE’s NSF funded (GriPhyN and ½ iVDGL reorganized into Software): 1.0 1.0 1.0 1.0 1.0 0.0 5.0 GriPhyN 0.0 3.0 3.0 3.0 3.0 3.0 15.0 iVDGL 0.0 0.5 0.5 0.5 0.0 0.0 1.5 PPDG 0.0 0.8 1.0 1.0 0.0 0.0 2.8 ITR2 telemetry 0.0 0.5 1.0 1.0 1.0 1.0 4.5 iVDGL grid ops 1.0 5.8 6.5 6.5 5.0 4.0 28.8 total Rob Gardner Distributed Facilities for U.S. ATLAS

Tier 2 Costs (Lehman 11/01) Rob Gardner Distributed Facilities for U.S. ATLAS

Funding for Tier 2 Centers • Additional funding for prototype tier 2 centers and for permanent tier 2 centers will need to be found Rob Gardner Distributed Facilities for U.S. ATLAS

Schedule • R&D Tier 2 centers • Two prototype tier 2 sites selected in 01: Boston U and Indiana U • Initial (university funded) centers established in 01 • Support analysis of DC1 data in summer 02 • DC2 production and analysis • Production Tier 2’s – FY ‘04 & FY ‘05 • Operation – FY ‘05, FY ‘06 & beyond • Full Scale System Operation, 20% (‘05) to 100% (‘06) (as for Tier 1) Rob Gardner Distributed Facilities for U.S. ATLAS

Persistent Grid Testbed for US-ATLAS ATLAS-US PCAP Meeting at ANL Oct 30, 2001 Ed May Argonne National Laboratory E. May Rob Gardner Distributed Facilities for U.S. ATLAS

Background & Motivation • Based on previous meetings of the US groups, in particular Summer 2000 at IU, organizational meeting Winter 2000-2001 at UM. • Establish a persistent grid test-bed of ATLAS-US level 1, level 2 and other sites April 2001. • Participating sites ANL, BNL, LBNL, BU, UM, IU, OU and UTA. • Provide a focus for working with PPDG and GriPhyN. Ultimately with CERN & EDG. E. May Rob Gardner Distributed Facilities for U.S. ATLAS

Participants • ANL HEP Ed May, Jerry Gieraltowski • LBNL(PDSF) Stu Loken, Shane Cannon • BNL Rich Baker, Torre Wenus, Danton Yu • Boston U Saul Youssef, Jim Shank • Indiana U Rob Gardner • Univ. of Michigan Shawn Mckee, Erc Myers • Univ. of Oklahoma Horst Severini, Pat Skubic • Univ. of Texas at Arlington Kaushik De More information: • http://www.usatlas.bnl.gov/computing/grid/ E. May Rob Gardner Distributed Facilities for U.S. ATLAS

Esnet, Mren NPACI, Abilene Calren Esnet, Abilene, Nton Esnet Abilene 8 Sites in Testbed, ‘01 U Michigan Boston University UC Berkeley LBNL-NERSC Argonne National Laboratory Brookhaven National Laboratory Oklahoma University Indiana University University of Texas at Arlington HPSS sites Rob Gardner Distributed Facilities for U.S. ATLAS

Planning & Implementation • During 1 year (2001) implement testbed with globus 1.1.3 and 1.1.4 • Provide an environment for Grid developers and testers ... relatively small number of friendly users. Not for production use. • Establishment of a technical working group with regular phone/VRVS meetings. • production users. E. May Rob Gardner Distributed Facilities for U.S. ATLAS

P & I continued • Long list of technical issues of what services and management issues. • Scope of interest varies widely with institution: e.g. • Data Cataloging and Replication (BNL) • Objectivity Database issues (ANL) • User & Account management (UM, IU) • Remote job execution (BU) E. May Rob Gardner Distributed Facilities for U.S. ATLAS

Activities & Accomplishments • GridView grid testbed status (UTA) • Magda distributed data manager prototype (BNL) • Pacman package manager (BU,BNL) • GRIPE A grid sign-up tool (IU) • Distributed job management prototyping with Condor (BU,UTA,OU • Testing of distributed data replication (MAGDA, GDMP, Globus) with • Atlas applications (Tilecal testbeam, ATLFast in Athena) (ANL,BU,OU) • Network Performance and monitoring. (UM, BNL) E. May Rob Gardner Distributed Facilities for U.S. ATLAS

Testbed Software • Testbed has been functional for ~ 8 months • Accounts (individual user, group) created at all sites • Grid credentials (based on globus CA) distributed • To be updated with ESnet CA credentials • Grid software at each node in the site: • Globus 1.1.4 • Condor 6.3 • ATLAS core software distribution at some of the sites (for developers) • ATLAS related grid software: Pacman, Magda, Gridview • Start grid-wide applications in 02 Rob Gardner Distributed Facilities for U.S. ATLAS

Future Activities for Testbed • Focus on Environments for Applications • Compatibility with EDG • Preparations for Atlas Data Challenge 1 and 2 E. May Rob Gardner Distributed Facilities for U.S. ATLAS

IU Tier 2 Configuration • Gateway: atlas.uits.iupui.edu • Nodes: atlas01 – atlas016 • 400 MHz PII, 256MB • 4.3 GB SCSI local disk • 100 Mbs NIC • Switch • HP ProCurve 4000M 10/100Base-TX • Disk and Storage • /usr/lhc1 60 GB • /usr/lhc2 60 GB • lhc1.uits.iupui.edu 200 GB attached RAID, AFS • Generic atlas account into local HPSS Rob Gardner Distributed Facilities for U.S. ATLAS

IU Notes • Tertiary storage • Currently IBM 3494 robot with ~10 TB ATLAS dedicated storage • New StorageTek to be installed in Feb 02 (capacity 360 TB) • HPSS software • Connectivity: • Located at Indianapolis campus (IUPUI) (OC12) • Better connectivity than Bloomington (DS3) • Bloomington-Indianapolis dark fiber project >1/2 complete, future installations could be located at IUB • Machine room adjacent to Abilene NOC, Global NOC • IU will develop grid operations center (iGOC) as part of iVDGL • Trouble-ticket system, monitoring, administrative Rob Gardner Distributed Facilities for U.S. ATLAS

BU Router internet Tufts internet 2 MIT Harvard Boston University Tier 2 RAID disk array IBM R30 100 TB SGIs 230 cpu IBM sp 64 cpu Linux Farm 128 PIII NoX Campus Network Access GRID conference center High end graphics lab SGI Onyx x4 RE II, 9 O2 Atlas dedicated shared OC12 622 Mb/s Rob Gardner Distributed Facilities for U.S. ATLAS

BU Notes • The 100 Terabyte mass store will be upgraded to 150 Terabytes. • Upgrading the local 100 Mb/s ethernet to Gigabit is being considered. • Nominal network bandwidth is proportional to the thickness of the purple lines. • For both IU and BU clusters, hardware funding in FY 02 will be used to optimize support for analysis of data challenge production • BU: large RAID purchase • IU: some RAID plus CPU upgrades Rob Gardner Distributed Facilities for U.S. ATLAS

US LHC Common Computing Projects • Meeting of ATLAS & CMS PM’s, CERN 1/01 • Identify possible common work areas (facilities, networking, database, grid) • Facilities Workshops (R Baker, V Odell): • BNL (3/01), FNAL (7/01), LBL (10/01) • Networking Workshops (S McKee): • IU (6/01), UM (10/01) Rob Gardner Distributed Facilities for U.S. ATLAS

Summary • Prototype tier 2 centers chosen, U-funded resources in place • Persistent testbed for grid projects (PPDG, GriPhyN/iVDGL, EDG) established • US LHC coordination for facilities and networking established, working groups formed • Hiring for prototype tier 2 centers & ATLAS-grid integration begun • Facilities grid planning document; coherent with Software grid development; includes networking and facilities infrastructure Rob Gardner Distributed Facilities for U.S. ATLAS

Distributed Facilities for U.S. ATLAS

Distributed Facilities for U.S. ATLAS

Presentation Transcript

U.S. ATLAS Software WBS 2.2

ATLAS Magda Distributed Data Manager

Metadata Considerations for ATLAS Distributed Computing

Plans for “Clouds” in the U.S. ATLAS Facilities

ATLAS Distributed Computing Coordination Matters

Distributed RI facilities

ADA: ATLAS Distributed Analysis

ATLAS Distributed Analysis and proposal for ATLAS-LHCb system

U.S. ATLAS Testbed Status Report

ATLAS Distributed Analysis: Current roadmap

ATLAS Distributed Analysis

The Alliance Distributed Supercomputing Facilities

Functional and Large-Scale Testing of the ATLAS Distributed Analysis Facilities with Ganga

ATLAS Distributed Analysis

ATLAS Distributed Analysis

ATLAS Distributed Computing

ATLAS Distributed Analysis (ADA)

U.S. ATLAS Computing

ATLAS Distributed Data Management

ATLAS Distributed Data Management Operations

U.S. ATLAS Research Phase

ATLAS Distributed Computing Tutorial