Case Studies: MGRID and DZero

Case Studies: MGRID and DZero Abhijit Bose The University of Michigan Ann Arbor, MI 48109 abose@eecs.umich.edu 2003 Summer Computing Institute, SDSC Slides contributed by Shawn Mckee, Lee Leuking, Igor Terekhov, Gabriele Garzoglio and Jianming Qian

Just What is a Grid, Anyway? • Three key criteria: • Coordinates distributed resources … • using standard, open protocols and interfaces … • to deliver non-trivial qualities of service to users and applications • What is not a Grid? • A cluster, network attached storage device, scientific instrument, network, etc. • Each is an important component of a Grid, but by itself does not constitute a Grid

Why Grids? • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour • 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data • Civil engineers collaborate to design, execute, & analyze shake table experiments • Climate scientists visualize, annotate, & analyze terabyte simulation datasets • An emergency response team couples real time data, weather model, population data

Why Grids? (contd) • A multidisciplinary analysis in aerospace couples code and data in four companies • An application service provider purchases cycles from compute cycle providers • Scientists use a specialized (and expensive) microscope at a remote facility on a time-shared basis The bandwidth, reliability and ubiquity of networks enable us to envision transparent access to geographically distributed resources: computers, storage, instruments, communication devices and data.

The Advent of Grid Computing • I would argue we have crossed a threshold in technology that has enabled this new paradigm to arise. • There are numerous examples of the advent of grid computing…

Network for EarthquakeEngineering Simulation • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other • On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

ATLAS • A Torroidal LHC Apparatus • Collaboration • 150 institutes • 1850 physicists • Detector • Inner tracker • Calorimeter • Magnet • Muon • United States ATLAS • 29 universities, 3 national labs • 20% of ATLAS

ATLAS

How Much Data is Involved? High Level-1 Trigger(1 MHz) High No. ChannelsHigh Bandwidth(500 Gbit/s) Level 1 Rate (Hz) 106 LHCB ATLAS CMS 105 HERA-B KLOE TeV II 104 Hans Hoffman DOE/NSF Review, Nov 00 High Data Archive(PetaByte) CDF/D0 103 H1ZEUS ALICE NA49 UA1 102 104 105 106 107 LEP Event Size (bytes)

Where does Michigan Fit? • ATLAS (and D0) will require grid technologies to meet their computational and data handling needs • Many other fields are or will be encountering equivalent problems • Michigan has a long history of leadership in networking and computer science R&D

MGRID – www.mgrid.umich.eduMichigan Grid Research and Infrastructure Development • A center to develop, deploy, and sustain an institutional grid at Michigan • Many groups across the University participate in compute/data/network-intensive research grants – increasingly Grid is the solution • ATLAS, DZero, NPACI, NEESGrid, Visible Human, MCBI, NFSv4, NMI • MGRID allows work on common infrastructure instead of custom solutions

MGRID Center • Central core of technical staff (new hires) • Faculty and staff from participating units • Executive committee from participating units and the provost office • Technical Leadership Team to guide design and development • Collaborative grid research and development with technical staff from participating units • Primary goal: develop and deploy an Institutional Grid for the University of Michigan

MGrid Research Project Partners • Center for Advanced Computing (cac.engin.umich.edu) • College of LS&A (Physics) (www.lsa.umich.edu) • Center for Information Technology Intergration (www.citi.umich.edu) • Michigan Center for BioInformatics(www.ctaalliance.org) • Visible Human Project (vhp.med.umich.edu) • Mental Health Research Institute (www.med.umich.edu/mhri) • ITCom (www.itcom.itd.umich.edu) • School of Information (si.umich.edu) • Electron Microbeam Analysis Laboratory (emalwww.engin.umich.edu/emal/fset.html)

MGRID: Goals • Provide participating units knowledge, support and a framework to deploy Grid technologies • Provide testbeds for existing and emerging Grid technologies • Coordinate activities within the national Grid community (GGF, GlobusWorld, …) • Provide a context for the University to invest in computational and other Grid resources

MGRID activities for Year 1 • Grid enable the CAC computer clusters and mass store. • Install NMI and NPACKage on development computers and assess changes necessary for NMI and NPACKage to support UM IT infrastructure. • Enable the AccessGrid across the campus backbone and within the units that elect to utilize the AccessGrid.

MGRID Activities for Year 1 (cont.) • Issue Request for Expression of Interest to faculty and staff across the UM. These projects may include instructional activity or other scholarly endeavor as a "research application" of grid computing. • + • Develop project plans and schedules with research project staff to identify modifications necessary to MGRID to support projects.

MGRID Activities for Year 1 • Work with network experts (e.g, the Internet2 End-to-End Performance Initiative Technical Advisory Group) to design and deploy campus network monitors and beacons to measure baseline campus network performance. • Establish collaborative efforts within the UM community to develop and submit proposals to funding agencies to support grid computing at Michigan and to support the use of MGRID resources and services for research and instructional activities. • Pursue interactions with Internet2

Longer term goals • We hope to make significant contributions to general grid problems related to: • Sharing resources among multiple VO’s • Network monitoring and QoS issues for grids • Integration of middleware with domain specific applications • Grid File Systems • We want to enable Michigan researchers to take advantage of grid resources both on campus and beyond

MGRID Resource MGRID User

An Application Grid Domain using NPACI resources DZero/SAM-Grid Deployment at Michigan (Slides Contributed by: Lee Leuking, Igor Terekhov, Gabriele Garzoglio at FNAL)

Timelines: Planning Meetings (CAC and Fermi Labs): Sep-Oct, 2002 Demonstration of SAM-Grid at SC2002: Nov, 2002 Deployment/Site-customization: Dec, 2002 - Mar, 2003 NPACI Production Runs at Michigan: July, 2003 (plus: site visits, students spent part of their time at Fermi Labs) Moving on to scheduling and data movement issues at DZero resource sites

Michigan DZero NPACI Processing and Grid Team A group of Physicists and Computer Scientists… Project Leaders: Jianming Qian (Physics) Abhijit Bose (CAC,EECS) Graduate Students: Brian Wickman (CAC,EECS) James Degenhardt (Physics) Jeremy Herr (Physics) Glenn Lopez (Physics) Core DZero Faculty in Physics: Jianming Qian Bing Zhou Drew Alten Need both Science/Application experts as well as Computer Scientists

The D0 Collaboration • 500+ Physicists • 72 institutions • 18 Countries

Scale of Challenges • Computing: • sufficient CPU cycles • and storages, good network • bandwidth • Software: • efficient reconstruction • program, realistic simulation, • easy data-access, … 50 Hz 2 MHz 5 KHz 1 KHz Level - 1 Level - 2 Level - 3 • Luminosity-dependent physics menu leads to approximately • constant Level 3 output rate • With a combined duty factor of 50%, we are writing at 25 Hz DC, • corresponding to 800 million events a year

Robotic Storage Data Path Raw Data MC Offsite Farms Fermilab Farm Data Handling System ? Offsite Fermilab

Major Software • Trigger algorithms (Level-2/Level-3) and simulation • Filtering through firmware programming and/or (partial) reconstruction, • event building and monitoring. Simulating Level-1 hardware and wrap • Level-2 and Level-3 code offline simulation. • Data management and access • SAM (Sequential Access to data via Meta-data) is used to catalog • data files produced by the experiment and provides distributed data • storage and access to production and analysis sites. • Event reconstruction • Reconstructing all physics object candidates, producing Data • Summary Tape (DST) and Thumbnails (TMB) for further analyses. • Physics and detector simulation • “Off the shelf” event generators to simulate physics processes and • home grown Geant-based (slow) and parameterized (fast) programs • to simulate detector responses. • Event display • Important tools for code development and physics analyses

Central Farm Remote Farm … Linux Desktops (CluEDØ) Computing Architecture Central Data Storage dØmino … DØ Central Analysis Backend (CAB) Remote Analysis

Data events 1.6 billion Raw data 250 kB/event 400 TB DST 150 kB/event 240 TB TMB 10 kB/event 16 TB Monte Carlo (Geant) 200 million MC DST 400 kB/event 80 TB MC TMB 20 kB/event 4 TB Monte Carlo (PMCS) 600 million MC TMB 20 kB/event 12 TB Storage and Disk Needs For two-year running Storage:store all officially produced and user derived datasets at Fermilab robotic tape system  ~1.5 PB Disk:all data and some MC TMBs are disk resident, sufficient disk cache for user files  40+ TB at analysis centers

Analysis CPU Needs CPU needs are estimated based on the layered analysis approach: • DST based: • Resource intensive, limited to physics, object ID, and detector groups • Example: insufficient TMB information, improved algorithms, bug fixes, … • TMB based: • Medium resource required, expect to be done mostly by subgroups • Example: creating derived datasets, direct analyses on TMB, … • Derived datasets: • Individuals done daily on their desktops and/or laptops • Example: Root-tree level analyses, selection optimization, … The CPU needs is about 4 THz for a data sample of two-year running

dØmino … … CluEDØ CAB … Analysis Computing RAC (Regional Analysis Center) • dØmino and its backend (CAB) at Fermilab Computing Center: • provided and managed by Fermilab Computing Division • dØmino is a cluster of SGI O2000 CPUs, provides limited CPU power, • but large disk caches and high performance I/O • CAB is a 160 dual 1.8 GHz AMD CPU Linux farm on dØmino backend, • should provide majority of analysis computing at Fermilab • CluEDØ at DØ: • over 200 Linux desktop PCs from collaborating institutions • managed by “volunteers” from the collaboration • CAB and CluEDØ are expected to provide half of the estimated analysis • CPU needs, the remaining is to be provided by regional analysis centers

Overview of SAM (Sequential Access to data via Meta-Data) Name Server Database Server(s) (Central Database) Global Resource Manager(s) Log server Shared Globally Station 1 Servers Station 3 Servers Local Station n Servers Station 2 Servers Mass Storage System(s) Arrows indicate Control and data flow Shared Locally

Components of a SAM Station /Consumers Producers/ Project Managers Temp Disk Cache Disk MSS or Other Station MSS or Other Station File Storage Server Station & Cache Manager File Storage Clients File Stager(s) Data flow Control Worker nodes

MC site/RAC Analysis site SAM Deployment • The success of SAM data handling system is the first step towards • utilizing offsite computing resources • SAM stations deployed at collaborating institutions provide easy • data storage and access Only most active sites are shown

SAM-GRID SAM-GRID is a Particle Physics Data Grid project. It integrates Job and Information Management (JIM) with the SAM data management system. A ‘first’ version of SAM-GRID is successfully demonstrated at Super Computing 2002 in Baltimore SAM-GRID could be an important job management tool for our offsite analysis efforts

Objectives of SAMGrid • Bring standard grid technologies (Globus and Condor) to the Run II experiments. • Enable globally distributed computing for D0 and CDF. • JIM complements SAM by adding job management and monitoring to data handling. • Together, JIM + SAM = SAMGrid

Principal Functionality • Enable all authorized users to use off-fermi-site computing resources • Provide standard interface for all job submission and monitoring • Jobs can be: 1. Analysis, 2. Reconstruction, 3. Monte Carlo, 4. Generic (vanilla) • JIM v1 features: • Submission to SAM station of user’s choice, • Automatic selection of SAM station based on amount of input data cached at each station • Web-based monitoring

User Interface User Interface Submission Client Submission Client Job Management Match Making Service Match Making Service Broker Queuing System Queuing System Information Collector Information Collector JOB Data Handling System Data Handling System Data Handling System Data Handling System Execution Site #1 Execution Site #n Computing Element Computing Element Computing Element Storage Element Storage Element Storage Element Storage Element Storage Element Grid Sensors Grid Sensors Grid Sensors Grid Sensors Computing Element

Site Requirements • Linux i386 hardware architecture machines • SAM station with working “sam submit” • For MC clusters, mc_runjob installed • Submission and execution sites continuously run SAM and JIM servers with auto-restart procedures provided. • Execution sites can configure their Grid users and batch queues to avoid self inflicted DOS, eg all users mapped to one user “d0grid” with limited resources. • Firewalls must have specific ports open for incoming connections to SAM and JIM. Client hosts may include FNAL and all submission sites. • Execution sites trust grid credentials used by D0, including DOE Science Grid, FNAL KCA, and others by agreement. To use FNAL KCA, Kerberos client must be installed at the submission site.

JIM V1 Deployment • A site can join SAM-Grid with combos of services: • Monitoring • Execution • Submission • April 1, 2003: Expect 5 initial sites for SAMGrid deployment, and 20 submission sites. • May 1, 2003: A few additional sites, depending on success and use of initial deployment. • Summer 2003: Continue to add execution and submission sites. Hope to grow to dozens exe. and hundreds of sub. sites. • CAB is powerful resource at FNAL, but... • Globus software not well supported on IRIX (CAB station server runs on d0mino). • FNAL computer security team restricts Grid jobs to in situ exes, or KCA certificates for user supplied exes.

SAMGrid Dependencies

Expectations from D0 • By March 15: Need 2 volunteers to help set up beta-sites and conduct submission tests. • We expect runjob to be interface with the JIM v1 release to run MC jobs. • In the early stages, it may require ½ FTE at each site to deploy and help troubleshoot and fix problems. • Initial deployment expectations: • GrkdKa – Analysis site • Imperial College and Lancaster – MC sites • U. Michigan (NPACI) – Reconstruction center. • Second round of deployments: Lyon (ccin2p3), Manchester, MSU, Princeton, UTA. • Others include NIKHEF and Prague, need to understand EDG/LCG implications.

How Do We Spell Success? • Figures of merit: • Number of jobs successfully started in remote exe batch queues. • If a job crashes it’s beyond our control. • May be issues related to data delivery that could be included as special failure mode. • How effectively CPU is used at remote sites • May change scheduling algorithm for job submission and/or tune queue config at sites. • Requires cooperation of participating sites • Ease of use and how much work gets done on the Grid.

Case Studies: MGRID and DZero