1 / 34

Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation

LHC Experiments and the PACI A Partnership for Global Data Analysis. Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation November 29, 2001 http://l3www.cern.ch/~newman/LHCGridsPACI.ppt. Global Data Grid Challenge.

giles
Download Presentation

Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LHC Experiments and the PACIA Partnership for Global Data Analysis Harvey B. Newman, Caltech Advisory Panel on CyberInfrastructure National Science Foundation November 29, 2001 http://l3www.cern.ch/~newman/LHCGridsPACI.ppt

  2. Global Data Grid Challenge “Global scientific communities, served by networks with bandwidths varying by orders of magnitude, need to perform computationally demanding analyses of geographically distributed datasets that will grow by at least 3 orders of magnitude over the next decade, from the 100 Terabyte to the 100 Petabyte scale [from 2000 to 2007]”

  3. The Large Hadron Collider (2006-) • The Next-generation Particle Collider • The largest superconductor installation in the world • Bunch-bunch collisions at 40 MHz,Each generating ~20 interactions • Only one in a trillion may lead to a major physics discovery • Real-time data filtering: Petabytes per second to Gigabytes per second • Accumulated data of many Petabytes/Year Large data samples explored and analyzed by thousands of globally dispersed scientists, in hundreds of teams

  4. Four LHC Experiments: The Petabyte to Exabyte Challenge ATLAS, CMS, ALICE, LHCBHiggs + New particles; Quark-Gluon Plasma; CP Violation • Data stored ~40 Petabytes/Year and UP; CPU 0.30 Petaflops and UP • 0.1 to 1 Exabyte (1 EB = 1018 Bytes) (2007) (~2012 ?) for the LHC Experiments

  5. Evidence for the Higgs at LEP at M~115 GeV The LEP Program Has Now Ended

  6. LHC: Higgs Decay into 4 muons 1000X LEP Data Rate 109 events/sec, selectivity: 1 in 1013 (1 person in a thousand world populations)

  7. Tier2 Center Tier2 Center Tier2 Center Tier2 Center Tier2 Center HPSS HPSS HPSS HPSS LHC Data Grid Hierarchy CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1 ~PByte/sec ~100-400 MBytes/sec Online System Experiment CERN 700k SI95 ~1 PB Disk; Tape Robot Tier 0 +1 HPSS ~2.5 Gbits/sec Tier 1 FNAL: 200k SI95; 600 TB IN2P3 Center INFN Center RAL Center 2.5 Gbps Tier 2 ~2.5 Gbps Tier 3 Institute ~0.25TIPS Institute Institute Institute Physicists work on analysis “channels” Each institute has ~10 physicists working on one or more channels 100 - 1000 Mbits/sec Physics data cache Tier 4 Workstations

  8. TeraGrid:NCSA, ANL, SDSC, Caltech StarLight: Int’l Optical Peering Point (see www.startap.net) A Preview of the Grid Hierarchyand Networks of the LHC Era Abilene Chicago Indianapolis DTF Backplane(4x: 40 Gbps) Urbana Pasadena Starlight / NW Univ UIC I-WIRE San Diego Multiple Carrier Hubs Ill Inst of Tech ANL OC-48 (2.5 Gb/s, Abilene) Univ of Chicago Multiple 10 GbE (Qwest) Indianapolis (Abilene NOC) Multiple 10 GbE (I-WIRE Dark Fiber) NCSA/UIUC • Solid lines in place and/or available in 2001 • Dashed I-WIRE lines planned for Summer 2002 Source: Charlie Catlett, Argonne

  9. Current Grid Challenges: Resource Discovery, Co-Scheduling, Transparency • Discovery and Efficient Co-Scheduling of Computing, Data Handling, and Network Resources • Effective, Consistent Replica Management • Virtual Data: Recomputation Versus Data Transport Decisions • Reduction of Complexity In a “Petascale” World • “GA3”: Global Authentication, Authorization, Allocation • VDT: Transparent Access to Results (and Data When Necessary) • Location Independence of the User Analysis, Grid,and Grid-Development Environments • Seamless Multi-Step Data Processing and Analysis:DAGMan (Wisc), MOP+IMPALA(FNAL)

  10. CMS Production: Event Simulation and Reconstruction Simulation Digitization GDMP Common Prod. tools (IMPALA) No PU PU CERN Fully operational   FNAL   Moscow  In progress INFN   Caltech   UCSD   UFL   Worldwide Productionat 12 Sites Imperial College   Bristol   Wisconsin   IN2P3 Not Op.   Helsinki Not Op. Not Op. “Grid-Enabled” Automated

  11. US CMS TeraGrid Seamless Prototype • Caltech/Wisconsin Condor/NCSA Production • Simple Job Launch from Caltech • Authentication Using Globus Security Infrastructure (GSI) • Resources Identified Using Globus Information Infrastructure (GIS) • CMSIM Jobs (Batches of 100, 12-14 Hours, 100 GB Output) Sent to the Wisconsin Condor Flock Using Condor-G • Output Files Automatically Stored in NCSA Unitree (Gridftp) • ORCA Phase: Read-in and Process Jobs at NCSA • Output Files Automatically Stored in NCSA Unitree • Future: Multiple CMS Sites; Storage in Caltech HPSS Also,Using GDMP (With LBNL’s HRM). • Animated Flow Diagram of the DTF Prototype:http://cmsdoc.cern.ch/~wisniew/infrastructure.html

  12. Baseline BW for the US-CERN Link: HENP Transatlantic WG (DOE+NSF) Transoceanic NetworkingIntegrated with the TeraGrid, Abilene, Regional Netsand Continental NetworkInfrastructuresin US, Europe, Asia, South America US-CERN Plans: 155 Mbps to 2 X 155 Mbps this Year; 622 Mbps in April 2002;DataTAG 2.5 Gbps Research Link in Summer 2002;10 Gbps Research Link in ~2003

  13. Transatlantic Net WG (HN, L. Price) Bandwidth Requirements [*] [*] Installed BW. Maximum Link Occupancy 50% Assumed The Network Challenge is Shared by Both Next- and Present Generation Experiments

  14. Internet2 HENP Networking WG [*]Mission • To help ensure that the required • National and international network infrastructures • Standardized tools and facilities for high performance and end-to-end monitoring and tracking, and • Collaborative systems • are developed and deployed in a timely manner, and used effectively to meet the needs of the US LHC and other major HENP Programs, as well as the general needs of our scientific community. • To carry out these developments in a way that is broadly applicable across many fields, within and beyond the scientific community • [*] Co-Chairs: S. McKee (Michigan), H. Newman (Caltech); With thanks to R. Gardner and J. Williams (Indiana)

  15. Grid R&D: Focal Areas for NPACI/HENP Partnership • Development of Grid-Enabled User Analysis Environments • CLARENS (+IGUANA) Project for Portable Grid-Enabled Event Visualization, Data Processing and Analysis • Object Integration: backed by an ORDBMS, and File-Level Virtual Data Catalogs • Simulation Toolsets for Systems Modeling, Optimization • For example: the MONARC System • Globally Scalable Agent-Based Realtime Information Marshalling Systems • To face the next-generation challenge of DynamicGlobal Grid design and operations • Self-learning (e.g. SONN) optimization • Simulation (Now-Casting) enhanced: to monitor, track and forward predict site, network and global system state • 1-10 Gbps Networking development and global deployment • Work with the TeraGrid, STARLIGHT, Abilene, the iVDGL GGGOC, HENP Internet2 WG, Internet2 E2E, and DataTAG • Global Collaboratory Development: e.g. VRVS, Access Grid

  16. CLARENS: a Data AnalysisPortal to the Grid: Steenberg (Caltech) • A highly functional graphical interface, Grid-enabling the working environment for “non-specialist” physicists’ data analysis • Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence. • The server is implemented in C++ to give access to the CMS OO analysis toolkit. • The server will provide a remote API to Grid tools: • Security services provided by the Grid (GSI) • The Virtual Data Toolkit: Object collection access • Data movement between Tier centers using GSI-FTP • CMS analysis software (ORCA/COBRA) • Current prototype is running on the Caltech Proto-Tier2 • More information at http://heppc22.hep.caltech.edu, along with a web-based demo

  17. Modeling and Simulation:MONARC System • Modelling and understanding current systems, their performance and limitations, is essential for the design of the future large scale distributed processing systems. • The simulation program developed within the MONARC (Models Of Networked Analysis At Regional Centers) project is based on a process oriented approach for discrete event simulation. It is based on the on Java(TM) technology and provides a realistic modelling tool for such large scale distributed systems. SIMULATION of Complex Distributed Systems

  18. MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 9) <E> = 0.83 <E> = 0.73 1MB/s ; 150 ms RTT CERN30 CPUs CALTECH 25 CPUs 1.2 MB/s 150 ms RTT 0.8 MB/s 200 ms RTT NUST 20 CPUs <E> = 0.66 Day = 9

  19. Losses occur when the cwnd is larger than 3,5 Mbyte 2) Fast Recovery (Temporary state to repair the lost) 1) A packet is lost New loss 3) Back to slow start(Fast Recovery couldn’t repair the lostThe packet lost is detected by timeout => go back to slow start cwnd = 2 MSS) Maximizing US-CERN TCP Throughput (S.Ravot, Caltech) TCP Protocol Study: Limits • We determined Precisely • The parameters which limit the throughput over a high-BW, long delay (170 msec) network • How to avoid intrinsic limits; unnecessary packet loss Methods Used to Improve TCP • Linux kernel programming in order to tune TCP parameters • We modified the TCP algorithm • A Linux patch will soon be available Result: The Current State of the Art for Reproducible Throughput • 125 Mbps between CERN and Caltech • 135 Mbps between CERN and Chicago Status: Ready for Tests at Higher BW (622 Mbps) in Spring 2002 Congestion window behavior of a TCP connection over the transatlantic line Reproducible 125 Mbps BetweenCERN and Caltech/CACR

  20. Lookup Discovery Service Lookup Service Lookup Service Service Listener Remote Notification Registration Station Server Station Server Station Server Proxy Exchange Agent-Based Distributed System: JINI Prototype (Caltech/Pakistan) • Includes “Station Servers” (static) that host mobile “Dynamic Services” • Servers are interconnected dynamically to form a fabric in which mobile agents travel, with a payload of physics analysis tasks • Prototype is highly flexible and robust against network outages • Amenable to deployment on leading edge and future portable devices (WAP, iAppliances, etc.) • “The” system for the travelling physicist • The Design and Studies with this prototype use the MONARC Simulator, and build on SONN studies See http://home.cern.ch/clegrand/lia/

  21. Globally Scalable Monitoring Service Lookup Service Discovery Lookup Service Proxy Client (other service) Registration Push & Pull rsh & ssh existing scripts snmp RC Monitor Service • Component Factory • GUI marshaling • Code Transport • RMI data access Farm Monitor Farm Monitor

  22. Examples • GLASTmeeting • 10 participants connected via VRVS (and 16 participants in Audio only) VRVS 7300 Hosts; 4300 Registered Users In 58 Countries 34 Reflectors; 7 In I2 Annual Growth 250% US CMS will use the CDF/KEK remote control room concept for Fermilab Run II as a starting point. However, we will (1) expand the scope to encompass a US based physics group and US LHC accelerator tasks, and (2) extend the concept to a Global Collaboratory for realtime data acquisition + analysis

  23. Next Round Grid Challenges: Global Workflow Monitoring, Management, and Optimization • Workflow Management, Balancing Policy Versus Moment-to-moment Capability to Complete Tasks • Balance High Levels of Usage of Limited Resources Against Better Turnaround Times for Priority Jobs • Goal-Oriented; According to (Yet to be Developed) Metrics • Maintaining a Global View of Resources and System State • Global System Monitoring, Modeling, Quasi-realtime simulation; feedback on the Macro- and Micro-Scales • Adaptive Learning: new paradigms for execution optimization and Decision Support (eventually automated) • Grid-enabled User Environments

  24. PACI, TeraGrid and HENP • The scale, complexity and global extent of the LHC Data Analysis problem is unprecedented • The solution of the problem, using globally distributed Grids, is mission-critical for frontier science and engineering • HENP has a tradition of deploying new highly functional systems (and sometimes new technologies) to meet its technical and ultimately its scientific needs • HENP problems are mostly “embarrassingly” parallel; but potentially “overwhelming” in their data- and network intensiveness • HENP/Computer Science synergy has increased dramatically over the last two years, focused on Data Grids • Successful collaborations in GriPhyN, PPDG, EU Data Grid • The TeraGrid (present and future) and its development program is scoped at an appropriate level of depth and diversity • to tackle the LHC and other “Petascale” problems, over a 5 year time span • matched to the LHC time schedule, with full ops. In 2007

  25. Some Extra Slides Follow

  26. Computing Challenges: LHC Example • Geographical dispersion: of people and resources • Complexity: the detector and the LHC environment • Scale: Tens of Petabytes per year of data 5000+ Physicists 250+ Institutes 60+ Countries • Major challenges associated with: • Communication and collaboration at a distance • Network-distributed computing and data resources • Remote software development and physics analysis • R&D: New Forms of Distributed Systems: Data Grids

  27. Why Worldwide Computing? Regional Center Concept Goals • Managed, fair-shared access for Physicists everywhere • Maximize total funding resources while meeting the total computing and data handling needs • Balance proximity of datasets to large central resources, against regional resources under more local control • Tier-N Model • Efficient network use: higher throughput on short paths • Local > regional > national > international • Utilizing all intellectual resources, in several time zones • CERN, national labs, universities, remote sites • Involving physicists and students at their home institutions • Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by region • And/or by Common Interests (physics topics, subdetectors,…) • Manage the System’s Complexity • Partitioning facility tasks, to manage and focus resources

  28. HENP Related Data Grid Projects Funded Projects • PPDG I USA DOE $ 2M 1999-2001 • GriPhyN USA NSF $ 11.9M + $1.6M 2000-2005 • EU DataGrid EU EC € 10M 2001-2004 • PPDG II (CP) USA DOE $ 9.5M 2001-2004 • iVDGL USA NSF $ 13.7M + $2M 2001-2006 • DataTAG EU EC € 4M 2002-2004 About to be Funded Project • GridPP* UK PPARC >$15M? 2001-2004 Many national projects of interest to HENP • Initiatives in US, UK, Italy, France, NL, Germany, Japan, … • EU networking initiatives (Géant, SURFNet) • US Distributed Terascale Facility: ($53M, 12 TFL, 40 Gb/s network) * = in final stages of approval

  29. Network Progress andIssues for Major Experiments • Network backbones are advancing rapidly to the 10 Gbps range: “Gbps” end-to-end data flows will soon be in demand • These advances are likely to have a profound impacton the major physics Experiments’ Computing Models • We need to work on the technical and political network issues • Share technical knowledge of TCP: Windows, Multiple Streams, OS kernel issues; Provide User Toolset • Getting higher bandwidth to regions outside W. Europe and US: China, Russia, Pakistan, India, Brazil, Chile, Turkey, etc. • Even to enable their collaboration • Advanced integrated applications, such as Data Grids, rely onseamless “transparent” operation of our LANs and WANs • With reliable, quantifiable (monitored), high performance • Networks need to become part of the Grid(s) design • New paradigms of network and system monitoringand use need to be developed, in the Grid context

  30. Grid-Related R&D Projects in CMS: Caltech, FNAL, UCSD, UWisc, UFl • Installation, Configuration and Deployment of Prototype Tier2 Centers at Caltech/UCSD and Florida • Large Scale Automated Distributed Simulation Production • DTF “TeraGrid” (Micro-)Prototype: CIT, Wisconsin Condor, NCSA • Distributed MOnte Carlo Production (MOP): FNAL • “MONARC” Distributed Systems Modeling; Simulation system applications to Grid Hierarchy management • Site configurations, analysis model, workload • Applications to strategy development; e.g. inter-siteload balancing using a “Self Organizing Neural Net” (SONN) • Agent-based System Architecture for DistributedDynamic Services • Grid-Enabled Object Oriented Data Analysis

  31. Muon <0.90> Measurement Mean measured Value ~48MB/s Jet <0.52> Simulation MONARC Simulation System Validation CMS Proto-Tier1 Production Farm at FNAL CMS Farm at CERN

  32. MONARC SONN: 3 Regional Centres Learning to Export Jobs (Day 0) 1MB/s ; 150 ms RTT CERN30 CPUs CALTECH 25 CPUs 1.2 MB/s 150 ms RTT 0.8 MB/s 200 ms RTT NUST 20 CPUs Day = 0

  33. US CMS Remote Control RoomFor LHC

  34. Caltech Tier2 San Diego Tier2    Full Event Database of ~100,000 large objects Full Event Database of ~40,000 large objects Denver Client    Request  Request   Parallel tuned GSI FTP Parallel tuned GSI FTP “Tag” database of ~140,000 small objects Bandwidth Greedy Grid-enabled Object Collection Analysis for Particle Physics (SC2001 Demo) Julian Bunn, Ian Fisk, Koen Holtman, Harvey Newman, James Patton The object of this demo is to show grid-supported interactive physics analysis on a set of 144,000 physics events. Initially we start out with 144,000 small Tag objects, one for each event, on the Denver client machine. We also have 144,000 LARGE objects, containing full event data, divided over the two tier2 servers.  Using local Tag event database, user plots event parameters of interest  User selects subset of events to be fetched for further analysis  Lists of matching events sent to Caltech and San Diego  Tier2 servers begin sorting through databases extracting required events  For each required event, a new large virtual object is materialized in the server-side cache, this object contains all tracks in the event.  The database files containing the new objects are sent to the client using Globus FTP, the client adds them to its local cache of large objects  The user can now plot event parameters not available in the Tag  Future requests take advantage of previously cached large objects in the client http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm

More Related