290 likes | 482 Views
Condensed Matter. Atomic Physics. Biological Physics. Physics. Grid. Astro physics. Symmetry. Beauty. Nuclear Physics. Large Hadron Collider. Particle Physics. Cosmology. Avoiding Gridlock. Tony Doyle Particle Physics Masterclass Glasgow, 11 June 2009. Outline. The Icemen
E N D
Condensed Matter Atomic Physics Biological Physics Physics Grid Astro physics Symmetry Beauty Nuclear Physics Large Hadron Collider Particle Physics Cosmology
Avoiding Gridlock Tony Doyle Particle Physics Masterclass Glasgow, 11 June 2009
Outline The Icemen Cometh Introduction – Origin - Why? What is the Grid? How does the Grid work? When will it be ready?
The World Wide Web a global information system which users can read and write via computers connected to the Internet “born” on March 13th 1989: A proposal was submitted “information Management” Tim Berners-Lee CERN 1989-91: Development The first three years were a phase of persuasion to get the Web adopted… 1992-1995: Growth the load on the first Web server ("info.cern.ch") rose steadily by a factor of 10 every year… 1996-1998: Commercialisation Google and other search engines 1999-2001: "Dot-com" boom (and bust) 2002-Present: The ubiquitous Web Web 2.0: blogs and RSS Historical Perspective
Data is everywhere… Computer Program 2 3 Read A Read B C = A + B Print C 5 1,000,000,000,000,000,000,000 Bytes 1 zettabyte or 1021 Bytes (~ doubling each year) Q: How much data have humans produced? Q: What is done with the data? Nothing Read it Listen to it Analyse it Watch it Calculate what the weather is going to do Calculate how proteins fold "Job" According to IDC, as of 2006 the total amount of digital data in existence was 0.161 zettabytes; the same paper estimates that by 2010, the rate of digital data generated worldwide will be 0.988 zettabytes per year.
Why “Grid”? • Analogy with the Electricity Power Grid Power Stations Distribution Infrastructure 'Standard Interface'
Computing Grid Computing and Data Centres Fibre Optics of the Internet
Why do particle physicists need the Grid? CERN LHC The world’s most powerful particle accelerator 4 Large Experiments
One year’s data from LHC would fill a stack of CDs 20km high Why do particle physicists need the Grid? Concorde (15 Km) Mt. Blanc (4.8 Km) Example from LHC: starting from this event • ~100,000,000 electronic channels • 800,000,000 proton-proton interactions per second • 0.0002 Higgs per second • 10 PBytes of data a year • (10 Million GBytes = 14 Million CDs) We are looking for this “signature” Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations Or for a needle in 20 million haystacks!
Data Grid The Grid • The Grid enables us to analyse all the data that comes from the LHC • Petabytes • 100,000 CPUs • Distributed around the world • Now used in many other areas
1. Rare Phenomena - Huge Background Why (particularly) the LHC? 2. Complexity All interactions 9 orders of magnitude The HIGGS • “When you are face to face with a difficulty you are up against a discovery” • Lord Kelvin
Four LHC Experiments • ATLAS • general purpose: origin of mass, supersymmetry, micro-black holes? • 2,000 scientists from 34 countries • CMS • general purpose detector • 1,800 scientists from 150 institutes • LHCb • to study the differences between matter and antimatter • producing over 100 million b and b-bar mesons each year • ALICE • heavy ion collisions, to create quark-gluon plasmas • 50,000 particles in each collision • “One Grid to Rule Them All”?
The Challenges I: Real-Time Event Selection 9 orders of magnitude Time In-Time Real-Time
The Challenges II: Real-Time Complexity • Many events • ~109 events/experiment/year • >~1 MB/event raw data • several passes required • Worldwide Grid computing requirement (2008): • ~300 TeraIPS • (100,000 of today’s fastest processors • connected via a Grid) Detectors 16 Million 40 MHz channels 3 Gigacell COLLISION RATE buffers Charge Time Pattern 100 kHz LEVEL - 1 TRIGGER Energy Tracks 1 MegaByte EVENT DATA 1 Terabit/s 200 GigaByte BUFFERS (50000 DATA CHANNELS) 500 Readout memories EVENT BUILDER 500 Gigabit/s Networks 20 TeraIPS EVENT FILTER Gigabit/s PetaByte Grid Computing Service ARCHIVE SERVICE LAN 300 TeraIPS Understand/interpret data via numerically intensive simulations
Solution – Build a Grid • Share more than information • Efficient use of resources at many institutes • Leverage over other sources of funding • Data, computing power, applications • Join local communities • Challenges: • share data between thousands of scientists with multiple interests • link major and minor computer centres • ensure all data accessible anywhere, anytime • grow rapidly, yet remain reliable for more than a decade • cope with different management policies of different centres • ensuredata security • be up and running routinely
Middleware is the Key Your Program Single PC Grid Your Program PROGRAMS MIDDLEWARE User Interface Machine Word/Excel Games Email/Web Resource Broker Information Service OPERATING SYSTEM CPU Replica Catalogue Disks, CPU etc Bookkeeping Service Middleware is the Operating System of a distributed computing system CPU Cluster CPU Cluster CPU Cluster Disk Server
… or this 10 11 0 2 5 6 7 9 3 4 1 8 VOMS-proxy-init LFC gridui JDL Job Submission RB Job Retrieval BDII Job Status? JS Grid Enabled Resources Grid Enabled Resources Grid Enabled Resources Grid Enabled Resources Logging & Bookkeeping Submitter CPU Nodes CPU Nodes CPU Nodes CPU Nodes Storage Storage Storage Storage VOMS WLMS
… or this LCG OSG NDG NGS An open operating system does not only have advantages?
Who do you trust? No-one? It depends on what you want… (assume its scientific collaboration)
Data Structure Physics Models Monte Carlo Truth Data Detector Simulation MC Raw Data Reconstruction MC Event Summary Data MC Event Tags Trigger System Data Acquisition Run Conditions Level 3 trigger Calibration Data Raw Data Trigger Tags Reconstruction Event Summary Data ESD Event Tags REAL and SIMULATED data
Physics Analysis Analysis Object Data Analysis Object Data Analysis Object Data AOD ESD: Data or Monte Carlo Event Tags Event Selection Tier 0,1 Collaboration wide Calibration Data Analysis, Skims INCREASING DATA FLOW Raw Data Tier 2 Analysis Groups Physics Objects Physics Objects Physics Objects Tier 3, 4 Physicists Physics Analysis
Grid Infrastructure CERN computer centre Tier 0 Offline farm RAL,UK Spain Germany Italy France Tier 1 National centres Online system Tier 2 Regional groups ScotGrid NorthGrid SouthGrid London Glasgow Edinburgh Durham Institutes Workstations 11 T1 centres Structure chosen for particle physics. Different for others.
An Example - ScotGrid Just in time for the LHC Machine Room Downstairs
The Grid • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day
1. Why? 2. What? 3. How? 4. When? • From Particle Physics perspective the Grid is: • 1. needed to utilise large-scale computing resources efficiently and securely • 2. a) a working system running today on large resources • b) about seamless discovery of computing resources • c) using evolving standards for interoperation • d) the basis for computing in the 21st Century • 3. Using middleware • Now available – ready for LHC data
Avoiding Gridlock? Avoiding Gridlock provided you have a star network (basis of the internet).. Computing is then almost limitless Avoid computer lockup using a Grid