560 likes | 786 Views
The ATLAS Trigger and Data Acquisition: a brief overview of concept , design and realization John Erik Sloper ATLAS TDAQ group CERN - Physics Dept. Overview. Introduction Challenges & Requirements ATLAS TDAQ Architecture Readout and LVL1 LVL2 Triggering & Region of interest
E N D
The ATLAS Trigger and Data Acquisition:a brief overview of concept, design and realization John Erik Sloper ATLAS TDAQ group CERN - Physics Dept.
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation
Introduction • Me: • John Erik Sloper, originally from Bergen, Norway • With the TDAQ group for 3½ years • Computer science background, currently enrolled at university of Warwick, Coventry for a PhD in engineering • Today: • Overview of Trigger and Data AcQuisition (TDAQ) • Practical view point • Using the real ATLAS TDAQ system as an example • We will go through the entire architecture of the TDAQ system from readout to storage • A brief overview of the status of the installation
Data acquisition and triggering • Data acquisition • Gathering • Receiving data from all read-out links for the entire detector • Processing • “Building the events” – Collecting all data that correspond to a single event • Serving the triggering system with data • Storing • Transporting the data to mass storage • Triggering • The trigger has the job of selecting the bunch-crossings of interest for physics analysis, i.e. those containing interactions of interest • Tells the data acquisition system what should be used for further processing
In high-energy particle colliders (e.g. Tevatron, HERA, LHC), the particles in the counter-rotating beams are bunched Bunches cross at regular intervals Interactions only occur during the bunch-crossings In this presentation “event” refers to the record of all the products of a given bunch-crossing The term “event” is not uniquely defined! Some people use the term “event” for the products of a single interaction between the incident particles People sometimes unwittingly use “event” interchangeably to mean different things! What is an “event” anyway?
Trigger menus • Typically, trigger systems select events according to a “trigger menu”, i.e. a list of selection criteria • An event is selected by the trigger if one or more of the criteria are met • Different criteria may correspond to different signatures for the same physics process • Redundant selections lead to high selection efficiency and allow the efficiency of the trigger to be measured from the data • Different criteria may reflect the wish to concurrently select events for a wide range of physics studies • HEP “experiments” — especially those with large general-purpose “detectors” (detector systems) — are really experimental facilities • The menu has to cover the physics channels to be studied, plus additional event samples required to complete the analysis: • Measure backgrounds, check the detector calibration and alignment, etc.
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation
22 m 44 m ATLAS TDAQ Weight: 7000 t
Particle multiplicity • h = rapidity = log(tg/2) (longitudinal dimension) • uch = no. charged particles / unit-h • nch = no. charged particles / interaction • Nch = total no. charged particles / BC • Ntot = total no. particles / BC • nch= uch x h= 6x 7 = 42 • Nch= nch x 23 = ~ 900 • Ntot= Nch x 1.5 = ~ 1400 7.5 m … still much more complex than a LEP event The LHC flushes each detector with ~1400 particles every 25 ns (p-p operation)
Higgs ->4m +30 MinBias The challenge How to extract this… … from this … … at a rate of ONE every 10 thousands billions and without knowing where to look for:the Higgs could be anywhere up to ~1 Tev or even nowhere…
Global requirements • No. overlapping events/25 ns 23 • No. particles in ATLAS/25 ns 1400 • Data throughput • At detectors (40 MHz)(equivalent to)TB/s • --> LVL1 AcceptsO(100) GB/s • --> Mass storageO(100) MB/s
Immediate observations • Very high rate • New data every 25 ns – virtually impossible to make real time decisions at this rate. • Not even time for signals to propagate through electronics • Amount of data • TB/s • Can obviously not be stored directly. No hardware or networks exist (or at least not affordable!) that can handle this amount of data • Even if we could, analyzing all the data would be extremely time consuming The TDAQ system must reduce the amount of data by several order of magnitudes
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building • Event filtering • Current status of installation
22 m Global view • The ATLAS TDAQ architecture is based on a three-level trigger hierarchy • Level 1 • Level 2 • Even filter • It uses a Level 2 selection mechanism based on a subset of event data -> Region-of-Interest • This reduces the amount of data needed to do lvl2 filtering • Therefore, there is a much reduced demand on dataflow power • Note that ATLAS differs from CMS on this point Weight: 7000 t
Nick Ellis, Seminar, Lecce, October 2007 Multi-level triggers • Multi-level triggers provide • Rapid rejection of high-rate backgrounds without incurring (much) dead-time • Fast first-level trigger (custom electronics) • Needs high efficiency, but rejection power can be comparatively modest • High overall rejection power to reduce output to mass storage to affordable rate • Progressive reduction in rate after each stage of selection allows use of more and more complex algorithms at affordable cost • Final stages of selection, running on computer farms, can use comparatively very complex (and hence slow) algorithms to achieve the required overall rejection power
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation
ARCHITECTURE(Functional elements and their connections) Calo MuTrCh Other detectors FE Pipelines Trigger DAQ 40 MHz LV L1 2.5 ms Lvl1 acc
LVL1 selection criteria • Features that distinguish new physics from the bulk of the cross-section for Standard Model processes at hadron colliders are: • In general, the presence of high-pT particles (or jets) • e.g. these may be the products of the decays of new heavy particles • In contrast, most of the particles produced in minimum-bias interactions are soft (pT ~ 1 GeV or less) • More specifically, the presence of high-pTleptons (e, m, t), photons and/or neutrinos • e.g. the products (directly or indirectly) of new heavy particles • These give a clean signature c.f. low-pT hadrons in minimum-bias case, especially if they are “isolated” (i.e. not inside jets) • The presence of known heavy particles • e.g. W and Z bosons may be produced in Higgs particle decays • Leptonic W and Z decays give a very clean signature • Also interesting for physics analysis and detector studies
LVL1 signatures and backgrounds • LVL1 triggers therefore search for • High-pTmuons • Identified beyond calorimeters; need pT cut to control rate from p+mn, K+mn, as well as semi-leptonic beauty and charm decays • High-pT photons • Identified as narrow EM calorimeter clusters; need cut on ET; cuts on isolation and hadronic-energy veto reduce strongly rates from high-pT jets • High-pT electrons • Same as photon (matching track in required in subsequent selection) • High-pTtaus (decaying to hadrons) • Identified as narrow cluster in EM+hadronic calorimeters • High-pT jets • Identified as cluster in EM+hadronic calorimeter — need to cut at very high pT to control rate (jets are dominant high-pT process) • Large missing ET or total scalar ET
Interactions every 25 ns … In 25 ns particles travel 7.5 m 22 m Weight: 7000 t 44 m • Cable length ~100 meters … • In 25 ns signals travel 5 m Level-1 latency Total Level-1 latency = (TOF+cables+processing+distribution) = 2.5 msec For 2.5 msec, all signals must be stored in electronics pipelines (there are 108 channels!)
Calorimetry … … … … … Muon Tr Ch … … … … … … … … Level-1 Accept Global Architecture - Level 1 LHC Beam Xings … DETECTOR Inner Tracker Calorimeters Muon Trigger Muon tracker Level 1 Trigger Data Acquisition FE pipelines Central Trigger Processor
ARCHITECTURE(Functional elements and their connections) Calo MuTrCh Other detectors FE Pipelines ROD ROD ROD Read-Out Links Read-Out Buffers ROB ROB ROB Trigger DAQ 40 MHz D E T RO LV L1 2.5 ms Lvl1 acc Read-Out Drivers Event data
LVL1 trigger rates ATLAS - A Toroidal Lhc ApparatuS - Trig. rate & Ev. size Rates in kHz No safety factor included! LVL1 trigger rate (high lum) = 40 kHzTotal event size = 1.5 MB Total no. ROLs = 1600 Design system for 75 kHz --> 120 GB/s(upgradeable to 100 kHz, 150 GB/s)
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation
ARCHITECTURE(Functional elements and their connections) Region ofInterest ROD ROD ROD Event data 120 GB/s RoI Builder ROB ROB ROB ROIB Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers Read-Out Links Read-Out Buffers
The Level-1 selection is dominated by local signatures Based on coarse granularity (calo, mu trig chamb), w/out access to inner tracking Important further rejection can be gained with local analysis of full detector data Region of Interest - Why? • The geographical addresses of interesting signatures identified by the LVL1 (Regions of Interest) • Allow access to local data of each relevant detector • Sequentially • Typically, there is 1-2 RoI per event accepted by LVL1 • <RoIs/ev> = ~1.6 • The resulting total amount of RoI data is minimal • a few % of the Level-1 throughput
RoI mechanism - Implementation • There is a simple correspondence# region <-> ROB number(s)(for each detector)-> for each RoI the list of ROBs with the corresponding data from each detector is quickly identified (LVL2 processors) • This mechanism provides a powerful and economic way to add an important rejection factor before full Event Building --> the ATLAS RoI-based Level-2 trigger … ~ one order of magnitude smaller ReadOut network … … at the cost of a higher control traffic … 4 RoI -faddresses Note that this example is atypical; the average number of RoIs/ev is ~1.6
ARCHITECTURE(Functional elements and their connections) RoI ROD ROD ROD ROB ROB ROB ROIB L2SV L2N L2P Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data 120 GB/s Read-Out Links RoI requests Read-Out Buffers LVL2 ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems Lvl2 acc There is typically only 1-2 RoI / event Only the RoI Builder and ROB input see the full LVL1 rate(same simple point-to-point link)
decision decision LVL2 Trafic Detector Frontend Level 1 Trigger Readout Buffers LVL2 Supervisor Controls L2 Network L2 Processors
L2 Switch Cross Switch L2 & EB networks - Full system on surface (SDX15) ROS systems USA15 SDX15 144 GbEthernet ~80 … L2SV L2PU L2PU L2PU
Level-2 Trigger Three parameters characterise the RoI-based Level-2 trigger: the amount of data required the overall time budget in L2P the rejection factor Level-2 Trigger Three parameters characterise the RoI-based Level-2 trigger: the amount of data required : 1-2% of total the overall time budget in L2P : 10 ms average the rejection factor : x 30
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation
ARCHITECTURE(Functional elements and their connections) RoI ROD ROD ROD ROB ROB ROB ROIB L2SV L2N L2P Dataflow Manager EB Event Building N/work EBN Sub-Farm Input SFI Event Builder DFM Event Filter N/work EFN Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data = 1-2% 120 GB/s Read-Out Links ~ 10 ms Read-Out Buffers LVL2 RoI requests ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems Lvl2 acc = ~2 kHz
Event Building Detector Frontend Level 1 +2 Trigger Readout Systems DataFlow Controls Builder Networks Manager Event Filter
ARCHITECTURE(Functional elements and their connections) 10’s TB/s(equivalent)100 000 CDs/s 40 MHz RoI ROD ROD ROD 75 kHz 120 GB/s D A T A F L O W H L T ROB ROB ROB ROIB L2SV L2N ~2+4 GB/s L2P Lvl2 acc = ~2 kHz ~2 kHz Dataflow Manager EB Event Building N/work EBN ~ 4 s Event Filter Sub-Farm Input ~4 GB/s SFI Event Builder DFM Event Filter Processors Event Filter N/work EFN EFP EFP EFP EFP Sub-Farm Output EFacc = ~0.2 kHz SFO ~ 300 MB/s -> 1 CD / sec ~ 200 Hz Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data = 1-2% 120 GB/s Read-Out Links ~ 40 ms Read-Out Buffers LVL2 RoI requests ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems
L2 Switch EB Switch Cross Switch SFO L2 & EB networks - Full system on surface (SDX15) ROS systems USA15 SDX15 144 144 GbEthernet ~110 ~80 DFM … SFI L2SV L2PU L2PU L2PU Event Filter GbE switches of this size can be bought today
Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering • Event Building • Event filtering • Current status of installation
D E T RO LV L1 2.5 ms ROD ROD ROD ROS D A T A F L O W H L T ~ 10 ms LVL2 ROB ROB ROB ROIB L2SV L2N L2P EB EBN ~ sec Event Filter SFI DFM EFN EFP EFP EFP EFP SFO TDAQ ARCHITECTURE
Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) Read-Out Subsystems (ROSs) First- level trigger Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Trigger / DAQ architecture
Read-Out System (ROS) • 153 ROSs installed in USA15(completed in August 2006) • 149 = 100% foreseen for “core” detectors + 4 hot spares • All fully equipped with ROBINs • PC properties • 4U, 19" rack mountable PC • Motherboard: Supermicro X6DHE-XB • CPU: Uni-proc 3.4 GHz Xeon • RAM: 512 MB • Redundant power supplyNetwork booted (no local hard disk)Remote management via IPMI • Network: • 2 GbE onboard (1 for control network) • 4 GbE on PCI-Express card1 for LVL2 + 1 for event building • Spares: • 12 full systems in pre-series as hot spares • 27 PC purchased (7 for new detectors) • ROBINs(Read-Out Buffer Inputs) • Input :3 ROLs (S-Link I/F) Output : 1 PCI I/F and 1 GbE I/F (for upgrade if needed)Buffer memory :64 Mbytes (~32k fragments per ROL) • ~700 ROBINs installed and spares • + 70 ROBINs ordered (new detectors and more spares) • System is complete : no further purchasing/procurement foreseen
~1800 ~100 ~ 500 Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Second- level trigger SDX1 pROS DataFlow Manager Network switches stores LVL2 output Network switches LVL2 Super- visor Gigabit Ethernet Event data requests Delete commands Requested event data Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3 kHz USA15 Regions Of Interest Data of events accepted by first-level trigger UX15 VME Dedicated links Read- Out Drivers (RODs) RoI Builder First- level trigger Trigger / DAQ architecture USA15 1600 Read- Out Links ~150 PCs Read-Out Subsystems (ROSs) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each
Central Switches • Data network - Event builder traffic - LVL2 traffic • Force10 E1200 - 1 Chassis + blades for 1 and 10 GbE as required For July 2008 :14 blades in 2 chassis~700 GbE ports@line speed Final system :extra blades for final system • Back-end network - To Event Filter and SFO • Force10 E600 - 1 Chassis + blades for 1 GbE For July 2008 :blades as requiredfollowing EF evolution Final system :extra blades for final system • Core online/Control network - Run Control - Databases - Monitoring samplers • Force10 E600 - 1 Chassis + blades for 1 GbE For July 2008 :2 chassis + blades following full system evolution Final system :extra blades for final system
Final size for max L1 rate (TDR) ~ 500 PCs for L2~ 1800 PCs for EF(multi-core technology -> same no. boxes, more applications) • Recently decided: ~ 900 of above will be XPUsconnected to both L2 and EF networks • 161 XPU PCs installed • 130 x 8 cores • CPU: 2 x Intel E5320 quad-core 1.86 GHz • RAM: 1 GB / core, i.e. 8 GB+ 31 x 4 cores • Cold-swappable power supplyNetwork booted - Local hard disk Remote management via IPMI • Network: • 2 GbE onboard, 1 for control network, 1 for data network • VLAN for the connection to both data and back-end networks • For July 2008 :total of 9 L2 + 27 EF racks as from TDR for 2007 run (1 rack = 31 PCs) • Final system :total of 17 L2 + 62 EF racks of which 28 (of 79) racks with XPU connection HLT Farms
DAQ/HLT in SDX1 +~90 racks (mostly empty…)
SDX1 dual-CPU nodes CERN computer centre 6 ~1800 ~100 ~ 500 Local Storage SubFarm Outputs (SFOs) Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Event rate ~ 200 Hz Second- level trigger Data storage SDX1 pROS DataFlow Manager Network switches stores LVL2 output Network switches LVL2 Super- visor Gigabit Ethernet Event data requests Delete commands Requested event data USA15 Regions Of Interest USA15 Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) Read-Out Subsystems (ROSs) RoI Builder First- level trigger Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Trigger / DAQ architecture Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3 kHz
SubFarm Output • 6 SFO PCs installed • 5+1 hot spare • 5U, 19" rack mountable PC • Motherboard: Intel • CPU: 2 x Intel E5320 dual-core 2.0 GHz • RAM: 4 GB • Quadruple cold-swappable power supplyNetwork bootedLocal hard diskrecovery from crashes pre-load events for testing (e.g. FDR)Remote management via IPMI • Disks: • 3 SATA Raid controllers • 24 disks x 500 GB = 12 TB • Network: • 2 GbE onboard1 for control and IPMI, 1 for data • 3 GbE on PCIe card for data System is complete, no further purchasing foreseen required b/width (300MB/s)already available to facilitate detector commissioning and calibrations in early phase
22 m 44 m TDAQ System operational at Point-1 Routinely used for Debugging and standalone commissioning of all detectors after installation TDAQ Technical Runs - use physics selection algorithms in the HLT farms on simulated data pre-loaded in the Read-Out System Commissioning Runs of integrated ATLAS - take cosmic data through full TDAQ chain (up to Tier-0) after final detectors integration Weight: 7000 t