Overview

The ATLAS Trigger and Data Acquisition:a brief overview of concept, design and realization John Erik Sloper ATLAS TDAQ group CERN - Physics Dept.

Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building & Event filtering • Current status of installation

Introduction • Me: • John Erik Sloper, originally from Bergen, Norway • With the TDAQ group for 3½ years • Computer science background, currently enrolled at university of Warwick, Coventry for a PhD in engineering • Today: • Overview of Trigger and Data AcQuisition (TDAQ) • Practical view point • Using the real ATLAS TDAQ system as an example • We will go through the entire architecture of the TDAQ system from readout to storage • A brief overview of the status of the installation

Data acquisition and triggering • Data acquisition • Gathering • Receiving data from all read-out links for the entire detector • Processing • “Building the events” – Collecting all data that correspond to a single event • Serving the triggering system with data • Storing • Transporting the data to mass storage • Triggering • The trigger has the job of selecting the bunch-crossings of interest for physics analysis, i.e. those containing interactions of interest • Tells the data acquisition system what should be used for further processing

In high-energy particle colliders (e.g. Tevatron, HERA, LHC), the particles in the counter-rotating beams are bunched Bunches cross at regular intervals Interactions only occur during the bunch-crossings In this presentation “event” refers to the record of all the products of a given bunch-crossing The term “event” is not uniquely defined! Some people use the term “event” for the products of a single interaction between the incident particles People sometimes unwittingly use “event” interchangeably to mean different things! What is an “event” anyway?

Trigger menus • Typically, trigger systems select events according to a “trigger menu”, i.e. a list of selection criteria • An event is selected by the trigger if one or more of the criteria are met • Different criteria may correspond to different signatures for the same physics process • Redundant selections lead to high selection efficiency and allow the efficiency of the trigger to be measured from the data • Different criteria may reflect the wish to concurrently select events for a wide range of physics studies • HEP “experiments” — especially those with large general-purpose “detectors” (detector systems) — are really experimental facilities • The menu has to cover the physics channels to be studied, plus additional event samples required to complete the analysis: • Measure backgrounds, check the detector calibration and alignment, etc.

22 m 44 m ATLAS TDAQ Weight: 7000 t

Particle multiplicity • h = rapidity = log(tg/2) (longitudinal dimension) • uch = no. charged particles / unit-h • nch = no. charged particles / interaction • Nch = total no. charged particles / BC • Ntot = total no. particles / BC • nch= uch x h= 6x 7 = 42 • Nch= nch x 23 = ~ 900 • Ntot= Nch x 1.5 = ~ 1400 7.5 m … still much more complex than a LEP event The LHC flushes each detector with ~1400 particles every 25 ns (p-p operation)

Higgs ->4m +30 MinBias The challenge How to extract this… … from this … … at a rate of ONE every 10 thousands billions and without knowing where to look for:the Higgs could be anywhere up to ~1 Tev or even nowhere…

Global requirements • No. overlapping events/25 ns 23 • No. particles in ATLAS/25 ns 1400 • Data throughput • At detectors (40 MHz)(equivalent to)TB/s • --> LVL1 AcceptsO(100) GB/s • --> Mass storageO(100) MB/s

Immediate observations • Very high rate • New data every 25 ns – virtually impossible to make real time decisions at this rate. • Not even time for signals to propagate through electronics • Amount of data • TB/s • Can obviously not be stored directly. No hardware or networks exist (or at least not affordable!) that can handle this amount of data • Even if we could, analyzing all the data would be extremely time consuming The TDAQ system must reduce the amount of data by several order of magnitudes

Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering & Region of interest • Event Building • Event filtering • Current status of installation

22 m Global view • The ATLAS TDAQ architecture is based on a three-level trigger hierarchy • Level 1 • Level 2 • Even filter • It uses a Level 2 selection mechanism based on a subset of event data -> Region-of-Interest • This reduces the amount of data needed to do lvl2 filtering • Therefore, there is a much reduced demand on dataflow power • Note that ATLAS differs from CMS on this point Weight: 7000 t

Nick Ellis, Seminar, Lecce, October 2007 Multi-level triggers • Multi-level triggers provide • Rapid rejection of high-rate backgrounds without incurring (much) dead-time • Fast first-level trigger (custom electronics) • Needs high efficiency, but rejection power can be comparatively modest • High overall rejection power to reduce output to mass storage to affordable rate • Progressive reduction in rate after each stage of selection allows use of more and more complex algorithms at affordable cost • Final stages of selection, running on computer farms, can use comparatively very complex (and hence slow) algorithms to achieve the required overall rejection power

ARCHITECTURE(Functional elements and their connections) Calo MuTrCh Other detectors FE Pipelines Trigger DAQ 40 MHz LV L1 2.5 ms Lvl1 acc

LVL1 selection criteria • Features that distinguish new physics from the bulk of the cross-section for Standard Model processes at hadron colliders are: • In general, the presence of high-pT particles (or jets) • e.g. these may be the products of the decays of new heavy particles • In contrast, most of the particles produced in minimum-bias interactions are soft (pT ~ 1 GeV or less) • More specifically, the presence of high-pTleptons (e, m, t), photons and/or neutrinos • e.g. the products (directly or indirectly) of new heavy particles • These give a clean signature c.f. low-pT hadrons in minimum-bias case, especially if they are “isolated” (i.e. not inside jets) • The presence of known heavy particles • e.g. W and Z bosons may be produced in Higgs particle decays • Leptonic W and Z decays give a very clean signature • Also interesting for physics analysis and detector studies

LVL1 signatures and backgrounds • LVL1 triggers therefore search for • High-pTmuons • Identified beyond calorimeters; need pT cut to control rate from p+mn, K+mn, as well as semi-leptonic beauty and charm decays • High-pT photons • Identified as narrow EM calorimeter clusters; need cut on ET; cuts on isolation and hadronic-energy veto reduce strongly rates from high-pT jets • High-pT electrons • Same as photon (matching track in required in subsequent selection) • High-pTtaus (decaying to hadrons) • Identified as narrow cluster in EM+hadronic calorimeters • High-pT jets • Identified as cluster in EM+hadronic calorimeter — need to cut at very high pT to control rate (jets are dominant high-pT process) • Large missing ET or total scalar ET

Level1 - Only Calorimeters and Muons

Interactions every 25 ns … In 25 ns particles travel 7.5 m 22 m Weight: 7000 t 44 m • Cable length ~100 meters … • In 25 ns signals travel 5 m Level-1 latency Total Level-1 latency = (TOF+cables+processing+distribution) = 2.5 msec For 2.5 msec, all signals must be stored in electronics pipelines (there are 108 channels!)

Calorimetry … … … … … Muon Tr Ch … … … … … … … … Level-1 Accept Global Architecture - Level 1 LHC Beam Xings … DETECTOR Inner Tracker Calorimeters Muon Trigger Muon tracker Level 1 Trigger Data Acquisition FE pipelines Central Trigger Processor

ARCHITECTURE(Functional elements and their connections) Calo MuTrCh Other detectors FE Pipelines ROD ROD ROD Read-Out Links Read-Out Buffers ROB ROB ROB Trigger DAQ 40 MHz D E T RO LV L1 2.5 ms Lvl1 acc Read-Out Drivers Event data

LVL1 trigger rates ATLAS - A Toroidal Lhc ApparatuS - Trig. rate & Ev. size Rates in kHz No safety factor included! LVL1 trigger rate (high lum) = 40 kHzTotal event size = 1.5 MB Total no. ROLs = 1600 Design system for 75 kHz --> 120 GB/s(upgradeable to 100 kHz, 150 GB/s)

ARCHITECTURE(Functional elements and their connections) Region ofInterest ROD ROD ROD Event data 120 GB/s RoI Builder ROB ROB ROB ROIB Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers Read-Out Links Read-Out Buffers

The Level-1 selection is dominated by local signatures Based on coarse granularity (calo, mu trig chamb), w/out access to inner tracking Important further rejection can be gained with local analysis of full detector data Region of Interest - Why? • The geographical addresses of interesting signatures identified by the LVL1 (Regions of Interest) • Allow access to local data of each relevant detector • Sequentially • Typically, there is 1-2 RoI per event accepted by LVL1 • <RoIs/ev> = ~1.6 • The resulting total amount of RoI data is minimal • a few % of the Level-1 throughput

RoI mechanism - Implementation • There is a simple correspondence# region <-> ROB number(s)(for each detector)-> for each RoI the list of ROBs with the corresponding data from each detector is quickly identified (LVL2 processors) • This mechanism provides a powerful and economic way to add an important rejection factor before full Event Building --> the ATLAS RoI-based Level-2 trigger … ~ one order of magnitude smaller ReadOut network … … at the cost of a higher control traffic … 4 RoI -faddresses Note that this example is atypical; the average number of RoIs/ev is ~1.6

ARCHITECTURE(Functional elements and their connections) RoI ROD ROD ROD ROB ROB ROB ROIB L2SV L2N L2P Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data 120 GB/s Read-Out Links RoI requests Read-Out Buffers LVL2 ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems Lvl2 acc There is typically only 1-2 RoI / event Only the RoI Builder and ROB input see the full LVL1 rate(same simple point-to-point link)

decision decision LVL2 Trafic Detector Frontend Level 1 Trigger Readout Buffers LVL2 Supervisor Controls L2 Network L2 Processors

L2 Switch Cross Switch L2 & EB networks - Full system on surface (SDX15) ROS systems USA15 SDX15 144 GbEthernet ~80 … L2SV L2PU L2PU L2PU

Level-2 Trigger Three parameters characterise the RoI-based Level-2 trigger: the amount of data required the overall time budget in L2P the rejection factor Level-2 Trigger Three parameters characterise the RoI-based Level-2 trigger: the amount of data required : 1-2% of total the overall time budget in L2P : 10 ms average the rejection factor : x 30

ARCHITECTURE(Functional elements and their connections) RoI ROD ROD ROD ROB ROB ROB ROIB L2SV L2N L2P Dataflow Manager EB Event Building N/work EBN Sub-Farm Input SFI Event Builder DFM Event Filter N/work EFN Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data = 1-2% 120 GB/s Read-Out Links ~ 10 ms Read-Out Buffers LVL2 RoI requests ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems Lvl2 acc = ~2 kHz

Event Building Detector Frontend Level 1 +2 Trigger Readout Systems DataFlow Controls Builder Networks Manager Event Filter

ARCHITECTURE(Functional elements and their connections) 10’s TB/s(equivalent)100 000 CDs/s 40 MHz RoI ROD ROD ROD 75 kHz 120 GB/s D A T A F L O W H L T ROB ROB ROB ROIB L2SV L2N ~2+4 GB/s L2P Lvl2 acc = ~2 kHz ~2 kHz Dataflow Manager EB Event Building N/work EBN ~ 4 s Event Filter Sub-Farm Input ~4 GB/s SFI Event Builder DFM Event Filter Processors Event Filter N/work EFN EFP EFP EFP EFP Sub-Farm Output EFacc = ~0.2 kHz SFO ~ 300 MB/s -> 1 CD / sec ~ 200 Hz Trigger DAQ Calo MuTrCh Other detectors 40 MHz D E T RO LV L1 FE Pipelines 2.5 ms Lvl1 acc = 75 kHz Read-Out Drivers RoI data = 1-2% 120 GB/s Read-Out Links ~ 40 ms Read-Out Buffers LVL2 RoI requests ROS RoI Builder L2 Supervisor L2 N/work L2 Proc Unit Read-Out Sub-systems

L2 Switch EB Switch Cross Switch SFO L2 & EB networks - Full system on surface (SDX15) ROS systems USA15 SDX15 144 144 GbEthernet ~110 ~80 DFM … SFI L2SV L2PU L2PU L2PU Event Filter GbE switches of this size can be bought today

Overview • Introduction • Challenges & Requirements • ATLAS TDAQ Architecture • Readout and LVL1 • LVL2 Triggering • Event Building • Event filtering • Current status of installation

D E T RO LV L1 2.5 ms ROD ROD ROD ROS D A T A F L O W H L T ~ 10 ms LVL2 ROB ROB ROB ROIB L2SV L2N L2P EB EBN ~ sec Event Filter SFI DFM EFN EFP EFP EFP EFP SFO TDAQ ARCHITECTURE

Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) Read-Out Subsystems (ROSs) First- level trigger Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Trigger / DAQ architecture

Read-Out System (ROS) • 153 ROSs installed in USA15(completed in August 2006) • 149 = 100% foreseen for “core” detectors + 4 hot spares • All fully equipped with ROBINs • PC properties • 4U, 19" rack mountable PC • Motherboard: Supermicro X6DHE-XB • CPU: Uni-proc 3.4 GHz Xeon • RAM: 512 MB • Redundant power supplyNetwork booted (no local hard disk)Remote management via IPMI • Network: • 2 GbE onboard (1 for control network) • 4 GbE on PCI-Express card1 for LVL2 + 1 for event building • Spares: • 12 full systems in pre-series as hot spares • 27 PC purchased (7 for new detectors) • ROBINs(Read-Out Buffer Inputs) • Input :3 ROLs (S-Link I/F) Output : 1 PCI I/F and 1 GbE I/F (for upgrade if needed)Buffer memory :64 Mbytes (~32k fragments per ROL) • ~700 ROBINs installed and spares • + 70 ROBINs ordered (new detectors and more spares) • System is complete : no further purchasing/procurement foreseen

~1800 ~100 ~ 500 Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Second- level trigger SDX1 pROS DataFlow Manager Network switches stores LVL2 output Network switches LVL2 Super- visor Gigabit Ethernet Event data requests Delete commands Requested event data Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3 kHz USA15 Regions Of Interest Data of events accepted by first-level trigger UX15 VME Dedicated links Read- Out Drivers (RODs) RoI Builder First- level trigger Trigger / DAQ architecture USA15 1600 Read- Out Links ~150 PCs Read-Out Subsystems (ROSs) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each

Central Switches • Data network - Event builder traffic - LVL2 traffic • Force10 E1200 - 1 Chassis + blades for 1 and 10 GbE as required For July 2008 :14 blades in 2 chassis~700 GbE ports@line speed Final system :extra blades for final system • Back-end network - To Event Filter and SFO • Force10 E600 - 1 Chassis + blades for 1 GbE For July 2008 :blades as requiredfollowing EF evolution Final system :extra blades for final system • Core online/Control network - Run Control - Databases - Monitoring samplers • Force10 E600 - 1 Chassis + blades for 1 GbE For July 2008 :2 chassis + blades following full system evolution Final system :extra blades for final system

Final size for max L1 rate (TDR) ~ 500 PCs for L2~ 1800 PCs for EF(multi-core technology -> same no. boxes, more applications) • Recently decided: ~ 900 of above will be XPUsconnected to both L2 and EF networks • 161 XPU PCs installed • 130 x 8 cores • CPU: 2 x Intel E5320 quad-core 1.86 GHz • RAM: 1 GB / core, i.e. 8 GB+ 31 x 4 cores • Cold-swappable power supplyNetwork booted - Local hard disk Remote management via IPMI • Network: • 2 GbE onboard, 1 for control network, 1 for data network • VLAN for the connection to both data and back-end networks • For July 2008 :total of 9 L2 + 27 EF racks as from TDR for 2007 run (1 rack = 31 PCs) • Final system :total of 17 L2 + 62 EF racks of which 28 (of 79) racks with XPU connection HLT Farms

DAQ/HLT in SDX1 +~90 racks (mostly empty…)

SDX1 dual-CPU nodes CERN computer centre 6 ~1800 ~100 ~ 500 Local Storage SubFarm Outputs (SFOs) Event Filter (EF) Event Builder SubFarm Inputs (SFIs) LVL2 farm Event rate ~ 200 Hz Second- level trigger Data storage SDX1 pROS DataFlow Manager Network switches stores LVL2 output Network switches LVL2 Super- visor Gigabit Ethernet Event data requests Delete commands Requested event data USA15 Regions Of Interest USA15 Data of events accepted by first-level trigger 1600 Read- Out Links UX15 ~150 PCs VME Dedicated links Read- Out Drivers (RODs) Read-Out Subsystems (ROSs) RoI Builder First- level trigger Timing Trigger Control (TTC) Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each Trigger / DAQ architecture Event data pulled: partial events @ ≤ 100 kHz, full events @ ~ 3 kHz

SubFarm Output • 6 SFO PCs installed • 5+1 hot spare • 5U, 19" rack mountable PC • Motherboard: Intel • CPU: 2 x Intel E5320 dual-core 2.0 GHz • RAM: 4 GB • Quadruple cold-swappable power supplyNetwork bootedLocal hard diskrecovery from crashes pre-load events for testing (e.g. FDR)Remote management via IPMI • Disks: • 3 SATA Raid controllers • 24 disks x 500 GB = 12 TB • Network: • 2 GbE onboard1 for control and IPMI, 1 for data • 3 GbE on PCIe card for data System is complete, no further purchasing foreseen required b/width (300MB/s)already available to facilitate detector commissioning and calibrations in early phase

CABLING!

22 m 44 m TDAQ System operational at Point-1 Routinely used for Debugging and standalone commissioning of all detectors after installation TDAQ Technical Runs - use physics selection algorithms in the HLT farms on simulated data pre-loaded in the Read-Out System Commissioning Runs of integrated ATLAS - take cosmic data through full TDAQ chain (up to Tier-0) after final detectors integration Weight: 7000 t

Overview

Overview

Presentation Transcript

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview

Overview

overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview

Overview