an overview over online systems at the lhc n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An Overview over Online Systems at the LHC PowerPoint Presentation
Download Presentation
An Overview over Online Systems at the LHC

Loading in 2 Seconds...

play fullscreen
1 / 18

An Overview over Online Systems at the LHC - PowerPoint PPT Presentation


  • 103 Views
  • Uploaded on

An Overview over Online Systems at the LHC. Invited Talk at NSS-MIC 2012 Anaheim CA, 31 October 2012 Beat Jost , Cern. Acknowledgments and Disclaimer. I would like to thank David Francis, Frans Meijers and Pierre vande Vyvre for lots of material on their experiments

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An Overview over Online Systems at the LHC' - jana-wood


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an overview over online systems at the lhc

An Overview over Online Systems at the LHC

Invited Talk at NSS-MIC 2012Anaheim CA, 31 October 2012

Beat Jost , Cern

acknowledgments and disclaimer
Acknowledgments and Disclaimer

I would like to thank David Francis, Frans Meijers and Pierre vande Vyvre for lots of material on their experiments

I would also like to thanks Clara Gaspar and Niko Neufeld for many discussions

There are surely errors and misunderstandings in this presentation which are entirely due to my shortcomings

NSS-MIC Anaheim 31 October 2012

outline
Outline
  • Data Acquisition Systems
    • Front-end Readout
    • Event Building
  • Run Control
    • Tools and Architecture
  • Something New – Deferred Triggering
  • Upgrade Plans

NSS-MIC Anaheim 31 October 2012

role of the online system
Role of the Online System
  • In today’s HEP experiments millions of sensors are distributed over hundreds of m2 and actuated dozens million times per second
  • The data of all these sensors have to be collected and assembled in one point (computer, disk, tape), after rate reduction through event selection
    • This is the Data Acquisition (DAQ) system
  • This process has to be controlled and monitored (by the operator)
    • This is the Run Control System
  • Together they form the Online system

And, by the way, it’s a pre-requisite for any physics analysis

NSS-MIC Anaheim 31 October 2012

setting the scene daq parameters
Setting the Scene – DAQ Parameters

NSS-MIC Anaheim 31 October 2012

a generic lhc daq system
A generic LHC DAQ system

Sensors

On/near Detector

Front-End Electronics

Front-End Electronics

Front-End Electronics

Front-End Electronics

Front-End Electronics

Aggregation

Aggregation/(Zero Suppression)

Off Detector

Zero Suppression/

Data Formatting/Data Buffering

Event Building

Network

HLT Farm

Perm. Storage

  • Today’s data rates are too big to let all the data flow through a single component

NSS-MIC Anaheim 31 October 2012

implementations front end readout
Implementations – Front-End Readout
  • The DAQ System can be viewed like a gigantic funnel collecting the data from the sensors to a single point (CPU, Storage) after selecting interesting events.
  • In general the response of the sensors on the detector are transferred (digitized or analogue) on point-point links to some form of 1st level of concentrators
    • Often there is already a concentrator on the detector electronics, e.g. readout chips for silicon detectors.
    • The more upstream in the system, the more the technologies at this level differ, also within the experiments
      • In LHCb the data of the Vertex detector are transmitted in analogue form to the aggregation layer and digitized there
  • The subsequent level of aggregation is usually also used to buffer the data and format them for the event-builder and High-level trigger
  • Somewhere along the way, Zero suppression is performed

NSS-MIC Anaheim 31 October 2012

readout links of lhc experiments
Readout Links of LHC Experiments

Flow Control

NSS-MIC Anaheim 31 October 2012

implementations event building
Implementations – Event Building
  • Event building is the process of collecting all the data fragments belonging to one trigger in one point, usually the memory of a processor of a farm.
  • Implementation typically using a switched network
    • ATLAS, ALICE and LHCb Ethernet
    • CMS 2 steps, first with Myrinet, second Ethernet
  • Of course the implementations in the different experiments differ in details from the ‘generic’ one, sometimes quite drastically.
    • ATLAS implements an additional level of trigger, thus reducing the overall requirements on the network capacity
    • CMS does event building in two steps; with Myrinet (fibre) and 1 GbE (copper) links
    • ALICE implements the HLT in parallel to the event builder thus allowing bypassing it completely
    • LHCb and ALICE use only one level of aggregation downstream of the Front-End electronics.

NSS-MIC Anaheim 31 October 2012

event building in the lhc experiments
Event Building in the LHC Experiments

NSS-MIC Anaheim 31 October 2012

controls software run control
Controls Software – Run Control
  • The main task of the run control is to guarantee that all components of the readout system are configured in a coherent manner according to the desired DAQ activity.
    • 10000s of electronics components and software processes
    • 100000s of readout sensors
  • Topologically implemented in a deep hierarchical tree-like architecture with the operator at the top
  • In general the configuration process has to be sequenced so that the different components can collaborate properly Finite State Machines (FSM)
  • Inter-Process(or) communication (IPC) is an important ingredient to trigger transitions in the FSMs

NSS-MIC Anaheim 31 October 2012

control tools and architecture
Control Tools and Architecture

ex. LHCb Controls Architecture

DetectorControl

Run Control

NSS-MIC Anaheim 31 October 2012

gui example lhcb run control
GUI Example – LHCb Run Control
  • Main operation panel for the shift crew
  • Each sub-system can (in principle) also be driven independently

NSS-MIC Anaheim 31 October 2012

error recovery and automation
Error Recovery and Automation

Snippet of forward chaining (Big Brother in LHCb):

object: BigBrother

state: READY

when ( LHCb_LHC_Modein_state PHYSICS ) do PREPARE_PHYSICS

when ( LHCb_LHC_Modein_state BEAMLOST ) do PREPARE_BEAMLOST

...

action: PREPARE_PHYSICS

do Goto_PHYSICSLHCb_HV

wait ( LHCb_HV )

move_to READY

action: PREPARE_BEAMLOST

do STOP_TRIGGER LHCb_Autopilot

wait ( LHCb_Autopilot )

if ( VELOMotionin_state {CLOSED,CLOSING} ) then

do Open VELOMotion

endif

do Goto_DUMPLHCb_HV

wait ( LHCb_HV, VELOMotion )

move_to READY

...

  • No system is perfect. There are always things that go wrong
    • E.g. de-synchronisation of some components
  • Two approaches to recovery
    • Forward chaining
      • We’re in the mess. How do we get out of it?
        • ALICE and LHCb: SMI++ automatically acts to recover
        • ATLAS: DAQ Assistant (CLIPS) operator assistance
        • CMS: DAQ Doctor (Perl) gives operator assistance
    • Backward chaining
      • We’re in the mess. How did we get there?
        • ATLAS: Diagnostic and Verification System (DVS)
  • Whatever one does: One needs lots of diagnostics to know what’s going on.

NSS-MIC Anaheim 31 October 2012

summary
Summary
  • All LHC Experiments are taking data with great success
    • All implementations work nicely
    • The systems are coping with the extreme running conditions, sometimes way beyond the original requirements
      • ATLAS and CMS have to cope with upto 40 interactions/bunch crossing (requirement was ~20-25) LHCb ~1.8 interactions instead of 0.4 as foreseen.
      • Significantly bigger event sizes
      • Significantly longer HLT processing
  • Availability of the DAQ systems are above 99%
    • Usually it’s not the DAQ hardware that doesn’t work
  • The automatic recovery procedures implemented keep the overall efficiency typically above 95%, mainly by faster reaction and avoidance of operator mistakes.

NSS-MIC Anaheim 31 October 2012

something new deferred trigger
Something New – deferred Trigger

Farm Node

Yes

MEP buffer full?

MEPrx

No

MEP

Moore

Moore

Moore

Result

Overflow

Reader

DiskWr

OvrWr

NSS-MIC Anaheim 31 October 2012

The inter-fill gaps (dump to stable-beams) of the LHC can be significant (many hours, sometimes days)

During this time the HLT farm is basically idle

The idea is to use this idle CPU time for executing the HLT algorithms on data that was written to a local disk during the operation of the LHC.

deferred trigger experience
Deferred Trigger – Experience

Beam Dump

Beam Dump

Beam Dump

End deferred HLT

Number of files

Start of deferred HLT

Start of deferred HLT

Online troubles

Start of Data taking

Start of Data taking

New fill

Start of Data taking

  • Currently deferring ~25% of the L0 Trigger Rate
    • ~250 kHz triggers
  • Data stored on 1024 nodes equipped with 1TB local disks
  • Great care has to be taken
    • to keep an overview of which nodes hold files of which runs.
    • Events are not duplicated
      • During deferred HLT processing files are deleted from disk as soon as they are opened by the reader
  • Starting and stopping is automated according to the state of the LHC
    • No stress for the shift crew

NSS-MIC Anaheim 31 October 2012

upgrade plans
Upgrade Plans
  • All four LHC experiments have upgrade plans for the nearer or farther future
    • Timescale 2015
      • CMS
        • integration of new point-to-point link (~10 Gbps) to new back-end electronics (in µTCA) of new trigger/detector systems
        • replacement of Myrinet with 10 GbE (TCP/IP) for data aggregation in to PCs and Infiniband (56 Gbps) or 40 GbE for event building
      • ATLAS: merging of L2 and HLT networks and CPUs
      • Each CPU in Farm will run both triggers
    • Timescale 2019
      • ALICE: increase acceptable trigger rate from 1 to 50kHz for Heavy Ion operation
        • New front-end readout link
        • TPC continuous readout
      • LHCb: Elimination of hardware trigger (readout rate 40 MHz)
        • Readout front-end electronics for every bunch crossing
          • New front-end electronics
          • Zero suppression on/near detector
        • Network/Farm capacity increase by factor 40 (3.2 TB/s, ~4000 CPUs)
        • Network technology: Infiniband or 10/40/100 Gb Ethernet
        • No architectural changes
    • Timescale 2022 and beyond
      • CMS&ATLAS: implementation of a HW track trigger running at 40 MHz and surely many other changes…

NSS-MIC Anaheim 31 October 2012