
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Design Deployment and Functional Tests of the online Event Filter for the ATLAS experiment PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

. 0. Design Deployment and Functional Tests of the online Event Filter for the ATLAS experiment. 01 11 010 001 1101 1110 11001 01011 110110 001101 1111111 0111000 11101010 01001110 110111001 000101101 1111010001 0101111100 111101001111 010110000101. TDAQ.

Download Presentation

Design Deployment and Functional Tests of the online Event Filter for the ATLAS experiment

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Design deployment and functional tests of the online event filter for the atlas experiment

0

Design

Deployment

and Functional Tests

of the online Event Filter

for the ATLAS experiment

01

11

010

001

1101

1110

11001

01011

110110

001101

1111111

0111000

11101010

01001110

110111001

000101101

1111010001

0101111100

111101001111

010110000101

TDAQ

Andrea Negri, INFN Pavia

on behalf of the ATLAS HLT Group

W

H

Z

t


Atlas t daq system

ATLAS T/DAQ system

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

ROB

ROB

ROB

RoI

LVL2

EF

( )

CM energy14 TeV

Luminosity1034 cm-2s-1

Collision rate40 MHz

Event rate ~1 GHz

Detector channels~108

1 selected event

every million

TDAQ

Latency

Rates

Muon

Calo

Inner

  • Level 1 trigger

    • Hardware based

    • Coarse granularity calo/muon data

40 MHz

Pipeline

Memories

LVL1

~2 ms

~75 kHz

Readout

Drivers

  • Level 2 trigger

    • Detector sub-region processed

    • Full granularity for all subdetectors

    • Fast rejection steering

ROD

ROD

ROD

Readout

Buffers

~1600

~10 ms

~2 kHz

Event builder network

  • Event Filter

    • Full event access

    • “Seeded” by LVL2 result

    • Algorithms inherited from offline

EF farm

~1000 CPUs

~1 s

~200 Hz

Storage: ~ 300 MB/s


Event filter system constraints and requirements

Event Filter system: Constraints and Requirements

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

SubFarm

Input

SFI

SFI

SFI

SFI

EF

SubFarm

SFO

SFO

SFO

SFO

SubFarm

Output

EF

Read out system

  • The computing instrument of the EF is organized as a set of independent subFarms, connected to different output ports of the EB switch

    • Possibility to partition the EF resources and runmultiple concurrent DAQs instances (e.g.: calibration and commissioning purposes)

Event builder network

  • General requirements

    • Scalability, flexibility and modularity

    • Hardware independence in order to follow technology trends

    • Reliability and fault tolerance

      • Avoid data losses

      • Could be critical: EF algorithms inherited from the offline ones

Storage

A common framework for offline and online

and similar reconstruction algorithms

  • Avoids duplication of work

  • Simplify performance/validation studies

  • Avoid selection biases

  • Common database access tools


Design features

Design features

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Remote

Farm

  • Each processing node manages its own connection with the SFI and SFO elements that implement the server part ofthe communication protocol

    • Allows dynamic insertion/removal of sub-farms in the EF or of processing hosts in a sub-farm

    • Allows geographically distributed implementations

    • Supports multiple SFI connections: dynamic re-routing in case of SFI malfunction (depends on the network topology)

    • Avoids single point of failure: a faulty processing host do not interfere with the operations of other sub-farm elements

  • In order to assure data security in case of event processing problems the design has been based on the decoupling between:

Read out system

Event builder network

SFI

SFI

SFI

SFI

SFO

SFO

SFO

SFO

Storage

data processingdata flow functionalities


Dataflow dataprocessing decoupling

DataFlow  DataProcessing decoupling

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

SFI

Incoming

Events

Node n

PT

#1

PTIO

EFD

PT

#n

PTIO

Accepted

Events

SFO

In each EF processing host

  • Data flow functionalities are provided by the Event Filter Dataflow process that:

    • Manages the communication with SFI and SFO

    • Stores the events during their transit in the Event Filter

    • Makes the events available to

  • the Processing Tasks that perform the data processing and event selection operations running the EF algorithms in the standard ATLAS offline framework

    • A pluggable interface (PTIO) allows PTs to access the dataFlow part via a unix domain socket ( )

Data

Flow

Data

Processing


Fault tolerance the sharedheap 1

Fault Tolerance: the sharedHeap (1)

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

SFI

RO map

PT

#1

PTIO

PT

#n

PTIO

SFO

  • When an event enters the processing node it is stored in a shared memory (sharedHeap) used to provide events to the PTs

  • A PT, using the PTIO interface (socket)

    • Requests an event

    • Obtains a pointer to sharedHeap portion that contain the event to be processed(The PTIO maps this portion in memory)

    • Processes the event

    • Communicates back to the EFD the filtering decisions

  • PT cannot corrupt the events because the map is read only

    • Only the EFD manages the sharedHeap

    • If the PT crashes the event is still owned by the EFD,that may assign the event to another PT or force accept it

Ev

x

Node n

SharedHeap

10011101010001001001000100010100011110100010010100100010010100010000100010010101011110000010111001100100100100101001101010100010001000100010010001001010001000010001001010101111000001011100110010010010010100110101010001000100010101010101010001011110100110100111000101000111

Ev

z

Ev

y

EFD


Fault tolerance the sharedheap 2

Fault tolerance: the sharedHeap (2)

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

SFI

PT

#1

PTIO

PT

#n

PTIO

SFO

  • To provide fault tolerance also in case of EFD crash the sharedHeap is implemented as a memory mapped file

    • The OS itself manages directly the actual write operations avoiding useless disk I/O over-heading

  • The raw events can be recovered reloading the sharedHeap file at EFD restart

  • The system could be out of sync only in case of power cut, OS crash or disk failure

    • these occurrences are completely decoupled from the event types and topology and therefore do not entail physics biases on the recorded data

Node n

SharedHeap

10011101010001001001000100010100011110100010010100100010010100010000100010010101011110000010111001100100100100101001101010100010001000100010010001001010001000010001001010101111000001011100110010010010010100110101010001000100010101010101010001011110100110100111000101000111

Ev

z

Ev

y

Ev

x

EFD


Flexibility and modularity

Flexibility and Modularity

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

SFI

SFI

Node n

Implementationexample

EFD

Input

Input

Monitoring

Sorting

PT

#1

PT

#3

PTIO

PTIO

Calibration

ExtPTs

ExtPTs

PT

#2

PTIO

Trash

Output

Output

Output

SFO

SFO

SFO

PTIO

PTIO

PT

#b

PT

#a

Calibration

data

Debugging

channel

Main output

stream

  • The EFD function is divided into different specific tasks that could be dynamically interconnected to form a configurable EF dataflow network

  • The internal dataflow is based on reference passing

    • Only the pointer to the event (stored in the sharedHeap) flows among the different tasks

  • Tasks that implement interfaces to external components are executed by independent threads (Multi Thread design)

    • In order to absorb communication latencies and enhance performance


Functional tests

Functional Tests

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Quad xeon 2.5GHz, 4GB

4000

3600

3200

2800

2400

Dummy PT

Real PT

Memory limit

  • Verified the robustness of the architecture

    • Week long runs (>109 events) without crashes or event losses (even randomly killing PTs)

  • EFD  PT communication mechanism scales with the number of running PTs

  • SFIEFDSFO communication protocol

    • Exploit gigabit links for realistic event sizes

    • Rate limitations for small event sizes (or remote farm implementations)

      • EFD asks for a new event only after the previous one has been received

      • Rate limited by the round trip time

      • Improvements under evaluation

  • Scalability tests carried out on 230 nodes

    • Up to: 21 subFarms, 230 EFDs, 16000 PTs


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

ATLAS Combined Test Beam


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Local

LVL2

farm

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

101010100010001001001000100010110

ROS

ROS

ROS

ROS

ROS

ROS

ROS

ROS

ROS

ROS

ROS

LVL1mu

Pixel

RPC

SCT

TRT

MDT

LAr

Tile

CSC

LVL1calo

TGC

Contains the LVL2 result that steers/seeds the EF processing

pROS

Remote Farms:

Poland

Canada

Denmark

Event

Builder

Infrastructure

tests only

data network (GbE)

DFM

monitoring

run control

gateway

SFI

SFO

Local EF farm

Storage

EF farm @ Meyrin

(few Km)

Test Beam Layout

Muon

Calo

Tracker


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Presenter Main Window

Test Beam Online Event Processing

  • Online event monitoring

    • Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

  • Online event reconstruction

    • E.g.: Track fitting

  • Online event selection

    • Beam composed of m, p, e

    • Track reconstruction in muon chamber allowed the selection of m events

    • Events labelled according to the selection and/or sent to different output streams

  • Validation of the HLT muon slice (work in progress)

    • Transfer LVL2 result to EF (via pROS) and decoding

    • Steering and seeding of the EF algorithm


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Residuals

of segments fit

in muon

chambers

s = 61 mm

mm

Online Event Processing

  • Online event monitoring

    • Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

  • Online event reconstruction

    • E.g.: Track fitting

  • Online event selection

    • Beam composed of m, p, e

    • Track reconstruction in muon chamber allowed the selection of m events

    • Events labelled according to the selection and/or sent to different output streams

  • Validation of the HLT muon slice (work in progress)

    • Transfer LVL2 result to EF (via pROS) and decoding

    • Steering and seeding of the EF algorithm


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

Hits in muon chamber

Energy deposition in calo cells

Online Event Processing

  • Online event monitoring

    • Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

  • Online event reconstruction

    • E.g.: Track fitting

  • Online event selection

    • Beam composed of m, p, e

    • Track reconstruction in muon chamber allowed the selection of m events

    • Events labelled according to the selection and/or sent to different output streams

  • Validation of the HLT muon slice (work in progress)

    • Transfer LVL2 result to EF (via pROS) and decoding

    • Steering and seeding of the EF algorithm


Design deployment and functional tests of the online event filter for the atlas experiment

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

ROS

Local

LVL2 farm

ROS

pROS

data network

ROS

DFM

ROS

Local

EF farm

ROS

SFI

Online Event Processing

  • Online event monitoring

    • Online histograms obtained merging data published by different PTs and gathered by a TDAQ monitoring process (the Gatherer)

  • Online event reconstruction

    • E.g.: Track fitting

  • Online event selection

    • Beam composed of m, p, e

    • Track reconstruction in muon chamber allowed the selection of m events

    • Events labelled according to the selection and/or sent to different output streams

  • Validation of the HLT muon slice (work in progress)

    • Transfer LVL2 result to EF (via pROS) and decoding

    • Steering and seeding of the EF algorithm

L2 Result


Conclusions

Conclusions

INTRODUCTION

DESIGN

FUNCTIONAL TESTS

DEPLOYMENT

CONCLUSIONS

  • Design: EF designed to cope with the challenging on-line requirements

    • Scalable design in order to allow dynamic hot-plug of processing resources, to follow technology trend and to allow geographically distributed implementations

    • High level of data security and fault tolerance via decoupling between data processing and data flow functionalities and the use of memory mapped file

    • Modularity and flexibility in order to allow different EF data-flows

  • Functional tests: design validated on different test beds

    • Proven design robustness, design scalability and data security mechanisms

    • No design limitations observed

  • Deployment on test beam setup

    • Online event processing, reconstruction and selection

    • Online validation of the HLT muon full slice


  • Login