1 / 15

New LHCb Trigger and DAQ Strategy: Gigabit-Ethernet-based System Architecture

This paper presents the system architecture of the new LHCb Trigger and Data Acquisition (DAQ) strategy, which is based on Gigabit-Ethernet technology. It discusses the two software trigger levels, the front-end electronics, event building network, CPU farm, and queuing latencies. The paper also mentions the challenges of context switching latency and scheduling in a multi-tasking OS. The architecture is scalable and utilizes commercial components.

raquelh
Download Presentation

New LHCb Trigger and DAQ Strategy: Gigabit-Ethernet-based System Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The New LHCb Trigger and DAQ Strategy: A System Architecture based on Gigabit-Ethernet RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne for the LHCb Collaboration

  2. LHCb Trigger Niko NEUFELD CERN, EP

  3. Two Software Trigger Levels • Both run on commercial PCs • Level-1 • uses reduced data set: only part of the sub-detectors (mostly Vertex-detector and some tracking) with limited-precision data • has a limited latency, because data need to be buffered in the front-end electronics • reduces event rate from 1.1 MHz to 40 kHz, by selecting events with displaced secondary vertices • High Level Trigger (HLT) • uses all detector information • reduces event rate from 40 kHz to 200 Hz for permanent storage Niko NEUFELD CERN, EP

  4. Features • Two data streams to handle: • Level-1 trigger: 4.8 kB @ 1.1 MHz • High Level Trigger: 38 kB @ 40 kHz • Fully built from commercial components • (Gigabit) Ethernet throughout • Push-through protocol, no re-transmissions • Centralized flow control • Latency control for Level-1 at several stages • Scalable by adding CPUs and/or switch ports Niko NEUFELD CERN, EP

  5. HLTTraffic Level-1Traffic Front-end Electronics FE FE FE FE FE FE FE FE FE FE TRM FE FE 349Links 40 kHz 2.3 GB/s 126-240Links 44 kHz 5.5-11.0 GB/s Multiplexing Layer Switch 31 Switches Switch 62-83 Switches Switch Switch Switch 64-157Links 88 kHz 33 Links 1.7 GB/s Readout Network L1-Decision Sorter TFCSystem 90-153 Links 5.5-10 GB/s StorageSystem Switch Switch Switch Switch Switch 90-153 SFCs SFC SFC SFC SFC SFC CPUFarm ~1400 CPUs CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Level-1 Traffic Gb Ethernet HLT Traffic Mixed Traffic Architecture Niko NEUFELD CERN, EP

  6. Front-end electronics • Separation of Level-1 and HLT paths • two Ethernet links into network • Data must be packaged into IPv4 packets • Must be able to pack several events into “super-events” reduce packet rate into network • Must provide sufficient buffer space to allow for Level-1 trigger algorithm to decide (53 ms total) • Must assign destination, which is centrally distributed (with the trigger system) Niko NEUFELD CERN, EP

  7. Event Building Network • Built from Gigabit Ethernet switches (1000 BaseT a.k.a UTP copper) • Try to optimise link-load ( ~ 80% or 100 MB/s) using (cheap) office switches to multiplex links from front-end • Need a large core switch with ~ 100 x 100 ports  can be built from smaller elements • Need switch with sufficient amount of buffering and good internal congestion control Niko NEUFELD CERN, EP

  8. CPU farm • More than 1000 PCs partitioned into sub-farms consisting of • a Sub-farm Controller (SFC), acting as a gateway into the readout-network • a number of worker CPUs, only known to the sub-farms • The SFC • builds the events from the “super-event” fragments it receives • distributes them among its workers in a load-balancing manner • receives the trigger decisions from workers and • passes them on to permanent storage (HLT events) • passes them to the decision sorter (Level-1 events) Niko NEUFELD CERN, EP

  9. Front-end Electronics Queuing latencies in the network (switch buffers) Queuing in the SFC (“all all nodes are busy with a L1 event”) TRM FE FE FE FE FE FE FE FE FE FE FE FE Switch Switch Switch Switch Switch Multiplexing Layer Reception of event and invocation of trigger algorithm Readout Network L1-Decision Sorter Switch Switch Switch Switch Switch SFC SFC SFC SFC SFC CPUFarm CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU Latencies TFCSystem Niko NEUFELD CERN, EP

  10. Latencies due to queuing in the network or the farm • Latencies in the network can only be estimated from simulation, because it comes from competition between large packets for the same output port (forwarding latency of a packet in a switch is negligible) • Latencies in the sub-farm are due to statistical fluctuations in the Level-1 processing time • Simulation using simulated raw data shows that the amount of events which run into Level-1 time-out because of this is very small ( < 10-4) • Goes down as sub-farms grow in number of workers • This can and will be measured Niko NEUFELD CERN, EP

  11. Context Switching Latency • What is it? • On a multi-tasking OS, whenever the OS switches from one process to another it needs a certain time to do this • Why do we worry? • Because we run the L1 and the HLT algorithms concurrently on each CPU node • Why do we want this concurrency? • We want to use every available CPU cycle Niko NEUFELD CERN, EP

  12. Scheduling and Latency • Using Linux 2.5.55 we have established two facts about the scheduler: • Soft Realtime priorities work: the Level-1 task will never be interrupted until it finishes • The context switch latency is low: < 10.1 ± 0.2 µs • Measurements of this have been done on a high-end server 2.4 GHz PIV Xeon – 400 MHz FSB – we should have machines at least 2x faster in 2007 • Conclusion: the scheme of running both tasks concurrently is sound Niko NEUFELD CERN, EP

  13. System Design • God-given parameters: trigger rates, transport overheads, raw data size distributions per front-end links • Chosen parameters: number of CPUs (1400), average link load (80%), maximum acceptable rate at event-building SFC (80 kHz), packing factor of events into “super-events” for transport (25) • Munch through huge spread-sheet, apply some reasonable rounding and take care of partitioning and voila! Niko NEUFELD CERN, EP

  14. Some Numbers Niko NEUFELD CERN, EP

  15. Summary • LHCb’s new software trigger system operates on the same infrastructure two read-out streams at 40 and 1100 kHz • One event stream requires hard latency restrictions to be obeyed • System is based on Gigabit Ethernet and uses commercial and mostly commodity hardware throughout • The system can be built today and afforded in three years from now Niko NEUFELD CERN, EP

More Related