1 / 25

The ATLAS Liquid Argon Calorimeters ReadOut Drivers

The ATLAS Liquid Argon Calorimeters ReadOut Drivers. A 600 MHz TMS320C6414 DSPs based design. The LHC. LHC is an accelerator ring, where the protons beams are accelerated to energy of 7 TeV. The LHC goal will be to have protons from 1 beam collide with the protons from the other.

arav
Download Presentation

The ATLAS Liquid Argon Calorimeters ReadOut Drivers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ATLAS Liquid Argon Calorimeters ReadOut Drivers A 600 MHz TMS320C6414 DSPs based design Julie PRAST, LAPP CNRS, FRANCE

  2. The LHC • LHC is an accelerator ring, where the protons beams are accelerated to energy of 7 TeV. • The LHC goal will be to have protons from 1 beam collide with the protons from the other. • 4 experiments. LHC : Large Hadron Collider (27 km diameter) Julie PRAST, LAPP CNRS, FRANCE

  3. The ATLAS experiment • Goal: explore the fundamental nature of matter and the basic forces that shape our universe. • About the size of a five story building. • Collaboration of 2000 physicists. • 150 universities and laboratories in 34 countries. Julie PRAST, LAPP CNRS, FRANCE

  4. The electromagnetic calorimeter • ATLAS : Several sub-detectors • Electromagnetic calorimeter • Identifies electrons and photons. • Measures energy carried by these particles. • 200 000 cells to be read at 40 MHz. Electromagnetic calorimeter Julie PRAST, LAPP CNRS, FRANCE

  5. The calorimeter electronic chain Timing Trigger Control (TTC) FRONT END ELECTRONICS BACK END ELECTRONICS 1600 optical links Glink DETECTOR 800 Optical links Slink ROD ROB ANALOG MEMORY (SCA) AMPLI 12 Bits ADC Shaping FEB Julie PRAST, LAPP CNRS, FRANCE

  6. The ROD modules • Calculate precise energy and timing of calorimeter signals from discrete time samples (t= 25 ns). • Perform monitoring. • Format data for the following element in the electronics chain. Julie PRAST, LAPP CNRS, FRANCE

  7. The ROD modules goals 200 modules, each receiving data from 1024 calorimeter cells.  Calculate energy for these data using optimal filtering weights: E =  ai (Si - PED)  If E > threshold, calculate timing and pulse quality factor: (< 10% cells) E  =  bi (Si - PED) 2 =  (Si - PED - E gi) 2  Performs histograms of E, , 2, ...  During calibration runs, perform signal averaging to calculate calibration constants for each channel. Julie PRAST, LAPP CNRS, FRANCE

  8. Requirements • The ROD module must be able to process an event in less than 10 µs, including histograms. • Use of commercial programmable processor. • A natural choice is Digital Signal Processor • Efficient power calculation for that kind of algorithm. • High I/O bandwidth. • Modular design. Basic components should be easily changed/upgraded. • Low power consumption. Julie PRAST, LAPP CNRS, FRANCE

  9. The ROD : a 9U VME board Julie PRAST, LAPP CNRS, FRANCE

  10. The ROD Motherboard Julie PRAST, LAPP CNRS, FRANCE

  11. At the beginning of LHC. ROD equipped with half of the PU. Level 1 trigger rate <50 kHz. Data from 4 FEB are routed to one PU. 1 DSP process 256 channels instead of 128. The Staging Mode Julie PRAST, LAPP CNRS, FRANCE

  12. The DSP Processing Unit config TMS320C6414 FIFO 4k*16 Input FPGA Apex 20k160 FEB1 EMIF B 16 16 64 16 EMIFA McBSP2 FEB3 EXT_INT 16 HPI 16 Acex 1k30 McBSP0 McBSP1 TTC TTC interface VME TMS320C6414 Input FPGA Apex 20k160 VME interface FEB2 16 HPI BCID FIFO 4k*16 McBSP0 McBSP1 FEB4 64 EMIFA TType 16 EXT_INT 16 16 McBSP2 EMIF B 16 JTAG config Data stream TTC VME Julie PRAST, LAPP CNRS, FRANCE

  13. The DSP Processing Unit FIFO Output FPGA DSP Input FPGA Julie PRAST, LAPP CNRS, FRANCE

  14. PU Software Summary in ROD out Input data : Serial data in FEB format. Input FPGA : Parallelized data In DSP format DSP : For 128 channels per events E calculation or E, t, 2 Output data : Integer 16 bit E or Integer 16 bit E 32 bit t, 2and gain or 32 bit E 32 bit t, 2and gain Output FPGA : TTC data VME Interface Histograms « Programmable »  Part Fixed part Julie PRAST, LAPP CNRS, FRANCE

  15. Cache Memory 16kB data External Memory Interface Instruction Decoding 64 Registers Cache Memory 16kB data 8 Calculation Units The TMS320C6414 : a last generation DSP from TI CPU Core C64x Central Memory 1MB DMA Controller Périphérals Julie PRAST, LAPP CNRS, FRANCE

  16. The DSP code structure Julie PRAST, LAPP CNRS, FRANCE

  17. DSP Software • Developed with Code Composer Studio. • Whole code written in C language except • Physics loops written in linear assembly and then optimized using CCS. • Code complexity limited • Good legibility and maintenance Julie PRAST, LAPP CNRS, FRANCE

  18. Example of Linear Assembly • Calculation of the cell energy : E=ai(si-p) • Let the compiler do all the laborious work of parallelizing, pipelining and register allocation. a1s1 a2s2+a2s2 a5s5+a5s5 aisi (i=2..5) aisi (i=1..5)E=aisi-aip mpy s1,a1,sa1 dotp2 a23,s23,sa23dotp2 s45,a45,sa45 add sa23,sa45,sa25 add sa1,sa25,sa15 sub sa15,px,e Julie PRAST, LAPP CNRS, FRANCE

  19. DSP software results • Physics calculation of 128 channels : 3.5 s. • Includes all the necessary histograms • , 2 for a fraction of 10 % of high energy cells. • 30 to 40% of time is due to stall cycles. • Cycles lost because data are not in the cache. Julie PRAST, LAPP CNRS, FRANCE

  20. Cache Memory 16kB data CPU Core C64x Central Memory 1MB Périphérals DMA Controller Cache Memory 16kB data The Cache Memory • When a data or instruction is not in the cache memory => 6 stalls cycles until the data is copied from the central memory to the cache. • For the E calculation : 6 data to be read => 36 wait cycles • The cache memory must be understood to ameliorate these numbers. Julie PRAST, LAPP CNRS, FRANCE

  21. Cache Memory 16kB data CPU Core C64x Central Memory 1MB Périphérals DMA Controller Cache Memory 16kB data Which improvements ? • L1D Mapping: • Take care of which data is loaded, from which address and in what order. • L1D Pipelining: • Use of consecutive loads • 1 miss : 6 wait cycles • 2 misses : 8 wait cycles • 4 misses : 12 wait cycles • L1D access optimization • Samples preloading • Interleaved histograms Julie PRAST, LAPP CNRS, FRANCE

  22. DSP software results • Physics calculation of 128 channels : 3.5 s. • Includes all the necessary histograms • , 2 for a fraction of 10 % of high energy cells. • 30 to 40% of time is due to stall cycles. • Cycles lost because data are not in the cache. • The complete code takes about 7 s (600 MHz DSP). • Includes the RTX kernel, synchronization and send tasks, … • 30 % of margin for further improvements. Julie PRAST, LAPP CNRS, FRANCE

  23. Agenda • Mid March : Motherboard + PU assembled • May 2003: Validation in standalone mode. • Fall 2003: System test in the experimentenvironment. • Spring 2004: production launch. • Summer 2004: Boards installation at LHC. Julie PRAST, LAPP CNRS, FRANCE

  24. Conclusion: the ROD • Calculate precise energy and timing of the signals calorimeter. • 1 motherboard and 4 Processing Units. • 1 PU = two 600 MHz TMS320C6414 DSP. • 30 % of margin for future improvements. • 200 ROD to be produced in 2004. Julie PRAST, LAPP CNRS, FRANCE

  25. Thank You Julie PRAST, LAPP CNRS, FRANCE

More Related