1 / 18

CMM++ and Phase-1 topics

CMM++ and Phase-1 topics. Samuel Silverstein Stockholm University. G-link readout emulation Firmware collaboration CMM++  TP data format Topological processing in CMM++ vs. TP. Current CMM. VME--. VME CPLD. Glink. DAQ and ROI readout. TTCrx. Glink. Input from JEM/CPM.

parry
Download Presentation

CMM++ and Phase-1 topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMM++ and Phase-1 topics Samuel Silverstein Stockholm University • G-link readout emulation • Firmware collaboration • CMM++  TP data format • Topological processing in CMM++ vs. TP

  2. Current CMM VME-- VME CPLD Glink DAQ and ROI readout TTCrx Glink Input from JEM/CPM LVDS merger links Virtex E (crate) Virtex E (system) Input from JEM/CPM modules (40 MHz) LVDS outputs to CTP 2

  3. CMM++ concept We want CMM++ to be backward-compatible VME-- Glink DAQ and ROI readout Glink From JEM/CPM SNAP12 Virtex 6 FPGA LVDS merger cables SNAP12 12-fiber bundles SNAP12 SNAP12 Legacy LVDS outputs to CTP SNAP12 4-6 optical Tx and/or Rx modules @ 6.4 Gbit/s From JEM/CPM (160 MHz) SNAP12 3

  4. CMM emulation (first try) VME-- VME CPLD Glink DAQ and ROI readout TTCrx Glink Input from JEM/CPM Implement in Virtex 6 (XC6VHX565T-2FF1924) based on existingJet CMM code LVDS merger links Virtex E (crate) Virtex E (system) Input from JEM/CPM modules (40 MHz) LVDS outputs to CTP Note: no SNAP12 transceivers yet 4

  5. Works, but high I/O use • Available I/O (excluding transceivers):640 pins • Real-time data path: • Backplane input (16 x 25): 400 • Cables (3 x 25): +75 • CTP output (2 x 33): +66 = 541 pins • Control and timing: • VME-- from CPLD 35 • TTC (L1A, BCR, deskew 1 and 2) +4 • Crystal clock +1 • clr_pe,rst,rst_dll,pbk_en,can_mc,en_cblout +6 = 46 pins • Readout: • Glink data outputs 2 x 20 40 • DAV pins 2 = 42 pins • Indicator LEDs : 8 + 8 pins • TOTAL: 637 / 640 (FF1924 package)637 Note: FF1923 package has 720 I/Os (but 8 fewer GTX transceivers)

  6. Emulating Glinks in the FPGA saves about 40 pins VME-- VME CPLD Glink DAQ and ROI readout TTCrx Glink Input from JEM/CPM Readout/Glink block LVDS merger links Virtex E (crate) Virtex E (system) Input from JEM/CPM modules (40 MHz) LVDS outputs to CTP 6

  7. G-link emulation in Virtex 6 DAQ or ROI data word Encodedword G-link encoder 24b  8b multiplexer GTX TX 20b 960 MHz 24b 8b DAV(data available) 40 MHz 40 MHz 120 MHz 120 MHz 7

  8. CMM++ readout block daq_in gdaq 20 DAQ daq_dav daq_reset roi_in groi 20 ROI roi_dav roi_reset daq_locked Clk40 roi_locked Clk120 Easy to drop into top-level design 8

  9. First simulation results non-zero data link lock DAV asserted Reset empty frames sending zeros

  10. There is still work to do... • Implement in ISE, study full timing simulation • Implement on Virtex-6 development board, study timing and jitter • Test with real 9U ROD?

  11. Firmware collaboration • SVN repository • Latest "day-1" jet algo and jet-CMM firmware projects now uploaded to CERN SVN: • https://svnweb.cern.ch/cern/wsvn/atl1calo/firmware/ • Need to do same for upgrade • Full designs (e.g. Jet CMM++) • "Standalone" modules (Glink readout, TP algorithms, etc) • Where? • Also in atl1calo/firmware directory? • Create a separate directory tree or repository? • Reminder: VHDL style guide for L1Calo: • https://edms.cern.ch/file/1100579/0.5/VHDL_guidlines.doc

  12. CMM++  TP data formats Original paradigm: Send full backplane contents immediately to the TP, unprocessed TP From backplane CMM++ ... 16 modules x 24 bits x 160 Mbit/s = 61.44 Gbit/s 12 fibers x 5120 Mbit/s = 61.44 Gbit/s Matched bandwidths 12

  13. Problems with this paradigm • Large total bandwidth to TP • 10 - 12 CMM++ modules  120 - 144 fibers • Not including other possible inputs, for instance muons • Most backplane data will be empty (at least in p-p) • Backplane format accommodates worst-case ROI multiplicity for a CPM/JEM • A single FPGA cannot receive this many links • Must first reduce size of data set before executing "global" algorithms on full event topology • Thus our current quadrant/system paradigm • Can we do better?

  14. Idea: reduce data volume already in the CMM++ • Have each CMM++ produce a zero-suppressed list of ROIs : • Absolute coordinates: 9 bits for jets, 13 bits for clusters • 8 and 11 bits, respectively, if crate number implicit • Thresholds: 8 bits per ROI • Additional information (energy, etc): 10-12 bits per ROI • Total: ~31 bits per ROI • One 6.4 Gbit/s fiber has a 128-bit data payload per BC • Up to four ROIs per fiber per BC • 4 fibers  16 ROIs / CMM++ (3 data reduction) • 6 fibers  24 ROIs / CMM++ (2 data reduction) • Use bit 32 to signal "last ROI" • Lets us know if max number of ROIs was exceeded

  15. With 6 fibers per CMM++, one FPGA can receive all L1Calo data Example: CMM++ only solution (TP also simplified) EM Energy Jet EM EM/ EM/ CP1 C M M + + C M M + + C M M + + JEP0 C M M + + C M M + + CP0 C M M + + CLUSTER JET ENERGY LVDS C M M + + CP3 C M M + + C M M + + JEP1 (TP) C M M + + CP2 C M M + + C M M + + ~56 L1Calo fibers to "TP" (48 CP + 6 Jet + 2 Energy)

  16. TP implementation • Two baseline options: • Designated CMM++ in system • Topological processor (TP) crate • TP crate is the preferred choice • Scalable: several FPGAs could perform algorithms in parallel • Room for more inputs (e.g. muons) • Prepares us better for Phase II • But should keep CMM++ option open • If TP is delayed, for instance

  17. Implications of CMM++ option • Algorithm limitations • If all L1Calo ROIs available, can probably perform any L1Calo-only algorithm • But only one FPGA, so we don't have unlimited resources to perform many algorithms • Latency • Probably no significant difference

  18. Implications of CMM++ option • Readout • Probably will read out TP to 9U RODs (Glink) • TP-CMM must read out crate-level CMM++ I/O plus TP inputs/outputs • Need two sets of G-link readout ports on CMM++ • Readout FIFOs • 160 MHz backplane format gives 1600 bits/BC, plus cables and outputs, to be pipelined for CMM++ readout • TP will also have many bits of input data to be pipelined and read out. • So a combined CMM++ and TP FPGA might need to use a lot of its memory resources just for readout. • Add external pipeline memories?

More Related