1 / 23

JEM FDR: Design and Implementation

JEM FDR: Design and Implementation. JEP system requirements Architecture Modularity Data Formats Data Flow Challenges : Latency Connectivity, high-speed data paths JEM revisions JEM 1.1 - implementation details Daughter modules Energy sum algorithms FPGA resource use Performance

thora
Download Presentation

JEM FDR: Design and Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JEM FDR: Design and Implementation • JEP system requirements • Architecture • Modularity • Data Formats • Data Flow • Challenges : • Latency • Connectivity, high-speed data paths • JEM revisions • JEM 1.1 - implementation details • Daughter modules • Energy sum algorithms • FPGA resource use • Performance • Production tests JEM FDR

  2. JEP system requirements • Process –4.9 < η < 4.9 region • ~32×32×2 = 2k trigger towers of Δη×Δφ=.2×.2 • 9 bit input data (0-511 GeV) • 32x32 10-bit “jet elements” after em/had pre-sum • 2 multiplications per jet element: ET (EX,EY) • 3 Adder trees spanning the JEP (JEMs, CMMs) • Sliding window jet algorithm, variable window size within 3×3 environment • Output data to CTP • Thresholded ET , ET • Jet hit count • Output data to RODs • Intermediate results, mainly captured from module boundaries • RoI data for RoIB JEM FDR

  3. JEP system design considerations • Moderate data processing power • Tough latency requirements • Large amount of signals to be processed  partition into parallel operating modules • Algorithm requiring environment to each jet element  high bandwidth inter-module lanes • Data concentrator functionality, many  few • Severely pin bound design, dominated by input connectivity • Modules • Processors (FPGAs) • Benefit from similarities to cluster processor • Common infrastructure (Backplane) • Common serial link technology JEM FDR

  4. System modularity • Two crates, each processing two quadrants in φ  32 × 8 bins (jet elements) per quad • η range split over 8 JEMs  4 × 8 jet elements per JEM • Four input processors per JEM • Single jet processor per JEM • Single sum processor per JEM JEM FDR

  5. Replication of environment elements - system and crate level - • JEM has 32 core algorithm cells • 4 × 8 jet elements • Directly mapped : 4 PPMs (e,h)  1 JEM • JEM operates on a total of 77 jet elements including ‘environment’ : 7 × 11 • Replication in φ via multiple copies of PPM output data • Replication in ηvia back-plane fan-out JEM FDR

  6. JEM data formats – real-time data • JEM Inputs from PPM: • Physical layer : LVDS, 10 bits, 12-bit encoded w. start/stop bit • D0 odd parity bit • D(9:1) 9 bit data, D1 = LSB= 1 GeV • Jet elements to jet processor: • No parity bit • D(9:0) 10 bit data, D0 = LSB= 1 GeV • 10 data bits muxed to 5 lines, least significant first • Energy sums to sum processor: • No parity bit • ET(11:0) 12 bit data, D0 = LSB= 1 GeV • EX(13:0) 14 bit data, D0 = LSB= .25 GeV • EY(13:0) 14 bit data, D0 = LSB= .25 GeV • JEM output to CMM: • J(23:0) 8 x 3 bit saturating jet hits sent on bottom port • J24 odd parity bit • S(23:0) 3 x 8 bit quad-linear encoded energy sums on top port • 6 bit energy • 2 bit range • Resolution 1GEV, 4 GeV, 16 GeV, 64 GeV • S24 odd parity bit JEM FDR

  7. JEM data formats - readout • Physical layer : 16bits, 20-bit encoded (CIMT, alternating flag bit, fill-frames 1A/1B, HDMP 1022 format) • Event separator : Minimum of 1 fill-frame sent after each event worth of data • All data streams odd parity protected (serial parity) • DAQ readout : 67-long stream per L1A / slice being read out • Input data on D(14:0) : 11 bit per channel, nine bit data, 1 bit parity error, 1 bit link error • 12 bit Bcnum & 25 bit sum & 25 bit jet hits on D15 • RoI readout : 45-long stream per L1A • D(1:0) : total of 8ROIs • 2 bits location & saturation flag & 8 bits threshold passed • D2 : 12 bits Bcnum • D(4:3) : used on FCAL JEMs only (forward jets) • D(15:5) : always zero JEM FDR

  8. JEM data flow 400 Mbit/s serial data (480 Mbit/s with protocol) Multiple protocols and data speeds and signaling levels used throughout board • Multiplexing up and down takes considerable fraction of latency budget • Re-synchronisation of data generally required on each chip and board boundary • FiFo buffers • Phase adjustment w. firmware-based detection • Delay scans LVDS deserialiser 40 MHz parallel Input processor 80 Mb/s 40Mb/s Jet processor+ readout controller Sum processor + readout controller 40 Mb/s 40 Mb/s parallel To CMM Link PHY Link PHY To CMM 640 Mbit/s serial data (800 Mbit/s with protocol) Not synchronous to bunch clock JEM FDR

  9. Challenges : latency & connectivity • Latency budget for energy sum processor:18.5 ticks (TDR) • Input cables : ~2 ticks • CMM : ~ 5 ticks • Transmission to CTP <2 ticks • ~ 9.5 ticks available on JEM from cable connector to backplane outputs to CMM Module dimensions imposed by use of common backplane • Large module : 9U*40cm • Full height of backplane used for data transmission due to high signal count  long high-speed tracks unavoidable  need to use terminated lines throughout  need to properly adjust timing • High input count : 88 differential cables JEM FDR

  10. Connectivity : high-density input cabling • 24 4-pair cable assemblies arranged in 6 blocks of 4 (2 φ bins × em, had) • Same coordinate system now on cables and crate: φ upwards, η left to right (as seen from front) • V cable rotated • Different cablingfor FCAL JEMs re-map FCAL channels in jet FPGA firmware JEM FDR

  11. Connectivity : details of differential data paths • Differential 100Ω termination at sink • 400 (480) Mbit/s input data • Use de-serialisers compatible to DS92LV1021 (LVDS signal level, not DC-balanced) • 88 signals per JEM arriving on shielded parallel pairs • Run via long cables (<15m) and short tracks (few cm) • Require pre-compensation on transmitting end • 640 (800) Mbit/s readout data • PECL level  electro-optical translator • HDMP1022 protocol, 16-bit mode • Use compatible low-power PHY JEM FDR

  12. Connectivity : details of single ended data paths • CMOS signals • point-to-point • 60ΩDCI source termination throughout on all FPGAs • 40Mb/s (25ns) • at 1.5V, no phase control • Energy sum path into sum processor : 40 lines per input processor • General control paths • At 2.5V : CMM merger signals via backplane (phase adjustment on receiving end) • 80Mb/s (12.5ns) at 1.5V : jet elements • 7x11x5bit =385 lines into jet processor • 2x3x11x5bit=330 lines on backplane from/to adjacent modules • Global phase adjustment via TTCrx • All signals latched into jet processor on same clock edge JEM FDR

  13. JEM history • JEM0.0 built from Dec. 2000 • LVDS de-serialiser DS92LV1224 • 11 input processors covering one phi bin each, Spartan2 • Main processor performing jet and energy algorithms, Virtex-E • Control FPGA, ROC, HDMP1022 PHY, coaxial output • Complete failure due to assembly company • JEM 0.x built from Dec. 2003 • Minor design correction wrt to JEM0.0 • New manufacturer (PCB / assembly ) • Fully functional prototype except CAN slow control and FPGA flash configuration • TTC interface not to specs due to lack of final TTCrx chip • Successfully tested all available functionality JEM FDR

  14. JEM 0 11 input processors VME-Interface 2 x HDMP1022 Backplane Conn. Main ROC TTCrx CAN 88 x DS92LV1224 JEM FDR

  15. JEM history (2) • JEM1.0 built in 2003 • All processors Virtex-2 • Input processors on daughter modules (R,S,T,U) • LVDS de-serialiser SCAN921260 (6-channel) • 4 input processors covering three phi bins each • 1 Jet processor on main board • 1 Sum processor on main board • 1 Board control CPLD (CC) • Readout links (PHY & opto) on daughter module (RM) • Flash configurator : system ACE • Slow control / CAN : Fujitsu microcontroller • Successfully tested algorithms and all interfaces • Some tuning required on SystemACE clock • CAN not to new specs (L1Calo common design) JEM FDR

  16. VME CC RM U Sum T TTC Jet CAN S ACE R Flash power History: JEM 1.0 JEM1.0 successfully tested • Algorithms • All interfaces • LVDS in • FIO inter-module links • Merger out • Optical readout • VME • CAN slow control • Mainz, RAL slice test, CERN test beam JEM FDR

  17. JEM 1.1 • JEM1.1 in production now • Identical to JEM 1.0 • Additional daughter module: Control Module (CM) • CAN • VME control • Fan-out of configuration lines • Expected back from assembly soooon JEM FDR

  18. JEM details –main board • 9U*40cm*2mm, bracing bars, ESD strips, shielded b’plane connector • 4 signal layers incl. top, bottom, 2*Vcc, 4*GND  total 10 layers • Micro vias on top, bottom, buried vias • All tracks controlled impedance : controlled / measured by manufacturer • Single ended 60Ω • Differential 100Ω • Point-to-point links only • All hand-routed • 60Ω DCI source termination on processors (CMOS levels) • Power distribution • All circuitry supplied by local step-down regulators, fused 10A (estimated maximum consumption < 5A on any supply, 50W tot.) • 10A capacity, separate 1.5V regulator for daughter modules • Defined ramp-up time (Virtex2 requirement) • staged bypass capacitors, low ESR • VME buffers scannable 3.3V (DTACK: open drain 3*24mA), short stubs on signal lines, 20-75 mm • Vccaux for FPGAs : dedicated quiet 3.3V • Merger signals (directly driven by processors) on 2.5V banks • FPGA core and inter-processor and inter-module links 1.5V JEM FDR

  19. JEM details –main board (2) • Timing • TTC signals terminated and buffered (LVPECL, DC) near backplane • TTCdec module with PLL and crystal clock automatic backup • DESKEW1 bunch clock used as a general purpose clock • Low skew buffers (within TTCdec PLL loop) with series terminators • DESKEW2 clock used for phase-controlled sampling 80Mb/s jet element data (local & FIO) on jet processor only • VME • Synchronised to bunch clock • Sum processor acts as VME controller • Basic pre-configure VME access through CM • Readout located on RM (ROCs on sum and jet processor) • DCS/CAN located on CM (except PHY - near backplane) • Configuration via SystemACE / CF • P2P links to keep ringing at bay • Multiple configurations, slot dependent choice JEM FDR

  20. JEM details –main board (3) • JTAG available on most active components. Separate chains • FPGAs (through SystemACE) • Non-programmable devices on input daughters • TTCdec and Readout Module • Buffers • Control Module • JTAG used for • Connectivity tests at manufacturer & MZ • CPLD configuration • FPGA configuration (ACE) JEM FDR

  21. Input modules • 24 LVDS data channels per module • 12 layer PCB with micro vias • Impedance controlled tracks • 60 Ω single ended • 100 Ω differential • LVDS signals entering via 100Ω differential connector on short tracks (<1cm) • Differential termination close to de-serialiser • 4 × SCAN921260 6-channel de-serialiser • PLL and analogue supply voltage only (3.3V) supplied from backplane • Digital supply from step-down regulator on main board • Reference clock supplied via FPGA • XC2V1500 input processor • 1.5V CMOS 60Ω DCI signals to sum and jet processor • SMBus device for Vcc and temperature monitoring (new) JEM FDR

  22. Readout Module RM 2 channels, 640 Mb/s 16bit  20 bit CIMT coded, fill-frame FF1, alternating flag bit, as defined in HDMP1022 specs • 2xPHY, 2xSFP opto transceiver, so far 2-layer boards • High-speed tracks <1cm • PHYs tested: • HDMP1022 serialiser 2.4W/chip (reference, tested in 16-bit and 20-bit mode) • HDMP1032A serialiser 660mW/chip, €27.86 @ 80pc (16-bit) • TLK1201A serdes 250mW/chip, < €5.00 @ 80pc, uncoded, requires data formatter firmware in ROC (16-bit, 20-bit) • Successfully run off bunch clock • Converted to Xtal clock due to unknown jitter situation on ATLAS TTC clock • Problems with Xtal clock distribution to ROI PHY (RAL, MZ) • RM seems to work with clock linked from DAQ PHY to ROI PHY • Want a local crystal oscillator on RM • Need new iteration of RM (HDMP1032A, TLK1201A) JEM FDR

  23. Control Module CM Combines CAN/DCS, VME pre-configure access and JTAG fanout • CAN • Controller to L1Calo specs now (common design for all processors, see CMM/CPM • Link to main board via SMBus only (Vcc, temperatures) • VME CPLD (pinout error corrected) • generating DTACK for all accesses within module sub-address range to avoid bus timeout • Providing basic access for • FPGA configuration via VME • configuration reset • ACE configuration selection / slot dependent • ACE configuration selection via VME • Buffers for SystemACE-generated JTAG signals to FPGAs • TTCdec parallel initialisation (ID from geographical address) JEM FDR

More Related