http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/http://www.staff.uni-mainz.de/uschaefe/browsable/Meeting/2013/neu/ jFEX Mainz Logos ? jFEX (inc. specific software/firmware & Tilecal input signal, options) 30'
jFEX (inc. specific software/firmware & Tilecal input signal, options) 30' Assume we have to cover all that’s in the document, but more details, where we feel it helps. Did I forget to cover any important section - No physics justification • Jet processing general • L1Calo phase 1 scheme • Algorithms • Data replication • jFEX description #4 • Density, fibre count etc. • Input options #4 • Firmware (also software to be mentioned?) • Demonstrators/current designs (GOLD, Topo, MiniPOD/V7) #4 • Schedule, manpower req. jfex input HD+MZ
Jet processing Phase-0 jet system consisting of • Pre-Processor • Analogue signal conditioning • Digitization • Digital signal processing • Jet element pre-summation to 0.2 x 0.2 (η×φ) • Jet processor • Sliding window processor for jet finding • Jet multiplicity determination • Jet feature extraction into L1Topo (pre-phase1) At phase-1: complement with jet feature extractor jFEX • LAr signals optically from digital processor system • Tilecal signals from analogue Pre-Processor / JEP … • … eventually Tilecal optical data off detector, and possible retirement of current L1Calo system
L1Calo Phase-1 System RTM eFEX Hub Hub Hub Opt. Plant ROD ROD ROD From Digital Processing System RTM jFEX Hub Hub Hub ROD ROD ROD L1Topo L1Topo L1Topo JEM JEM JEM CMX CMX CMX PPR PPR JMM JMM JMM CPM CPM CPM CMX CMX CMX New at Phase 1
Algorithms, now and then Sliding window algorithm : Find and disambiguate ROIs Calculate jet energy in differently sized windows (programmable) • Improve granularity by factor of four, to 0.1×0.1 (η×φ) • Slightly increase environment • Allow for flexibility in jet definition (non-square jet shape, Gaussian filter, …) • Fat jets to be calculated from high granularity small jets • Optionally increase jet environment (baseline 0.9 × 0.9) Phase 0 Phase 1 tower 0.2 x 0.2 0.1 x 0.1 ROI 0.4 x 0.4 three jet windows up to 0.8 × 0.8 0.9 × 0.9limited by data duplication
Data replication Sliding window algorithm requiring large scale replication of data • Forward duplication only (fan-out), no re-transmission • Baseline: no replication of any source into more than two sinks • Fan-out in eta (or phi) handled at source only (DPS) • Duplication at the parallel end (on-FPGA), using additional Multi-Gigabit Transceivers • Allowing for differently composed streams • Minimizing latency • Fan-out in phi (or eta) handled at destination only • Baseline “far end PMA loopback” • Looking into details and alternatives
Initial baseline 8+ Modules, each covering full phi, limited eta range • Environment of 0.9 in eta (core bin +/- 4 neighbours) • Each module receives fully duplicated data in eta :1.6 eta worth of data required for a core of 0.8 • 16 eta bins including environment 8 FPGAs per module, each: • Environment 0.9×0.9 • Each FPGA receives fully duplicated data in eta and phi:1.6×1.6 worth of data required for a core of 0.8×0.8 • 256 bins @ 0.1×0.1 in η×φ, e/m + had 512 numbers • With baseline 6.4 , 64 Multi-Gb/s receivers • Hierfasercount
fibre count / density • Erstdurch 2 und dann mal 8 nachoben an slide 7 • 64 * 8 channels per FPGA • Due to full duplication in phi direction, exactly half of all 512 signals are routed into the modules optically on fibres • 256 fibres • 22 × 12-channel opto receivers • 4 × 72-way fibre bundles / MTP connectors • Option • For larger jets window we require larger FPGAs, some more fibres and replication factor > ×2 • Aim at higher line rates (currently FPGAs support 13 Gb/s, microPOD 10 Gb/s) • Allow for even finer granularity / larger jets / smaller FPGA devices : • If digital processor baseline allows for full duplication of 6.4Gb/s signals, the spare capacity, when run at higher rate, can be used to achieve a replication of more than 2-fold, so as to support a larger jet environment.
How to fit on a module ? • ATCA • 8 processors (~XC7VX690T) • 4 microPODs each • fan-out passive or “far end PMA loopback” • Small amount of control logic / non-realtime (ROD) • Nein! Might add 9th processor for consolidation of results • Opto connectors in Zone 3 • Module control !!! • Maximise module payload with help of small-footprint ATCA power brick and tiny IPMC mini-DIMM Z3
jFEX system • Need to handle both fine granularity and large jet environment (minimum 0.9×0.9) • Require high density / high bandwidth per moduleNeed that density to keep input replication factor at acceptable level • ~ 8 modules (+FCAL ?) BildneuerTeil, e,j, e ausgegraut • Single crate go for ATCA shelf / blades: • Sharing infrastructure with eFEX • Handling / splitting of fibre bundles • ROD design • Hub design • RTM Input signals: • Granularity .1×.1 (η×φ) • One electromagnetic, one hadronic tower – nachoben • --- nachunten • Unlike eFEX, no “BCMUX” scheme due to consecutive non-zero data • 6.4 Gb/s line rate, 8b/10b encoding, 128 bit per BC • For now, assume 16bit per tower, 8 towers per fibre
Tilecal input options • List 3 options and mention DPS approach • For following slides/drawings:Merge Sam and HCSC slides • Do we attach preferences, do we talk about advantages/disadvantages ? • My preference: NO!!!
Background • Phase-1 eFEX and jFEX receive digital EM layer data from LAr DPS • But equivalent Tile data path not available before Phase 2 • So: need to extract digital hadronic tower sums produced from the current analog sums sent to L1Calo • Three points where this can be done • See next slide
Alternatives Barrel sector logic Muon Trigger MuCTPi • Can extract Tile tower sums from: • Tile receiver stations • PreProcessor modules • JEM modules in JEP Endcap sector logic Muon detector eFEX Central Trigger L1Topo CTP DPS EM calorimeter digital readout C O R E EM data to FEX jFEX Hadronic data to FEX Tile tower “DPS” 3 2 1 JEP PreProcessor C M X Analog sums from Tile/LAr Receiver stations JEM nMCM CP New/upgraded Hardware C M X CTP output L1Calo Trigger Topological info
Considerations • Latency • Dynamic range • Current L1Calo towers have 8 bit dynamic range with 1GeV/LSB • Would like 9 or 10 bits, if possible • Cost to implement • Risk of disruption to existing system
Option 1: Tile Rx stations • Signals extracted at arrival point in USA15, so latency cost is minimal • Must build a new system to digitize and process analog signals • No constraints on dynamic range • Cost is high – essentially need to build new receiver and PreProcessor systems • High risk of disruption to current L1Calo: • Analog data path ahead of L1Calo rearranged • Where do we fit the new systems that do this?
Option 2: PreProcessor • New MCM (Phase 0) • FPGA based tower processing • Can drive higher-speed data to the LVDS link driver card(blue arrows) • Replacement link card (Phase 1) • Send tower data electrically to CP and JEP (same as now) • An FPGA and parallel-optic transmitter (e.g. minipod) produce hadronic output to FEX • Fiber ribbon takes data from link card to an MTP/MPO output port (probably on front panel) nMCM prototype
Option 2: PreProcessor • Minimal latency: • Essentially equal to option 1; • Can extend dynamic range: • nMCM can drive outputs at higher rates, so more bits per tower possible • ‘Easy’ to get 9 bits, 10 bits probably possible • Relatively low cost • nMCM will already exist • A few (small) LVDS link boards • Possibly need to replace some PreProcessor mother boards (8 layers, low component count) • Low disruption: Only upgrading existing boards
Option 3: JEM Upgrade Upgraded input cards Double-ratetower data from upgraded PPM(960 Mbit/s) High-speed links to FEXfrom input cards to front panel (lowest latency) (hadronic tower sums)
Option 3: JEM upgrade • Higher latency: • Serial transmission from PPr to JEP adds multiple BCs to latency • Limited dynamic range: • BCMUX protocol consumes some bandwidth • 9 bits possible (by removing parity), 10 bits probably not possible • Similar cost to Option 2 • PreProcessornMCM and link cards still get replaced (but not PPr mother boards?) • Plans to upgrade JEM daughter boards anyway • Low disruption: Again, similar to Option 2
Current System to DAQ r/o data RGTM J 2 power LCD (f/o & routing) CP1 ch1 BCMUX LVDS-Tx ch2 FPGA 32x 10 C P ch3 CP2 to CP (LVDS cables) BCMUX LVDS-Tx ch4 480Mb/s FPGA 10 SUM LVDS-Tx FPGA (Spartan-6) to JEP (LVDS cables) MCM #1 FPGA J E P 16x JEP 480Mb/s FPGA MCM #16 PPM Virtex-II 2 L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP
Phase-I: first solution to DAQ r/o data RGTM J 2 power nLCD (f/o & routing) CP1 ch1 BCMUX LVDS-Tx ch2 FPGA 32x 10 C P ch3 CP2 to CP (LVDS cables) BCMUX LVDS-Tx ch4 480Mb/s FPGA MUX + LVDS-Tx to JEP (LVDS cables) FPGA (Spartan-6) MCM #1 FPGA J E P 16x JEP 960Mb/s FPGA MCM #16 PPM Spartan-6/Artix-7 3 L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP
Phase-I: second solution (A) Rear Extension to DAQ r/o data RGTM J 2 power to jFEX (optic fibers) nLCD (f/o & routing) CP1 ch1 SNAP12 BCMUX LVDS-Tx ch2 FPGA 32x 10 C P C P ch3 CP2 to CP (LVDS cables) BCMUX LVDS-Tx ch4 480Mb/s FPGA MUX + LVDS-Tx to JEP (LVDS cables) FPGA (Spartan-6) MCM #1 J E P FPGA J E P 16x JEP FPGA ? 960Mb/s FPGA MCM #16 PPM Spartan-6/Artix-7 Xilinx 7 Series 4 L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP
Phase-I: second solution (B) Rear Extension to DAQ r/o data RGTM J 2 power to jFEX (optic fibers) LCD (f/o & routing) CP1 ch1 SNAP12 BCMUX LVDS-Tx ch2 FPGA 32x 10 C P C P ch3 CP2 to CP (LVDS cables) BCMUX LVDS-Tx FPGA ch4 480Mb/s FPGA 10 SUM LVDS-Tx to JEP (LVDS cables) FPGA (Spartan-6) MCM #1 J E P FPGA J E P 16x JEP 480Mb/s FPGA MCM #16 PPM Virtex-II Xilinx 7 Series 5 L1Calo Weekly Meeting, 10/01/2013 V.Andrei, KIP
Firmware • jFEX • Sliding window algorithm • Infrastructure for high speed links • Module control • DAQ (buffers and embedded ROD functionality, common effort) • Example : JEM based Tilecal inputs • Serialization nMCM 960Mb/s • Serialization JMM 6.4Gb/s • Re-target existing JEM input firmware to new FPGA
Demonstrator projects : “GOLD” Try out technologies and schemes for L1Topo • Fibre input from the backplane (MTP-CPI connectors) • Up to 10Gb/s o/e data paths via industry standard converters on mezzanine • Using mid range FPGAs (XC6VLX240T) up to 6.4Gb/s, 24 channels per device • Typical sort/dφ algorithm (successfully implemented) takes ~ 13% logic resources • Real-time output via opto links on the front panel (currently used as data source for latency measurements etc.) • Will continue to be used as source/sink for L1Topo tests
Recent GOLD results (Virtex-6) • Jitter analysis on cleaned TTC clock (σ = 2.9 ps) • Signal integrity: sampled in several positions along the chain • MGT and o/e converters settings optimization • Bit Error Rate (BER) < 10-16 at 6.4 Gbps / 12 channels • Eye widths above 60 ps (out of 156 ps) • Crosstalk among channels measured in some cases but with negligible effect Sampled after fan-out chip
GOLD latency (Virtex-6) Latency measured along the real-time path in various points at 6.4 Gbs, 16 bit data width, and 8b/10b encoding • Far End PMA loopback: 34 ns latency • Electrical (LVDS) output: 63 ns latency • Far End PCS loopback: 78 ns latency • Parallel loopback in fabric: 86 ns latency Firmware mod for latency measurement including algorithm, and electrical out towards CTP under way (Volker almost got there…)
Topo processor details – post PDR • Real-time path: • 14 fibre-optical 12-way inputs (miniPOD) • Via four 48-way backplane connectors • 4-fold segmented reference clock tree, 3 Xtal clocks each, plus jitter-cleaned LHC bunch clock multiple • Two processors XC7V690T (prototype XC7V485T) • Interlinked by 238-way LVDS path • 12-way (+) optical output to CTP • 32-way electrical (LVDS) output to CTP via mezzanine • Full ATCA compliance / respective circuitry on mezzanine • Module control via Kintex and Zynq processors • Initially via VMEbus extension • Eventually via Kintex or Zynq processor (Ethernet / base interface)
floor plan, so far… • Processor FPGA configuration via SystemACE and through module controller • Module controller configuration via SPI and SD-card • DAQ and ROI interface • Two SFPs, L1Calo style • up to 12 opto fibres (miniPOD) • Hardware to support both L1Calo style ROD interface, and embedded ROD / S-Link interface on these fibres
Tests miniPOD on VC707 • From Eduard • Note : microPOD same o/e engine as miniPOD • (no pics on MicroPOD, probably confidential…)
Schedule • Extract from Ian's Gantt chart, and that’s it? • Check with Ian whether anything to be updated before the session • Also manpower estimates ? • If so, anything to be said in addition to what’s in the document ?
conclusion • The 8-module jFEX seems possible with ~2013’s technology • Key technologies explored already (GOLD, L1Topo,…) • Use of microPODs challenging for thermal and mechanical reasons, but o/e engine is the same as in popular miniPODs • Scheme allows for both fine granularity and large environment at 6.4Gb/s line rate and a limit of 100% duplication of input channels • Rather dense circuitry, but comparable to GOLD demonstrator • For even finer granularity and / or larger jets things get more complicated Need to explore higher transmission rates • DPS needs to handle the required duplication (in eta) Details of fibre organization and content cannot be presented now • Started work on detailed specifications, in parallel exploring higher data rates… • Tilecal signals required in FEXes in fibre-optical format • Three options for generating them • All seem viable, but probably at different cost • Need to agree on a baseline before TDR