1 / 38

Federico Alessio, w ith inputs from Richard, Ken, Guillaume

Study on buffer usage and data packing at the FE. LHCb Electronics Upgrade Meeting 11 April 2013. Federico Alessio, w ith inputs from Richard, Ken, Guillaume. Scope. Attempt to study : Impact of TFC commands on behaviour of FE buffer in upgraded readout architecture

miron
Download Presentation

Federico Alessio, w ith inputs from Richard, Ken, Guillaume

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Study on buffer usage and data packingat the FE LHCb Electronics Upgrade Meeting 11 April 2013 Federico Alessio, with inputs from Richard, Ken, Guillaume

  2. Scope • Attempt to study: • Impact of TFC commands on behaviour of FE buffer in upgradedreadoutarchitecture • Feasibility of packingalgorithmacross GBT link asspecified in readoutarchitecturespecifications • ThispresentationisNOT intendedto show youhow to pack across the GBT link or how to use the buffer. • However, itIS intended to stimulatediscussionsusing a practicalexample on possiblesolutions/implicationsat FE in the global readoutarchitecture. • Thereis a publishedinternal note whichcontainswhatI’mpresentinghere: LHCb-INT-2013-015 . • Itisnotfinal, itismeantonly to stimulatediscussions  feedbacks! 2

  3. TFC simulationtestbench • First simulationtestbench (est. 2009) developed in VisualElite from Mentor Graphics. Includes: • S-ODIN • SOL40 (only TFC) • LHC clock • LHC fillingscheme • LLT emulation (based on current L0) • Custom-made FE emulationblock • GenericFE emulation • OT-like • CALO-like • No TELL40 emulation, throttleisfaked • Everythingis an HDL entity, portable to othersimulationplatforms • Basically, the aimis to simulate a (very small) slice of the readoutsystem • === Mini-DAQ including FE emulation • Couldaddfew FE channels with differentoccupancies • *onlyproblemissimulation time 3

  4. (Simplified) TFC simulationtestbench 4

  5. FE emulation, why? • Needed to develop a FE emulationblock to simulate the generation of detector data • Used to • study impact of TFC commandsat FE buffer behaviour • demonstratefeasibility of packingmechanismat FE aswritten in specs • emulate FE data generator to spy on sub-detectors for FE reviews….  • Proposed to use itas a practicalexample of a generic FE data generator for the readoutarchitecturesimulationframeworkuntil sub-detectors’ codesbecomeavailable • Description of the code here • Simulationresults • Considerations on packingmechanism • Considerations on buffer usage • Synthesisresults • Practicalproof of howimportant • simulating code is… 5

  6. Generic FE channelas in specs • FE channelcontains a buffer: • No trigger at FE, so buffer isactually a derandomizer. • Used to pipe data @ 40MHz to be packed and sent over GBT link. • If no TFC command and occupancytoo high, buffer willfill up veryveryquickly • We are runningat 40MHz! It’s 40 timesfasterthannow… • Mechanism to empty buffer • TFC commands come in handy • DATA coming out on GBT link: • No emptyspaces, no unexpected 0s • Fullydynamicpackingalgorithmacross GBT frame-width • Wishingly, data should be in order… 6

  7. The code: FE data generator 7

  8. The code: FE buffer manager 8

  9. The code: GBT dynamicpacking Very important to analyze simulation output bit-by-bit and clock-by-clock! 9

  10. The code: configuration • FE generic data generator is fully programmable: • Number of channels associated to GBT link • Width of each channel • Derandomizer depth • Mean occupancy of the channels associated to GBT link • Size of GBT frame (80 bits or WideBus + GBT header 4 bits) • Extremely flexible and easy to configure with parameters • Covers almost all possibilities (almost…) • Including flexible transmission of NZS and ZS • Including TFC commands as defined in specs • Study dependency of FE buffer behaviour with TFC commands • Study effect of packing algorithm on TELL40 • Study synchronization mechanism at beginning of run • Study re-synchronization mechanism when de-synchronized • Etc… etc… etc… • And it is fully synthesizable…  10

  11. Simulationresults • Simulated 11 different scenarios: • fixed GBT size to 80 bits + 4 bits GBT header • fixed width of data header to 24 bits in three fields (12 for BXID, 8 for data size, 4 for info) • fixed width of data channel to 5 bits as practical example • Numbers scale relatively: less occupancy, more number of channels 11

  12. Simulationresults Scenario 1: 10% occupancy, 50x5bits channels, derandomizer depth 75 Scenario 2: 25% occupancy, 50x5bits channels, derandomizer depth 75 12

  13. Simulationresults Scenario 8: 40% occupancy, 32x5bits channels, derandomizer depth 165 Scenario 9: 40% occupancy, 32x5bits channels, derandomizer depth 165 + NO BX VETO sent from TFC 13

  14. Simulationresults Filling scheme TFC commands FE data generated Derandomizer occupancy GBT output For a bit-by-bit zoom in please come to my office  14

  15. Simulationresults For a bit-by-bit zoom in please come to my office or ask the code  15

  16. Synthesisresults • Using Quartus Altera 12.1 SP1 • No synthesis optimization done, let fitter free, no pinout defined, no timing constraint • No memory cells used • Doable, can be further improved though. 16

  17. FYI, simulationoutlook • Simulation should be a coordinated effort • Personal drive in order to be able to produce a (complex) code for TFC on time • FE generic code + TFC code should be merged with TELL40 effort • To test both FE packing algorithm and FE buffer management • To test decoding at TELL40 and investigate consequences/solutions • To analyze effects of TFC commands on global system (including TELL40) • Effort already ongoing between me and Guillaume to do so • We would very very much appreciate to have the code (emulation) of each sub-detectors • a FE generic code is useful to study things on paper, but real code is something different • Proposal is to use this simulation effort to validate FE code • simulation performed by me and Guillaume to investigate solutions, issues in FE 17

  18. Conclusions • Packing mechanism as specified in our document is feasible. • Will be used temporarily to emulate FE generated data in global readout and TFC simulation. • However, very big open questions: • Is your FE compatible with such scheme? What about such code in an ASIC? • Behaviour of FE derandomizer will strongly depend on your compression or suppression mechanism. • If dynamic could create big latencies • If your data does not come out of order can become quite complicated… • Behaviour of FE derandomizer will strongly depend on TFC commands • FE buffer depth should not rely on having a BX VETO! Aim at a bandwidth for fully 40 MHz readout  BX VETO solely to discard events synchronously. • What about SYNCH command? When do you think you can apply it? Ideally after derandomizer and after suppression/compression, but… • How many clock cycles do you need to recover from an NZS event? • Can you handle consecutive NZS events? 18

  19. Qs & As? 19

  20. System and functional requirements • Bidirectionalcommunication network • Clock jitter, and phase and latency control • At the FE, butalsoat TELL40 and between S-TFC boards • Partitioningto allowrunning with any ensemble and parallelpartitions • LHCinterfaces • Eventsrate control • Low-Level-Trigger input • Support for old TTC-baseddistributionsystem • Destination control for the eventpackets • Sub-detectors calibrationtriggers • S-ODIN data bank • Infomationabouttransmittedevents • Test-benchsupport 20

  21. The S-TFC system at a glance • S-ODINresponsible for controllingupgradedreadoutsystem • Distributing timing and synchronouscommands • Manages the dispatching of events to the EFF • Rate regulates the system • Support old TTC system: hybridsystem! STORAGE • SOL40responsible for interfacingFE+TELL40 sliceto S-ODIN • Fan-out TFC information to TELL40 • Fan-in THROTTLE information from TELL40 • Distributes TFC information to FE • Distributes ECS configuration data to FE • Receives ECS monitoring data from FE DATA DATA 21

  22. S-TFC concept reminder 22

  23. The upgraded physical readout slice • Common electronicsboard for upgradedreadoutsystem: Marseille’s ATCA board with 4 AMC cards • S-ODIN AMC card • LLT  AMC card • TELL40  AMC card • LHC Interfaces specific AMC card 23

  24. Latest S-TFC protocol to TELL40 Wewillprovide the TFC decodingblock for the TELL40: VHDL entity with inputs/outputs • «Extended» TFC word to TELL40 via SOL40: •  64 bits sentevery 40 MHz = 2.56 Gb/s (on backplane) •  packed with 8b/10b protocol(i.e. total of 80 bits) •  no dedicated GBT buffer, use ALTERA GX simple 8b/10b encoder/decoder • MEP acceptcommandwhen MEP ready: • Take MEP address and pack to FARM • No need for special address, dynamic Constant latency after BXID • THROTTLE information from each TELL40 to SOL40: • no change: 1 bit for each AMC board + BXID for which the throttlewas set • 16 bits in 8b/10b encoder • same GX buffer asbefore (assame decoder!) 24

  25. S-TFC protocol to FE, no change • TFC word on downlink to FE via SOL40 embedded in GBT word: •  24 bits in each GBT frame every 40 MHz = 0.98 Gb/s •  allcommandsassociated to BXID in TFC word • Put localconfigurabledelays for each TFC command • GBT doesnotsupportindividualdelays for each line • Need for «local» pipelining: detector delays+cables+operationallogic (i.e. laser pulse?) • DATA SHOULD BE TAGGED WITH THE CROSSING TO WHICH IT BELONGS! • TFC word willarrivebefore the actualeventtakesplace • To allow use of commands/resets for particularBXID • Accounting of delays in S-ODIN: for now, 16 clock cyclesearlier + time to receive • Aligned to the furthest FE (simulation, then in situ calibration!) • TFC protocol to FE hasimplications on GBT configuration and ECS to/from FE • seespecsdocument! 25

  26. SODIN firmware v1r0 – blockdiagram 26

  27. Timing distribution • From TFC point of view, weensureconstant: • LATENCY: Alignment with BXID • FINE PHASE: Alignment with best samplingpoint • Some resynchronizationmechanismsenvisaged: • Within TFC boards • With GBT • No impact on FE itself • Loopbackmechanism: • re-transmit TFC word back • allows for latencymeasurement + monitoring of TFC commands and synchronization 27

  28. How to decode TFC in FE chips? FE electronicblock • Use of TFC+ECS GBTsin FE is 100% common to everybody!! • dashedlines indicate the detector specificinterfaceparts • pleasepayparticular care in the clock transmission: the TFC clock must be used by FE to transmit data, i.e. lowjitter! • Kaptoncable, crate, copperbetween FE ASICs and GBTX 28

  29. The TFC+ECS GBT Clock[7:0] External clock reference FEModule • These clocks should be the main clocks for the FE • 8 programmablephases • 4 programmablefrequencies (40,80,160,320 MHz) E – Port GBTX e-Link Phase - Shifter CLK Reference/xPLL E – Port FEModule E – Port ePLLRx GBTIA DEC/DSCR CDR E – Port data-down data-up Phase – Aligners + Ser/Des for E – Ports CLK Manager clock 80, 160 and 320 Mb/s ports GBLD SCR/ENC SER E – Port ePLLTx FEModule E – Port E – Port • Used to: • sample TFC bits • drive Data GBTs • drive FE processes Control Logic Configuration (e-Fuses + reg-Bank) one 80 Mb/s port GBT – SCA JTAG I2C Slave I2C Master E – Port data I2C (light) control clocks JTAG port I2C port 29

  30. The TFC+ECS GBT protocol to FE •  TFC protocolhasdirectimplications in the way in which GBT should be usedeverywhere • 24 e-links @ 80 Mb/s dedicated to TFC word: • use 80 MHz phaseshifter clock to sample TFC parallel word • TFC bits are packed in GBT frame so thattheyall come out on the same clock edge • We can repeat the TFC bits also on consecutive 80 MHz clock edgeifneeded • Leftover 17 e-linksdedicated to GBT-SCAs for ECS configuring and monitoring(seelater) 30

  31. Words come out from GBT at 80 Mb/s • In simplewords: • Odd bits of GBT protocol on risingedgeof 40 MHz clock (first, msb), • Even bits of GBT protocol on fallingedgeof 40 MHz clock (second,lsb) 31

  32. TFC decoding at FE after GBT • Thisiscrucial!! • wecan alreadyspecifywhereeach TFC bit will come out on the GBT chip • thisis the only way in which FE designers stillhaveminimalfreedom with GBT chip • if TFC info waspacked to come out on only 12 e-links (first oddtheneven), thendecoding in FE ASIC would be mandatory! • whichwouldmeanthatthe GBT bus wouldhave to go to each FE ASIC for decoding of TFC command • thereisalso the idea to repeat the TFC bits on even and odd bits in TFC protocol • wouldthat help? • FE couldtielogicalblocksdirectly on GBT pins… 32

  33. Now, what about the ECS part? • Eachpair of bit from ECS field inside GBT can go to a GBT-SCA • OneGBT-SCA isneeded to configure the Data GBTs(EC one for example?) • The rest can go to either FE ASICs or DCS objects(temperature, pressure) via other GBT-SCAs • GBT-SCA chip hasalreadyeverything for us: interfaces, e-linksports .. •  No reason to go for somethingdifferent! • However, «silicon for SCA will come laterthansilicon for GBTX»… •  Weneedsomethingwhilewewait for it! 33

  34. SOL40 encoding block to FE! • Protocol drivers build GBT-SCA packets with addressing scheme and bus type for associated GBT-SCA user busses to selected FE chip • Basically each block will build one of the GBT-SCA supported protocols Memory Mapwith internal addressing scheme for GBT-SCA chips + FE chips addressing, e-link addressing and bus type: content of memory loaded from ECS 34

  35. Usual considerations … • TFC+ECSInterface has the ECS load of an entireFE cluster for configurating and monitoring • 34bits @ 40 MHz = 1.36Gb/son single GBT link • ~180 Gb/s for full TFC+ECSInterface (132 links) • Single CCPC mightbecomebottleneck… • Clara & us, December 2011 • How long to configure FE cluster? • howmany bits / FE? • howmanyFEs/ GBT link? • howmanyFEs / TFC+ECSInterface? •  Numbers to be pinned down soon+ GBT-SCAinterfaces and protocols. 35

  36. Old TTC systemsupport and runningtwosystems in parallel • We already suggested the idea of a hybrid system: • reminder: L0 electronics relying on TTC protocol • part of the system runs with old TTC system • part of the system runs with the new architecture • How? • Need connection between S-ODIN and ODIN (bidirectional) •  use dedicated RTM board on S-ODIN ATCA card • In an early commissioning phase ODIN is the master, S-ODIN is the slave • S-ODIN task would be to distribute new commands to new FE, to new TELL40s, and run processes in parallel to ODIN • ODIN tasks are the ones today + S-ODIN controls the upgraded part • In this configuration, upgraded slice will run at 40 MHz, but positive triggers will come only at maximum 1.1MHz… • Great testbench for development + tests + apprenticeship… • Bi-product: improve LHCb physics programme in 2015-2018… • 3. In the final system, S-ODIN is the master, ODIN is the slave •  ODIN task is only to interface the L0 electronics path to S-ODIN and to • provide clock resets on old TTC protocol 36

  37. S-ODIN on Marseille’s ATCA board 37

  38. TFC+ECSInterface on Marseille’s ATCA board 38

More Related