federico alessio w ith inputs from richard ken guillaume
Skip this Video
Download Presentation
Federico Alessio, w ith inputs from Richard, Ken, Guillaume

Loading in 2 Seconds...

play fullscreen
1 / 38

Federico Alessio, w ith inputs from Richard, Ken, Guillaume - PowerPoint PPT Presentation

  • Uploaded on

Study on buffer usage and data packing at the FE. LHCb Electronics Upgrade Meeting 11 April 2013. Federico Alessio, w ith inputs from Richard, Ken, Guillaume. Scope. Attempt to study : Impact of TFC commands on behaviour of FE buffer in upgraded readout architecture

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Federico Alessio, w ith inputs from Richard, Ken, Guillaume' - miron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
federico alessio w ith inputs from richard ken guillaume

Study on buffer usage and data packingat the FE

LHCb Electronics Upgrade Meeting

11 April 2013

Federico Alessio,

with inputs from Richard, Ken, Guillaume



  • Attempt to study:
  • Impact of TFC commands on behaviour of FE buffer in upgradedreadoutarchitecture
  • Feasibility of packingalgorithmacross GBT link asspecified in readoutarchitecturespecifications
  • ThispresentationisNOT intendedto show youhow to pack across the GBT link or how to use the buffer.
  • However, itIS intended to stimulatediscussionsusing a practicalexample on possiblesolutions/implicationsat FE in the global readoutarchitecture.
  • Thereis a publishedinternal note whichcontainswhatI’mpresentinghere: LHCb-INT-2013-015 .
  • Itisnotfinal, itismeantonly to stimulatediscussions  feedbacks!



TFC simulationtestbench

  • First simulationtestbench (est. 2009) developed in VisualElite from Mentor Graphics. Includes:
    • S-ODIN
    • SOL40 (only TFC)
    • LHC clock
    • LHC fillingscheme
    • LLT emulation (based on current L0)
    • Custom-made FE emulationblock
      • GenericFE emulation
      • OT-like
      • CALO-like
    • No TELL40 emulation, throttleisfaked
    • Everythingis an HDL entity, portable to othersimulationplatforms
  • Basically, the aimis to simulate a (very small) slice of the readoutsystem
    • === Mini-DAQ including FE emulation
    • Couldaddfew FE channels with differentoccupancies
    • *onlyproblemissimulation time



FE emulation, why?

  • Needed to develop a FE emulationblock to simulate the generation of detector data
    • Used to
    • study impact of TFC commandsat FE buffer behaviour
    • demonstratefeasibility of packingmechanismat FE aswritten in specs
    • emulate FE data generator to spy on sub-detectors for FE reviews…. 
  • Proposed to use itas a practicalexample of a generic FE data generator for the readoutarchitecturesimulationframeworkuntil sub-detectors’ codesbecomeavailable
    • Description of the code here
    • Simulationresults
    • Considerations on packingmechanism
    • Considerations on buffer usage
    • Synthesisresults
  • Practicalproof of howimportant
  • simulating code is…



Generic FE channelas in specs

  • FE channelcontains a buffer:
  • No trigger at FE, so buffer isactually a derandomizer.
  • Used to pipe data @ 40MHz to be packed and sent over GBT link.
  • If no TFC command and occupancytoo high, buffer willfill up veryveryquickly
    • We are runningat 40MHz! It’s 40 timesfasterthannow…
    • Mechanism to empty buffer
    • TFC commands come in handy
  • DATA coming out on GBT link:
  • No emptyspaces, no unexpected 0s
  • Fullydynamicpackingalgorithmacross GBT frame-width
  • Wishingly, data should be in order…



The code: GBT dynamicpacking

Very important to analyze simulation output bit-by-bit and clock-by-clock!



The code: configuration

  • FE generic data generator is fully programmable:
    • Number of channels associated to GBT link
    • Width of each channel
    • Derandomizer depth
    • Mean occupancy of the channels associated to GBT link
    • Size of GBT frame (80 bits or WideBus + GBT header 4 bits)
  • Extremely flexible and easy to configure with parameters
  • Covers almost all possibilities (almost…)
    • Including flexible transmission of NZS and ZS
  • Including TFC commands as defined in specs
    • Study dependency of FE buffer behaviour with TFC commands
    • Study effect of packing algorithm on TELL40
    • Study synchronization mechanism at beginning of run
    • Study re-synchronization mechanism when de-synchronized
    • Etc… etc… etc…
  • And it is fully synthesizable… 




  • Simulated 11 different scenarios:
  • fixed GBT size to 80 bits + 4 bits GBT header
  • fixed width of data header to 24 bits in three fields (12 for BXID, 8 for data size, 4 for info)
  • fixed width of data channel to 5 bits as practical example
  • Numbers scale relatively: less occupancy, more number of channels




Scenario 1: 10% occupancy, 50x5bits channels, derandomizer depth 75

Scenario 2: 25% occupancy, 50x5bits channels, derandomizer depth 75




Scenario 8: 40% occupancy, 32x5bits channels, derandomizer depth 165

Scenario 9: 40% occupancy, 32x5bits channels, derandomizer depth 165 + NO BX VETO sent from TFC




Filling scheme

TFC commands

FE data generated

Derandomizer occupancy

GBT output

For a bit-by-bit zoom in please come to my office 




For a bit-by-bit zoom in please come to my office or ask the code 




  • Using Quartus Altera 12.1 SP1
  • No synthesis optimization done, let fitter free, no pinout defined, no timing constraint
  • No memory cells used
  • Doable, can be further improved though.



FYI, simulationoutlook

  • Simulation should be a coordinated effort
  • Personal drive in order to be able to produce a (complex) code for TFC on time
  • FE generic code + TFC code should be merged with TELL40 effort
  • To test both FE packing algorithm and FE buffer management
  • To test decoding at TELL40 and investigate consequences/solutions
  • To analyze effects of TFC commands on global system (including TELL40)
    • Effort already ongoing between me and Guillaume to do so
  • We would very very much appreciate to have the code (emulation) of each sub-detectors
  • a FE generic code is useful to study things on paper, but real code is something different
  • Proposal is to use this simulation effort to validate FE code
  • simulation performed by me and Guillaume to investigate solutions, issues in FE




  • Packing mechanism as specified in our document is feasible.
    • Will be used temporarily to emulate FE generated data in global readout and TFC simulation.
  • However, very big open questions:
    • Is your FE compatible with such scheme? What about such code in an ASIC?
    • Behaviour of FE derandomizer will strongly depend on your compression or suppression mechanism.
      • If dynamic could create big latencies
      • If your data does not come out of order can become quite complicated…
    • Behaviour of FE derandomizer will strongly depend on TFC commands
      • FE buffer depth should not rely on having a BX VETO! Aim at a bandwidth for fully 40 MHz readout  BX VETO solely to discard events synchronously.
      • What about SYNCH command? When do you think you can apply it? Ideally after derandomizer and after suppression/compression, but…
    • How many clock cycles do you need to recover from an NZS event?
      • Can you handle consecutive NZS events?


system and functional requirements
System and functional requirements
  • Bidirectionalcommunication network
  • Clock jitter, and phase and latency control
    • At the FE, butalsoat TELL40 and between S-TFC boards
  • Partitioningto allowrunning with any ensemble and parallelpartitions
  • LHCinterfaces
  • Eventsrate control
  • Low-Level-Trigger input
  • Support for old TTC-baseddistributionsystem
  • Destination control for the eventpackets
  • Sub-detectors calibrationtriggers
  • S-ODIN data bank
    • Infomationabouttransmittedevents
  • Test-benchsupport


the s tfc system at a glance
The S-TFC system at a glance
  • S-ODINresponsible for controllingupgradedreadoutsystem
  • Distributing timing and synchronouscommands
  • Manages the dispatching of events to the EFF
  • Rate regulates the system
  • Support old TTC system: hybridsystem!


  • SOL40responsible for interfacingFE+TELL40 sliceto S-ODIN
  • Fan-out TFC information to TELL40
  • Fan-in THROTTLE information from TELL40
  • Distributes TFC information to FE
  • Distributes ECS configuration data to FE
  • Receives ECS monitoring data from FE




the upgraded physical readout slice
The upgraded physical readout slice
  • Common electronicsboard for upgradedreadoutsystem: Marseille’s ATCA board with 4 AMC cards
    • S-ODIN AMC card
    • LLT  AMC card
    • TELL40  AMC card
    • LHC Interfaces specific AMC card



Latest S-TFC protocol to TELL40

Wewillprovide the TFC decodingblock for the TELL40: VHDL entity with inputs/outputs

  • «Extended» TFC word to TELL40 via SOL40:
  •  64 bits sentevery 40 MHz = 2.56 Gb/s (on backplane)
  •  packed with 8b/10b protocol(i.e. total of 80 bits)
  •  no dedicated GBT buffer, use ALTERA GX simple 8b/10b encoder/decoder
  • MEP acceptcommandwhen MEP ready:
  • Take MEP address and pack to FARM
  • No need for special address, dynamic

Constant latency after BXID

  • THROTTLE information from each TELL40 to SOL40:
    • no change: 1 bit for each AMC board + BXID for which the throttlewas set
      • 16 bits in 8b/10b encoder
      • same GX buffer asbefore (assame decoder!)



S-TFC protocol to FE, no change

  • TFC word on downlink to FE via SOL40 embedded in GBT word:
  •  24 bits in each GBT frame every 40 MHz = 0.98 Gb/s
  •  allcommandsassociated to BXID in TFC word
  • Put localconfigurabledelays for each TFC command
    • GBT doesnotsupportindividualdelays for each line
    • Need for «local» pipelining: detector delays+cables+operationallogic (i.e. laser pulse?)
  • TFC word willarrivebefore the actualeventtakesplace
    • To allow use of commands/resets for particularBXID
    • Accounting of delays in S-ODIN: for now, 16 clock cyclesearlier + time to receive
    • Aligned to the furthest FE (simulation, then in situ calibration!)
  • TFC protocol to FE hasimplications on GBT configuration and ECS to/from FE
    • seespecsdocument!



Timing distribution

  • From TFC point of view, weensureconstant:
  • LATENCY: Alignment with BXID
  • FINE PHASE: Alignment with best samplingpoint
  • Some resynchronizationmechanismsenvisaged:
  • Within TFC boards
  • With GBT
    • No impact on FE itself
  • Loopbackmechanism:
  • re-transmit TFC word back
  • allows for latencymeasurement + monitoring of TFC commands and synchronization



How to decode TFC in FE chips?

FE electronicblock

  • Use of TFC+ECS GBTsin FE is 100% common to everybody!!
  • dashedlines indicate the detector specificinterfaceparts
  • pleasepayparticular care in the clock transmission: the TFC clock must be used by FE to transmit data, i.e. lowjitter!
    • Kaptoncable, crate, copperbetween FE ASICs and GBTX





External clock reference


  • These clocks should be the main clocks for the FE
  • 8 programmablephases
  • 4 programmablefrequencies (40,80,160,320 MHz)

E – Port



Phase - Shifter

CLK Reference/xPLL

E – Port


E – Port





E – Port



Phase – Aligners + Ser/Des for E – Ports

CLK Manager


80, 160 and 320 Mb/s ports




E – Port



E – Port

E – Port

  • Used to:
  • sample TFC bits
  • drive Data GBTs
  • drive FE processes

Control Logic


(e-Fuses + reg-Bank)

one 80 Mb/s port



I2C Slave

I2C Master

E – Port


I2C (light)



JTAG port

I2C port



The TFC+ECS GBT protocol to FE

  •  TFC protocolhasdirectimplications in the way in which GBT should be usedeverywhere
    • 24 e-links @ 80 Mb/s dedicated to TFC word:
      • use 80 MHz phaseshifter clock to sample TFC parallel word
    • TFC bits are packed in GBT frame so thattheyall come out on the same clock edge
      • We can repeat the TFC bits also on consecutive 80 MHz clock edgeifneeded
  • Leftover 17 e-linksdedicated to GBT-SCAs for ECS configuring and monitoring(seelater)



Words come out from GBT at 80 Mb/s

  • In simplewords:
  • Odd bits of GBT protocol on risingedgeof 40 MHz clock (first, msb),
  • Even bits of GBT protocol on fallingedgeof 40 MHz clock (second,lsb)



TFC decoding at FE after GBT

  • Thisiscrucial!!
  • wecan alreadyspecifywhereeach TFC bit will come out on the GBT chip
  • thisis the only way in which FE designers stillhaveminimalfreedom with GBT chip
    • if TFC info waspacked to come out on only 12 e-links (first oddtheneven), thendecoding in FE ASIC would be mandatory!
    • whichwouldmeanthatthe GBT bus wouldhave to go to each FE ASIC for decoding of TFC command
  • thereisalso the idea to repeat the TFC bits on even and odd bits in TFC protocol
    • wouldthat help?
    • FE couldtielogicalblocksdirectly on GBT pins…



Now, what about the ECS part?

  • Eachpair of bit from ECS field inside GBT can go to a GBT-SCA
    • OneGBT-SCA isneeded to configure the Data GBTs(EC one for example?)
    • The rest can go to either FE ASICs or DCS objects(temperature, pressure) via other GBT-SCAs
      • GBT-SCA chip hasalreadyeverything for us: interfaces, e-linksports ..
        •  No reason to go for somethingdifferent!
      • However, «silicon for SCA will come laterthansilicon for GBTX»…
        •  Weneedsomethingwhilewewait for it!



SOL40 encoding block to FE!

  • Protocol drivers build GBT-SCA packets with addressing scheme and bus type for associated GBT-SCA user busses to selected FE chip
  • Basically each block will build one of the GBT-SCA supported protocols

Memory Mapwith internal addressing scheme for GBT-SCA chips + FE chips addressing, e-link addressing and bus type: content of memory loaded from ECS



Usual considerations …

  • TFC+ECSInterface has the ECS load of an entireFE cluster for configurating and monitoring
  • 34bits @ 40 MHz = 1.36Gb/son single GBT link
    • ~180 Gb/s for full TFC+ECSInterface (132 links)
    • Single CCPC mightbecomebottleneck…
    • Clara & us, December 2011
  • How long to configure FE cluster?
    • howmany bits / FE?
    • howmanyFEs/ GBT link?
    • howmanyFEs / TFC+ECSInterface?
  •  Numbers to be pinned down soon+ GBT-SCAinterfaces and protocols.



Old TTC systemsupport and

runningtwosystems in parallel

  • We already suggested the idea of a hybrid system:
    • reminder: L0 electronics relying on TTC protocol
    • part of the system runs with old TTC system
    • part of the system runs with the new architecture
  • How?
  • Need connection between S-ODIN and ODIN (bidirectional)
  •  use dedicated RTM board on S-ODIN ATCA card
  • In an early commissioning phase ODIN is the master, S-ODIN is the slave
      • S-ODIN task would be to distribute new commands to new FE, to new TELL40s, and run processes in parallel to ODIN
      • ODIN tasks are the ones today + S-ODIN controls the upgraded part
        • In this configuration, upgraded slice will run at 40 MHz, but positive triggers will come only at maximum 1.1MHz…
          • Great testbench for development + tests + apprenticeship…
          • Bi-product: improve LHCb physics programme in 2015-2018…
  • 3. In the final system, S-ODIN is the master, ODIN is the slave
    •  ODIN task is only to interface the L0 electronics path to S-ODIN and to
    • provide clock resets on old TTC protocol