Federico alessio w ith inputs from richard ken guillaume
1 / 38

Federico Alessio, w ith inputs from Richard, Ken, Guillaume - PowerPoint PPT Presentation

  • Uploaded on

Study on buffer usage and data packing at the FE. LHCb Electronics Upgrade Meeting 11 April 2013. Federico Alessio, w ith inputs from Richard, Ken, Guillaume. Scope. Attempt to study : Impact of TFC commands on behaviour of FE buffer in upgraded readout architecture

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Federico Alessio, w ith inputs from Richard, Ken, Guillaume' - miron

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Federico alessio w ith inputs from richard ken guillaume

Study on buffer usage and data packingat the FE

LHCb Electronics Upgrade Meeting

11 April 2013

Federico Alessio,

with inputs from Richard, Ken, Guillaume


  • Attempt to study:

  • Impact of TFC commands on behaviour of FE buffer in upgradedreadoutarchitecture

  • Feasibility of packingalgorithmacross GBT link asspecified in readoutarchitecturespecifications

  • ThispresentationisNOT intendedto show youhow to pack across the GBT link or how to use the buffer.

  • However, itIS intended to stimulatediscussionsusing a practicalexample on possiblesolutions/implicationsat FE in the global readoutarchitecture.

  • Thereis a publishedinternal note whichcontainswhatI’mpresentinghere: LHCb-INT-2013-015 .

  • Itisnotfinal, itismeantonly to stimulatediscussions  feedbacks!


TFC simulationtestbench

  • First simulationtestbench (est. 2009) developed in VisualElite from Mentor Graphics. Includes:

    • S-ODIN

    • SOL40 (only TFC)

    • LHC clock

    • LHC fillingscheme

    • LLT emulation (based on current L0)

    • Custom-made FE emulationblock

      • GenericFE emulation

      • OT-like

      • CALO-like

    • No TELL40 emulation, throttleisfaked

    • Everythingis an HDL entity, portable to othersimulationplatforms

  • Basically, the aimis to simulate a (very small) slice of the readoutsystem

    • === Mini-DAQ including FE emulation

    • Couldaddfew FE channels with differentoccupancies

    • *onlyproblemissimulation time


(Simplified) TFC simulationtestbench


FE emulation, why?

  • Needed to develop a FE emulationblock to simulate the generation of detector data

    • Used to

    • study impact of TFC commandsat FE buffer behaviour

    • demonstratefeasibility of packingmechanismat FE aswritten in specs

    • emulate FE data generator to spy on sub-detectors for FE reviews…. 

  • Proposed to use itas a practicalexample of a generic FE data generator for the readoutarchitecturesimulationframeworkuntil sub-detectors’ codesbecomeavailable

    • Description of the code here

    • Simulationresults

    • Considerations on packingmechanism

    • Considerations on buffer usage

    • Synthesisresults

  • Practicalproof of howimportant

  • simulating code is…


Generic FE channelas in specs

  • FE channelcontains a buffer:

  • No trigger at FE, so buffer isactually a derandomizer.

  • Used to pipe data @ 40MHz to be packed and sent over GBT link.

  • If no TFC command and occupancytoo high, buffer willfill up veryveryquickly

    • We are runningat 40MHz! It’s 40 timesfasterthannow…

    • Mechanism to empty buffer

    • TFC commands come in handy

  • DATA coming out on GBT link:

  • No emptyspaces, no unexpected 0s

  • Fullydynamicpackingalgorithmacross GBT frame-width

  • Wishingly, data should be in order…


The code: GBT dynamicpacking

Very important to analyze simulation output bit-by-bit and clock-by-clock!


The code: configuration

  • FE generic data generator is fully programmable:

    • Number of channels associated to GBT link

    • Width of each channel

    • Derandomizer depth

    • Mean occupancy of the channels associated to GBT link

    • Size of GBT frame (80 bits or WideBus + GBT header 4 bits)

  • Extremely flexible and easy to configure with parameters

  • Covers almost all possibilities (almost…)

    • Including flexible transmission of NZS and ZS

  • Including TFC commands as defined in specs

    • Study dependency of FE buffer behaviour with TFC commands

    • Study effect of packing algorithm on TELL40

    • Study synchronization mechanism at beginning of run

    • Study re-synchronization mechanism when de-synchronized

    • Etc… etc… etc…

  • And it is fully synthesizable… 



  • Simulated 11 different scenarios:

  • fixed GBT size to 80 bits + 4 bits GBT header

  • fixed width of data header to 24 bits in three fields (12 for BXID, 8 for data size, 4 for info)

  • fixed width of data channel to 5 bits as practical example

  • Numbers scale relatively: less occupancy, more number of channels



Scenario 1: 10% occupancy, 50x5bits channels, derandomizer depth 75

Scenario 2: 25% occupancy, 50x5bits channels, derandomizer depth 75



Scenario 8: 40% occupancy, 32x5bits channels, derandomizer depth 165

Scenario 9: 40% occupancy, 32x5bits channels, derandomizer depth 165 + NO BX VETO sent from TFC



Filling scheme

TFC commands

FE data generated

Derandomizer occupancy

GBT output

For a bit-by-bit zoom in please come to my office 



For a bit-by-bit zoom in please come to my office or ask the code 



  • Using Quartus Altera 12.1 SP1

  • No synthesis optimization done, let fitter free, no pinout defined, no timing constraint

  • No memory cells used

  • Doable, can be further improved though.


FYI, simulationoutlook

  • Simulation should be a coordinated effort

  • Personal drive in order to be able to produce a (complex) code for TFC on time

  • FE generic code + TFC code should be merged with TELL40 effort

  • To test both FE packing algorithm and FE buffer management

  • To test decoding at TELL40 and investigate consequences/solutions

  • To analyze effects of TFC commands on global system (including TELL40)

    • Effort already ongoing between me and Guillaume to do so

  • We would very very much appreciate to have the code (emulation) of each sub-detectors

  • a FE generic code is useful to study things on paper, but real code is something different

  • Proposal is to use this simulation effort to validate FE code

  • simulation performed by me and Guillaume to investigate solutions, issues in FE



  • Packing mechanism as specified in our document is feasible.

    • Will be used temporarily to emulate FE generated data in global readout and TFC simulation.

  • However, very big open questions:

    • Is your FE compatible with such scheme? What about such code in an ASIC?

    • Behaviour of FE derandomizer will strongly depend on your compression or suppression mechanism.

      • If dynamic could create big latencies

      • If your data does not come out of order can become quite complicated…

    • Behaviour of FE derandomizer will strongly depend on TFC commands

      • FE buffer depth should not rely on having a BX VETO! Aim at a bandwidth for fully 40 MHz readout  BX VETO solely to discard events synchronously.

      • What about SYNCH command? When do you think you can apply it? Ideally after derandomizer and after suppression/compression, but…

    • How many clock cycles do you need to recover from an NZS event?

      • Can you handle consecutive NZS events?


System and functional requirements
System and functional requirements

  • Bidirectionalcommunication network

  • Clock jitter, and phase and latency control

    • At the FE, butalsoat TELL40 and between S-TFC boards

  • Partitioningto allowrunning with any ensemble and parallelpartitions

  • LHCinterfaces

  • Eventsrate control

  • Low-Level-Trigger input

  • Support for old TTC-baseddistributionsystem

  • Destination control for the eventpackets

  • Sub-detectors calibrationtriggers

  • S-ODIN data bank

    • Infomationabouttransmittedevents

  • Test-benchsupport


The s tfc system at a glance
The S-TFC system at a glance

  • S-ODINresponsible for controllingupgradedreadoutsystem

  • Distributing timing and synchronouscommands

  • Manages the dispatching of events to the EFF

  • Rate regulates the system

  • Support old TTC system: hybridsystem!


  • SOL40responsible for interfacingFE+TELL40 sliceto S-ODIN

  • Fan-out TFC information to TELL40

  • Fan-in THROTTLE information from TELL40

  • Distributes TFC information to FE

  • Distributes ECS configuration data to FE

  • Receives ECS monitoring data from FE




The upgraded physical readout slice
The upgraded physical readout slice

  • Common electronicsboard for upgradedreadoutsystem: Marseille’s ATCA board with 4 AMC cards

    • S-ODIN AMC card

    • LLT  AMC card

    • TELL40  AMC card

    • LHC Interfaces specific AMC card


Latest S-TFC protocol to TELL40

Wewillprovide the TFC decodingblock for the TELL40: VHDL entity with inputs/outputs

  • «Extended» TFC word to TELL40 via SOL40:

  •  64 bits sentevery 40 MHz = 2.56 Gb/s (on backplane)

  •  packed with 8b/10b protocol(i.e. total of 80 bits)

  •  no dedicated GBT buffer, use ALTERA GX simple 8b/10b encoder/decoder

  • MEP acceptcommandwhen MEP ready:

  • Take MEP address and pack to FARM

  • No need for special address, dynamic

Constant latency after BXID

  • THROTTLE information from each TELL40 to SOL40:

    • no change: 1 bit for each AMC board + BXID for which the throttlewas set

      • 16 bits in 8b/10b encoder

      • same GX buffer asbefore (assame decoder!)


S-TFC protocol to FE, no change

  • TFC word on downlink to FE via SOL40 embedded in GBT word:

  •  24 bits in each GBT frame every 40 MHz = 0.98 Gb/s

  •  allcommandsassociated to BXID in TFC word

  • Put localconfigurabledelays for each TFC command

    • GBT doesnotsupportindividualdelays for each line

    • Need for «local» pipelining: detector delays+cables+operationallogic (i.e. laser pulse?)


  • TFC word willarrivebefore the actualeventtakesplace

    • To allow use of commands/resets for particularBXID

    • Accounting of delays in S-ODIN: for now, 16 clock cyclesearlier + time to receive

    • Aligned to the furthest FE (simulation, then in situ calibration!)

  • TFC protocol to FE hasimplications on GBT configuration and ECS to/from FE

    • seespecsdocument!


SODIN firmware v1r0 – blockdiagram


Timing distribution

  • From TFC point of view, weensureconstant:

  • LATENCY: Alignment with BXID

  • FINE PHASE: Alignment with best samplingpoint

  • Some resynchronizationmechanismsenvisaged:

  • Within TFC boards

  • With GBT

    • No impact on FE itself

  • Loopbackmechanism:

  • re-transmit TFC word back

  • allows for latencymeasurement + monitoring of TFC commands and synchronization


How to decode TFC in FE chips?

FE electronicblock

  • Use of TFC+ECS GBTsin FE is 100% common to everybody!!

  • dashedlines indicate the detector specificinterfaceparts

  • pleasepayparticular care in the clock transmission: the TFC clock must be used by FE to transmit data, i.e. lowjitter!

    • Kaptoncable, crate, copperbetween FE ASICs and GBTX




External clock reference


  • These clocks should be the main clocks for the FE

  • 8 programmablephases

  • 4 programmablefrequencies (40,80,160,320 MHz)

E – Port



Phase - Shifter

CLK Reference/xPLL

E – Port


E – Port





E – Port



Phase – Aligners + Ser/Des for E – Ports

CLK Manager


80, 160 and 320 Mb/s ports




E – Port



E – Port

E – Port

  • Used to:

  • sample TFC bits

  • drive Data GBTs

  • drive FE processes

Control Logic


(e-Fuses + reg-Bank)

one 80 Mb/s port



I2C Slave

I2C Master

E – Port


I2C (light)



JTAG port

I2C port


The TFC+ECS GBT protocol to FE

  •  TFC protocolhasdirectimplications in the way in which GBT should be usedeverywhere

    • 24 e-links @ 80 Mb/s dedicated to TFC word:

      • use 80 MHz phaseshifter clock to sample TFC parallel word

    • TFC bits are packed in GBT frame so thattheyall come out on the same clock edge

      • We can repeat the TFC bits also on consecutive 80 MHz clock edgeifneeded

  • Leftover 17 e-linksdedicated to GBT-SCAs for ECS configuring and monitoring(seelater)


Words come out from GBT at 80 Mb/s

  • In simplewords:

  • Odd bits of GBT protocol on risingedgeof 40 MHz clock (first, msb),

  • Even bits of GBT protocol on fallingedgeof 40 MHz clock (second,lsb)


TFC decoding at FE after GBT

  • Thisiscrucial!!

  • wecan alreadyspecifywhereeach TFC bit will come out on the GBT chip

  • thisis the only way in which FE designers stillhaveminimalfreedom with GBT chip

    • if TFC info waspacked to come out on only 12 e-links (first oddtheneven), thendecoding in FE ASIC would be mandatory!

    • whichwouldmeanthatthe GBT bus wouldhave to go to each FE ASIC for decoding of TFC command

  • thereisalso the idea to repeat the TFC bits on even and odd bits in TFC protocol

    • wouldthat help?

    • FE couldtielogicalblocksdirectly on GBT pins…


Now, what about the ECS part?

  • Eachpair of bit from ECS field inside GBT can go to a GBT-SCA

    • OneGBT-SCA isneeded to configure the Data GBTs(EC one for example?)

    • The rest can go to either FE ASICs or DCS objects(temperature, pressure) via other GBT-SCAs

      • GBT-SCA chip hasalreadyeverything for us: interfaces, e-linksports ..

        •  No reason to go for somethingdifferent!

      • However, «silicon for SCA will come laterthansilicon for GBTX»…

        •  Weneedsomethingwhilewewait for it!


SOL40 encoding block to FE!

  • Protocol drivers build GBT-SCA packets with addressing scheme and bus type for associated GBT-SCA user busses to selected FE chip

  • Basically each block will build one of the GBT-SCA supported protocols

Memory Mapwith internal addressing scheme for GBT-SCA chips + FE chips addressing, e-link addressing and bus type: content of memory loaded from ECS


Usual considerations …

  • TFC+ECSInterface has the ECS load of an entireFE cluster for configurating and monitoring

  • 34bits @ 40 MHz = 1.36Gb/son single GBT link

    • ~180 Gb/s for full TFC+ECSInterface (132 links)

    • Single CCPC mightbecomebottleneck…

    • Clara & us, December 2011

  • How long to configure FE cluster?

    • howmany bits / FE?

    • howmanyFEs/ GBT link?

    • howmanyFEs / TFC+ECSInterface?

  •  Numbers to be pinned down soon+ GBT-SCAinterfaces and protocols.


Old TTC systemsupport and

runningtwosystems in parallel

  • We already suggested the idea of a hybrid system:

    • reminder: L0 electronics relying on TTC protocol

    • part of the system runs with old TTC system

    • part of the system runs with the new architecture

  • How?

  • Need connection between S-ODIN and ODIN (bidirectional)

  •  use dedicated RTM board on S-ODIN ATCA card

  • In an early commissioning phase ODIN is the master, S-ODIN is the slave

    • S-ODIN task would be to distribute new commands to new FE, to new TELL40s, and run processes in parallel to ODIN

    • ODIN tasks are the ones today + S-ODIN controls the upgraded part

      • In this configuration, upgraded slice will run at 40 MHz, but positive triggers will come only at maximum 1.1MHz…

        • Great testbench for development + tests + apprenticeship…

        • Bi-product: improve LHCb physics programme in 2015-2018…

  • 3. In the final system, S-ODIN is the master, ODIN is the slave

    •  ODIN task is only to interface the L0 electronics path to S-ODIN and to

    • provide clock resets on old TTC protocol

  • 36

    S-ODIN on Marseille’s ATCA board


    TFC+ECSInterface on Marseille’s ATCA board