1 / 31

Data buffering

Data buffering. Buffering requirements. Data buffering During trigger latency Location: Pixel region Size: H it rate, Trigger latency, PR organization/sharing, Data per hit ( TOT/ADC, BX-ID, etc.) Buffer type: FIFO FIFO/RAM Latches Data buffer with individual counters

brad
Download Presentation

Data buffering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data buffering

  2. Buffering requirements • Data buffering • During trigger latency • Location: Pixel region • Size: Hit rate, Trigger latency, PR organization/sharing, Data per hit ( TOT/ADC, BX-ID, etc.) • Buffer type: FIFO • FIFO/RAM • Latches • Data buffer with individual counters • Registers + latches • To get data out of PR to EOC • Location: PR region • Size: Hit * Trigger rate * Data, Bandwidth to EOC, Sharing with other PR’s • Buffer type: Small FIFO or back pressure to latency buffer • To assemble and compress all hits related to one event • FIFO buffer per pixel column to align event fragments • Buffers for compression/extraction ?. • Output FIFO • Acceptable losses: • Hit loss: • Whish: <0.1% • Acceptable: <1%, in worst case locations ?. • Event loss: Never, as implies system de-sync.

  3. Assumptions • Hit rate: 500MHz (track) * 4( cluster size) = ~2GHz/cm2 • Conservative for both track rate and cluster size ! • Trigger latency: 20us (10us) • Pixel Region: 4 x 4 (options 1x1, 2x2) • Large PR (4 x 4) favoured to have “few” (16k memories per chip !) large memories instead of many small, as much more efficient when using RAM. • 2 x 2 PR optimal for minimal number of bits to store • But does not seem optimal if using a RAM block (but very good if using latches) This also depends on assumptions of size and shape of pixel clusters • Storage type: Area per bit • Normal memory: 1.05 x 0.5um2 (TSMC RAM cell): 1x • Will not work in our radiation environment ! • Rad tol memory: 1.5 x 2.6 um2: 7.5x • Latch: 3.8 x 1.8 um2 (small standard cell): 13x • Could special optimized latch be made for in-pixel storage ? • Small latch 2.8 x 1.4 um2 exists but is it sufficiently rad hard ? 7.5x (same as rad hard RAM !) Does not include overhead: Address decoding, sense amp, etc. ! • RAM: ~25% overhead (TBC), Strong dependency on RAM size • Latch: ~50% overhead (TBC), Readout mux + P&R • Data: 10 bit BX, N x 4 bit TOT/ADC • 1x1: 14; 2x2: 26; 3x3: 46; 4x4: 74; 5x5: 110

  4. Pixel Region effect • Pixel region rate is a fraction of hit rate: • 1 – 1/cluster-size • Statistical model: • 3x3 cluster model: Middle barrel with highest rates • Verified with other cluster models

  5. Buffer depth • Hits (PR hits) random in time: In 25ns BX • Store hits during latency: 20us or10us (ATLAS 6us). • Hit loss: 0.1% would be comfortable • 1% could possibly be acceptable in hottest regions.(especially if we really want to have small pixels) • Buffer depth required per pixel region: Plots and Table • Does NOT scale linear ! • Buffer bits required per pixel is the real physical requirement • (10bit BX + N x 4 bit TOT/ADC)* Depth / N • Conclusions • Base line: 20us, 0.1%, 4 x 4: 16 buffers of 74bits = 1184bits • Accept 1% loss: 2/16 = ~15% size reduction • Shorter latency: 10us, 0.1%: 5/16 = ~30% size reduction • 10us, 1% loss: 8/16 = ~50% size reduction • 2x2 instead of 4x4: ~25% reduction in number of bits (but not in area) 20us 10us 20us

  6. What if small/larger clusters 4x4,16 buffers: Loss< 0.01%(A buffer of 11 would have been acceptable) Track rate: 500KHz Latency: 20us Hit rate: 500KHz x 7.5 = 3.7GHz/cm2 ! Hit rate: 500KHz x 2 = 1GHz/cm2 4x4,16 buffers: Loss< 1%Which is still acceptable Same conclusion for 2x2 with 8 buffers

  7. Physical buffer size: PR 4x4 • 16 words x 74bits = 1184 bits • 4x4 pixel region: • Total area: ~25um x 100um x 16 = 40.000um2 • Digital: ~1/2 = 20.000um2 • Total memory size (with guessed overhead): • Standard RAM: 1.05x0.5um2 x1184 = 621um2 x 1.25 = 777um2 • Rad tol. RAM: 1.5x2.6 um2 x1184 = 4617um2 x 1.25 =5771um2 • Latches: 1.8x3.8 um2x1184 = 8099um2 x 1.5 = 12147um2 • A 16x74 memory with minimum latches has been implemented in 7136 um2(Half size than conservative guess !, comparable to rad hard RAM))Thanks to S. Bonacini • Percentage of digital area used for latency buffer • Standard RAM: 4% Not rad hard • Rad tol. RAM: 29% OK, Overhead TBC • Latches: 60% Too large • Mini latches 35% Would be OK, Rad tol ?

  8. Physical buffer size: PR 2x2 • 9words x 26bits = 234 bits • 2x2 pixel region • Total area: ~25um x 100um x 4 = 10.000um2 • Digital: ~1/2 = 5.000um2 • Total memory size (with guessed overhead): • Standard RAM: 1.05x0.5um2 x 234 = 123um2 x 1.25 = 154um2 • Rad tol. RAM: 1.5x2.6 um2 x 234 = 913um2 x 1.25 = 1140um2 • Latches: 1.8x3.8 um2x 234 = 1628um2 x 1.5 = 2400um2 • Percentage of digital area used for latency buffer • Standard RAM: 3% Not rad hard • Rad tol. RAM: 23% Overhead largely underestimated ? • Latches: 50% Could be OK, • Mini latches 27% Looks promising, Rad tol ?

  9. Physical buffer size: PR 1x1 • 5words x14bits = 70 bits • Impractical small RAM (use latches) • 1x1 pixel region • Total area: ~25um x 100um = 2500um2 • Digital: ~1/2 = 1250um2 • Total memory size (with guessed overhead): • Standard RAM: 1.05x0.5um2 x70 = 37um2 x 1.25 = 50um2 • Rad tol. RAM: 1.5x2.6 um2 x 70 = 273um2 x 1.25 = 340um2 • Latches: 1.8x3.8 um2x 70 = 478 um2 x 1.5 = 720um2 • Percentage of digital area used for latency buffer • Standard RAM: 4% Not rad hard • Rad tol. RAM: 27% Overhead largely underestimated • Latches: 58% Large ! • Mini latches ~35% Could be OK, Rad tol. ?

  10. Latency and buffer conclusions • 20us latency buffer, <0.1% loss, seems feasible for a 25um x 100um pixel size • ½ area assigned to analog front-end. • ½ area assigned to digital: • Buffer shared across pixels in pixel regions • 4x4 PR: ~30% of dig. area needed for 16 buffer Rad RAM • 2x2 PR: <50% of dig. area needed for latches • 1x1 PR: <60% of dig. area needed for latches • 10us latency: ~30% memory reduction • Accept 1% loss: ~15% memory reduction • 10us, <1% loss: ~50% memory reduction. • Based on conservative assumptions (inner layer): • Track/particle rate: 500MHz/cm2 • Cluster size: ~4 (middle barrel) • What if Cluster size ~2x (~4GHz/cm2): • Loss < 1%, Acceptable for small high rate regions Details of rad hard RAM/latch to be confirmed: • Area overhead for small memories • Radiation tolerance to 1Grad.

  11. Pixel readout

  12. Basic CMS HL-LHC assumptions • Cluster size: ~4 (average) • Cluster size and shape varies significantly over pixel detector: • Middle barrel, End barrel, End cap disks, Tracks from collision point, Machine background/halo, Loopers, monsters, etc. • Depends on sensor type, thickness and radiation damage. • Rate: Worst case HL-LHC (layer locations as in Phase1) • Layer 1 (3.0cm): ~500MHz/cm2 tracks -> ~2GHz/cm2 hits (TBC) • Layer2 (6.8cm): ~½ of layer 1 • Layer3 (10.2cm): ~½ of layer 2 -> ~¼ of layer 1 • Layer4 16.0cm) : ~½ of layer 3 -> ~1/10 of layer 1(50MHz/cm2 tracks, 200MHz/cm2 hits) • End-cap disks ?. Have not used usual 1/r2 scaling to be conservative (too conservative ?) • Pixel size: ~25x100um = 2500um2 • Pixel chip: ~6.5 cm2 , ~256k pixels • Pixel Regions (PR): 4 x 4 or 2 x 2 • Tracks/hits per Readout Out Chip (ROC) per Bx: • Layer 1: 50KHz/pixel, 75 tracks/ROC/Bx, 300hits/ROC/Bx • Layer 4: 5KHz/pixel, 7.5 tracks/ROC/Bx, 30hits/ROC/Bx • L1 Trigger: 1MHz (500KHz), 20us (10us)

  13. General pixel (chip) architecture • Pixels: ~256k, Chip size = ~2.6cm x ~3cm • ~1G transistors, ~3Watt

  14. Readoutrate and data formatting • Readout link bandwidth depends on many aspects. • Hit rates, Trigger rates, Required data (e.g. 4b ADC), Readout format, Use of Pixel Regions (PR), Clustering approach (extraction). • Not (yet) using readout re-formatting across Pixel Regions. • Data reduction: 10-20% • Not (yet) on-chip track position extraction: Cluster center • Requires intelligence in EOC: Correlate date from different pixel columns • Handling of “strange”/broken clusters • Data reduction: 25% - 50% • Keep this option as a possible “safety factor” for the future. • Readout formatting: As function of pixel region and data • Event header: 32 bit event header (12b B-ID , 12b E-ID, 8b diverse)) • 1 x 1 Binary Binary: 18bit pixel address = 18bit • 1 x 1 TOT: 18bit address + 4bit TOT = 22bit • 2 x 2 Binary: 16bit PR address + 4bit hit map = 20bit • 2 x 2 4 TOT: 16bit PR address + 4 x 4bit TOT = 32bit • 2 x 2 1-4 TOT 16bit PR address, 4b hit map, 1-4 4b TOT = 24 – 36bit • 4 x 4 Binary 14bit PR address, 16b hit map = 30bit • 4 x 4 16 TOT 14bit PR address, 16 x 4 bit TOT = 78bit • 4 x 4 hit map, 1-16 TOT 14bit PR address, 16b hit map, 1-16 4bit TOT: 34 – 95bit • 4 x 4 1-16 Adr + TOT 14bit PR address, 1 – 16 4bit adr + 4bit TOT: 22 – 142 bit • Track extraction: 18bit center pixel adr, 3+3 bit interpolation, 8 bit charge = 32

  15. LP-GBT: optical link interface chip • LP-GBT user bandwidth 2xGBT: ~6.4Gbits/s • Single and multiple bit error correction • 5 E-links @ 1.2Gbits/s • 10 E-links @ 640Mbits/s • 20 E-links @ 320Mbits/s • 40 E-links @ 160Mbits/s • What if ~10Gbits/s user bandwidth (No data protection) • (4 E-links @ 2.4Gbits/s) • 8 E-links @ 1.2Gbits/s • 16 E-links @ 640Mbits/s We do not care about a few corrupted hits (noise)as long as event sync not lost. • E-links at 1.2Gbits/s • Feasible on 2-10m low mass electrical links • Cable: Alu/copper micro coax or twisted pair, Alucapton flat cable, ? • Limited complexity cable driver/receiver • Feasible with limited power in 65nm and very high radiation • Enables very nice readout/system flexibility. • Also appropriate if link chip on module (local on module connections) Gain of using higher speed E-links ? • Assumption: Opto part not on pixel module (radiation, size)

  16. Readout of inner barrel layer @1MHz • Inner layer: Hit rate 2GHz/cm2, 1MHz trigger • Data rates calculated with simplified statistical model • Generic cluster size distribution: • Pixel Region organization • First conclusions • ~1 LP-GBT (6.4Gbits/s) required per pixel chip for layer 1 • ~30% reduction if only Binary • ~30% effect of encoding and PR when having TOT/ADC • ~50% reduction possible using on-chip track position extraction

  17. Opto links per pixel module • 1MHz trigger rate • Opto link rate: 6.4 Gbits/s with multi bit corrections ( needed ?) • Module types • Layer 1-2 : 4 pixel chips with 2 – 4 LPGBT links • Layer 3-4: 8 pixel chips with 1 - 2 LPGBT links • If ~10Gbits/s link: (links per module) Module type A (inner layers, 4 ROCs) needs 1-2 LPGBT links Module type B (outer layers, 8 ROCs) needs 1-2 LPGBT links End cap disks can most likely be made of module type B (or a possible C type ?)

  18. Pixel readout cabling • ~2m cabling from pixel module to LPGBT • Assumes that LPGBT and opto can not survive on pixel module (1Grad) but can survive on service cylinder (100Mrad) • Data rate: 1.2Gb/s (initial proposal) • Cable types: • PSI twisted pair: 400Mb/s over 1m • Habia UBT 3601 ST 2 • ATLAS tracker cable: 4-6Gb/s over 4-6m BPIX GBTs FPIX GBTs

  19. Estimated material from cables • PSI twisted • Acceptable material • Does it have the bandwidth ? • Very fragile • Habia • Has the bandwidth • Too heavy for 1MHz trigger • Solution: somewhere in between • Cu plated alu wire ? • Increased cross section • ALU shielding ? • Highly optimized cable driver receiver (cable equalization) • Dedicated study required ! D. Abbaneo

  20. Pixel modules for 1MHz trigger • 1x4 pixel chip modules for Layer 1 & 2 • 2x4 pixel chip modules for Layer 3 & 4 and forward disks • Electrical links: 1.2Gbits/s on capton, micro coax/pair • Only point to point connections • Pre-emphasis drivers • Equalizer(filter) receivers • Pixel chip: • 1-8 1.2Gbits/s data links • 1 320Mbits/s control/timing link • Power: 0.5 – 1 W/cm2 • No pixel module controller • LPGBT: • 8 x 1.2Gbits/s data links = 10Gbits/s (user data !) • 8 x 320Mbits/s control/timing links = < 5Gbits/s down link (much less would be OK)

  21. 4x1 pixel module: A • Same 4x1 pixel module for layer 1 & 2 • Different link configurations • Layer 1: 2 x 10Gb/s links (3 x 6.4Gb/s links) • Layer 2: 1 x 10Gb/s link (2 x 6.4Gb/s links)

  22. 4 x 2 pixel module: B • Same 4x2 pixel module for layer 3 & 4 & disks • Different link configurations • Layer 3: 2 x 10Gb/s links (3 x 6.4Gb/s links) • Layer 4: 1 x 10Gb/s link (2 x 6.4Gb/s links)

  23. Option B: LP-GBT on pixel module • Two variants of each pixel module type with 1/2 links ? • Where to put LP-GBT ?: Cooling: Will not be such a low power chip (1/2 – 1 W ?) • In middle of hybrid on top of sensor • Extensions at the ends • On the side • Is it sufficiently radiation hard: • Likely not as very high speed at limit of technology • We will not know before very late ! • Use it a ½ speed ? • But then we need twice as many • SEU rates in our extreme radiation environment ? • Delicate high speed cupper/alu link • Special very radiation hard cable driver and receiver • How to power it ? • It may need higher supply voltage than our highly power optimized pixel chip design. • Will not have on-chip DC/DC. • What if we use serial power ?. Will most likely need its own dedicated power • In the ideal world where we could put the Opto on the pixel module; • Radiation tolerance, compactness ? One would most likely use the same pixel chip – LPGBT communication (no cable driver/receiver). Make pixel chip (and LPGTB) so both options can be used (cable drivers/receivers) At ends On top

  24. Option C: “LP-GBT” in pixel chip • We still have the problem of the opto and related drivers/receivers to/from remote board • One opto link per pixel chip not optimal for outer layers • Build-in “LP-GBT” concentrator and fanout function and only use link for ½ or ¼ of the chips. • What link speed ?. • 10G: Speed degradation with radiation is a significant risk • 5G: More links and possibly still non trivial to integrate and get to work for our very high radiation levels ?. • Low speed down link for timing/control: 320Mbits/s • Other • Dedicated module controller and link chip • Only serializer in pixel chip and timing/control from fan-out from LPGBT on remote module. • ?

  25. Pixel phase 2 spaghetti • Pixel detector: ~1000 modules • Each module needs 12 – 24 1.2Gbits/s E-links • Needs ~15k electrical “low mass” links • What is a low mass electrical link ?. • Distance, Speed, Cable, Driver, Receiver , , • Routing/Installation ? • Material and physics impact ? • Baseline: Low mass twisted pair as used in CMS pixel phase1 • Needs to confirm that we can get 1.2Gbits/s on this with optimized drivers/receivers. • How to improve: • Lower trigger rate: ½ (500KHz) • (Confirm hit rates) • On chip track/cluster extraction: ½ ? • Local intelligence in very hostile radiation environment • Higher speed links: 2 – 10Gbits/s ? • How much “copper” needed for this ? • High speed circuits in radiation environment ?. • Opto on pixel module (Readout = VCSEL) • Can it survive radiation ? • Speed: 5 – 10Gbits/s ? • Power, compactness, ,

  26. Summary of 1MHz trigger • 1MHz trigger will be a significant challenge for the pixel system (not so much for the pixel chip) • Low mass requirements • Opto and high speed serializer at 2-10m distance • Requires 1-2 10Gbits/s opto links per pixel module (both Type A and B) • Data reduction: Lower trigger rate, track location extraction • Remote opto: Connection from pixel module to optical link: 2-10m • Modest E-link speed to LPGBT: 1.2 Gbits/s • 12-24 E-links per pixel module • ~15k Low mass cables ! • Flexible and modular approach to deal with different data rates on different modules with a factor 2 safety factor in available output bandwidth of ROC (use only 4 of 8 outputs) CABLE SPAGETTI AND MATERIAL !. • LPGBT on module: 10Gbits/s up + 5Gbits/s down • Few (2 x 2) high speed electrical links per module • High mass cable + Complex high speed link driver/receiver required ? • Can LPGBT survive radiation ? • Build-in modest speed “GBT” in pixel chip. • Multiple intermediate speed (2-5Gbits/s) electrical links • Local opto: • Radiation hardness ? • Compactness ? • Link Speed ? • LPGBT or custom chip or built-in ? • Only driver (laser) and remote receiver (PIN) ?

  27. Summary & conclusions • 1 MHZ trigger rate is a major challenge for the readout of the pixel detector: • Highest hit rates of all detectors • Highest constrains on space/mass/power • Extremely hostile radiation environment Seems feasible at cost of material budget ! • 20us trigger latency can imply larger pixels • Pixel size will be dictated by what we have to put in each pixel in the ROC: Analog + Buffering • Storage (SRAM) cells will have to be large because of radiation. • To be studied in detail • Pixel contribution to trigger: • Data bandwidth out of pixel detector is the bottleneck • Complex processing in ROC will be difficult • Radiation, Power, Area, Complexity • Limited time to evaluate ideas and global gain • Differences to ATLAS may require dedicated CMS pixel chip • ATLAS • L0: 500KHz, 6us, (10% ROI) • (L1: 200KHz, 20us) • Technology and building blocks will come from RD53

  28. What we need to study/understand • Can opto be used within pixel volume • At least for readout: VCSEL + driver + serializer • Radiation, size, power • Where can (opto) services be located: 2m ? • Full detector layout required ? • What is a low mass electrical link ? • Rate, Length, Mass, driver/receiver complication, power • Confirm assumed hit rates: Track rate, cluster size • How does this depend on sensor and layout ? • Additional data for pixel trigger at L1 ? • Seems difficult ! • Question: Why does CMS “need” a 1MHZ trigger rate when it has a track trigger ? • ATLAS aims at 500KHz, without track trigger !.

  29. Backup

  30. 4 x 2 pixel module with link redundancy • Requirements: • Two control inputs per pixel chip with automatic selection of the correct/active one. • Redundant mode: • Only one of the links active. • ½ data bandwidth • Full bandwidth mode: • Data from one pixel chip will be split across the two links. • Control from one link and data to two links. Notice: Only point to point links

  31. Pixel contribution to trigger ? • Many questions and challenges • Self seeded (push): On what basis ? • Double layer Pt cut a la TT: NO • Requires very high rate connections between pixel chips on different layers, + chip overlaps. • Chip to chip bandwidth (minimum): ~50 Gbits/s (inner) • 75 tracks/IC/Bx x 40MHz x ~18bits (optimistic) • Connectivity nightmare in “low” mass pixel detector • “Pt cut” based on cluster shape: NO • Depends strongly on sensor type and sensor thickness that is not yet fixed. • Requires local intelligence to analyze all clusters in real time • We are already very tight in resources for buffering/logic/triggering • Has to be done in extremely hostile radiation environment • Can this work across whole pixel barrel + endcap ? • What is rate reduction and what is physics gain ? • Send coarse information for each cluster @ 40MHz: NO • Requires local and intelligent cluster extraction. • Requires ~50Gbits/s per chip on top of normal readout (5Gbits/s) (inner) • Possibly do one of these for outer layers only (1/10 rate of inner layer): ? • ROI based (Pull, as ATLAS) • Two level trigger ?: Rates, Latency and ROI fraction • High rate L0 trigger: 10MHz ?, 3-6us ?, 10% ROI ?. • Low rate L2 trigger: 100KHz ?, 100us ?. • What to send ? (Full or only coarse cluster information) • Groups still working on this if there is some significant gain at reasonable cost (complexity and data rates from pixel front-ends) • Data bandwidth out of the pixel detector is the bottleneck ( +ROC complexity) • Low power, Logic/buffer density, Radiation, Communication and links, etc. • New realistic ideas must be verified soon !.

More Related