Download
fe i4 architecture performance n.
Skip this Video
Loading SlideShow in 5 Seconds..
FE-I4 Architecture & Performance PowerPoint Presentation
Download Presentation
FE-I4 Architecture & Performance

FE-I4 Architecture & Performance

141 Views Download Presentation
Download Presentation

FE-I4 Architecture & Performance

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. FE-I4 Architecture & Performance Marlon Barbero, Universität Bonn 2nd ATLAS CMS Electronics for SLHC, CERN Mar. 04th 2009

  2. FE-I4 for IBL & sLHC • IBL (~2014): inserted layer @ 3.7cm in current pixel detector. • sLHC tentative layout (>2017): pixel layers at 3.7cm, 7.5cm, 16cm, 20cm(note: Discussion on boundary pixel / short strips, …). IBL tentative ID layout for sLHC IBL R~37 2 layers long strips 3 layers short strips FE-I4 fixed 4 layers pixels removable M. Garcia-Sciveres, ACES Mar. 03rd 09

  3. FE-I4: Some Specifications • Pixel size: 50×250μm2. • Pixel array: 80 columns×336 rows = 26880 pixels/FE. • Dimensions FE-I4: ~ 20×19 mm2. • Analog goals: 1.5V, 10μA/pixel. Digital goals: 1.2V, 10μA/pixel. • Analog information: ToT coded on 4 bits. • pseudo-LVDS output: 160Mb.s-1. • Rad.-hardness: >200MRad ionizing dose (FE-I3: >50Mrad). • Minimal guidelines: no ELT, nmos guard rings for analog & sensitive digital circuitry. • Sensor capacitance: 0-0.5pF. • Low noise at low cap. (~100e-). • DC leakage I tolerant to Ileak > 100nA. A. Mekkaoui, ACES Mar. 04th 09

  4. Digital Readout Architecture FE-I3 Both FE readout based on double column (DC) structure FE-I4 • All hit pixels are shipped to EoC buffer. • A hit pixel need to transfer its data to EoC before accepting new hit  congestion. • Each pixel is logically independent inside the DC. bottleneck local storage low traffic on DC bus • Store data locally in DC until L1T. • Only 0.25% of pixel hits are shipped to EoC  DC bus traffic “low”. Warning: Local Buffer Congestion??? • Each pixel tied to its neighbors -time info- (real hits clustered). TW out.

  5. Simulation David Arutinov - Bonn Two sources of inefficiencies are identified in the FE-I4 architecture • Pile-up inefficiency: α (hit rate; mean(ToT); area).  untie neighbor pixels if needed & aggressive return to baseline. • Local buffer overflow:  increase Logic Unit / Local Buffer Region (averaging out effect) & increase # of cells per Local Buffer. nm = 1+n τ ; ; #? Pile-up inefficiency. Mean ToT = 4 Analytical Simulation sLHC 1.9% LHC 0.13% 3xLHC 0.56% n – true interaction rate m – recorded count rate τ - mean ToT

  6. Local Buffer Overflow (2x2) 3xLHC • Local Buffer Overflow Inefficiency for the 3.7cm layer 0.5% - 5 cells 120 BX Simulation Analytical 0.1% - 6 cells 0.01% ~ 7-8 cells x6 3xLHC sLHC Latency 120 BX Latency (BX)

  7. Inefficiency FE-I4 2x2 x6 0.6% At 3 times LHC luminosity, r~3.7cm, FE-I4 inefficiencies should be in acceptable range Mean ToT = 4

  8. Towards a reference design Now all pixels in buffer area are ‘semi’-tied together. Due to the smaller radius (3.7 cm vs. 5.05 cm) charge sharing in Z becomes comparable with r/phi. 4 pixel region DC η=2.5 η=0 pn= Erland-B function n: buffer occupied. k: total # of buffer. ρ=λμ, with λ: hit probability. μ: busy time.

  9. Region Schematic • A 4-pixel unit with these functionalities: • Time-Stamping (up to 5 stored at a time). • ToT coded on 4 bits: no hit, small hit, long hit, analog values. • Neighbor bit. • Small hit  Available to Neighbor Region . hit proc.: TS/sm/big/ToT Token disc. top left disc. top right Read & Trigger 5 ToT memory /pixel disc. bot. left disc. bot. right L1T Read Neighbor 5 latency counter / region

  10. Digital Column Architecture • 168 regions + CLK + buffering scheme  1 Double-Column • Simple buffer. • H-tree. • Delay compensated for skew balancing.

  11. Work in progress Tomasz Hemperek - Bonn region symbol region layout 188mm 94mm 50m delay matching - clk 50m x8 DC schem. drop on vdd addresses

  12. FE-I4 Performance • Inefficiency 3.7cm @ 3xLHC: 0.56% (double-hit) + 0.05% (5-deep buffer overflow).( ~0.35%+~0.0065% for 16cm sLHC -50ns bx-) • Area:  cells from provider, 100 x 102 um2. • Power:  1 hit/bx/DC, 100kHz L1T, 2.6uW / pixel. Warning: This is before adding any buffering, clock distribution…  5-6uW digital total?

  13. Needs in Periphery • Focus needs to shift to periphery. • Command decoder  L1T / configuration. • Ctrl block  handles token pass, read request to DC, readout from DC. • Data Formatting  Data Output Protocol, compression, 8b10b. • Data transmission  pseudo-LVDS output from fast CLK. • Power blocks  regulator. • Pad frame.

  14. Status of FE-I4 Periphery Pix Array: 80×336 pixel array L1T, token, read, … token token 28 b × 40 DC EoC EoC EoC L1T, token, read, … data compression config. monitoring Periphery: Bypass-able data formatting (protocol) with error detection (parity/CRC?) pixel config Asynch. FIFO : in “advanced stage” : “effort needed” trigger FIFO ctrl block Bypass-able L1T 160MHz global config PLL, 40MHz in, 160MHz out DACs interface clk select 40MHz ‘LVDS’-out 160Mb/s Powering aux 2

  15. Summary FE-I4 Architecture • Lot of work performed during last 4-6 months on digital region + digital Double-Column.  will remain high in priority list in coming months (performance studies, improvements, optimization) . • Focus has started shifting to FE periphery.  Much effort needed there (interface, data output protocol, control block…). • Validation, testability. • Milestones 2009: • Reviews foreseen for 2009, March and early summer. • Full scale design completed: fall 2009. Needless to say, this is an aggressive schedule.

  16. FE-I4_proto1 collaboration FE-I4-P1 3mm • Participating institutes: Bonn, CPPM, Genova, LBNL, Nikhef. Bonn: D. Arutinov, M. Barbero, T. Hemperek, M. Karagounis. CPPM: D. Fougeron, M. Menouni. Genova: R. Beccherle, G. Darbo. LBNL: R. Ely, M. Garcia-Sciveres, D. Gnani, A. Mekkaoui. Nikhef: R. Kluit, J.D. Schipper LDORegulator 61x14 array Control Block ChargePump CapacitanceMeasurement SEU test IC 4mm DACs CurrentReference ShuLDO+trist LVDS/LDO/10b-DAC 4-LVDS Rx/Tx

  17. backup BACKUP SLIDES

  18. FE-I4 • Originally developed as an IC for b-layer upgrade. • Similar bandwidth for IBL and outer layers at sLHC ~2017 + schedule construction sLHC outer layers sooner than insertable inner layers  FE-I4 a good fit for both projects. • FE-I4 for IBL requires: • hit rate ×4-5 wrt FE-I3, 5cm. • small pixel & big chip (active fraction). • compatible w. present RO & ctrl. • compatible w. different sensor types. • FE-I4 for outer layers @ sLHC requires: • big chip for costs reduction. • compatible w. sLHC RO & ctrl. • lower current & compatible new powering schemes.

  19. Motivation for re-design of FE FE-I3FE-I4 • Need for new FE: • Smaller b-layer radius + potential luminosity increase  higher hit rate FE-I3 column-drain architecture saturated.  FE-I4 has new digital architecture.  FE-I4 has smaller pixel (reduced cross-section). • Enhancements brought to FE-I4: • Improved active area ratio (<¾0.9):  Bigger IC; reduced periphery; cost. • Power: • Analog design for reduced currents; decrease of digital activity (digital logic sharing for neighbor pixels); new powering concepts. • Adapt to sensor technologies with different cap. / leak. • New technology:  Availability, rad-hard, higher integration density for digital circuits. FE-I3 (5cm) 3xLHC sLHC LHC inefficiency Hit prob. / DC 0.25μm130nm

  20. 4-pixel / 8-pixel • Local Buffer Overflow Inefficiency Quadri-pixel vs. Octo-pixel. Averaging out effect.

  21. FE-I4 geometry • 250 μm × 50 μm. • Array: 80 columns × 336 rows. • No bricking. 20.2mm 7.6mm ~200μm 16.8mm ~19 mm active IBM reticule 8mm active 2.8mm ~2mm Chartered reticule (24 x 32) FE-I3 74% FE-I4 ~89% vendor’s max chip size: 21mm×19.5mm (review when above 20mm)

  22. Some target specs for FE-I4 • Rad.-hardness: >200MRad ionizing dose (FE-I3: >50Mrad). • Minimal guidelines: no ELT, nmos guard rings for analog & sensitive digital circuitry. • Sensor capacitance: 0-0.5pF. • Low noise at low cap. (~100e-). • DC leakage I tolerant to > 100nA.

  23. Clock Multiplier I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • 2 options: • send a 80MHz CLK to the FE and use both edges to transmit • Needs modification of BOC / ROD to produce higher speed TTC • Needs synchronization protocol on the FE between 80MHz clock & beam crossing. • A new DORIC needs to decode CLK at twice frequency • send a 40MHz CLK to the FE and multiply clock on FE • Needs a clock multiplier on chip • Note: synergy with what the strip MCC need • In FE-I4, we will provide both options: • Clock multiplier from the 40MHz input clock • AUX: possibility to send the 80MHz to the FE

  24. 8b10b encoder I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • For IBL, need to transmit data out at BW of 160Mb/s • At BOC/ROD: • Data rate 4 times the clock rate • Phase adjustment • Use Clock Data Recovery mechanism • CDR requires an output data stream with good engineering properties • 8b10b: • adequate for this purpose, enough transitions for reliable CDR • widely used  easy to implement • provides some level of error detection • provides comma for frame identification & synchronization

  25. PLL Overview Voltage Controlled Oscillator Charge Pump Loop Filter Phase Frequency Detector 640 MHz 40 MHz Frequency Divider Conversion and Buffering

  26. Analog Readout Chain • In FE-I4_proto1 (FE-I4 prototype submitted spring 2008): • 2-stage architecture optimized for low power, low noise, fast rise time.  Additional gain, Cc/Cf2~6.  More flexibility on choice of Cf1.  Qcoll less dependant on Cdetect.  2nd stage decoupled from leakage related DC potential shift. • 12b configuration:  FDAC: tuning feedback current.  TDAC: tuning of discriminator threshold.  Local charge injection circuitry. TDAC 50 mm Amp2 discri Preamp FDAC Config Logic 145 mm

  27. Irradiation in 2008 • Sept. Los Alamos 800MeV p+  FE-I4-Proto1 FE, #1 (50Mrad) & #2 (100Mrad)‏ • Oct. CERN 20GeV p+  SEU test chip + LVDS test chip (used for interface and received a low parasitic dose ) • Dec. Los Alamos 800Mev p+ • FE-I4-Proto 1 chips #2 (an additional 100MRad) and #3 (200MRad)‏ • LVDS chips #1,#2 and #3,#4 Laser along beam line LVDS RxTx FE-I4-proto1 Beam stop

  28. SEU-hardened latch • CPPM has studied the influence of various layout of a DICE latch on the SEU x-section. Physical separation of sensitive node pairs. Latch5.1 and latch5.2 ; Area :12µm × 4µm = 48 μm2 nMos separation : 7µm ; pMos separation : 3 µm Triple Redundant Logic with Interleaved Layout. Calin et al, IEEE Trans. Nucl. Sci. vol43, n.6, 1996 • X-section [cm2.bit-1]: • Standard Latch: ~ 5.10-14 • DICE w. improved layout: ~ 3.10-16 1.a 2.a 3.a 1.a 2.a 3.a 1.b 2.b 3.b 1.b 2.b 3.b X-section : < 1.10-17

  29. LVDS transciever IBM 130nm • For IBL and outer layers sLHC, need for a 320Mb.s-1 BW/ LVDS i/0. • LVDS transciever IC irradiated up to ~180Mrad. No degradation observed. 1.8mm tests with differential probe and 100 Ω on board term. @ 1.2V supply TX output Chained RxTx output @ 320 MHz Clock Clock-Rate 1050mV 320MHz 600mV 160MHz 150mV 40MHz Common Mode Voltage 0.8mm

  30. Output Stage: PLL & 8b10b I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force • Compatibility w. current BOC / ROD. • Clock multiplier from the 40MHz input clock • Classic PLL design: Phase Freq. Detector, Loop Filter, Voltage Controlled Oscill., Freq. Divider. • Phase Frequency Detector w. Upset Detection Unit. • Settling in 1.2μs; fast recovery from SEU in divider & Vctrl. • 8b10b: • higher frequency clk & data  recover clk from data. • balanced coding for Clock Data Recovery in BOC / ROD. • some nice features (error detection, frame alignment). • Both blocks by-passable for maximum flexibility.

  31. Out-Stage: tri-state pseudo-LVDS • MUXing FE output for outer layers. • M3-M6 steered by tri-state logic block  all switch can be left open  hZ. • tri-state LVDS submitted. Testing is starting. Tri-State logic

  32. Others • Note: • Low power comparator. • Failsafe mechanism of LVDS receiver. • Pad-frame. • LDO with new 0-cell. • ShuLDO. • Vin= 1.6V • Vout= 1.2V1.5V • Zero is introduced in the open loop transfer function by a frequency dependent voltage controlled current source • Less peaking of Vout in comparison with compensation by • RESRof Cout output. Talk M. Karagounis -ID Powering -Tue. 24th 2009

  33. MC events _ • Events: (Pythia generator) • WH(120GeV); Hbb. • overlaid with: 24 / 75 / 240 / 400 events pileup. “LHC”/“3×LHC”/ “sLHC” (25ns / 50ns bx) • Sensor: Un-irradiated planar sensor, 260μm width. Note: 3D simulation in progress • Geometry: (Geant3 simulation package) • pixel size: FE-I3: 400×50μm2; FE-I4: 250×50μm2. • first: 4 barrels, 3.7 (FE-I4) & 5.05/8.85/12.25 cm radius FE-I3. • new: 6 barrels, 3.7/5.05/8.85/12.25/16/21 cm radius FE-I4. • Threshold: first  3750e-. New  down to 1000e-. V. Kostyukhin -3D Si- Mon. 23rd 2009

  34. Foreword: Minimal Bias events • FE-I4 for: - b-layer upgrade: luminosity? radius?  75 ev pile-up & 3.7cm. - s-LHC: lumi.? radius?  240/400 ev pile-up & outer layer. • Extrapolation to LHC energy:  extrapolation @ 14TeV: uncertainty ~ 30%? (1st years operation crucial to feedback simulation) <pt charged particle> at η=0 <# charged particles> / interaction

  35. 3×LHC / b-layer replacement FE-I3, 50μm×400μm. FE-I4 simul., 50μm×250μm. r [mm] η=1.0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 200 η=1.2 160 1.41 1.24 1.26 1.26 1.37 1.34 1.33 122.5 120 rates given in [pixel hits.bx-1cm-2] η=1.5 2.55 2.56 2.54 2.55 2.64 2.65 2.64 88.5 80 η=2.0 6.30 6.46 6.03 5.85 5.91 6.46 6.11 50.5 η=2.5 40 12.10 11.53 12.01 11.85 11.72 12.11 8.02 37 η=3.0 η=3.5 z [mm] 0 600 0 100 200 300 400 500

  36. 10×LHC (25ns bx) / sLHC FE-I4, 50μm×250μm. FE-I4 simul., 50μm×250μm. FE-I4 Nigel, 50μm×250μm. FE-I4 sdtf 220908, 50×250μm2. r [mm] mean: 2.3 210 201 η=1.0 η=0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 200 η=1.2 160 rates given in [pixel hits.bx-1cm-2] 150 131 122.5 mean: 4.7 120 η=1.5 mean: 7.8 88.5 80 η=2.0 70 mean: 19.5 50.5 η=2.5 40 36.89 35.76 35.97 36.46 35.94 33.26 23.23 mean: 35 37/37 η=3.0 η=3.5 z [mm] 0 0 100 200 300 400 500 600 324 524

  37. 10×LHC (50ns bx) / sLHC FE-I4, 50μm×250μm. FE-I4 simul., 50μm×250μm. FE-I4 Nigel, 50μm×250μm. FE-I4 sdtf 220908, 50×250μm2. r [mm] mean: 3.9 210 201 η=1.0 η=0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 200 η=1.2 160 rates given in [pixel hits.bx-1cm-2] 150 131 122.5 mean: 8.4 120 η=1.5 mean: 13.4 88.5 80 η=2.0 70 mean: 34 50.5 η=2.5 40 61.18 58.74 60.02 60.12 59.15 55.10 38.67 mean: 60 37/37 η=3.0 η=3.5 z [mm] 0 0 100 200 300 400 500 600 324 524

  38. Extrapolations to other radius sLHC, 50ns bx / 400 events pileup Hits/mm2 sLHC, 25ns bx / 240 events pileup Reasonable fit with:exp(1.34-0.57*R)+0.15-0.0053*R Hits/mm2 Reasonable fit with:exp(0.86-0.58*R)+0.088-0.0031*R r [cm] r [cm]

  39. Pixel occupancy  Data bandwidth • Pixel hit rate  FE output bandwidth: • # bits / pixel transmitted? • address 7+9 bits, analog info 4+2 bits 22b? • data output protocol? • Reduce data output by taking into account clustered nature of real physics hits. NUMBER OF PIXELS 3xLHC FE-I4, central module, 3.7cm layer 3xLHC 10xLHC FE-I4, central module, 3.7cm layer FE-I4, central module, 21cm layer

  40. Pixel occupancy  Data bandwidth preliminary assumption: 100kHz L1T, 336×80 pixels FE-I4 • Example 3: clustered data out with fixed format. • compression factor (all at 3×LHC) 3.7cm (vs. 21cm), η=0 • indiv pixels: 4.09 (0.25)×(7+9+4+2)= 1.00 (1.00) A.U. • static 1×2: 3.45 (0.18)×(7+8+2×4+2)=0.96 (0.83) A.U. • dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 0.87(0.74) A.U. • static 1×4: 2.86 (0.17)×(6+8+4×4+4)=1.08(1.08) A.U. • dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 0.95(0.95) A.U. • dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 0.85(0.94) A.U. (×336) column NL row 106.count.FE-1.s-1 row ToT DC (×40) dyn. 1×4 better at small R? (larger η!) dyn. 1×2 at large R? Disclaimer: no header, trailer, DC-balancing, error correction… For reference in backup slides: same at higher η

  41. Pixel occupancy  Data bandwidth preliminary assumption: 100kHz L1T, 336×80 pixels FE-I4 • Example 3: clustered data out with fixed format. • compression factor (all at 3×LHC) 3.7cm mod.4 (vs. 21cm mod.6), • indiv pixels: 3.96 (0.26)×(7+9+4+2)= 1.00 (1.00) A.U. • static 1×2: 3.38 (0.20)×(7+8+2×4+2)=0.97 (0.87) A.U. • dynamic 1×2: 3.05 (0.18)×(7+9+2×4+2)= 0.91(0.79) A.U. • static 1×4: 2.28 (0.17)×(6+8+4×4+4)= 0.89(1.01) A.U. • dyn. in-DC 1×4: 2.00 (0.15)×(6+9+4×4+4)= 0.80(0.91) A.U. • dynamic 1×4: 1.88 (0.14)×(7+9+4×4+4)= 0.78 (0.85) A.U. (×336) column NL row 106.count.FE-1.s-1 row ToT DC (×40) dyn. 1×4 better at small R? (larger η!) dyn. 1×2 at large R? Disclaimer: no header, trailer, DC-balancing, error correction…

  42. Data BW for IBL @ 3.7cm

  43. Data BW for sLHC