1 / 47

Interconnect Optimizations

Interconnect Optimizations. G. S. D. w. S. h. l. h s. l s. S s. w s. A scaling primer. Ideal process scaling: Device geometries shrink by S ( = 0.7x) Device delay shrinks by s Wire geometries shrink by s R/ m : r /(ws.hs) = r/s 2 Cc/ m : (hs). e /(Ss) = Cc C/ m : similar

karan
Download Presentation

Interconnect Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interconnect Optimizations

  2. G S D w S h l hs ls Ss ws A scaling primer • Ideal process scaling: • Device geometries shrink by S (= 0.7x) • Device delay shrinks by s • Wire geometries shrink by s • R/m: r/(ws.hs) = r/s2 • Cc/m : (hs).e/(Ss) = Cc • C/m: similar • R/m doubles, C/m and Cc/m unchanged

  3. Interconnect role • Short (local) interconnect • Used to connect nearby cells • Minimize wire C, i.e., use short min-width wires • Medium to long-distance (global) interconnect • Size wires to tradeoff area vs. delay • Increasing width  Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem • “Fat” wires • Thicker cross-sections in higher metal layers • Useful for reducing delays for global wires • Inductance issues, sharing of limited resource

  4. Cross-Section of A Chip

  5. Block scaling • Block area often stays same • # cells, # nets doubles • Wiring histogram shape invariant • Global interconnect lengths don’t shrink • Local interconnect lengths shrink by s

  6. Interconnect delay scaling • Delay of a wire of length l : tint= (rl)(cl) = rcl2 (first order) • Local interconnects : tint : (r/s2)(c)(ls)2 = rcl2 • Local interconnect delay unchanged (compare to faster devices) • Global interconnects : tint : (r/s2)(c)(l)2 = (rcl2)/s2 • Global interconnect delay doubles – unsustainable! • Interconnect delay increasingly more dominant

  7. Buffer Insertion For Delay Reduction

  8. Analysis of Simple RC Circuit i(t) R v(t) vT(t) C ± state variable Input waveform

  9. v0u(t) v0 v0(1-e-t/RC)u(t) Analysis of Simple RC Circuit Step-input response: match initial state: output response for step-input:

  10. Delays of Simple RC Circuit • v(t) = v0(1 - e-t/RC) -- waveform under step input v0u(t) • v(t)=0.5v0  t = 0.69RC • i.e., delay = 0.69RC (50% delay) v(t)=0.1v0 t = 0.1RC v(t)=0.9v0 t = 2.3RC • i.e., rise time = 2.2RC (if defined as time from 10% to 90% of Vdd) • Commonly used metric TD = RC (= Elmore delay)

  11. Elmore Delay Delay

  12. Elmore Delay • Driver is modeled as R • Driver intrinsic gate delay t(B) • Delay = all Ri all Cj downstream from Ri Ri*Cj • Elmore delay at n2 R(B)*(C1+C2)+R(w)*C2 • Elmore delay at n1 R(B)*(C1+C2) n1 n2 R(B) B R(w) C1 C2

  13. Elmore Delay • For uniform wire • No matter how to lump, the Elmore delay is the same x unit wire capacitance c unit wire resistance r C

  14. Delay for Buffer u v u C(b) C Driver resistance Input capacitance Intrinsic buffer delay

  15. Buffers Reduce Wire Delay x/2 x/2 R C rx/2 R rx/2 cx/4 cx/4 cx/4 cx/4 C ∆t t_unbuf = R( cx + C ) + rx( cx/2 + C ) t_buf = 2R( cx/2 + C) + rx( cx/4 + C) + tb t_buf – t_unbuf = RC + tb– rcx2/4 x

  16. Combinational Logic Delay Combinational logic delay <= clock period Register Primary Input Register Primary Output Combinational Logic clock

  17. l l1 l2 l3 ln Buffered global interconnects: Intuition Interconnect delay = r.c.l2 Now, interconnect delay =  r.c.li2 < r.c.l2(where l = S lj ) sinceS (lj 2) < (S lj )2 (Of course, account for buffer delay also)

  18. … L Rd – On resistance of inverter Cg – Gate input capacitance r,c – Resistance, cap. per micron l Optimal inter-buffer length • First order (lumped parasitic, Elmore delay) analysis • Assume N identical buffers with equal inter-buffer length l(L = Nl) • For minimum delay,

  19. Optimal interconnect delay • Substituting lopt back into the interconnect delay expression: Delay grows linearly with L (instead of quadratically)

  20. 80 clk-buf 70 buf 60 tot-buf 50 40 % cells used to buffer nets 30 20 10 0 90nm 65nm 45nm 32nm Total buffer count • Ever-increasing fractions of total cell count will be buffers • 70% in 32nm

  21. Feature size (nm) Relative 250 180 130 90 65 45 32 delay 100 Gate delay (fanout 4) Local interconnect (M1,2) Global interconnect with repeaters Global interconnect without repeaters 10 1 Source: ITRS, 2003 0.1 ITRS projections

  22. Buffers Improve Slack RAT = 300 Delay = 350 Slack = -50 slackmin = -50 RAT = 700 Delay = 600 Slack = 100 RAT = Required Arrival Time Slack = RAT - Delay RAT = 300 Delay = 250 Slack = 50 Decouple capacitive load from critical path slackmin = 50 RAT = 700 Delay = 400 Slack = 300

  23. Timing Driven Buffering Problem Formulation • Given • A Steiner tree • RAT at each sink • A buffer type • RC parameters • Candidate buffer locations • Find buffer insertion solution such that the slack at the driver is maximized

  24. Candidate Buffering Solutions

  25. Candidate Solution Characteristics • Each candidate solution is associated with • vi: a node • ci: downstream capacitance • qi: RAT vi is a sink ciis sink capacitance vis an internal node

  26. Van Ginneken’s Algorithm Candidate solutions are propagated toward the source Dynamic Programming

  27. Solution Propagation: Add Wire • c2 = c1 + cx • q2 = q1 – rcx2/2 – rxc1 • r: wire resistance per unit length • c: wire capacitance per unit length x (v1, c1, q1) (v2, c2, q2)

  28. Solution Propagation: Insert Buffer (v1, c1, q1) (v1, c1b, q1b) • c1b = Cb • q1b = q1 – Rbc1– tb • Cb: buffer input capacitance • Rb: buffer output resistance • tb: buffer intrinsic delay

  29. Solution Propagation: Merge • cmerge = cl + cr • qmerge = min(ql , qr) (v, cl , ql) (v, cr , qr)

  30. Solution Propagation: Add Driver (v0, c0, q0) (v0, c0d, q0d) • q0d = q0 – Rdc0 = slackmin • Rd: driver resistance • Pick solution with max slackmin

  31. Example of Solution Propagation • r = 1, c = 1 • Rb = 1, Cb = 1, tb = 1 • Rd = 1 2 2 (v1, 1, 20) Add wire (v2, 3, 16) (v2, 1, 12) v1 v1 Insert buffer Add wire Add wire (v3, 5, 8) (v3, 3, 8) v1 v1 slack = 3 slack = 5 Add driver Add driver

  32. Example of Merging Left candidates Right candidates Merged candidates

  33. Solution Pruning • Two candidate solutions • (v, c1, q1) • (v, c2, q2) • Solution 1 is inferior if • c1 > c2 : larger load • and q1 < q2 : tighter timing

  34. Pruning When Insert Buffer They have the same load cap Cb, only the one with max q is kept

  35. (1) (2) (3) Generating Candidates From Dr. Charles Alpert

  36. (3) (b) (a) Both (a) and (b) “look” the same to the source. Throw out the one with the worst slack (4) Pruning Candidates

  37. (4) (5) Candidate Example Continued

  38. (5) At driver, compute which candidate maximizes slack. Result is optimal. Candidate Example Continued After pruning

  39. Left Candidates Right Candidates Merging Branches

  40. Critical With pruning Pruning Merged Branches

  41. Van Ginneken Example (20,400) Buffer C=5, d=30 Wire C=10,d=150 (30,250) (5, 220) (20,400) Buffer C=5, d=50 C=5, d=30 Wire C=15,d=200 C=15,d=120 (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400)

  42. Van Ginneken Example Cont’d (30,250) (5, 220) (45, 50) (5, 0) (20,100) (5, 70) (20,400) (5,0) is inferior to (5,70). (45,50) is inferior to (20,100) Wire C=10 (30,250) (5, 220) (20,100) (5, 70) (30,10) (15, -10) (20,400) Pick solution with largest slack, follow arrows to get solution

  43. Basic Data Structure Worse load cap (c1, q1) (c2, q2) (c3, q3) Better timing • Sorted list such that • c1 < c2 < c3 • If there is no inferior candidates q1 < q2 < q3

  44. Prune Solution List Increasing c (c1, q1) (c2, q2) (c3, q3) (c4, q4) N N q1 < q2? q1 < q3? q1 < q4? Prune 2 Prune 3 Y Y N q2 < q4? Prune 3 q2 < q3? Y N Prune 4 q3 < q4? N Prune 4 q3 < q4?

  45. Pruning In Merging Left candidates Right candidates ql1 < ql2 < qr1 < ql3 < qr2 (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) Merged candidates (cl1+cr1, ql1) (cl2+cr1, ql2) (cl3+cr1, qr1) (cl3+cr2, ql3) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2) (cl1, ql1) (cl2, ql2) (cl3, ql3) (cr1, qr1) (cr2, qr2)

  46. Van Ginneken Complexity • Generate candidates from sinks to source • Quadratic runtime • Adding a wire does not change #candidates • Adding a buffer adds only one new candidate • Merging branches additive, not multiplicative • Linear time solution list pruning • Optimal for Elmore delay model

  47. Multiple Buffer Types • r = 1, c = 1 • Rb1 = 1, Cb1 = 1, tb1 = 1 • Rb2 = 0.5, Cb2 = 2, tb2 = 0.5 • Rd = 1 2 2 (v1, 1, 20) (v2, 3, 16) v1 (v2, 2, 14) (v2, 1, 12) v1 v1

More Related