1 / 52

ECE260B – CSE241A Winter 2005 Power Distribution

ECE260B – CSE241A Winter 2005 Power Distribution. Website: http://vlsicad.ucsd.edu/courses/ece260b-w05. Motivation. Power supply noise is a serious issue in DSM design Noise is getting worse as technology scales Noise margin decreases as supply voltage scales

kawena
Download Presentation

ECE260B – CSE241A Winter 2005 Power Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE260B – CSE241AWinter 2005Power Distribution Website: http://vlsicad.ucsd.edu/courses/ece260b-w05

  2. Motivation • Power supply noise is a serious issue in DSM design • Noise is getting worse as technology scales • Noise margin decreases as supply voltage scales • Power supply noise may slow down circuit performance • Power supply noise may cause logic failures

  3. Vcc Vss Vcc Vss Vcc Power = … • Routing resources • 20-40% of all metal tracks used by Vcc, Vss • Increased power  denser power grid • Pins • Vcc or Vss pin carries 0.5-1W of power • Pentium4 uses 423 pins; 223 Vcc or Vss • More pins  package more expensive (+ package development, motherboard redesign, …) • Battery cost • 1kg NiCad battery powers a Pentium 4 alone for less than 1 hour • Performance • High chip temperatures degrade circuit performance • Large across-chip temperature variations induce clock skew • High chip power limits use of high-performance circuits • Power transients determine minimum power supply voltage

  4. Fan Heat Sink Processor Processor Pins Integrated Heat Spreader Decoupling Capacitors OLGA Pins Package Pins Interposer Power = Package Pentium4 die is about 1.5g and less than 1cm^3 Pentium-4 in package with interposer, heat sink, and fan can be 500g and 150cm^3 Modern processor packaging is complex and adds significantly to product cost. http://www.intel.com/support/processors/procid/ptype.htm Courtesy M. McDermott UT-Austin

  5. Planning for Power • Early simulation of major power dissipation components • Early quantification of chip power • Total chip power • Maximum power density • Total chip power fluctuations • inherent & added fluctuations due to clock gating • Early power distribution analysis (dc, ac, & multi-cycle) • I.e., average, maximum, multi-cycle fluctuations • Early allocation & coordination of chip resources • Wiring tracks for power grid • Low Vt devices • Dynamic circuits • Clock gating • Placement and quantity of added decoupling capacitors

  6. Power and Ground Routing • Floorplanning includes planning how the power, ground and clock should route • Power supply distribution • Tree: trunk must supply current to all branches • Resistance must be very small since when a gate switches, its current flows through the supply lines • If the resistance of supply lines is too large, voltage supplied to gates will drop, which can cause the gate to malfunction • Usually, want at most 5-10% IR drop due to supply resistance •  Usually on the top layers of metal, then distributed to lower wiring layers

  7. cut line VDD VSS cell VDD no cut line VSS cut line VDD VSS no connection Planar Power Distribution • Topology of VDD/VSS networks. • Inter-digitated • Design each macrocell such that all VDD and VSS terminals are on opposite sides. • If floorplan places all macrocells with VDD on same side, then no crossing between VDD and VSS. VDD B VSS C VDD VSS A VDD VSS VDD VSS VDD VSS Courtesy K. Yang, UCLA

  8. Gridded Power Distribution • With more metal layers, power is striped • Connection between the stripes allows a power grid • Minimizes series resistance • Connection of lower layer layout/cells to the grid is through vias • Note that planar supply routing is often still needed for a strong lower layer connection. • There may not be sufficient area to make a strong connection in the middle of a design (connect better at periphery of die) Courtesy K. Yang, UCLA

  9. Power Supply Drop/Noise • Supply noise = variations in power supply voltage that act as noise source for logic gates • Power supply wiring resistance  voltage variations with current surges • Current surges depend on dynamic behavior of circuit • Solution approach • Measure maximum current required by each block • Redesign power/ground network to reduce resistance • Worst case: move activity to another clock cycle to reduce peak current  scheduling problem • Example: Drive 32-bit bus, total bus wire load = 2pF, with delay 0.5ns • R for each transistor needs to be < 0.25kW to meet RC = 0.5ns • Effective R of bits together is 250/32 = 7.5W • For < 10% drop, power distribution R must be < 1W Courtesy K. Yang, UCLA

  10. Electromigration • Physical migration of metal atoms due to “electron wind” can eventually create a break in a wire • MTTF (mean time to failure)  1/J2 where J= current density • Current density must not exceed specification  wire Ii/wi < Jspec • Specified as mA per m wire width (e.g., 1mA/ m) or mA per via cut • EM occurs both in signal (AC=bidirectional) and power wires (DC = unidirectional) • Much worse for DC than AC; DC occurs inside cells and in power buses • May need more contacts on transistor sources and drains to meet EM limits • Width of power buses must support both iR and EM requirements • Issues in IR and EM constraint generation • Topology is most likely not a tree • How do we determine current patterns? • Effects of R, L

  11. What Happens? • Example of an AlCu line seen under microscope. • Accelerated by higher temperature and high currents • Voids form on grain boundaries • Metal atoms move with current away from voids and collect at boundaries Catastrophic failure Courtesy K. Yang, UCLA

  12. Taken from http://www.nd.edu/~micro/fig20.html Taken from Sverre Sjøthun, “Electromigration In-Depth,” from www.dpwg.com Courtesy S. Sapatnekar, UMinn

  13. Power Supply Rules of Thumb • Rules depend on technology • Tech file has rules for resistance and electromigration • Examples: • Must have a contact for each 16l of transistor width (more is better) • Wire must have less than 1mA/mm of width • Power/Gnd width = Length of wire * Sum (all transistors connected to wire) / 3*106l (very approximate) • For small designs, power supply design is non-issue Courtesy K. Yang, UCLA

  14. Basic Methodology Concepts • Reliability (slotting, splitting) • Alignment of hierarchical rings, stripes • Isolation of analog power • Styles of power distribution • Rings and trunks • Uniform grid • Bottom-up grid generation • Depends on: • Package: flip-chip vs. wire-bond; I/O count (fewer pads  denser grid) • Power budget • IR drop limits • Floorplan constraints (hard macros, etc.)

  15. Metal Slotting vs. Splitting Easy connections through standard via arrays • Required by metal layout rules for uniform CMP (planarization) • Split power wires • Less data than traditional slotting • More accurate R/C analysis of power mesh • Not supported by all tools GND GND GND GND VS. M1 M1 Difficult to connect - where should vias go? Courtesy Cadence Design Systems, Inc.

  16. Trunks and Rings Methodology • Each Block has its own ring • Rings may be inside the blocks or part of the top level • Each Block has trunks connecting top level to block G G V V Rings may be shared with abutted blocks Individual trunks connecting blocks to top level block 3 V V block 5 G G block 2 V block 4 G V block 1 V G G G V V G V Courtesy Cadence Design Systems, Inc.

  17. Advantages Power tailored to the demands of each block (flexible) More area efficient since the demands of each block are uniquely met Simple implementation supported by many tools Rings can be shared between blocks by abutted blocks Disadvantages Limited redundancy, power grid built to match needs Assumptions in design may change or be invalid Non regular structure requires more detailed IR drop/EM analysis missing vias/connections fatal Rings will require slotting/splitting due to wide widths Increase in data volume Trunks and Rings Courtesy Cadence Design Systems, Inc.

  18. Uniform Chip Grid Methodology • Robust and redundant power network • mainly in microprocessors and high end large ASICs • Implementation • Primary distribution through upper metal layers • Lower layers in blocks to connect to top through via stacks • Typically pushed into blocks • Blocks typically abut • Requires block grids to align • Rows/Followpins should align with block pins • Global buffer insertion global grid higher layers Fine or custom grid or no grid on lower layers G G V V V V block 4 block 5 G G block 3 V block 4 G V block 1 V G G G V V G V Courtesy Cadence Design Systems, Inc.

  19. Advantages Easily implemented Lends itself to straightforward hand calculations Path redundancy allows less sensitively to changes in current pattern Mesh of power/ground provides shielding (for capacitance) and current returns (for inductance) Top-down propagation easy to use on this style Disadvantages Takes up significant routing resources (20%-40% of all routing tracks if not already reserved for power/ground) Fine grids may slow down P&R tools Imposes grid structure into each block which may be unnecessary Top and blocks coupled closely if top level routing pushed into blocks Changes to block/top must be reflected in other Uniform Chip Grid Courtesy Cadence Design Systems, Inc.

  20. Bottom-Up Grid Generation Methodology • Design and optimize power grid for block, merge at top • Advantages • Able to tailor grid for routing resource efficiency in each block • Flexibility to choose the best grid for the block (i.e. ring and stripe, power plane, grid) • Disadvantages • Designing grid in context of the “big picture” is more difficult • Block grid may present challenging connections to top level • Assumptions for block grid’s connection to top level must be analyzed and validated Courtesy Cadence Design Systems, Inc.

  21. Power Routing in Area-Based P&R • Power routing approaches • (1) Pre-route parts of power grid during floorplanning • (2) Build grid (except connections to standard cells) before P&R • (3) Build entire grid before P&R • N.B.: Area-based P&R tools respect pre-routes absolutely • Cadence tools: power routing done inside SE, all other tasks (clock, place, route, scan, …) done by point tools • Lab 5 tomorrow has a tiny bit of power routing (rings, stripes) • Miscellany • ECOs: What happens to rings and trunks if blocks change size? • Layer choices: What is cost of skipping layers (to get from thick top-layer metal down to finer layers)? • How wide should power wires be? • Post-processing strategies Courtesy Cadence Design Systems, Inc.

  22. Power Routing Wire Width Considerations • Slotting rules: Choose maximum width below slotting width • Halation (width-dependent spacing) rules: Do as much as possible of power routing below wide wire width to save routing space • Choose power routing widths carefully to avoid blocking extra tracks (and, use the space if blocking the track!) What is better power width here? Blocked tracks Courtesy Cadence Design Systems, Inc.

  23. Power Routing Tool Usage • 4 layer power grid example (HVHV) • Turn on via stacking • Route metal2 vertically • Route metal4 vertically (use same coordinates) • Route metal3 horizontally (make coincident with every N metal1 routes) • Turn off via stacking • Route metal1 horizontally metal2/metal4 coincident metal1 inside cells metal3 every n micron Courtesy Cadence Design Systems, Inc.

  24. Post-Processing Flows (DEF or Layout Editing) During PnR After post processing Courtesy Cadence Design Systems, Inc.

  25. (Tree) Supply Network Design • Tree topology assumption not very useful in practice, but illustrates some basic ideas • Assume R dominates, L and C negligible • marginally permissible assumption • Current drawn at various points in the tree (time-varying waveform) • Current causes a V=IR drop • “Ground” is not at 0V • “Vdd” is not at intended level Supply = sinks Courtesy S. Sapatnekar, UMinn

  26. IR Drop Constraints • Chowdhury and Breuer, TCAD 7/88 • Can write V drop to each sink as •  Ri Ii < Vspec for all sink current patterns made available • Tree structure: can compute Ii easily • Ri   li / wi • Change wi to reduce IR drop • Objective: minimize  ai wi • Current density must never exceed a specification • For each wire, Ii/wi < Jspec Supply Courtesy S. Sapatnekar, UMinn

  27. P/G Mesh Optimization (R only) • Dutta and Marek-Sadowska, DAC 89 • Cost function:  ai li wi =  ai cili2 // = total wire area (since ci = conductance = wi/( li) • Constraints • EM: Ii e wi// current density I/w less than upper bound • Substitute Ii = vi (wi/  li) // I = V/R vp - vq  e  li // divide by wi, *  li • Wire width constraints: Wmin  wi  Wmax (translate to ci) • Voltage drop constraints: va - vb  Vspec1 and/or vi  Vspec2 • Circuit equations that determine the v’s • Variables: ci’s (vi’s depend on ci’s) Courtesy S. Sapatnekar, UMinn

  28. Solution Technique • Method of feasible directions • Find an initial feasible solution (satisfies all constraints) • Choose a direction that maintains feasibility • Make a move in that direction to reduce cost function • Given a set of ci’s, must find corresponding vi’s • Feasible direction method: move from point c* to c+ • c* and c+ must be close to each other (i.e., if you have the solution at c*, the solution at c+ corresponds to a minor change in conductances) • Solving for vi’s : solving a system of linear equations • Solution at c* is a good guess for the solution at c+ • Converges in a few iterations Courtesy S. Sapatnekar, UMinn

  29. Modeling Gate Currents • Currents in supply grid caused by charging/discharging of capacitances by logic gates • All analyses require generation of a “worst-case switching” scenario • Enumeration is infeasible  Two basic approaches • Simulation based methods: designer supplies “hot” vectors, or we try to generate these hot vectors automatically • “Pattern-independent” methods: try to estimate the worst-case (can be expensive, very inaccurate) • Once current patterns are available, apply them to supply network to find out if constraints are satisfied Courtesy S. Sapatnekar, UMinn

  30. Complexity of Hot Vector Generation • Devadas et al., TCAD 3/92: • Assume zero gate delays for simplicity • Find the maximum current drawn by a block of gates • Using a current model for each gate • Find a set of input patterns so that the total current is maximized • Boolean assignment problem: equivalent to Weighted Max-Satisfiability • Given a Boolean formula in conjunctive normal form (product of sums), is there an assignment of truth values to the variables such that the formula evaluates to True? • Checking for Satisfiability (for k-sat, k > 2) is NP-complete •  Difficult even under zero gate delay assumption Courtesy S. Sapatnekar, UMinn

  31. Pattern-Independent Methods • iMAX approach: Kriplani et al., TCAD 8/95 • Current model for a single gate • Gates switch at different times • Total current drawn from Vdd (ignoring supply network C) is the sum of these time-shifted waveforms • Objective: find the worst-case waveform Ipeak  Delay Courtesy S. Sapatnekar, UMinn

  32. (Not to scale!) Example • Maximum current not just a sum of individual maximum currents • Temporal dependencies • [Using deliberate clock skews can reduce the peak current, as we saw in the Useful-Skew discussion] Courtesy S. Sapatnekar, UMinn

  33. Maximum Envelope Current (MEC) • Find the time interval during which a gate may switch • Manufacturing process variations can cause changes • Actual switching event can cause changes • Switching at second gate can occur at t=1 or at t=2 • In general, a large number of paths can go through a gate; assume (conservatively) that switching occurs in t  [1,2] • Assume that all gate inputs can switch independently – provides an upper bound on the switching current (unit gate delays) Courtesy S. Sapatnekar, UMinn

  34. G1 G2 G3 (Large) Errors in MEC Approach • Correlation Problem • Switching at G0, G1, G2 and G3 not independent • G0 = 0 implies that G1, G2, G3 switch; G0 = 1 means that other inputs will determine gate activity • If the other inputs cannot make the gate switch in the same time window, then iMAX estimates are pessimistic • Reconvergent Fanout Problem • Signals that diverge at G0 reconverge at Gk  inputs to Gk are not independent • Assumption of independent switching is not valid • Many heuristic refinements proposed, but guardbanding (error) of power estimation still a huge issue G0 G1 Gk G2 G0 G3 Courtesy S. Sapatnekar, UMinn

  35. Outline • Motivation • Power Supply Noise Estimation • Decoupling Capacitance (decap) Budget • Allocation of Decoupling Capacitance • Experiment Results • Conclusion

  36. Why Decoupling Capacitance • Frequency point of view • Decaps form low-pass filters • They cancel anti- effects • Physical point of view • Decaps serve as charge reservoirs • They shortcut supply current paths and reduces voltage drop • No effect to DC supply currents

  37. Power Supply Network—RLC Mesh VDD :Current Source Rp Lp : VDD pin VDD VDD VDD Slide courtesy of S Zhao, K Roy & C.-K. Kok

  38. Current Distribution in Power Supply Mesh Illustration Current contribution Current flowing path :Connection point, VDD (1) (3) :VDD pin (5) VDD (2) (6) C B Module A Slide courtesy of S Zhao, K Roy & C.-K. Kok

  39. Current Distribution in Power Supply Network • Distribute switching current for each module in the power supply mesh • Observation: Currents tend to flow along the least-impedance paths • Approximation: Consider only those paths with minimal impedance --shortest, second shortest, … Slide courtesy of S Zhao, K Roy & C.-K. Kok

  40. i i 3(t) 1(t) R1 L1 C2 2(t) Current Flowing Paths and Power Supply Noise Calculation • Power supply noise at a target module is the voltage difference between the VDD pin and the module • Apply KVL: VDD R2 L2 k C1 i Slide courtesy of S Zhao, K Roy & C.-K. Kok

  41. i i 1(t) 3(t) Why Decoupling Capacitance? • P/G network wiresizing won’t change voltage drop frequency spectrum • To reduce Vdrop by k times needs to size up wires by k times along the supply current path VDD R2 L2 k C1 R1 L1 C2 i 2(t) • Decoupling caps act as a low-pass filter • Efficient to remove high frequency elements of Vdrop

  42. Decoupling Capacitance Budget • Decap budget for each module can be determined based on its noise level • Initial budget can be estimated as follows: • Iterations are performed if necessary until noise at each module in the floorplan is kept under certain limit Slide courtesy of S Zhao, K Roy & C.-K. Kok

  43. Allocation of Decoupling Capacitance • Decap needs to be placed in the vicinity of each target module • Decap requires WS to manufacture on • Use MOS capacitors • Decap allocation is reduced to WS allocation • Two-phase approach: • Allocate the existing WS in the floorplan • Insert additional WS into the floorplan if required Slide courtesy of S Zhao, K Roy & C.-K. Kok

  44. Allocation of Existing White Space WS A B D w2 C w1 E w3 Slide courtesy of S Zhao, K Roy & C.-K. Kok

  45. Objective: Maximize the utilization of available WS Existing WS can be allocated to neighboring modules using LP Notation: LP Approach: Allocation of Existing WS--Linear Programming (LP) Approach Slide courtesy of S Zhao, K Roy & C.-K. Kok

  46. Insert Additional WS into Floorplan If Necessary • Update decap budget for each module after existing WS has been allocated • If additional WS if required, insert WS into floorplan by extending it horizontally and vertically • Two-phase procedure: • insert WS band between rows based the decap budgets of the modules in the row • insert WS band between columns based on the decap budgets of the modules in the column Slide courtesy of S Zhao, K Roy & C.-K. Kok

  47. Moving Modules to Insert WS Slide courtesy of S Zhao, K Roy & C.-K. Kok

  48. Experimental ResultsComparison of Decap Budgets(Ours vs “Greedy Solution”)

  49. Experimental Results for MCNC Benchmark Circuits

  50. Floorplan of playout Before/After WS Insertion

More Related