Directions in low power cad
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Directions in Low-Power CAD PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

Directions in Low-Power CAD. Dennis Sylvester University of Michigan [email protected] http://vlsida.eecs.umich.edu With acknowledgements to: Prof. David Blaauw, Dr. Sarvesh Kulkarni, Saumil Shah, Kavi Chopra. Topics. A new dual-Vth assignment formulation Dual-Vdd power distribution

Download Presentation

Directions in Low-Power CAD

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Directions in Low-Power CAD

Dennis Sylvester

University of Michigan

[email protected]

http://vlsida.eecs.umich.edu

With acknowledgements to: Prof. David Blaauw, Dr. Sarvesh Kulkarni, Saumil Shah, Kavi Chopra


Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


Motivation

  • We require high-performance yet low-power circuits

  • Leakage power contributes significantly to total power

  • All High- Vth implementation too slow

  • All Low-Vth implementation too leaky

  • Dual- Vth processes popular

  • Problem Definition

    • Minimize

      • Total Circuit Power

    • Subject to

      • Circuit Delay Constraint

      • Sizing Constraints

    • Optimization Variables

      • Gate Sizes

      • Gate Threshold Voltages

S. Narendra et al [ICCAD ’03]

Switching

Subthreshold

leakage


Gate Sizing + Vth Assignment Problem Prior Work

  • Traditionally a discrete problem

  • Previous approaches

    • Separate Sizing and Vth Assignment

    • Mixed Integer Non-Linear Programming

    • Sensitivity-based methods (DUET, etc)

    • Continuous formulation [Chen, ASP-DAC ‘05]

      • Very reliant on discretization heuristic


Proposed Approach – Self-snapping formulation

  • Continuous formulation – Use of large variety of algorithms/powerful non-linear optimizers possible

  • Solution has almost all gates assigned to one of the two available threshold voltages

  • Small fraction of gates with intermediate Vth’s, can be handled heuristically

  • Discretization algorithm has negligible power impact and can be very simple


Proposed Approach – Mixed- Vth Gates

  • Consider each gate to be a parallel combination of high and low Vth gates

  • RC Delay Model

HVt

LVt

Mixed Gate

  • Linear Power Model

HVt Gate

LVt Gate


Complete Dual- Vth Problem Formulation

  • Similar to single-Vth gate sizing problem, with simple gate delays replaced with High Vth/Low Vth parallel combinations

  • Minimize

  • Subject to:


Proof of Discretized Solution

  • Conceptually separate optimization process into two distinct phases:

    • D-Phase : Fix delays of all gates

    • W-Phase : Find the minimum-power sizing solution that satisfies the chosen D vector

  • Hypothetical separation for proof – Not implemented in actual optimization procedure


W-Phase

  • Proof of discrete optimal solution under arbitrary D-vector sufficient

  • W-Phase formulation

  • Minimize

  • Subject to:


W-Phase

  • Linear programming problem

  • n basic variables, n non-basic variables

  • Therefore, only n non-zero variables

  • Every gate snapped to either high-Vth or low-Vth

  • Addition of upper and lower bounds on total size leads to some non-snapped gates

  • Number extremely small – simple heuristic achieves good results


Practical Constraint – Fixed-Width Input Drivers

  • Sequential elements driving the combinational circuit

  • Delay of these elements affected by primary input widths

  • Modeled as fixed-width drivers


Extension of Discretization Analysis

  • m+n constraints in the optimization problem

  • n+m basic variables, n-m non-basic variables

  • Therefore, n+m positive variables

  • Total number of non-snapped gates bounded by number of inputs

    • Once again, small in number; can be handled heuristically

    • In practice, number of non-snapped gates found to be much less than the number of inputs


Discretization Heuristics

  • Iterative snapping

    • Round gates to closer Vth and re-optimize until non-snapped solution achieved

  • Single-pass Vth assignment

    • Fix all gates to closer Vth and re-optimize only for gate sizes

  • Second heuristic faster with negligible power impact


Results

  • Snapping properties of some circuits

  • # of non-snapped gates is very small

  • Dominated by gates at upper and lower size bounds

  • Approach is easily extendable to multi-Vth AND multi-Lgate


Results

  • Power and runtime comparisons between proposed approach and sensitivity-based algorithm at 2% timing backoff (results shown for larger circuits only)

  • Average: 31% leakage reduction vs. previous approaches


Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


FF

VDDH

VDDL

FF

VDDL Swing

DC Current

IN

FF

Need for Level Conversion

FF

FF

Multiple supply design

  • Relies on applying a lower supply (VDDL) to gates along non-critical paths thus reducing power while meeting timing

  • A flexible fine-grained VDD assignment scheme promises best power reduction

    • Gate-level Extended Clustered Voltage Scaling

  • However, physical design and power delivery are complicated


OUT

IN

Non-critical

Critical

CVS

ECVS

Implications of using multiple supplies

Coupled

issues

Circuits

Level shifting

Algorithms

VDD assignment

Physical design

VDD Granularity

Power delivery

Distribution

Generation

Fine-grained

Islanding


Power delivery for dual-VDD circuits

  • Power grid integrity vital for circuit performance

  • Dual-VDD circuits require two supply voltages for operation

  • Fine-grained dual-VDD can place VDDL/VDDH gates arbitrarily on the die

  • Implications at the board, package and die level

    • Fixed resources need to be split between VDDL and VDDH

  • However, load on each supply is lower than on original single supply:

    Power supply current demanded by a dual-VDD circuit is significantly lower than the corresponding single-VDD circuit, allowing robust power delivery within available resources (decap, C4, wiring)


VDD

ECVS

Reduced current load on VDDL/VDDH

  • Gate level comparison

    • Avg. 54% (33%) for VDDL = 0.8V (0.6V)

  • Circuit level comparison

    • Avg. 49% (51%) and 28% (14%) for VDDH and VDDL for 0.8V (0.6V)


LpkgH

RpkgH

Lskt

Rskt

Lmb2

Rmb2

Lmb1

Rmb1

2

+

RhfH

Rpkg_capH

RdieH

RblkH

VDDH

Load

VDDH

I(VDDH)

LhfH

Lpkg_capH

LblkH

CdieH

-

ChfH

Cpkg_capH

CblkH

1

-

RhfL

RblkL

RdieL

Rpkg_capL

VDDL

Load

VDDL

I(VDDL)

LhfL

LblkL

Lpkg_capL

CdieL

+

ChfL

CblkL

Cpkg_capL

3

LpkgL

RpkgL

Lmb1

Rmb1

Lskt

Rskt

Lmb2

Rmb2

Package level results

  • Two VRMs on board to supply VDDL and VDDH

  • Ground path can be shared by VDDL and VDDH

  • Decoupling capacitance divided in the ratio of current loads

  • Similar power supply noise with same resources as single-VDD case (decoupling capacitance, C4s)

Intel, “Intel Pentium 4 processor in the 432 pin/Intel 850 Chipset Platform,” 2002.


Single-VDD

Dual-VDD

VDDH

VDDL

GND

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

Dual-VDD segregated

Dual-VDD segregated

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

Dual-VDD fine-grained

Dual-VDD physical design alternatives

Segregated placement constrains placer leading to higher core area and wirelength

C. Yeh, et al., “Layout techniques supporting the use of dual supply voltages for cell-based designs,” Proc. DAC, 1999.

M. Igarashi, et al., “A low-power design method using multiple supply voltages,” Proc. ISLPED, 1997.


Dual-VDD standard cells topologies

Single-VDD

Dual-VDD Shared-GND

Dual-VDD Dual-GND

3-rail cell

4-rail cell

VDDH

VDDL

GND

(shared)

VDDL

GNDL

VDDH

GNDH

VDD

GND

VDDL

GNDL

VDDH

VDDL

VDDH

GNDH

GND

(shared)

Dual-VDD power grid alternatives

  • Routing the power supply rails

  • Dual-VDD Dual-GND requires two separate grounds off-chip and complicates timing analysis and design of the board itself

  • Multi-rail standard cells can be used to realize the Dual-VDD grids  allows placer to operate with no constraints


Dual-VDD on-chip power grid design

  • Guidelines while designing the dual-VDD grid:

    • Scale wires with respect to the single-VDD considering how the current demand has scaled

    • VDDL gates more sensitive to grid noise  important since ground is shared

      • 120mV noise is 10% for a 1.2V gate, but 20% for a 0.6V gate

    • Placement of VDDL and VDDH gates  assign more wiring resources to VDDL grid in areas where there is more demand for VDDL current

    • Consider effects that arise from the board and package level such as shared C4s

      • Fewer C4s leads to higher effective package R, L


Obtain current

consumption of

Single/Dual VDD

designs (SPICE)

Regional

Global

Obtain Dual

VDD design

Original Single

VDD design

Local

Single

VDD

Lib file

Dual

VDD

Lib file

Break down die

into “local” &

“regional” areas

Placement

database

(Cadence)

Measure voltage

droop/bounce

Size each wire

segment in each

local area using

effective ,β &simulate grid

Calculate local,regional, global& effective  & 

for each wiresegment

VDDH

VDDL

GND

Measure wire

congestion

Proposed technique D-Place

  • Partition the chip floorplan

  • Obtain eff.  and  as follows

  • Let  = I(VDDH)/I(VDD) and  = I(VDDL)/I(VDD)

  • Scale wires as follows


Peak voltage drop comparisons

VDDL = 0.6V

VDDL = 0.8V

  • D-Place grids better than single-VDD grids in AVG cases

  • Inferior by < 2.6% (≈15mV) in some MAX cases

  • 0.6V VDDL as robust as 0.8V

  • 0.6V also provides higher power savings

  • Proposed approach better by 2-7% (AVG) and 7-12% (MAX) compared to prior approaches


Voltage variation across die

  • Voltage drop contours

  • Wiring congestion similar for dual-Vdd vs. single Vdd grids

  • Lower current demands can lead to smaller amounts of decoupling cap; lower leakage (or use same decap for better performance)

Dual-VDD grid no less robust than single-VDD grid


Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


P

P

Vth

Delay

Leff

Power

Chip Performance-space

Process Parameter-space

Introduction

Optical Proximity Effects Variation

Chemical Mechanical Polishing Variations

Low Leakage

PoorTiming

Timing Yield Loss

Good Timing

High Leakage

Power Yield Loss

This Work: Optimize the timing and power yield using gate sizing


Problem Description

  • Nonlinear Continuous Optimization

    • Objective: Maximize Timing and Power Yield

      Yield: A utility function defined w.r.t the JPDF of leakage and timing

    • Decision Variables: Gate Size

Tconst

Pconst

  • Efficient implementation requires

    • Computing yield as function of decision variables - gate size

    • Fast and Accurate Gradient computation


Power and Timing Yield Analysis (see DAC05 for more detail)

Timing Analysis [Sapatnekar03, Chandu05](d, d)

d

Delay

Correlation (1 parameter)

Power Analysis (l, l)

Delay and Power Bivariate JPDF (d, d, l, l, )

l

Log(Leakage)


Cut Edge Time(CT)

Size Up 7

Traditional Incremental Timing

Cut Set SSTA: Intuition

  • Consider Timing Graph

Required Arrival Time (RT)

Arrival Time (AT)

Unperturbed Sub Graph

2

6

9

Unperturbed

Left Sub Graph

Unperturbed

Right Sub Graph

8

3

10

1

4

7

5

MaxCut Edge Time (CT)

  • If Forward SSTA  Reverse SSTA then Cut Set SSTA will give exact same sensitivities as naïve approach that recomputes yield relating to all nodes, most being unchanged


Statistical Yield Optimization Results

  • Initial yield ~0-2% due to inverse correlation

  • Gate sizing alone provides good improvements

  • Combined with Lgate biasing, provides outstanding results

Chopra, et al., ICCAD05


Another approach to statistical optimization

  • General statistical optimization

    • Method relies on efficient deterministic formulations and variation space sampling to drive statistical optimization

    • Applicable to many mainstream VLSI design problems: gate sizing, Vth assignment, Leff biasing as well as potential new levers


BB

controller

Statistically Optimized Body Bias Clusteringfor Post-Silicon Tuning

  • Concept:Speed up critical gates using FBBand slow down non-critical gatesusing RBB to meet timing andpower constraints

  • Traditional view:Centralized body bias generatorcontrolling different die regions

    • Ineffective for compensating intra-die variations

    • Highly suboptimal power

Critical

Non-critical


Critical

DELAY

POWER

Correlated

Coarse Body Bias Assignment

ONE BIAS FOR ALL GATES

  • Simplified assignment minimizing routing overheads

  • Biasing dictated by placement instead of gate criticality

  • Disregards complex dependence of gate criticality on:

    • Circuit topology

    • Correlations in process variations

  • Effective in tightening delay but leads to high power

 Important to cluster gates to leverage ABB effectively


Generate sample scenarios

Solve BB assignmentfor each scenario

Scenario ‘2’

Scenario ‘1’

Leff_4.2

Leff_4.1

ρi,j

Gate

BB-PDF

4

4

7

7

3

3

Leff_5.1

Leff_5.2

Leff_7.2

Leff_7.1

Leff_3.1

Leff_3.2

5

5

Leff_2.2

Leff_2.1

Scenario ‘x’

Leff_4.x

DETERMINISTICALLY optimize each scenario (i.e., tune each gate for each die scenario)

Leff_1.2

Leff_1.1

Leff_6.1

Leff_6.2

2

2

4

7

6

6

1

1

3

Leff_5.x

Leff_7.x

Leff_3.x

5

Leff_2.x

Leff_1.x

Leff_6.x

2

6

1

Clustering

Post-silicon tuning

Proposed New Optimization Framework

Generate PDFs of optimal actions


Results vs. Traditional Dual-Vth

  • Delay

    • 3-9X tighter σ

  • Leakage power

    • Dual-Vth vs. 2-4 ABB clusters

      • Avg. 28-38% (51-59%) lower μ (95th)

  • Area

    • Capo generates contiguous regions of similarly clustered cells while minimally displacing cells

      • 5-8% increase in wirelength and area


A few conclusions

  • Parametric yield is a critical design objective going forward

    • Requires accurate estimation and fast optimization approaches to this key metric

    • Envision all tools in 4-6 years being yield-driven, rather than timing or power alone

  • Lots of room for improvement in many ‘well-studied’ CAD problems today

    • Recent examples; dual-Vth+sizing, placement (Cong, et al)


  • Login