Directions in low power cad
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Directions in Low-Power CAD PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Directions in Low-Power CAD. Dennis Sylvester University of Michigan [email protected] http://vlsida.eecs.umich.edu With acknowledgements to: Prof. David Blaauw, Dr. Sarvesh Kulkarni, Saumil Shah, Kavi Chopra. Topics. A new dual-Vth assignment formulation Dual-Vdd power distribution

Download Presentation

Directions in Low-Power CAD

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Directions in low power cad

Directions in Low-Power CAD

Dennis Sylvester

University of Michigan

[email protected]

http://vlsida.eecs.umich.edu

With acknowledgements to: Prof. David Blaauw, Dr. Sarvesh Kulkarni, Saumil Shah, Kavi Chopra


Topics

Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


Motivation

Motivation

  • We require high-performance yet low-power circuits

  • Leakage power contributes significantly to total power

  • All High- Vth implementation too slow

  • All Low-Vth implementation too leaky

  • Dual- Vth processes popular

  • Problem Definition

    • Minimize

      • Total Circuit Power

    • Subject to

      • Circuit Delay Constraint

      • Sizing Constraints

    • Optimization Variables

      • Gate Sizes

      • Gate Threshold Voltages

S. Narendra et al [ICCAD ’03]

Switching

Subthreshold

leakage


Gate sizing v th assignment problem prior work

Gate Sizing + Vth Assignment Problem Prior Work

  • Traditionally a discrete problem

  • Previous approaches

    • Separate Sizing and Vth Assignment

    • Mixed Integer Non-Linear Programming

    • Sensitivity-based methods (DUET, etc)

    • Continuous formulation [Chen, ASP-DAC ‘05]

      • Very reliant on discretization heuristic


Proposed approach self snapping formulation

Proposed Approach – Self-snapping formulation

  • Continuous formulation – Use of large variety of algorithms/powerful non-linear optimizers possible

  • Solution has almost all gates assigned to one of the two available threshold voltages

  • Small fraction of gates with intermediate Vth’s, can be handled heuristically

  • Discretization algorithm has negligible power impact and can be very simple


Proposed approach mixed v th gates

Proposed Approach – Mixed- Vth Gates

  • Consider each gate to be a parallel combination of high and low Vth gates

  • RC Delay Model

HVt

LVt

Mixed Gate

  • Linear Power Model

HVt Gate

LVt Gate


Complete dual v th problem formulation

Complete Dual- Vth Problem Formulation

  • Similar to single-Vth gate sizing problem, with simple gate delays replaced with High Vth/Low Vth parallel combinations

  • Minimize

  • Subject to:


Proof of discretized solution

Proof of Discretized Solution

  • Conceptually separate optimization process into two distinct phases:

    • D-Phase : Fix delays of all gates

    • W-Phase : Find the minimum-power sizing solution that satisfies the chosen D vector

  • Hypothetical separation for proof – Not implemented in actual optimization procedure


W phase

W-Phase

  • Proof of discrete optimal solution under arbitrary D-vector sufficient

  • W-Phase formulation

  • Minimize

  • Subject to:


W phase1

W-Phase

  • Linear programming problem

  • n basic variables, n non-basic variables

  • Therefore, only n non-zero variables

  • Every gate snapped to either high-Vth or low-Vth

  • Addition of upper and lower bounds on total size leads to some non-snapped gates

  • Number extremely small – simple heuristic achieves good results


Practical constraint fixed width input drivers

Practical Constraint – Fixed-Width Input Drivers

  • Sequential elements driving the combinational circuit

  • Delay of these elements affected by primary input widths

  • Modeled as fixed-width drivers


Extension of discretization analysis

Extension of Discretization Analysis

  • m+n constraints in the optimization problem

  • n+m basic variables, n-m non-basic variables

  • Therefore, n+m positive variables

  • Total number of non-snapped gates bounded by number of inputs

    • Once again, small in number; can be handled heuristically

    • In practice, number of non-snapped gates found to be much less than the number of inputs


Discretization heuristics

Discretization Heuristics

  • Iterative snapping

    • Round gates to closer Vth and re-optimize until non-snapped solution achieved

  • Single-pass Vth assignment

    • Fix all gates to closer Vth and re-optimize only for gate sizes

  • Second heuristic faster with negligible power impact


Results

Results

  • Snapping properties of some circuits

  • # of non-snapped gates is very small

  • Dominated by gates at upper and lower size bounds

  • Approach is easily extendable to multi-Vth AND multi-Lgate


Results1

Results

  • Power and runtime comparisons between proposed approach and sensitivity-based algorithm at 2% timing backoff (results shown for larger circuits only)

  • Average: 31% leakage reduction vs. previous approaches


Topics1

Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


Multiple supply design

FF

VDDH

VDDL

FF

VDDL Swing

DC Current

IN

FF

Need for Level Conversion

FF

FF

Multiple supply design

  • Relies on applying a lower supply (VDDL) to gates along non-critical paths thus reducing power while meeting timing

  • A flexible fine-grained VDD assignment scheme promises best power reduction

    • Gate-level Extended Clustered Voltage Scaling

  • However, physical design and power delivery are complicated


Implications of using multiple supplies

OUT

IN

Non-critical

Critical

CVS

ECVS

Implications of using multiple supplies

Coupled

issues

Circuits

Level shifting

Algorithms

VDD assignment

Physical design

VDD Granularity

Power delivery

Distribution

Generation

Fine-grained

Islanding


Power delivery for dual vdd circuits

Power delivery for dual-VDD circuits

  • Power grid integrity vital for circuit performance

  • Dual-VDD circuits require two supply voltages for operation

  • Fine-grained dual-VDD can place VDDL/VDDH gates arbitrarily on the die

  • Implications at the board, package and die level

    • Fixed resources need to be split between VDDL and VDDH

  • However, load on each supply is lower than on original single supply:

    Power supply current demanded by a dual-VDD circuit is significantly lower than the corresponding single-VDD circuit, allowing robust power delivery within available resources (decap, C4, wiring)


Reduced current load on vddl vddh

VDD

ECVS

Reduced current load on VDDL/VDDH

  • Gate level comparison

    • Avg. 54% (33%) for VDDL = 0.8V (0.6V)

  • Circuit level comparison

    • Avg. 49% (51%) and 28% (14%) for VDDH and VDDL for 0.8V (0.6V)


Package level results

LpkgH

RpkgH

Lskt

Rskt

Lmb2

Rmb2

Lmb1

Rmb1

2

+

RhfH

Rpkg_capH

RdieH

RblkH

VDDH

Load

VDDH

I(VDDH)

LhfH

Lpkg_capH

LblkH

CdieH

-

ChfH

Cpkg_capH

CblkH

1

-

RhfL

RblkL

RdieL

Rpkg_capL

VDDL

Load

VDDL

I(VDDL)

LhfL

LblkL

Lpkg_capL

CdieL

+

ChfL

CblkL

Cpkg_capL

3

LpkgL

RpkgL

Lmb1

Rmb1

Lskt

Rskt

Lmb2

Rmb2

Package level results

  • Two VRMs on board to supply VDDL and VDDH

  • Ground path can be shared by VDDL and VDDH

  • Decoupling capacitance divided in the ratio of current loads

  • Similar power supply noise with same resources as single-VDD case (decoupling capacitance, C4s)

Intel, “Intel Pentium 4 processor in the 432 pin/Intel 850 Chipset Platform,” 2002.


Dual vdd physical design alternatives

Single-VDD

Dual-VDD

VDDH

VDDL

GND

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

Dual-VDD segregated

Dual-VDD segregated

VDDH + VDDL row

VDDH + VDDL row

VDDH + VDDL row

Dual-VDD fine-grained

Dual-VDD physical design alternatives

Segregated placement constrains placer leading to higher core area and wirelength

C. Yeh, et al., “Layout techniques supporting the use of dual supply voltages for cell-based designs,” Proc. DAC, 1999.

M. Igarashi, et al., “A low-power design method using multiple supply voltages,” Proc. ISLPED, 1997.


Dual vdd power grid alternatives

Dual-VDD standard cells topologies

Single-VDD

Dual-VDD Shared-GND

Dual-VDD Dual-GND

3-rail cell

4-rail cell

VDDH

VDDL

GND

(shared)

VDDL

GNDL

VDDH

GNDH

VDD

GND

VDDL

GNDL

VDDH

VDDL

VDDH

GNDH

GND

(shared)

Dual-VDD power grid alternatives

  • Routing the power supply rails

  • Dual-VDD Dual-GND requires two separate grounds off-chip and complicates timing analysis and design of the board itself

  • Multi-rail standard cells can be used to realize the Dual-VDD grids  allows placer to operate with no constraints


Dual vdd on chip power grid design

Dual-VDD on-chip power grid design

  • Guidelines while designing the dual-VDD grid:

    • Scale wires with respect to the single-VDD considering how the current demand has scaled

    • VDDL gates more sensitive to grid noise  important since ground is shared

      • 120mV noise is 10% for a 1.2V gate, but 20% for a 0.6V gate

    • Placement of VDDL and VDDH gates  assign more wiring resources to VDDL grid in areas where there is more demand for VDDL current

    • Consider effects that arise from the board and package level such as shared C4s

      • Fewer C4s leads to higher effective package R, L


Proposed technique d place

Obtain current

consumption of

Single/Dual VDD

designs (SPICE)

Regional

Global

Obtain Dual

VDD design

Original Single

VDD design

Local

Single

VDD

Lib file

Dual

VDD

Lib file

Break down die

into “local” &

“regional” areas

Placement

database

(Cadence)

Measure voltage

droop/bounce

Size each wire

segment in each

local area using

effective ,β &simulate grid

Calculate local,regional, global& effective  & 

for each wiresegment

VDDH

VDDL

GND

Measure wire

congestion

Proposed technique D-Place

  • Partition the chip floorplan

  • Obtain eff.  and  as follows

  • Let  = I(VDDH)/I(VDD) and  = I(VDDL)/I(VDD)

  • Scale wires as follows


Peak voltage drop comparisons

Peak voltage drop comparisons

VDDL = 0.6V

VDDL = 0.8V

  • D-Place grids better than single-VDD grids in AVG cases

  • Inferior by < 2.6% (≈15mV) in some MAX cases

  • 0.6V VDDL as robust as 0.8V

  • 0.6V also provides higher power savings

  • Proposed approach better by 2-7% (AVG) and 7-12% (MAX) compared to prior approaches


Voltage variation across die

Voltage variation across die

  • Voltage drop contours

  • Wiring congestion similar for dual-Vdd vs. single Vdd grids

  • Lower current demands can lead to smaller amounts of decoupling cap; lower leakage (or use same decap for better performance)

Dual-VDD grid no less robust than single-VDD grid


Topics2

Topics

  • A new dual-Vth assignment formulation

  • Dual-Vdd power distribution

  • Approaches to parametric yield optimization: statistical leakage + delay


Introduction

P

P

Vth

Delay

Leff

Power

Chip Performance-space

Process Parameter-space

Introduction

Optical Proximity Effects Variation

Chemical Mechanical Polishing Variations

Low Leakage

PoorTiming

Timing Yield Loss

Good Timing

High Leakage

Power Yield Loss

This Work: Optimize the timing and power yield using gate sizing


Problem description

Problem Description

  • Nonlinear Continuous Optimization

    • Objective: Maximize Timing and Power Yield

      Yield: A utility function defined w.r.t the JPDF of leakage and timing

    • Decision Variables: Gate Size

Tconst

Pconst

  • Efficient implementation requires

    • Computing yield as function of decision variables - gate size

    • Fast and Accurate Gradient computation


Power and timing yield analysis see dac05 for more detail

Power and Timing Yield Analysis (see DAC05 for more detail)

Timing Analysis [Sapatnekar03, Chandu05](d, d)

d

Delay

Correlation (1 parameter)

Power Analysis (l, l)

Delay and Power Bivariate JPDF (d, d, l, l, )

l

Log(Leakage)


Cut set ssta intuition

Cut Edge Time(CT)

Size Up 7

Traditional Incremental Timing

Cut Set SSTA: Intuition

  • Consider Timing Graph

Required Arrival Time (RT)

Arrival Time (AT)

Unperturbed Sub Graph

2

6

9

Unperturbed

Left Sub Graph

Unperturbed

Right Sub Graph

8

3

10

1

4

7

5

MaxCut Edge Time (CT)

  • If Forward SSTA  Reverse SSTA then Cut Set SSTA will give exact same sensitivities as naïve approach that recomputes yield relating to all nodes, most being unchanged


Statistical yield optimization results

Statistical Yield Optimization Results

  • Initial yield ~0-2% due to inverse correlation

  • Gate sizing alone provides good improvements

  • Combined with Lgate biasing, provides outstanding results

Chopra, et al., ICCAD05


Another approach to statistical optimization

Another approach to statistical optimization

  • General statistical optimization

    • Method relies on efficient deterministic formulations and variation space sampling to drive statistical optimization

    • Applicable to many mainstream VLSI design problems: gate sizing, Vth assignment, Leff biasing as well as potential new levers


Statistically optimized body bias clustering for post silicon tuning

BB

controller

Statistically Optimized Body Bias Clusteringfor Post-Silicon Tuning

  • Concept:Speed up critical gates using FBBand slow down non-critical gatesusing RBB to meet timing andpower constraints

  • Traditional view:Centralized body bias generatorcontrolling different die regions

    • Ineffective for compensating intra-die variations

    • Highly suboptimal power

Critical

Non-critical


Coarse body bias assignment

Critical

DELAY

POWER

Correlated

Coarse Body Bias Assignment

ONE BIAS FOR ALL GATES

  • Simplified assignment minimizing routing overheads

  • Biasing dictated by placement instead of gate criticality

  • Disregards complex dependence of gate criticality on:

    • Circuit topology

    • Correlations in process variations

  • Effective in tightening delay but leads to high power

 Important to cluster gates to leverage ABB effectively


Proposed new optimization framework

Generate sample scenarios

Solve BB assignmentfor each scenario

Scenario ‘2’

Scenario ‘1’

Leff_4.2

Leff_4.1

ρi,j

Gate

BB-PDF

4

4

7

7

3

3

Leff_5.1

Leff_5.2

Leff_7.2

Leff_7.1

Leff_3.1

Leff_3.2

5

5

Leff_2.2

Leff_2.1

Scenario ‘x’

Leff_4.x

DETERMINISTICALLY optimize each scenario (i.e., tune each gate for each die scenario)

Leff_1.2

Leff_1.1

Leff_6.1

Leff_6.2

2

2

4

7

6

6

1

1

3

Leff_5.x

Leff_7.x

Leff_3.x

5

Leff_2.x

Leff_1.x

Leff_6.x

2

6

1

Clustering

Post-silicon tuning

Proposed New Optimization Framework

Generate PDFs of optimal actions


Results vs traditional dual vth

Results vs. Traditional Dual-Vth

  • Delay

    • 3-9X tighter σ

  • Leakage power

    • Dual-Vth vs. 2-4 ABB clusters

      • Avg. 28-38% (51-59%) lower μ (95th)

  • Area

    • Capo generates contiguous regions of similarly clustered cells while minimally displacing cells

      • 5-8% increase in wirelength and area


A few conclusions

A few conclusions

  • Parametric yield is a critical design objective going forward

    • Requires accurate estimation and fast optimization approaches to this key metric

    • Envision all tools in 4-6 years being yield-driven, rather than timing or power alone

  • Lots of room for improvement in many ‘well-studied’ CAD problems today

    • Recent examples; dual-Vth+sizing, placement (Cong, et al)


  • Login