Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction
Download
1 / 31

Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction. Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. Address comments to [email protected] Outline. Review and Motivation Chip-level Vdd-level Assignment Algorithms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. ' - gurit


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Yan Lin and Lei He

EE Department, UCLA

Partially supported by NSF.

Address comments to [email protected]


Outline
Outline Slack Allocation for FPGA Power Reduction

  • Review and Motivation

  • Chip-level Vdd-level Assignment Algorithms

  • Experimental Results

  • Conclusions


Fpga power reduction
FPGA Power Reduction Slack Allocation for FPGA Power Reduction

  • Existing FPGAs are power inefficient compared to ASICs [kussy, ISLPED’98]

  • Power aware FPGA CAD algorithms for existingFPGA architectures

    • CAD algorithms to minimize power-delay product[Lamoureux et al, ICCAD’03]

    • Configuration inversion for leakage reduction[Anderson et al, FPGA’04]

  • Power efficient FPGA circuits and architectures

    • Dual-Vdd and Vdd-programmable FPGA logic blocks[Li et al, FPGA’04][Li et al, DAC’04]

    • Vdd-programmable FPGA interconnects

      • [Li et al, ICCAD’04]

      • [Gayasen et al, FPL’04] [Anderson et al, ICCAD’04]


Vdd programmable interconnects li et al iccad 04
Vdd-programmable Interconnects Slack Allocation for FPGA Power Reduction[Li et al, ICCAD’04]

Power transistor

  • Conventional routing switch

  • Vdd-programmable switch

    • Vdd selection for used switch

    • Power-gating unused switch

      • Reduce leakage by 300X

    • Configurable Vdd-level conversion

      • Avoid excessive leakage when low-Vdd switch drives high-Vdd switches

  • Segment based Vdd-level converter insertion (SLC)

    • Area overhead

      • 35% area overhead for MCNC benchmark circuits

    • Leakage overhead

      • 29% leakage overhead for MCNC benchmark circuits


Previous approaches w o lcs
Previous Approaches w/o LCs Slack Allocation for FPGA Power Reduction

  • [Gayasen et al, FPL’04]

    • Level converters inserted at CLB inputs (outputs)

    • All the routing trees driven by (driving) the source (sink)CLB have the same Vdd-level as the source (sink) CLB

      • Lacking in flexibility

    • A path-based Vdd-level assignment is performed for CLBsand interconnects

  • [Anderson et al, ICCAD’04]

    • VT drop of NMOS is used to generate low-Vdd

    • Positive feedback PMOS is used to tolerate low-Vdd switch driving high-Vdd switches

      • Alternative design of level converter

      • Still has delay and power penalty


Our major contributions
Our Major Contributions Slack Allocation for FPGA Power Reduction

  • Proposed two ways to avoid using level converters in interconnects

    • Tree based level converter insertion (TLC)

      • All the switches in one routing tree have same Vdd-level

  • Dual-Vdd tree based level converter insertion (dTLC)

    • Only high-Vdd switch drives low-Vdd switches in one tree

  • Proposed a few Vdd-level assignment algorithms

    • Sensitivity based algorithms

      • TLC-S and dTLC-S for TLC and dTLC, respectively

    • Linear programming (LP) based algorithm

      • dTLC-LP for dTLC


Problem formulations

  • Tree based LC insertion Slack Allocation for FPGA Power Reduction(TLC)

    • allows one type of Vdd-level within one routing tree

  • Dual-Vdd tree based LC insertion (dTLC)

    • allows high-Vdd switch drives low-Vdd switches, but not vice versa

Problem Formulations

  • Assign Vdd-level to each interconnect switch to minimize interconnect power

    • Meet the delay target Tspec

    • Vdd-level converters

      • are removed within interconnects

      • are inserted at CLB inputs/outputs and can be used when needed


Outline1
Outline Slack Allocation for FPGA Power Reduction

  • Review and Motivation

  • Chip-level Vdd-level Assignment Algorithms

  • Experimental Results

  • Conclusions


Delay power model with dual vdd

  • Dynamic power

  • Leakage power is pre-characterized using SPICE

Delay & Power Model with Dual-Vdd

  • To incorporate dual-Vdd into timing analysis

    • Pre-characterize the intrinsic delay and effective driving resistance of switch using SPICE

    • Calculate routing delay using Elmore delay model


Chip level assignment algorithms
Chip-level Assignment Algorithms Slack Allocation for FPGA Power Reduction

  • Tree based level converter insertion (TLC)

    • Sensitivity based algorithm TLC-S

  • Dual-Vdd tree based level converter insertion (dTLC)

    • Sensitivity based algorithm dTLC-S

    • Linear programming (LP) based algorithm dTLC-LP


Sensitivity based algorithm tlc s
Sensitivity Based Algorithm Slack Allocation for FPGA Power ReductionTLC-S

  • Iterative assignment

    • Assign low-Vdd to the ‘untried’ tree with maximum power sensitivity in each iteration

    • Reject the assignment if critical path increases

    • Iteration terminates after all trees are ‘tried’

  • Power sensitivity

    • The power reduction by changing Vdd from high-Vdd to low-Vdd

    • Power includes both dynamic and leakage power


Sensitivity based algorithm dtlc s
Sensitivity Based Algorithm Slack Allocation for FPGA Power ReductiondTLC-S

  • A “candidate switch” is defined as

    • A switch does not drive any switch

    • Low-Vdd has been assigned to all of its fanout switches

  • Iterative assignment

    • Assign low-Vdd to a candidate switch with maximum power sensitivity in each iteration

    • Reject assignment if critical path increases

    • Iteration terminates when there is no candidate switch


Lp based algorithm dtlc lp overview
LP Based Algorithm Slack Allocation for FPGA Power ReductiondTLC-LP: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist


Dtlc lp single net estimation

b4 Slack Allocation for FPGA Power Reduction

b4

b4

b4

b3

b3

b1

b3

b1

b3

b1

b1

b2

b2

b2

sink1

b2

s1=2

s1=2

s1=1

sink2

s2=1

s1

s2=3

s2=1

s2

dTLC-LP: Single-Net Estimation

  • Slack is represented in multiples of

    • is delay increase of an interconnect segment by changing Vdd from high-Vdd to low-Vdd

  • An example


Dtlc lp single net estimation cont
dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

  • Given the allocated slacks, estimate number of low-Vdd switches

  • sik: Slack for kth sink in ithrouting tree

  • lik: Number of switches in the path from source to kth sink in ithtree

  • SLij: Set of sinks in the fanout cone of jth switch in ithtree

  • An example

Source


Dtlc lp single net estimation cont1

s Slack Allocation for FPGA Power Reduction1/l1

s1

dTLC-LP: Single-Net Estimation (Cont.)

  • Given the allocated slacks, estimate number of low-Vdd switches

  • sik: Slack for kth sink in ithrouting tree

  • lik: Number of switches in the path from source to kth sink in ithtree

  • SLij: Set of sinks in the fanout cone of jth switch in ithtree

  • An example

Source


Dtlc lp single net estimation cont2
dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

  • Given the allocated slacks, estimate number of low-Vdd switches

  • sik: Slack for kth sink in ithrouting tree

  • lik: Number of switches in the path from source to kth sink in ithtree

  • SLij: Set of sinks in the fanout cone of jth switch in ithtree

  • An example

Source

s2/l2

s2


Dtlc lp single net estimation cont3
dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

  • Given the allocated slacks, estimate number of low-Vdd switches

  • sik: Slack for kth sink in ithrouting tree

  • lik: Number of switches in the path from source to kth sink in ithtree

  • SLij: Set of sinks in the fanout cone of jth switch in ithtree

  • An example

Source

s3/l3

s3


Dtlc lp single net estimation cont4
dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

  • Given the allocated slacks, estimate number of low-Vdd switches

  • sik: Slack for kth sink in ithrouting tree

  • lik: Number of switches in the path from source to kth sink in ithtree

  • SLij: Set of sinks in the fanout cone of jth switch in ithtree

  • An example

Source

Min(sk/lk)

  • Theorem: The estimation gives a lower bound of number of low-Vdd switches that can be achieved


Dtlc lp full chip time slack allocation
dTLC-LP Slack Allocation for FPGA Power Reduction : Full-chip Time Slack Allocation

  • Objective function

  • fs(i): transition density of ithtree

  • Fn(i): estimated number of low-Vdd switches in ith tree

  • Directly minimize dynamic power

  • May help minimizing leakage power that exponentially depends on Vdd-level

  • Constraints

    • Net-based timing constraints

  • For PIs and POs

  • For edges corresponding to routing

  • For edges other than routing


Dtlc lp full chip time slack allocation1

dTLC-LP : Full-chip Time Slack Allocation

  • Objective function

  • fs(i): transition density of ithtree

  • Fn(i): estimated number of low-Vdd switches in ith tree

  • Directly minimize dynamic power

  • May help minimizing leakage power that exponentially depends on Vdd-level

  • Constraints

  • Upper bound for useful slack

  • Theorem: The time slack allocation problem is an LP problem


Dtlc lp overview
dTLC-LP function: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist


Dtlc lp net level bottom up assignment
dTLC-LP function : Net-level Bottom-up Assignment

  • Theorem: the bottom-up assignment is optimal

  • Perform bottom-up assignment within each tree to leverage the allocated slacks

  • Bottom-up assignment

    • Assign low-Vdd to switches in the routing tree in a bottom-up fashion

    • Slack is reduced by in each step

    • Stop the process until no slack left


Dtlc lp overview1
dTLC-LP function: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist


Outline2
Outline function

  • Review and Motivation

  • Modeling and Problem Formulations

  • Chip-level Vdd-level Assignment Algorithms

  • Experimental Results

  • Conclusions


Experimental setting
Experimental Setting function

  • Cluster-based Island Style FPGA Structure

    • 100% buffered interconnects, subset switch block

    • Uniform length 4 for all wire segments

  • ITRS 100nm technology

  • Use VPR [Betz-Rose-Marquardt] for placement and routing

  • Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation

    • Considering short-circuit power, glitch power and input vector

    • 8% average error compared to SPICE simulation


Interconnect power comparison between tlc s dtlc s and dtlc lp

0.05 function

Leakage power

Dynamic power

0.045

0.04

0.035

0.03

Interconnect Power (watt)

0.025

0.02

0.015

0.01

0.005

0

dTLC-LP

TLC-S

dTLC-S

Interconnect Power Comparison between TLC-S, dTLC-S and dTLC-LP

  • dTLC-S and dTLC-LP achieve 6.7% and 6.9% less interconnect power compared to TLC-S, respectively

  • Interconnect power breakdown

    • TLC-S, dTLC-S and dTLC-LP have almost the same leakage

    • dTLC-S and dTLC-LP achieve 13.8% and 15.8% less interconnect dynamic power compared to TLC-S, respectively


Dtlc lp compared to slc and h2llci

h2lLCi function

SLC

dTLC-LP

25%

20%

0%

5%

15%

10%

15%

20%

25%

64%

19%

10%

5%

dTLC-LP

h2lLCi

SLC

0%

dTLC-LP compared to SLC and h2lLCi

100%

0.14

90%

0.12

80%

0.1

70%

Interconnect Power (watt)

% of VddL Switches

0.08

60%

0.06

50%

0.04

40%

30%

0.02

12.00

12.50

13.00

13.50

14.00

14.50

15.00

15.50

12.00

12.50

13.00

13.50

14.00

14.50

15.00

15.50

Critical Path Delay (ns)

Critical Path Delay (ns)

  • SLC [Li et al, ICCAD ’04]

    • Segment based level converter inserted in interconnects

    • Sensitivity based assignment algorithm

  • h2lLCi [Gayasen et al, FPL’04]

    • All the routing tree driven by source CLB have the same Vdd-level as the source CLB

    • Path based assignment algorithm

  • dTLC-LP, SLC and h2lLCi achieve 77.54%, 74.70% and 41.80% low-Vdd switches w/o relaxing Tspec

  • At different delays,dTLC-LP achieves

    • The highest number of low-Vdd switches

    • The lowest power consumption


Runtime comparison between tlc s dtlc s and dtlc lp

1.E+04 function

TLC-S

9.E+03

dTLC-S

8.E+03

dTLC-LP

7.E+03

6.E+03

Runtime (s)

5.E+03

4.E+03

3.E+03

2.E+03

1.E+03

0.E+00

alu4

apex2

apex4

elliptic

ex1010

frisc

pdc

s38417

s38584

MCNC Benchmarks

Runtime Comparison between TLC-S, dTLC-S and dTLC-LP

  • TLC-S runs the fastest

  • dTLC-S versus dTLC-LP

    • Runs 3X faster than dTLC-LP

    • But achieves similar power consumption


Conclusions and future work
Conclusions and Future Work function

  • Proposed two ways to avoid using level converters in Vdd-programmable interconnects

    • Tree based level converter insertion (TLC)

    • Dual-Vdd tree based level converter insertion (dTLC)

  • Developed chip-level dual-Vdd assignment algorithms w/o level converters

    • Sensitivity based algorithms TLC-S and dTLC-S

    • LP based algorithm dTLC-LP

  • Developed dTLC-LP that reduces interconnect power by 64%

  • Developed dTLC-S that obtains slightly smaller power reduction with 3X speedup compared to dTLC-LP

  • Extend chip-level Vdd-level assignment to interconnects using wire segments of different lengths

  • Allocate time slack to logic blocks and interconnects in a uniform fashion


Thank you! function


ad