Loading in 5 sec....

Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. PowerPoint Presentation

Yan Lin and Lei He EE Department, UCLA Partially supported by NSF.

- By
**gurit** - Follow User

- 105 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Yan Lin and Lei He EE Department, UCLA Partially supported by NSF. ' - gurit

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction

Yan Lin and Lei He

EE Department, UCLA

Partially supported by NSF.

Address comments to [email protected]

Outline Slack Allocation for FPGA Power Reduction

- Review and Motivation
- Chip-level Vdd-level Assignment Algorithms
- Experimental Results
- Conclusions

FPGA Power Reduction Slack Allocation for FPGA Power Reduction

- Existing FPGAs are power inefficient compared to ASICs [kussy, ISLPED’98]
- Power aware FPGA CAD algorithms for existingFPGA architectures
- CAD algorithms to minimize power-delay product[Lamoureux et al, ICCAD’03]
- Configuration inversion for leakage reduction[Anderson et al, FPGA’04]

- Power efficient FPGA circuits and architectures
- Dual-Vdd and Vdd-programmable FPGA logic blocks[Li et al, FPGA’04][Li et al, DAC’04]
- Vdd-programmable FPGA interconnects
- [Li et al, ICCAD’04]
- [Gayasen et al, FPL’04] [Anderson et al, ICCAD’04]

Vdd-programmable Interconnects Slack Allocation for FPGA Power Reduction[Li et al, ICCAD’04]

Power transistor

- Conventional routing switch

- Vdd-programmable switch
- Vdd selection for used switch
- Power-gating unused switch
- Reduce leakage by 300X

- Configurable Vdd-level conversion
- Avoid excessive leakage when low-Vdd switch drives high-Vdd switches

- Segment based Vdd-level converter insertion (SLC)
- Area overhead
- 35% area overhead for MCNC benchmark circuits

- Leakage overhead
- 29% leakage overhead for MCNC benchmark circuits

- Area overhead

Previous Approaches w/o LCs Slack Allocation for FPGA Power Reduction

- [Gayasen et al, FPL’04]
- Level converters inserted at CLB inputs (outputs)
- All the routing trees driven by (driving) the source (sink)CLB have the same Vdd-level as the source (sink) CLB
- Lacking in flexibility

- A path-based Vdd-level assignment is performed for CLBsand interconnects

- [Anderson et al, ICCAD’04]
- VT drop of NMOS is used to generate low-Vdd
- Positive feedback PMOS is used to tolerate low-Vdd switch driving high-Vdd switches
- Alternative design of level converter
- Still has delay and power penalty

Our Major Contributions Slack Allocation for FPGA Power Reduction

- Proposed two ways to avoid using level converters in interconnects
- Tree based level converter insertion (TLC)
- All the switches in one routing tree have same Vdd-level

- Tree based level converter insertion (TLC)

- Dual-Vdd tree based level converter insertion (dTLC)
- Only high-Vdd switch drives low-Vdd switches in one tree

- Proposed a few Vdd-level assignment algorithms
- Sensitivity based algorithms
- TLC-S and dTLC-S for TLC and dTLC, respectively

- Linear programming (LP) based algorithm
- dTLC-LP for dTLC

- Sensitivity based algorithms

- Tree based LC insertion Slack Allocation for FPGA Power Reduction(TLC)
- allows one type of Vdd-level within one routing tree

- Dual-Vdd tree based LC insertion (dTLC)
- allows high-Vdd switch drives low-Vdd switches, but not vice versa

- Assign Vdd-level to each interconnect switch to minimize interconnect power
- Meet the delay target Tspec
- Vdd-level converters
- are removed within interconnects
- are inserted at CLB inputs/outputs and can be used when needed

Outline Slack Allocation for FPGA Power Reduction

- Review and Motivation
- Chip-level Vdd-level Assignment Algorithms
- Experimental Results
- Conclusions

- Interconnect power Slack Allocation for FPGA Power Reduction

- Dynamic power

- Leakage power is pre-characterized using SPICE

- To incorporate dual-Vdd into timing analysis
- Pre-characterize the intrinsic delay and effective driving resistance of switch using SPICE
- Calculate routing delay using Elmore delay model

Chip-level Assignment Algorithms Slack Allocation for FPGA Power Reduction

- Tree based level converter insertion (TLC)
- Sensitivity based algorithm TLC-S

- Dual-Vdd tree based level converter insertion (dTLC)
- Sensitivity based algorithm dTLC-S
- Linear programming (LP) based algorithm dTLC-LP

Sensitivity Based Algorithm Slack Allocation for FPGA Power ReductionTLC-S

- Iterative assignment
- Assign low-Vdd to the ‘untried’ tree with maximum power sensitivity in each iteration
- Reject the assignment if critical path increases
- Iteration terminates after all trees are ‘tried’

- Power sensitivity
- The power reduction by changing Vdd from high-Vdd to low-Vdd
- Power includes both dynamic and leakage power

Sensitivity Based Algorithm Slack Allocation for FPGA Power ReductiondTLC-S

- A “candidate switch” is defined as
- A switch does not drive any switch
- Low-Vdd has been assigned to all of its fanout switches

- Iterative assignment
- Assign low-Vdd to a candidate switch with maximum power sensitivity in each iteration
- Reject assignment if critical path increases
- Iteration terminates when there is no candidate switch

LP Based Algorithm Slack Allocation for FPGA Power ReductiondTLC-LP: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist

b4 Slack Allocation for FPGA Power Reduction

b4

b4

b4

b3

b3

b1

b3

b1

b3

b1

b1

b2

b2

b2

sink1

b2

s1=2

s1=2

s1=1

sink2

s2=1

s1

s2=3

s2=1

s2

dTLC-LP: Single-Net Estimation- Slack is represented in multiples of
- is delay increase of an interconnect segment by changing Vdd from high-Vdd to low-Vdd

- An example

dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

- Given the allocated slacks, estimate number of low-Vdd switches

- sik: Slack for kth sink in ithrouting tree
- lik: Number of switches in the path from source to kth sink in ithtree
- SLij: Set of sinks in the fanout cone of jth switch in ithtree

- An example

Source

s Slack Allocation for FPGA Power Reduction1/l1

s1

dTLC-LP: Single-Net Estimation (Cont.)- Given the allocated slacks, estimate number of low-Vdd switches

- sik: Slack for kth sink in ithrouting tree
- lik: Number of switches in the path from source to kth sink in ithtree
- SLij: Set of sinks in the fanout cone of jth switch in ithtree

- An example

Source

dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

- Given the allocated slacks, estimate number of low-Vdd switches

- sik: Slack for kth sink in ithrouting tree
- lik: Number of switches in the path from source to kth sink in ithtree
- SLij: Set of sinks in the fanout cone of jth switch in ithtree

- An example

Source

s2/l2

s2

dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

- Given the allocated slacks, estimate number of low-Vdd switches

- sik: Slack for kth sink in ithrouting tree
- lik: Number of switches in the path from source to kth sink in ithtree
- SLij: Set of sinks in the fanout cone of jth switch in ithtree

- An example

Source

s3/l3

s3

dTLC-LP Slack Allocation for FPGA Power Reduction: Single-Net Estimation (Cont.)

- Given the allocated slacks, estimate number of low-Vdd switches

- sik: Slack for kth sink in ithrouting tree
- lik: Number of switches in the path from source to kth sink in ithtree
- SLij: Set of sinks in the fanout cone of jth switch in ithtree

- An example

Source

Min(sk/lk)

- Theorem: The estimation gives a lower bound of number of low-Vdd switches that can be achieved

dTLC-LP Slack Allocation for FPGA Power Reduction : Full-chip Time Slack Allocation

- Objective function

- fs(i): transition density of ithtree
- Fn(i): estimated number of low-Vdd switches in ith tree
- Directly minimize dynamic power
- May help minimizing leakage power that exponentially depends on Vdd-level

- Constraints
- Net-based timing constraints

- For PIs and POs

- For edges corresponding to routing

- For edges other than routing

dTLC-LP : Full-chip Time Slack Allocation

- Objective function

- fs(i): transition density of ithtree
- Fn(i): estimated number of low-Vdd switches in ith tree
- Directly minimize dynamic power
- May help minimizing leakage power that exponentially depends on Vdd-level

- Constraints

- Upper bound for useful slack

- Theorem: The time slack allocation problem is an LP problem

dTLC-LP function: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist

dTLC-LP function : Net-level Bottom-up Assignment

- Theorem: the bottom-up assignment is optimal

- Perform bottom-up assignment within each tree to leverage the allocated slacks

- Bottom-up assignment
- Assign low-Vdd to switches in the routing tree in a bottom-up fashion
- Slack is reduced by in each step
- Stop the process until no slack left

dTLC-LP function: Overview

Single-Vdd placed and routed netlist

Chip-level

Time Slack Allocation

Net-level

Bottom-up Assignment

Refinement

Dual-Vdd netlist

Outline function

- Review and Motivation
- Modeling and Problem Formulations
- Chip-level Vdd-level Assignment Algorithms
- Experimental Results
- Conclusions

Experimental Setting function

- Cluster-based Island Style FPGA Structure
- 100% buffered interconnects, subset switch block
- Uniform length 4 for all wire segments

- ITRS 100nm technology
- Use VPR [Betz-Rose-Marquardt] for placement and routing
- Use fpgaEva-LP2 [Lin et al, FPGA’05] for power calculation
- Considering short-circuit power, glitch power and input vector
- 8% average error compared to SPICE simulation

0.05 function

Leakage power

Dynamic power

0.045

0.04

0.035

0.03

Interconnect Power (watt)

0.025

0.02

0.015

0.01

0.005

0

dTLC-LP

TLC-S

dTLC-S

Interconnect Power Comparison between TLC-S, dTLC-S and dTLC-LP- dTLC-S and dTLC-LP achieve 6.7% and 6.9% less interconnect power compared to TLC-S, respectively
- Interconnect power breakdown
- TLC-S, dTLC-S and dTLC-LP have almost the same leakage
- dTLC-S and dTLC-LP achieve 13.8% and 15.8% less interconnect dynamic power compared to TLC-S, respectively

h2lLCi function

SLC

dTLC-LP

25%

20%

0%

5%

15%

10%

15%

20%

25%

64%

19%

10%

5%

dTLC-LP

h2lLCi

SLC

0%

dTLC-LP compared to SLC and h2lLCi100%

0.14

90%

0.12

80%

0.1

70%

Interconnect Power (watt)

% of VddL Switches

0.08

60%

0.06

50%

0.04

40%

30%

0.02

12.00

12.50

13.00

13.50

14.00

14.50

15.00

15.50

12.00

12.50

13.00

13.50

14.00

14.50

15.00

15.50

Critical Path Delay (ns)

Critical Path Delay (ns)

- SLC [Li et al, ICCAD ’04]
- Segment based level converter inserted in interconnects
- Sensitivity based assignment algorithm

- h2lLCi [Gayasen et al, FPL’04]
- All the routing tree driven by source CLB have the same Vdd-level as the source CLB
- Path based assignment algorithm

- dTLC-LP, SLC and h2lLCi achieve 77.54%, 74.70% and 41.80% low-Vdd switches w/o relaxing Tspec
- At different delays,dTLC-LP achieves
- The highest number of low-Vdd switches
- The lowest power consumption

1.E+04 function

TLC-S

9.E+03

dTLC-S

8.E+03

dTLC-LP

7.E+03

6.E+03

Runtime (s)

5.E+03

4.E+03

3.E+03

2.E+03

1.E+03

0.E+00

alu4

apex2

apex4

elliptic

ex1010

frisc

pdc

s38417

s38584

MCNC Benchmarks

Runtime Comparison between TLC-S, dTLC-S and dTLC-LP- TLC-S runs the fastest
- dTLC-S versus dTLC-LP
- Runs 3X faster than dTLC-LP
- But achieves similar power consumption

Conclusions and Future Work function

- Proposed two ways to avoid using level converters in Vdd-programmable interconnects
- Tree based level converter insertion (TLC)
- Dual-Vdd tree based level converter insertion (dTLC)

- Developed chip-level dual-Vdd assignment algorithms w/o level converters
- Sensitivity based algorithms TLC-S and dTLC-S
- LP based algorithm dTLC-LP

- Developed dTLC-LP that reduces interconnect power by 64%
- Developed dTLC-S that obtains slightly smaller power reduction with 3X speedup compared to dTLC-LP
- Extend chip-level Vdd-level assignment to interconnects using wire segments of different lengths
- Allocate time slack to logic blocks and interconnects in a uniform fashion

Thank you! function

Download Presentation

Connecting to Server..