1 / 18

# A Method for Fast Delay/Area Estimation - PowerPoint PPT Presentation

A Method for Fast Delay/Area Estimation. EE219b Semester Project Mike Sheets May 16, 2000. Overview. Problem statement Proposed solution Constant delay paradigm Zero-slack algorithm Implementation Incorporation into SIS Library characterization Results Conclusions Future Work.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about ' A Method for Fast Delay/Area Estimation' - cameo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### A Method for Fast Delay/Area Estimation

EE219b Semester Project

Mike Sheets

May 16, 2000

• Problem statement

• Proposed solution

• Constant delay paradigm

• Zero-slack algorithm

• Implementation

• Incorporation into SIS

• Library characterization

• Results

• Conclusions

• Future Work

• Given a boolean network, estimate the area if implemented with particular required time constraints

• Estimation should be fast and reasonably accurate

• Examine how technology independent logic optimization affects the estimation

• Constant area (traditional) model

• Composed of discretely sized gates with constant area

• Mapping involves calculating delay as a function of load

• Constant delay model

• Composed of mathematical functions relating area to size

• Mapping involves calculating size (area) as a function of load

Constant Area Model

Constant Delay Model

ND2X1

ND2

CL

CL

Area = constant from library

Size = constant from library

Delay = dint + k*CL

Area = Aint + Aslope*size

Size = k*CL /(Delay – dint)

Delay = constant

Given input arrival times {ai} and output required time {rk}, assign gate delays as follows:

• Initialize all internal required/arrival times to “unknown”

• Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates

• For each node from primary inputs to primary outputs

• Calculate all the (ai, li) pairs from all fanin edges

• Discard dominated pairs, save the union of the undominated pairs

• When all primary outputs are reached, calculate minimum (rk-ai)/lp

• Assign delay of each gate in the selected path(s) to this minimum

• Update arrival and required times for all fi and fo edges of newly assigned delays

• Repeat steps 2-4 until all gates are assigned delays

Pair domination defined:

a1

r3

Pair (ai, li) dominates (aj, lj) if

ai  aj and li  lj

If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary.

n1

n3

l1

l3

n

a2

r4

l2

l4

n2

n4

Select an allowable slack threshold sthresh (if zero then algorithm yields same result as previous)

• Compute the forward level lj and arrival time aj of all nodes in network using a forward trace

• Compute the reverse level kj and required time rj of all nodes in network using a backward trace

• Update the delay of every node as dj = dj + (rj-aj)/(lj+kj)

• While the slack of any node exceeds sthresh then repeat steps 1-3.

BLIF

net.

read_blif

Tech. independent optimization:

script.algebraic, script.boolean, etc

Tech.

lib.

Tech. dependent optimization:

map

read_library

Area

Manual

analysis

Est.

lib.

Area/delay tradeoff curve

read_estim

Fast delay/area estimation:

estimate

• Commercial standard cell library have possibly multiple gates that implement the same equation

• Each gate in the library has characteristics:

• Size

• Delays from all input pins to the output pin for all transitions and several loads

• Capacitance for all input pins

• Maximum load

• Area

• We need estimation parameters for each class of gates (ie. gates with the same equation):

• Intrinsic gate delay (dint)

• Drive factor (k)

• Area line y-intercept (Aint)

• Area line slope (Aslope)

• Input capacitance line y-intercept (cint)

• Input capacitance line slope (cslope)

• Inverter delay scales linearly with load/size

• Slope is k

• Y-intercept is dint

• Inverter area scales linearly with size

• Slope is Aslope

• Y-intercept is Aint

• Requires at least two gates per class in the library

• Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination)

• Further research shows the reason is CMOS implementation (below)

• Future work might replace linear model with piece-wise linear model for more accuracy

NAND-gate CMOS schematic

for smaller sizes

NAND-gate CMOS schematic

for larger sizes

• These issues are evident in the table

• OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area

• NOR3, NOR4 had poor coefficients of determination

• Many gates in the library had only one size

• Sweep mode

• User specifies a range of required times to sweep (possibly only one) and a step size

• Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered)

• Binary search mode

• Used to find the minimum possible required time (period) given infinite area

• Starts at a user-specified maximum and performs a binary search until a pass limit is reached

• Various sized combinational logic benchmarks

• MCNC c17, c880, c1908, c3540

• Various sized sequential logic benchmarks

• Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously)

• MCNC s713, s838, s953, s1196, s1238, s1423

• Tested four scripts

• script.none (no optimization), script.algebraic, script.boolean, script.rugged

• Sweep mode allows multiple required times (clock periods) to be easily tabulated

• When delay is non-critical (ie. as required time approaches infinity)

• Area within 20% of no optimization

• Variation between optimization scripts mostly under 10%

• Sometimes more optimization yields worse results

• As required times become smaller, more paths become critical requiring larger sizes (area)

• Area increases quickly before failure

• From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times

• Accuracy

• Relate estimated areas to actual areas from a good mapping using the full technology library

• Use more complex delay equations to handle different rise/fall times

• Modify the algorithm to handle the case where a primary input cannot drive the required load

• Characterization

• Revise characterization to support piece-wise linear functional forms

• Automate process so only the actual technology library is required as an input

• Mapping

• Examine how various mapping options affect estimation

• Use buffered fanout trees (Touati) after sizing gates

• Speed

• Compare speed of total estimation procedure to traditional flow

• Power estimation