- 72 Views
- Uploaded on
- Presentation posted in: General

A Method for Fast Delay/Area Estimation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Method for Fast Delay/Area Estimation

EE219b Semester Project

Mike Sheets

May 16, 2000

- Problem statement
- Proposed solution
- Constant delay paradigm
- Zero-slack algorithm

- Implementation
- Incorporation into SIS
- Library characterization
- Results

- Conclusions
- Future Work

- Given a boolean network, estimate the area if implemented with particular required time constraints
- Estimation should be fast and reasonably accurate

- Examine how technology independent logic optimization affects the estimation

- Constant area (traditional) model
- Composed of discretely sized gates with constant area
- Mapping involves calculating delay as a function of load

- Constant delay model
- Composed of mathematical functions relating area to size
- Mapping involves calculating size (area) as a function of load

Constant Area Model

Constant Delay Model

ND2X1

ND2

CL

CL

Area = constant from library

Size = constant from library

Delay = dint + k*CL

Area = Aint + Aslope*size

Size = k*CL /(Delay – dint)

Delay = constant

Given input arrival times {ai} and output required time {rk}, assign gate delays as follows:

- Initialize all internal required/arrival times to “unknown”
- Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates
- For each node from primary inputs to primary outputs
- Calculate all the (ai, li) pairs from all fanin edges
- Discard dominated pairs, save the union of the undominated pairs

- When all primary outputs are reached, calculate minimum (rk-ai)/lp

- For each node from primary inputs to primary outputs
- Assign delay of each gate in the selected path(s) to this minimum
- Update arrival and required times for all fi and fo edges of newly assigned delays
- Repeat steps 2-4 until all gates are assigned delays

Pair domination defined:

a1

r3

Pair (ai, li) dominates (aj, lj) if

ai aj and li lj

If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary.

n1

n3

l1

l3

n

a2

r4

l2

l4

n2

n4

Select an allowable slack threshold sthresh (if zero then algorithm yields same result as previous)

- Compute the forward level lj and arrival time aj of all nodes in network using a forward trace
- Compute the reverse level kj and required time rj of all nodes in network using a backward trace
- Update the delay of every node asdj = dj + (rj-aj)/(lj+kj)
- While the slack of any node exceeds sthresh then repeat steps 1-3.

BLIF

net.

read_blif

Tech. independent optimization:

script.algebraic, script.boolean, etc

Tech.

lib.

Tech. dependent optimization:

map

read_library

Area

Manual

analysis

Est.

lib.

Area/delay tradeoff curve

read_estim

Fast delay/area estimation:

estimate

- Commercial standard cell library have possibly multiple gates that implement the same equation
- Each gate in the library has characteristics:
- Size
- Delays from all input pins to the output pin for all transitions and several loads
- Capacitance for all input pins
- Maximum load
- Area

- We need estimation parameters for each class of gates (ie. gates with the same equation):
- Intrinsic gate delay (dint)
- Drive factor (k)
- Area line y-intercept (Aint)
- Area line slope (Aslope)
- Input capacitance line y-intercept (cint)
- Input capacitance line slope (cslope)

- Inverter delay scales linearly with load/size
- Slope is k
- Y-intercept is dint

- Inverter area scales linearly with size
- Slope is Aslope
- Y-intercept is Aint

- Requires at least two gates per class in the library
- Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination)
- Further research shows the reason is CMOS implementation (below)
- Future work might replace linear model with piece-wise linear model for more accuracy

NAND-gate CMOS schematic

for smaller sizes

NAND-gate CMOS schematic

for larger sizes

- These issues are evident in the table
- OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area
- NOR3, NOR4 had poor coefficients of determination
- Many gates in the library had only one size

- Sweep mode
- User specifies a range of required times to sweep (possibly only one) and a step size
- Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered)

- Binary search mode
- Used to find the minimum possible required time (period) given infinite area
- Starts at a user-specified maximum and performs a binary search until a pass limit is reached

- Various sized combinational logic benchmarks
- MCNC c17, c880, c1908, c3540

- Various sized sequential logic benchmarks
- Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously)
- MCNC s713, s838, s953, s1196, s1238, s1423

- Tested four scripts
- script.none (no optimization), script.algebraic, script.boolean, script.rugged

- Sweep mode allows multiple required times (clock periods) to be easily tabulated

- When delay is non-critical (ie. as required time approaches infinity)
- Area within 20% of no optimization
- Variation between optimization scripts mostly under 10%

- Sometimes more optimization yields worse results
- As required times become smaller, more paths become critical requiring larger sizes (area)
- Area increases quickly before failure

- From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times

- Accuracy
- Relate estimated areas to actual areas from a good mapping using the full technology library
- Use more complex delay equations to handle different rise/fall times
- Modify the algorithm to handle the case where a primary input cannot drive the required load

- Characterization
- Revise characterization to support piece-wise linear functional forms
- Automate process so only the actual technology library is required as an input

- Mapping
- Examine how various mapping options affect estimation
- Use buffered fanout trees (Touati) after sizing gates

- Speed
- Compare speed of total estimation procedure to traditional flow

- Power estimation