A Method for Fast Delay/Area Estimation

1 / 18

# A Method for Fast Delay/Area Estimation - PowerPoint PPT Presentation

A Method for Fast Delay/Area Estimation. EE219b Semester Project Mike Sheets May 16, 2000. Overview. Problem statement Proposed solution Constant delay paradigm Zero-slack algorithm Implementation Incorporation into SIS Library characterization Results Conclusions Future Work.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'A Method for Fast Delay/Area Estimation' - cameo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### A Method for Fast Delay/Area Estimation

EE219b Semester Project

Mike Sheets

May 16, 2000

Overview
• Problem statement
• Proposed solution
• Constant delay paradigm
• Zero-slack algorithm
• Implementation
• Incorporation into SIS
• Library characterization
• Results
• Conclusions
• Future Work
Problem Statement
• Given a boolean network, estimate the area if implemented with particular required time constraints
• Estimation should be fast and reasonably accurate
• Examine how technology independent logic optimization affects the estimation
Area/Delay Models
• Constant area (traditional) model
• Composed of discretely sized gates with constant area
• Mapping involves calculating delay as a function of load
• Constant delay model
• Composed of mathematical functions relating area to size
• Mapping involves calculating size (area) as a function of load

Constant Area Model

Constant Delay Model

ND2X1

ND2

CL

CL

Area = constant from library

Size = constant from library

Delay = dint + k*CL

Area = Aint + Aslope*size

Size = k*CL /(Delay – dint)

Delay = constant

Zero Slack Algorithm

Given input arrival times {ai} and output required time {rk}, assign gate delays as follows:

• Initialize all internal required/arrival times to “unknown”
• Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates
• For each node from primary inputs to primary outputs
• Calculate all the (ai, li) pairs from all fanin edges
• Discard dominated pairs, save the union of the undominated pairs
• When all primary outputs are reached, calculate minimum (rk-ai)/lp
• Assign delay of each gate in the selected path(s) to this minimum
• Update arrival and required times for all fi and fo edges of newly assigned delays
• Repeat steps 2-4 until all gates are assigned delays

Pair domination defined:

a1

r3

Pair (ai, li) dominates (aj, lj) if

ai  aj and li  lj

If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary.

n1

n3

l1

l3

n

a2

r4

l2

l4

n2

n4

Faster Approximation

Select an allowable slack threshold sthresh (if zero then algorithm yields same result as previous)

• Compute the forward level lj and arrival time aj of all nodes in network using a forward trace
• Compute the reverse level kj and required time rj of all nodes in network using a backward trace
• Update the delay of every node as dj = dj + (rj-aj)/(lj+kj)
• While the slack of any node exceeds sthresh then repeat steps 1-3.
Incorporation into SIS

BLIF

net.

Tech. independent optimization:

script.algebraic, script.boolean, etc

Tech.

lib.

Tech. dependent optimization:

map

Area

Manual

analysis

Est.

lib.

Fast delay/area estimation:

estimate

Library Characterization
• Commercial standard cell library have possibly multiple gates that implement the same equation
• Each gate in the library has characteristics:
• Size
• Delays from all input pins to the output pin for all transitions and several loads
• Capacitance for all input pins
• Area
• We need estimation parameters for each class of gates (ie. gates with the same equation):
• Intrinsic gate delay (dint)
• Drive factor (k)
• Area line y-intercept (Aint)
• Area line slope (Aslope)
• Input capacitance line y-intercept (cint)
• Input capacitance line slope (cslope)
Inverter Characterization (1)
• Inverter delay scales linearly with load/size
• Slope is k
• Y-intercept is dint
Inverter Characterization (2)
• Inverter area scales linearly with size
• Slope is Aslope
• Y-intercept is Aint
Characterization Issues
• Requires at least two gates per class in the library
• Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination)
• Further research shows the reason is CMOS implementation (below)
• Future work might replace linear model with piece-wise linear model for more accuracy

NAND-gate CMOS schematic

for smaller sizes

NAND-gate CMOS schematic

for larger sizes

Estimation Library
• These issues are evident in the table
• OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area
• NOR3, NOR4 had poor coefficients of determination
• Many gates in the library had only one size
Estimation Modes
• Sweep mode
• User specifies a range of required times to sweep (possibly only one) and a step size
• Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered)
• Binary search mode
• Used to find the minimum possible required time (period) given infinite area
• Starts at a user-specified maximum and performs a binary search until a pass limit is reached
Experimentation
• Various sized combinational logic benchmarks
• MCNC c17, c880, c1908, c3540
• Various sized sequential logic benchmarks
• Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously)
• MCNC s713, s838, s953, s1196, s1238, s1423
• Tested four scripts
• script.none (no optimization), script.algebraic, script.boolean, script.rugged
• Sweep mode allows multiple required times (clock periods) to be easily tabulated
Sensitivity to Optimization Script
• When delay is non-critical (ie. as required time approaches infinity)
• Area within 20% of no optimization
• Variation between optimization scripts mostly under 10%
Conclusions
• Sometimes more optimization yields worse results
• As required times become smaller, more paths become critical requiring larger sizes (area)
• Area increases quickly before failure
• From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times
Possible Future Work
• Accuracy
• Relate estimated areas to actual areas from a good mapping using the full technology library
• Use more complex delay equations to handle different rise/fall times
• Modify the algorithm to handle the case where a primary input cannot drive the required load
• Characterization
• Revise characterization to support piece-wise linear functional forms
• Automate process so only the actual technology library is required as an input
• Mapping
• Examine how various mapping options affect estimation
• Use buffered fanout trees (Touati) after sizing gates
• Speed
• Compare speed of total estimation procedure to traditional flow
• Power estimation