A method for fast delay area estimation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

A Method for Fast Delay/Area Estimation PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

A Method for Fast Delay/Area Estimation. EE219b Semester Project Mike Sheets May 16, 2000. Overview. Problem statement Proposed solution Constant delay paradigm Zero-slack algorithm Implementation Incorporation into SIS Library characterization Results Conclusions Future Work.

Download Presentation

A Method for Fast Delay/Area Estimation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A method for fast delay area estimation

A Method for Fast Delay/Area Estimation

EE219b Semester Project

Mike Sheets

May 16, 2000


Overview

Overview

  • Problem statement

  • Proposed solution

    • Constant delay paradigm

    • Zero-slack algorithm

  • Implementation

    • Incorporation into SIS

    • Library characterization

    • Results

  • Conclusions

  • Future Work


Problem statement

Problem Statement

  • Given a boolean network, estimate the area if implemented with particular required time constraints

    • Estimation should be fast and reasonably accurate

  • Examine how technology independent logic optimization affects the estimation


Area delay models

Area/Delay Models

  • Constant area (traditional) model

    • Composed of discretely sized gates with constant area

    • Mapping involves calculating delay as a function of load

  • Constant delay model

    • Composed of mathematical functions relating area to size

    • Mapping involves calculating size (area) as a function of load

Constant Area Model

Constant Delay Model

ND2X1

ND2

CL

CL

Area = constant from library

Size = constant from library

Delay = dint + k*CL

Area = Aint + Aslope*size

Size = k*CL /(Delay – dint)

Delay = constant


Zero slack algorithm

Zero Slack Algorithm

Given input arrival times {ai} and output required time {rk}, assign gate delays as follows:

  • Initialize all internal required/arrival times to “unknown”

  • Select the path(s) with the minimum value of (rk-ai)/lp where lp is the length of the path in number of gates

    • For each node from primary inputs to primary outputs

      • Calculate all the (ai, li) pairs from all fanin edges

      • Discard dominated pairs, save the union of the undominated pairs

    • When all primary outputs are reached, calculate minimum (rk-ai)/lp

  • Assign delay of each gate in the selected path(s) to this minimum

  • Update arrival and required times for all fi and fo edges of newly assigned delays

  • Repeat steps 2-4 until all gates are assigned delays

Pair domination defined:

a1

r3

Pair (ai, li) dominates (aj, lj) if

ai  aj and li  lj

If either (a1, l1) or (a2, l2) dominates the other, the four possible paths through n can be reduced to two, since the dominated path is “faster” than necessary.

n1

n3

l1

l3

n

a2

r4

l2

l4

n2

n4


Faster approximation

Faster Approximation

Select an allowable slack threshold sthresh (if zero then algorithm yields same result as previous)

  • Compute the forward level lj and arrival time aj of all nodes in network using a forward trace

  • Compute the reverse level kj and required time rj of all nodes in network using a backward trace

  • Update the delay of every node asdj = dj + (rj-aj)/(lj+kj)

  • While the slack of any node exceeds sthresh then repeat steps 1-3.


Incorporation into sis

Incorporation into SIS

BLIF

net.

read_blif

Tech. independent optimization:

script.algebraic, script.boolean, etc

Tech.

lib.

Tech. dependent optimization:

map

read_library

Area

Manual

analysis

Est.

lib.

Area/delay tradeoff curve

read_estim

Fast delay/area estimation:

estimate


Library characterization

Library Characterization

  • Commercial standard cell library have possibly multiple gates that implement the same equation

  • Each gate in the library has characteristics:

    • Size

    • Delays from all input pins to the output pin for all transitions and several loads

    • Capacitance for all input pins

    • Maximum load

    • Area

  • We need estimation parameters for each class of gates (ie. gates with the same equation):

    • Intrinsic gate delay (dint)

    • Drive factor (k)

    • Area line y-intercept (Aint)

    • Area line slope (Aslope)

    • Input capacitance line y-intercept (cint)

    • Input capacitance line slope (cslope)


Inverter characterization 1

Inverter Characterization (1)

  • Inverter delay scales linearly with load/size

    • Slope is k

    • Y-intercept is dint


Inverter characterization 2

Inverter Characterization (2)

  • Inverter area scales linearly with size

    • Slope is Aslope

    • Y-intercept is Aint


Characterization issues

Characterization Issues

  • Requires at least two gates per class in the library

  • Additionally, some gates have poor accuracy (trend lines have poor coefficients of determination)

  • Further research shows the reason is CMOS implementation (below)

  • Future work might replace linear model with piece-wise linear model for more accuracy

NAND-gate CMOS schematic

for smaller sizes

NAND-gate CMOS schematic

for larger sizes


Estimation library

Estimation Library

  • These issues are evident in the table

    • OAI31 and OAI32 have Aslope of 0.0, meaning that the two cells in the library had the same area

    • NOR3, NOR4 had poor coefficients of determination

    • Many gates in the library had only one size


Estimation modes

Estimation Modes

  • Sweep mode

    • User specifies a range of required times to sweep (possibly only one) and a step size

    • Estimation starts with the largest required time and steps down until network fails the zero slack algorithm (ie. negative slack is encountered)

  • Binary search mode

    • Used to find the minimum possible required time (period) given infinite area

    • Starts at a user-specified maximum and performs a binary search until a pass limit is reached


Experimentation

Experimentation

  • Various sized combinational logic benchmarks

    • MCNC c17, c880, c1908, c3540

  • Various sized sequential logic benchmarks

    • Interpretation of required time is clock period (assuming all flip-flops are clocked synchronously)

    • MCNC s713, s838, s953, s1196, s1238, s1423

  • Tested four scripts

    • script.none (no optimization), script.algebraic, script.boolean, script.rugged


Tradeoff curves

Tradeoff Curves

  • Sweep mode allows multiple required times (clock periods) to be easily tabulated


Sensitivity to optimization script

Sensitivity to Optimization Script

  • When delay is non-critical (ie. as required time approaches infinity)

    • Area within 20% of no optimization

    • Variation between optimization scripts mostly under 10%


Conclusions

Conclusions

  • Sometimes more optimization yields worse results

  • As required times become smaller, more paths become critical requiring larger sizes (area)

    • Area increases quickly before failure

  • From the benchmarks shown, estimation is relatively insensitive to technology independent optimization with infinite required times


Possible future work

Possible Future Work

  • Accuracy

    • Relate estimated areas to actual areas from a good mapping using the full technology library

    • Use more complex delay equations to handle different rise/fall times

    • Modify the algorithm to handle the case where a primary input cannot drive the required load

  • Characterization

    • Revise characterization to support piece-wise linear functional forms

    • Automate process so only the actual technology library is required as an input

  • Mapping

    • Examine how various mapping options affect estimation

    • Use buffered fanout trees (Touati) after sizing gates

  • Speed

    • Compare speed of total estimation procedure to traditional flow

  • Power estimation


  • Login