- 94 Views
- Uploaded on
- Presentation posted in: General

Traditional SOC Design Flow

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Key Problem: Timing assumption during prelayout synthesis widely differs from the post layout reality.
- This happens because the interconnect delay dominates the overall propagation delay in DSM (Deep Sub-Micron) technologies.
- As a result getting a timing closure becomes a challenge.

Source: Advanced ASIC Chip Synthesis. 2nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers

Set Design Constraints

Develop HDL files

Design Rule Constraints

set_max_transition

set_max_fanout

set_max_capacitance

Design Optimisation Constraints

Create_clock

set_clock_latency

set_propagated_clock

set_clock_uncertainty

set_clock_transition

set_input_delay

set_output_delay

set_max_area

Specify Libraries

Library Objects

link_library

target_library

symbol_library

synthetic_library

Read Design

analyze

elaborate

read_file

Select Compile Strategy

Top Down

Bottom Up

Define Design Environment

Optimize the Design

Set_operating_conditions

Set_wire_load_model

Set_drive

Set_driving_cell

Set_load

Set_fanout_load

Set_min_library

Compile

Analyze and Resolve

Design Problems

Check_design

Report_area

Report_constraint

Report_timing

Save the

Design database

write

- .synopsys_dc.setup
- Library paths
- Company wide, project wide design environment related variables and commands
- UNIX variables

- Three files at three locations. All three are read in the following order
- Synopsys root - $SYNOPSYS/admin/setup
- Affects all users. Only system adminstrator can modify this. In small startups with only single ASIC project, this serves as the place to enforce project wide discipline.

- Home Directory
- Content affects all DC activities. Project wide enforcement could happen at these level if the designer is involved in a single project (less likely).

- Working Directory
- Affects the current invocation of DC. If a person is working on more than one Synopsys projects (more likely), then the project wide enforcement should happen at this level. One working directory for each project.

- Synopsys root - $SYNOPSYS/admin/setup
- Repeated commands are overridden

Technology Library

Created by ASIC vendor in Synopsys format – which is now an open standard.

Cells are defined by their names, function, timing, net delay, parasitic information, units for time, resistance, capacitance etc.

Target Library

a technology library that Design Compiler maps to during optimization.

Link Library

The technology library that contains the definition of the cells used in the mapped design. In principle should be the same as target_library unless a technology translation is being performed.

- Symbol Library
Definition of graphics symbols. Cells in Symbol Library must match

- DesignWare Library
A DesignWare component library is a collection of reusable circuit-design building blocks that are tightly integrated into the Synopsys synthesis environment.

- GTECH Library
The GTECH library is the Synopsys generic technology library. It is technology-independent and included with Design Compiler software.

GTECH parts are Synopsys unmapped representations of Boolean functions (library cell placeholders). GTECH instantiation allows for a technology-independent HDL description and the accuracy of instantiation.

- Search_path
If the library variables only specify file names, search_path is used to locate libraries. By default points to current working directory and $SYNOPSYS/libraries/syn

- Design
A circuit that performs one or more logical functions

- Cell
An instance of a design or library primitive within a design

- Reference
The name of the original design that a cell instance points to

- Port
The input or output of a design

- Pin
The input or output of a cell

- Net
A wire that connects ports to ports or ports to pins

- Clock
A timing reference object to describe a waveform for timing analysis

Read about these commands from Synopsys Documentation

Find and Filter

Read / Analyze / Elaborate

Compile

Report_timing

Also read about what are Attributes and Variables

Synopsys Design Environment Essentials

CMOS essentials for logic synthesis

Constraint Classification

Load and Drive Constraints

Clocking constraints

Operating Conditions Constraints

Static Timing Analysis

Chip Level Timing and Multiple Clock Domains

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

This is also known as Elmore Delay model

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Width of transistor is found by multiplying the scaling factor (16/8/2/1) with the minimum width of transistor which is 0.5 mm.

Multiply Cg,N/Cg,P/Cd,N/Cd,P with the width of the transistor to get the drain/gate capacitances for P and N transistors.

Wider transistor more capacitance

Divide Reff,N/Reff,P with the width of the transistor to get the Resistance for the N and P transistors.

Wider Transistor Less resistance

The factor 2.2 comes from 90% Vdd swing

loge(0.9Vdd/ 0.1Vdd)

The sheet resistance (0.07) is for unit square.

Since the wire width is 0,25mm. resistance for 1 mm X 0.25 mm wire is 0.07/0.25. This factor is multiplied by the length 250 mm

The wire capacitance is made up of two parts: Bottom (area) capacitance found using 250 X 0.25 (area) X CA,M2.

Side capacitance is found by multiplying length 250 XCL,M32

Source: MIT. Course 6.375. Lecture L06. 2006

- Optimisation Constraints
- Performance – clock
- Area
- Power

- Technology, Operating and Manufacturing Constraints
- Max rise time, max capacitance
- Operating Conditions –
- Vdd, Temperature
- Drive current, Load

- Process Variations
- Fast corner, Slow corner

- Physical Design
- Antenna rules

Design

Create a solution

Technology, Operating & Manufacturing Constraints

Optimisation Constraints

Evaluate the solution

Analysis

Constraints Met

- Exhaustively verifies that
- the timing constraints (clock) are met for a design
- for given technology (Standard Cell Library) and
- a set of specified operating conditions

- Limitations of the alternative – Simulation
- Not Exhaustive
- Accuracy
- RTL
- Gate Level
- SDF back annotation
- Dependent on STA

- Circuit Level SPICE simulation are impractical

- Time (STA also takes time, but is bounded)

PROCESS (clk)

BEGIN

IF rising_edge (clk) THEN

s <= a * b;

END IF;

END

- Untimed
- Transaction Level - SystemC
- Multiple Cycles
- Bus Transactions, Transmit/Receive, Encode/Decode

- Cycle Accurate – RTL
- What happens in each clock cycle is accurately known

- Gate Level – Event Driven
- Physical details of computation, storage and interconnect operations known
- Delay in wire is not known
- Clock is ideal

- Layout Level
- Delay in wire known
- Clock is real
- Relative position of standard cell is known

A=1

Z

B

Vdd

B

Z

Vdd

0.7Vdd

R

z

0.5Vdd

y

Q

0.3Vdd

P

x

t1

t1

t2

t2

- The intrinsic delays and the slews are characterised using SPICE simulation by sweeping many parameters that affects the Intrinsic delay and Slew
- All the paths are exhaustively covered

Library and Design

Environment Conditions for Analysis

A

Delay Computation

Through Wire

Delay Computation

Through Gate

Delay and Slew

At Gate Output

B

D

Delay and Slew

At Next Gate Input

C

- Paths
- Start point: Input ports or clock pins of sequential devices and
- End point: Output ports or Data input pins of sequential devices.

- Paths are organised in groups identified by clocks controlling their endpoints.

- positive unate timing arc:
- Combines rise delays with rise delays, and fall delays with fall delays. An example is an AND gate cell delay or an interconnect (net) delay.

- negative unate timing arc:
- Combines incoming rise delays with local fall delays, and incoming fall delays with local rise delays. An example is a NAND gate.

- nonunate timing arc:
- Combines local delay with the worst-case incoming delay value. Nonunate timing arcs are present in logic functions whose output value change cannot be predicted by the direction of the change on the input value. An example is an XOR gate.

- Accuracy of estimates is critical
- Intrinsic Delays are accurate after logic synthesis
- Slew and Net Delays are estimated and known accurately only after physical synthesis

Discrete Factors:

Geometry & Dimension

Specific Path

Transition Direction

Related Pin

P1

P2

Z

A

N1

4 Input NAND gate

B

N2

- Load on the Gate
- Load of all the inputs that this output has to drive
- Load of the interconnect wires
- Tri-stated wires

- Input Slew
- Transition time at the previous gate
- The interconnect
- Primary input – drive strength, driver cell

- Technology Constraints
- Max Transition
- Max Fanout
- Max Capacitance
- Min Capacitance

- Design Constraints
- Set Load
- Set Drive (inverse of resistance)

5

Z1

A

Z2

A

Z3

set_driving_cell

set_load

or set_drive

Technology Constraint; Cannot be relaxed

Design Constraint

- If drive or driving cell is not specified, the synthesis tool assumes infinite drive strength
- If load is not specified, the synthesis tool assumes zero load

Piece Wise Linear Model

Load

D12

D22

L2

L

D2

D1

D

L1

D11

D21

S

S1

S2

Slew

worst

worst

worst

Delay

nominal

Delay

nominal

Delay

nominal

best

best

best

Process

Temperature

Voltage

Operating Conditions

NameLibraryProcessTempVoltInterconnect Model

WCCOMmy_lib1.50701.1worst_case_tree

WCINDmy_lib1.50801.1worst_case_tree

WCMILmy_lib1.501251.0worst_case_tree

BCCOMmy_lib1.5001.2best_case_tree

BCINDmy_lib1.50-401.2best_case_tree

BCMILmy_lib1.50-551.3best_case_tree

Consider a minimum size NMOS device in a 1.2 mm CMOS process. VGS =VDS = 5V

The nominal saturation current for the device size W = 1.8 mm, Leff = 0,9 um

Now consider the variation in the following parameters:

- 25 % variation in Threshold voltage – Vt
- 10 % variation in transconductancek’n mainly due to variation in oxide thickness.
- ±0.15mm (about 10 %) variation in W and L. Variations in W and L are uncorrelated as they are
- ±0.5V (10%) variation in power supply voltage

Speed of device is proportional to the drain current and can thus result in variation of the speed of the circuit.

Libraries are characterized for various operating conditions

Further characterisation is done to see how the delay model responds to change in process, voltage and temperature. This is done by holding two parameters constant and sweeping the third.

This yields derating factors for Process, Voltage and Temperature

Timing relationship between

two input pins

two consecutive events on the same input pin

Pulse Width

Setup

Hold

Recovery

Removal

Width of High and low phases of clocks

Width of Active level of asynchronous inputs like reset

Not met. Reset may

have no effect

rst_n

Pulse

Width

Requirement

Data should be stable setup time before the arrival of clock edge.

What happens if the setup time is violated ?

Not met. New data

may not get latched

clk

data

Setup Requirement

Data should be stable hold time after the arrival of clock edge.

What happens if the Hold time is violated ?

Not met. Old data may

not get latched

clk

data

Hold

Requirement

Minimum time between de-assertion of an asynchronous control signal and the next active clock edge

Minimum time between an active clock edge that an asynchronous control signal should remain asserted

rst_n

Not met. clk may

not have effect

Not met. clk may

override rst_n

clk

clk

rst_n

Recovery

Requirement

Removal

Requirement

Can be formulated as a setup check

Can be formulated as a hold check

a

Vin2 = Vout1

Vin1

Vout1

Vin2, Vout1

c

c

Vin2

Vout2

b

Vin1 = Vout2

Vin1, Vout2

a

b

http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time

The time it takes data D to reach node Z is called the setup time.

The time it takes data D to reach node W is called the hold time.

http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time

http://www.edn.com/design/analog/4371393/Understanding-the-basics-of-setup-and-hold-time

Setup Constraint

Boundary of the Flop

Assume C1 is zero

clk reaches F1 before data has arrived at F1 and registers wrong data

To avoid this, data should stabilize D1 time before the arrival of clk.

In reality, C1 is never zero, so data should stabilize D1-C1 time before the arrival of clk.

As there are multiple D1 paths and multiple C1 paths, the complete and safe setup constraint is max (data path delays) – min (clock path delays)

Delay D1

data

F1

Delay C1

clk

Hold Constraint

Assume D1 is zero

Data reaches F1 before clk has arrived at F1. When the clk arrives, new data has overwritten the previous data.

To avoid this, data should remain stable C1 time after the arrival of clk.

In reality, D11 is never zero, so data should remain stable C1-D1 time after the arrival of clk.

The complete and safe hold constraint is max (clock path delays) – min (data path delays)

Typically clock paths are well buffered and faster

There can be substantial data path delay, especially in scan flops

max (data path delays) – min (clock path delays) is always positive. This implies that Setup constraint is never negative

max (clock path delays) – min (data path delays) can be negative. This implies that Hold constraint can be negative

Boundary of the Flop

Delay D1

data

clk

F1

At Device Interface

Delay C1

clk

data

At Latching Element

clk

Stable

Stable

New

New

Setup + Hold (cannot be negative) =

Max(clock path) + Max(data path) –

Min(clock path) – Min(data path)

data

Negative Hold – Seen At Device Interface

Good design practice mandates that inBlock does not have a combinatorial logic (”m”) driving output

These days ”m” is more likely to be the result of global interconnect delay.

Early floorplanning is a good way to estimate the delay due to ”m”

If floorplanning is not done a good bet is 50-60% of the clock cycle

Characterize command automatically calculates input delay from parent design

set_input_delay -clock Clock 8 “data_in_2”

set_output_delay -clock Clk -max -fall 10 {"Z<0>" "Z<1>"}

C2

I1

C0

C1

C3

O1

F1

F2

F3

clk

C4

I2

O2

O2 = TI2 + C4

Four kinds of path groups exist:

Input to Output, e.g., I2 to O2

Input to Register, e.g, I1 to F1

Register to Register F1 to F2

Register to Output F3 to O1

TI1 + C0 ≤ P – S1

TI1 + C0 ≥ H1

Setup Slack: P- S1- TI1- C0

Hold Slack: TI1 + C0 - H1

Setup and Hold Slacks should be positive

TI1, TI2 are input delays

DQ1, DQ2 and DQ3 are clk-to-Q delays

S1, S2 and S3 are setup constraints

H1, H2 and H3 are hold constraints

C0-C3 combinatorial delays

P is the clock Period

DQ1 + C1 ≤ P – S2

DQ2 + C1 ≥ H2

Setup Slack: P - S2 - DQ2 - C1

Hold Slack: DQ2 + C1 – H2

Gate Level Design

Simulation Library

Timing Library

Timing Analysis

Tool

Simulator

SDF File

Source: MIT. Course 6.375. Lecture L06. 2006

Clock Skew in Alpha Processor

The basic assumption in synchronous system is that all the sequential elements in the design sample their input at the same time, marked by a clock signal. In reality, the clock signal does not arrive at the sequential elements at the same time. The difference in time between the reference clock signal and the local clock signal at a sequential element is called the clock skew.

In fact clock skew would not be a problem if the clock signal was uniformly delayed at all the sequential elements. It is the non-uniform delay of the clock signal that creates the problem. The delay depends on the distance of the sequential element from the clock source and the local load.

The primary reason for the delay is the large amount of load seen by the clock signal. The load consists of all the sequential elements in the design and clock net itself which behaves as a distributed RC line (or higher order models ) and can be several cms long in a large chip.

The total capacitance of a single clock line easily measures hundreds of pF and can easily reach into nF range. The total clock capacitance of the Alpha processor equals 3.25 nF, which is 40% of the total switching capacitance of the entire chip.

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Source: MIT. Course 6.375. Lecture L06. 2006

Each synchronous module is composed of combinational logic CL and a Flop and is characterised by six timing parameters: The min. and max. propagation(pg) delays of the register: tr,min, tr,max and combinational logic: tl,min, tl,max. The propagation delay of the interconnect ti and the local clock skew tf.

The max pg. delay corresponds to the time taken by the slowest output to respond to any transition at input. This delay constraints the max. allowable clock speed.

The min pg. delay corresponds to the time taken by atleast one output to start responding to a transition at input. This delay is typically much smaller than the max delay and determines the amount of skew a circuit can tolerate before race condition occurs. If d is greater tr,min + ti + tl,min than inputs at R2 can change before the previous inputs are latched.

tf” tf’ + tr,min + ti + tl,min OR

d tr,min + ti + tl,min

tf” + T tf’ + tr,max + ti + tl,maxOR

T tr,max + ti + tl,max - d

- Positive Skew: d > 0:
- In this case the clock is routed in the same direction as the data and the first equation needs to be satisfied. Violating it will result in malfuntioning of circuit. Observe that slowing down the clock period does not help. The positive skew actually helps improve the clock speed as it is a negative factor in the constraint on clock period T.

- Negative Skew: d < 0:
- The negative skew occurs when the data is routed in the direction opposite to the clock signal. The first equation is unconditionally satisfied and the circuit works correctly independent of the skew. Unfortunately, negative skew will limit the clock speed and thus lower the performance, as predicted by the second equation: the skew reduces the time available for computation by |d|.

a

c

d

b

0

Setup time met

Hold time met

Launch

Clock

c

a

b

0

Capture

Clock

d

a

0

b

a

c

d

b

0

Setup time violated

Hold time violated

Launch

Clock

c

a

b

0

Capture

Clock

d

a’

0

b’

a

c

d

b

0

Setup time violated

Hold time met

Launch

Clock

c

a

b

0

Capture

Clock

d

0

logic

logic

FF 1

FF 2

setup

startpoint

hold

relationship

relationship

endpoint

Setup Violations result from worst case timing

Hold Violations result from best case timing

4

4

1

2

3

4

1

2

3

4

CGU

CGU

6

5

8

6

5

8

8

8

7

7

Blocks 4 & 8 communicate and need their clocks to be skew alligned

The data signals between Blocks 4 & 8 could take more than one clock cycle and can get routed through blocks 5 and 6

This makes chip level timing closure difficult and sensitive to geometry.

A hierarchical design style, where each chiplets are timing closed independently and chip can be composed from such chiplets. Solution: Latency insensitive design.

Data Based

ClockBased

GS

Double Latch

GALS

Handshake: 2 Phase, 4 Phase

GRLS (KTH Technology)

Asynchronous – 2 Clock FIFO

Latency

ambiguity

Data based

synchronization

Clock based

synchronization

Constraints

Complexity

PS

PD

S

ACL

D

CLKs

CLKD

D

D

Q

Q

Ps

PD

CLKD

ACL: Asynchronous Communication Link

Source

Destination

- Advantages
- Good choice for single bit control data
- Grey coded multi bit data payloads are also target

- Disadvantages
- No Flow Control Send and Forget
- Metastable signal to multiple targets could resolve to different values

PS

PD

S

D

ACL

RS

RD

AS

AD

CLKs

CLKD

Ps

PD

D

Q

CLKD

FSM

RS

RD

Q

D

D

Q

AD

AS

Q

D

Q

D

FSM

CLKs

Pd: Destination Payload

Ps: Source Payload

Data payload frequency must be less than the worst-case round trip delay of the flow control

2-phase

3Ts + 3Td ≥ TPs

4 phase

6Ts + 6Td ≥ TPs

Example:

Source: 27 MHz, Destination: 200 MHz

Maximum isochronous data rate using 2 phase protocol

3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz

3Ts + 3Td

6Ts + 6Td

TPs

TPs

TPs

The period for which

data remains valid/asserted

2-phase

3Ts + 3Td ≥ TPs

4 phase

6Ts + 6Td ≥ TPs

Note that TPs does not decide data payload frequency. TPs is less than the round trip delay to enable the next payload to be transferred immediately after the round trip delay is over.

The period (TPL)corresponding to the data payload frequency has to be more than the worst case round trip delay i.e. 3Ts + 3Td ≤ TPL and 6Ts + 6Td ≤ TPLfor 2 and 4 phase protocols respectively. This is illustrated in the example below

Data payload frequency must be less than the worst-case round trip delay of the flow control

4-phase

6Ts + 6Td

2-phase

3Ts + 3Td

Example:

Source: 27 MHz, Destination: 200 MHz

Maximum isochronous data rate using 2 phase protocol

3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz

- Fail Safe, Self Correcting:
- Write logic could think the FIFO is full when it is not
- Read logic could think that the FIFO is empty when it is not

- Not suitable for Island hopping:
- Storage in Write Island is a problem
- Typically the read side needs to be read every cycle

Source: ETH, Zurich

- Synchronous Design – phase and skew alligned
- Mesochronous Design – same clk freq and phase alligned
- Ratiochronous Design
Different Clock freqs but have rational relationship – phase alligned

KTH research

- Pleisochronous
- No rational clock relationship – phase relationship drifts

- Asynchronous

- During the initial phase of synthesis clock is ideal
- set_auto_disable_drc_nets command should be used to prevent DC from wasting time on fixing DRC violations on high fanout nets like Resets and Clocks
- Model skew and jitter effects using the set_clock_uncertainity command
- Model clock network latency using set_clock_latency command
- Once clock tree has been inserted use the set_propagated_clock command to use the actual clock. Back annotation using read_sdf command is required