- 72 Views
- Uploaded on
- Presentation posted in: General

Logic Decomposition of Asynchronous Circuits Using STG Unfoldings

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Logic Decomposition of Asynchronous Circuits Using STG Unfoldings

Victor Khomenko

School of Computing Science,

Newcastle University, UK

- The traditional synchronous (clocked) designs
lack flexibility to cope with contemporary

design technology challenges

Asynchronous circuits – no clocks:

- Low power consumption and EMI
- Tolerant of voltage, temperature and
manufacturing process variations

- Modularity – no problems with the clock skew
and related subtle issues

[ITRS’09]: 22% of designs will be driven by ‘handshake clocking’ in 2013, and 40% in 2020

- Synthesis algorithms are complicated
- Computationally hard to synthesize efficient circuits

- Logic decomposition is one of the most difficult tasks in logic synthesis
- The quality of the resulting circuit (in terms of area and latency) depends to a large extent on the way logic decomposition was performed

F

instant

evaluator

delay

…

- Gates are atomic (so no internal hazards)
- Gates’ delays are positive and unbounded (and perhaps variable)
- Wire delays are negligible (SI) or, alternatively, wire forks are isochronic (QDI)

delay

delay

delay

delay

G

H1

…

…

Hk

…

F

instant

evaluator

…

Data Transceiver

Device

Bus

d

lds

dtack-

dsr+

lds+

csc+

dsr

VME Bus

Controller

ldtack

dtack

d-

lds-

ldtack-

ldtack+

csc-

dsr-

dtack+

d+

May be not in the gate library and has to be decomposed

Data Transceiver

Device

Bus

d

lds

dtack

dsr

csc

ldtack

Unexpected!

Unexpected!

dtack-

dsr+

lds+

csc+

d-

lds-

ldtack-

ldtack+

csc-

dsr-

dtack+

d+

d

lds

dtack

dsr

csc

x

ldtack

Insert a new signal dec whose implementation is [dec] = ldtack + csc

dtack-

dsr+

lds+

csc+

ldtack+

d-

lds-

ldtack-

dec+

dec-

csc-

dsr-

dtack+

d+

d

lds

dtack

Multiway acknowledgement

dsr

csc

dec

ldtack

d

lds

dtack

dsr

csc

ldtack

d

lds

dtack

C

dsr

csc

ldtack

Only possible because there is no globally reachable state at which dsr=ldtack=0 and csc=1

State Graphs:

- Relatively easy theory
- Many algorithms
- Not visual
- State space explosion problem

Unfoldings:

- Alleviate the state space explosion problem
- More visual than state graphs
- Proven efficient for model checking
- Quite complicated theory
- Not sufficiently investigated
- Relatively few algorithms

Function-guided signal insertion

forever do

for all non-input signals x do

S[x] ← ∅

for all G {latches, gates} do

S[x] ← S[x] decompositions(x,G)

bestH[x] ← best SI candidate in S[x]

if for each x, bestH[x] is implementable

Library matching

stop

if for each x, bestH[x]=UNDEFINED

fail

H ← the most complex bestH

Insert a new signal z implementing H into the STG

[Cortadella et al, ’99]

Problem: given a Boolean function F, insert a new signal dec(i.e. a set of new transitions labelled dec+or dec-) with the implementation [dec]=F into the STG. Only unfolding prefix (rather than state graph) may be used.

Sequential pre-insertion

Sequential post-insertion

Concurrent insertion

- Validity criteria: safeness & bisimilarity
- can be checked before the transformation is performed, i.e. on the original prefix (to avoid backtracking)

- Perform the insertion directly on the prefix
- avoid re-unfolding
- good for visualization (re-unfolding can dramatically change the look of the prefix)
- Can transfer some information between the iterations of the algorithm

- The suite of transformations is good in practice for resolution of encoding conflicts

The suite of transformations is not sufficient for logic decomposition; intuitively:

only linear (in the PN size) number of sequential pre- and post-insertions (assuming that the pre- and postset sizes are bounded)

only quadratic (in the PN size) number of concurrent insertions

exponential number of ‘cuts’ in the PN where a Boolean expression can change its value

dec+

imec-sbuf-ram-write

prbar

req

wen

precharged

wsen

done

ack

wsldin

wsld

wenin

dec-

Implementation of prbar:

(csc2 req) csc1 wsldin

dec

s1

d1

sources

s2

destinations

d2

s3

- All previously listed good points hold for GTIs as well
- Exponentially many GTIs can exist:
- more likely that an appropriate transformation exists
- no longer practical to enumerate them all
- can enumerate only the ‘potentially useful’ (for logic decomposition) GTIs

x

I

C

F=v

F=v

An insertion I is compatible with F if whenever an x can fire and trigger I, F’x=1, where

F’x= Fx=0 Fx=1

Intuitively, when x fires, the value of F must change, as I becomes enabled.

F=0

dtack-

dsr+

csc+

dsr+

lds+

ldtack+

dtack+

csc+

csc-

d+

dsr-

d-

lds-

ldtack-

F=1

F=1

F =ldtack csc

[ACSD’07]

Find an optimal w.r.t. a heuristic cost function SAT assignment of the Boolean formula

MUTEX SA CUTOFF FUN

depending on the variables I1, ..., Ik corresponding to the compatible insertions, and conveying that:

- no two insertions are non-commuting, or concurrent, or in auto-conflict, or one of them can trigger the other (MUTEX)
- consistent assignment of signs is possible in the prefix (SA) and beyond cut-offs (CUTOFF)
- F is a possible implementation of the newly inserted signal (FUN)

Parameterised by the user; takes into account:

- the delay introduced by the insertion
- the number of syntactic triggers of all non-input signals
- the number of inserted transitions of a signal
- the number of signals which are not locked with the newly inserted signal
- …

x

I

C

F=v

F=v

Let C be a configuration enabling some x, F’x=1, and I be the set of compatible insertions such that:

Then the clause VII I is in FUN.

One can build a Boolean formula FUNGEN depending on C and compatible insertions whose SAT assignments satisfy this condition.

Problem: it is infeasible to enumerate all configurations.

Idea 1: The same clause can be generated by many different configurations, and hence once one such configuration is found, the others can be excluded from the search.

Idea 2: Clauses subsumed by already generated ones can be excluded from the search.

It is enough to add a clause VIII to FUNGEN whenever a new clause VII I is computed.

C

C

F=0

dtack-

dsr+

csc+

dsr+

lds+

ldtack+

dtack+

csc+

csc-

d+

dsr-

d-

lds-

ldtack-

F=1

F=1

F =ldtack csc

- Implemented in MPSAT (library matching not implemented yet) and compared with PETRIFY
- Assorted small benchmarks:
- Similar failure rates and the quality of circuits
- structural insertions seem sufficient

- The tests reflect the quality of heuristics in choosing the decomposition in each step rather than the quality of the signal insertion routine

- Similar failure rates and the quality of circuits
- Large benchmarks
- Tend to be non-decomposable by both tools
- Only one series (scalable pipelines) was useful
- can be solved by a single insertion, hence minimizes the impact of heuristics and reflects the quality of the signal insertion routine
- huge reachability graphs, so unfoldings win

- Unfolding-based decomposition algorithm
- alleviates state space explosion
- completes the design cycle based fully on unfoldings (i.e. state graphs are never built)

- All advantages of state-based decomposition are retained:
- multiway acknowledgement
- latch utilisation
- highly optimised circuits

Thank you!

Any questions?