Communicating in systems with heterogeneous timing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 92

Communicating in Systems with Heterogeneous Timing PowerPoint PPT Presentation


  • 38 Views
  • Uploaded on
  • Presentation posted in: General

Communicating in Systems with Heterogeneous Timing. Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11 Jan. 2001. Objectives.

Download Presentation

Communicating in Systems with Heterogeneous Timing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Communicating in systems with heterogeneous timing

Communicating in Systems with Heterogeneous Timing

Alex Yakovlev,

Asynchronous Systems Laboratory

University of Newcastle upon Tyne

Edinburgh,11 Jan. 2001


Objectives

Objectives

  • To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing

  • To develop hardware implementations for ACMs, using self-timed circuits for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications

  • Work is done within a collaborative EPSRC research project COMFORT with King’s College London.


Heterogeneously timed nets hets

Heterogeneously Timed Nets (hets)

A2

C2

A4

A1

C1

A3

C3


Communicating in systems with heterogeneous timing

Hets

Time/event/data-driven

Data processing elements

(active)

A2

C2

A4

A1

C1

A3

C3


Communicating in systems with heterogeneous timing

Hets

Data communication elements

(passive) - ACMs

A2

C2

A4

A1

C1

A3

C3


Previous work

Previous work

  • Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems

    • high time heterogeneity but relatively low speed

  • Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits

    • high speed but very limited time heterogeneity (mesa-chronous or source synchronous)


Interaction between system parts

Interaction between system parts

Comm.

Mechanism

(e.g. shared memory)

A

B


Terminology on timing

Terminology on timing

  • Temporal relationship between parts A and B in a system can be:

    • (Globally, locally for A/B) clocked = synchronous on (global, local for A/B) clock

    • Self-timed = synchronous on handshakes and/or by some time constraints, e.g. I/O and fundamental modes

    • (Mutually) asynchronous = NOT synchronous (on global clock or on handshakes); hence asynchronous is neither self-timed nor globally clocked


Globally clocked

Globally clocked

Comm.

Mechanism

(e.g. shared memory)

A

B

Global clock


Self timed via handshake

Self-timed (via handshake)

Comm.

Mechanism

(e.g. shared memory)

A

B

Req/Ack handshake(s),

possibly with bounded buffer in between


Fully asynchronous

Fully Asynchronous

Comm.

Mechanism

(e.g. shared memory)

A

B

Timing for A

Timing for B

Temporal

firewall


Evolution of timing 1

Evolution of timing (1)

  • Globally clocked systems:

    Good: deterministic and predictable for real-time, safety-critical systems

    Bad: prone to clock skew, bad for power consumption and EMC: indiscriminate data-crunching


Evolution of timing 2

Evolution of timing (2)

  • Self-timed systems (with micropipelines and handshakes):

    Good: no skew problems, good for power and EMC if data-driven

    Bad: temporal non-determinism, lockable handshakes, hence bad for real-time


Evolution of timing 3

Evolution of timing (3)

  • Fully or partially Asynchronous systems:

    Good: distributed and heterogeneous clocking; real-time applied locally – fully predictable; self-timing can be applied where possible for power saving and EMC

    Bad: potential loss of information where full asynchrony (e.g. due to real-time) is applied


Asynchronous communication mechanisms acms

Asynchronous Communication mechanisms (ACMs)

ACM

Writer

Reader

Level of asynchrony is defined by WRITE and READ rules


Classification of acms

Classification of ACMs

Hugo Simpson’s classification:


Difficulty with simpson s classification

Difficulty with Simpson’s classification

  • Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division, but what is meant is that:

    • Destructive (non-destructive) write cannot (can) wait

    • Destructive (non-destructive) read can (cannot) wait

  • There is symmetry (duality) between Pool and Channel but no symmetry between Signal and Constant, because Constant allows ‘constructive’ write only once - yet ‘constructive’ writes are also allowed by Signal


Petri net capture of simpson s protocols

Petri net capture of Simpson’s protocols

Signal

Pool

non-destr write

empty

destr write

non-destr read

destr read

destr write

full

full

Constructive writes

Channel

Constant

empty

empty

non-destr write

destr read

non-destr read

non-destr write

full

full


Another interpretation

Another interpretation

Signal

Pool

read

read

write

write

re-read

over-write

over-write

read

read

unread

unread

Channel

Command

read

read

write

write

re-read

Constant is a special case of Command

read

read

unread

unread


Another interpretation1

Another interpretation

Signal

Pool

read

read

write

write

re-read

over-write

over-write

read

read

Busy Writer

unread

unread

Channel

Command

read

read

write

write

re-read

read

read

unread

unread


Another interpretation2

Another interpretation

Signal

Pool

read

read

write

write

re-read

over-write

over-write

read

read

unread

unread

Channel

Command

read

read

write

write

re-read

read

Lazy Writer

read

unread

unread


Another interpretation3

Another interpretation

Signal

Pool

read

read

write

write

re-read

over-write

over-write

read

read

unread

unread

Busy Reader

Channel

Command

read

read

write

write

re-read

read

read

unread

unread


Another interpretation4

Another interpretation

Signal

Pool

read

read

write

write

re-read

over-write

over-write

read

read

unread

unread

Lazy Reader

Channel

Command

read

read

write

write

re-read

read

read

unread

unread


Another classification of acms

Another classification of ACMs


Signal vs pool

Real time 1 (busy domain)

Real time 2 (busy domain)

Pool

Signal vs Pool

Real time (busy domain)

Data-driven (lazy domain)

Signal

Low Power!


Problems with the above petri net definitions

Problems with the above Petri net definitions

  • These Petri nets assumed:

    • Data capacity (max value of the data state of the ACM) equals 1 (this can be easily generalised to any finite n>0 for Channel, defined as an n-place buffer with a wide range of known hardware implementations); do we semantically need other ACMs with n>1?

    • Write and Read access are held up only by the data state of the ACM and not by the Read and Write operations themselves – those are treated as atomic and taking no time; in reality they are not and should be assumed to take arbitrary time


Breaking the atomicity

read

write

read

unread

Breaking the atomicity

Signal with non- atomic access

Signal with atomic access

write

over-write

read

in writing

reading

unread

over-write

not-in-writing


Breaking the atomicity1

read

write

read

unread

Breaking the atomicity

Signal with non- atomic access

Signal with atomic access

not-in-reading

write

over-write

read

in writing

unread

over-write

in reading

Read may be held up by write being in progress … but not write by reading!

not-in-writing


Communicating in systems with heterogeneous timing

But …

Signal with non- atomic access

What if Reading

begins just before Writing?

write

read

in writing

reading

unread

over-write

Problem with data integrity if only one data slot (one data token) is available

not-in-writing


Required properties of signal 1

Required Properties of Signal(1)

  • Data states and their updating:

    • Signal’s capacity is 1 (at any time, it has either 0 or 1 unread data items)

    • At the end of write access, Signal’s state is set to unread (1)

    • At the end of read access, Signal’s state is set to read (0)


Required properties of signal 2

Required Properties of Signal(2)

  • Conditional asynchrony for the reader:

    • Read access may start only when Signal’s data state is unread (1) and no write access is in progress

    • Read access can be arbitrarily long

  • Unconditional asynchrony for the writer:

    • Write must be allowed to start and complete access at any time, regardless of Signal’s data state and the status of read access.


Required properties of signal 3

Required Properties of Signal(3)

  • Data coherence:

    • Any item of data that is read from Signal must not have been changed since been written (i.e. no writing or reading in part)

  • Data freshness:

    • Any read access must obtain the data item designated as the current unread item in Signal, i.e. the data item made available by the latest completed write access


Data slots and signal

Data slots and Signal

  • “Data slot” is a unique portion of the shared memory which may contain one item of data of arbitrary (but bounded) size

  • Signal cannot be implemented using One Slot only and satisfy all of the above properties

  • Let us construct a Signal with TWO data slots

  • First a formal specification, State Graph (or Transition System) must be built


Formal spec of signal

Formal spec of Signal

Write slot 0 (wr0)

Read slot 0 (rd0)

Automaton

for Signal

Write slot 1 (wr1)

Read slot 1 (rd1)

Problem: construct a maximally permissible automaton, on alphabet of {wr0,wr1,rd0,rd1}, satisfying the required properties of the Signal ACM


State graph constraints

rdi

wri

wri

rdi

wrj

rdj

rdj

wrj

s

s

s

s

wri

wri

s

s

rdj

rdj

State Graph constraints

1. Data states, their updates and asynchrony:

An wraction is enabled in every state

2. Data coherence:

only if i<>j


State graph constraints1

wri

rdi

s

wrj

rdj

s

s’

rdi

rdj

i<>j

State Graph constraints

3. Data freshness (slot swapping):

wri

s

If

then

rdj

4. No “re-try loops” (persistency in reading):

there is no rdi on this path

wrj

wri


State graph for 2 slot signal

State Graph for 2-slot Signal

wr1

wr1

s0

s2

rd0

rd0

wr0

init state

wr1

wr0

s1

s3

s5

rd1

rd1

wr0

s4

s0


How to implement 2 slot signal

wr1

wr1

s0

s2

rd0

rd0

wr0

init state

wr1

wr0

s1

s3

s5

rd1

rd1

wr0

s4

s0

How to implement 2-slot Signal?

  • In order to implement Signal we must distribute states and events between elements of implementation architecture.

  • For that we must first separate states using a behavioural model of the implementation


Implementation architecture

Implementation architecture

The following structure must be kept in mind:

Data access

Data access

Data slots

wr1

rd0

rd1

wr0

Writer

Reader

Wreq

Rreq

Control access

Control access

Signal control

Wack

Rack

In hardware implementation of Signal control, latches and logic will be used to generate signals corresponding to steering events wri and rdi, events on handshakes with writer and reader, and some internal events


Behavioural model for signal

Behavioural model for Signal

  • Petri nets can be used as a behavioural model (algorithm) for Signal:

    • A 1-safe Petri net can be synthesised from a finite Transition System using theory of regions (Ehrenfeucht, Rozenberg et al)

    • A 1-safe Petri net can be implemented in a self-timed circuit using either direct translation techniques or logic synthesis from Signal Transition Graphs (Yakovlev,Koelmans98)


State graph refinement

l3

wr1

wr1

wr1

rd0

rd0

s0

s2

l3

wr1

rd0

rd0

wr0

init state

wr1

wr0

s1

s3

s5

l1

m1

m1

l0

rd1

rd1

wr1

wr0

wr0

s4

s0

rd1

rd1

l1

wr0

m0

m0

wr0

l2

State Graph refinement

This Transition System cannot be synthesised into a 1-safe Petri net with unique event labelling – it requires refinement (it violates some separation conditions). There is also arbitration (conflict relation) between rdi and wrj events – in a physical implementation one cannot disable output actions


State graph refinement1

l3

wr1

rd0

rd0

l3

wr1

l1

m1

m1

l0

wr1

wr0

rd1

rd1

l1

wr0

m0

m0

wr0

l2

State Graph refinement

Now arbitration is between internal events while wri and rdj are persistent


Distributing states b w write and read parts

l3

l0

Write elementary states

wr1

5

1

rd0

rd0

l3

wr1

wr0

l1

l1

m1

m1

2

l0

wr1

wr0

3

4

l2

6

rd1

rd1

l1

wr0

3

6

Write superstates

m0

m0

l3

wr1

wr0

l2

1

2

5

4

Distributing states b/w Write and Read parts

Write part:


Distributing states b w write and read parts1

l3

m1

wr1

11

7

rd0

rd0

Read elementary states

l3

10

wr1

rd1

l1

m1

m1

8

l0

7

wr1

wr0

m0

rd1

rd1

l1

11

wr0

8

9

12

m0

m0

Read superstates

rd0

wr0

l2

12

9

10

Distributing states b/w Write and Read parts

Read part:


Completing the petri net model

l0

m1

5

11

1

7

wr0

rd1

l1

8

2

m0

l2

9

3

12

6

l3

rd0

wr1

4

10

Completing the Petri net model


Introducing binary control variables

r+

l0

m1

w-

w=1

5

11

1

7

r=1

wr0

rd1

l1

2

8

r-

w+

l2

m0

w=0

3

6

9

12

r=0

l3

wr1

rd0

10

Introducing binary control variables

‘w’ encodes the slot being accessed for writing

‘r’ encodes the slot being accessed for reading

4


Towards circuit implementation

Towards circuit implementation

Data-in

Data-out

Slot 0

Slot 1

wr0

wr1

rd0

rd1

set/reset

test

Rreq

Wreq

w

Read

part

Write

part

Wack

Rack

test

set/reset

r


Direct translation of pns to circuits

Direct translation of PNs to circuits

Controlled

Operation

p1

p2

p2

p1

(0)

(1)

(1)

(0)

(1)

1*

To Operation


Direct translation of pns to circuits1

Direct translation of PNs to circuits

p1

p2

p2

p1

0->1

1->0

(1)

(0)

(1)

1->0

To Operation


Direct translation of pns to circuits2

Direct translation of PNs to circuits

p1

p2

p2

p1

1->0

0->1

1->0

0->1

1*

1->0->1

To Operation


Direct translation of pns to circuits3

Direct translation of PNs to circuits

  • This method associates places with latches (flip-flops) – so the state memory (marking) of PN is directly mimicked in the circuit’s state memory

  • Transitions are associated with controlled actions (e.g. activations of data path units or lower level control blocks – by using handshake protocols)

  • Modelling discrepancy (be careful!):

    • in Petri nets removal of a token from pre-places and adding tokens in post-places is instantaneous (i.e. no intermediate states)

    • in circuits the “move of a token” has a duration and there is an intermediate state


Translation in brief

Translation in brief

This method has been used for designing control of a token ring adaptor

[Yakovlev, Varshavsky, Marakhovsky, Semenov, IEEE Conf. on Asynchronous Design Methodologies, London, 1995


Refining the write part

1

wr1

wr0

r=0

r=1

2

w=0

w=1

3

l3

w-

w+

l1

4

41

23

21

43

Refining the Write part

l0

w-

w=1

11

5

1

r=1

wr0

l1

2

w+

l2

w=0

12

3

6

r=0

l3

wr1


Control circuit for write part

1

wr1

wr0

r=0

r=1

2

w=1

w=0

3

l3

w-

w+

l1

4

43

41

23

21

Control circuit for Write part


Implementing david cells 1

Implementing David cells (1)

Speed-independent version:

“Aggressive” relative timing version:


Implementing david cells 2

Implementing David cells (2)

This is an peep-hole optimised solution for two David cells (places 1 and 3) and interface to the handshake with the Writer


Implementing sync blocks

Implementing ‘sync’ blocks

(0)

(1)

r_0

r

(0)

(0)

ck1

(1)

r_1

(0)


Simulation using cadence toolkit

Simulation using Cadence toolkit

metastability inside mutex

Write response time

input of sync

output of sync


Cycle times ns for 0 6 micron

Cycle times (ns) for 0.6 micron


Improving performance

l3

wr1

wr1

rd0

wr1

rd0

l3

s0

s2

wr1

rd0

rd0

wr0

init state

wr1

wr0

l1

s1

s3

s5

m1

m1

l0

wr1

wr0

rd1

rd1

rd1

wr0

s4

s0

rd1

l1

wr0

m0

m0

wr0

l2

Improving performance

In case of repetitive writing (of, eg., slot 1), read access may have to wait for the completion of write just because of a timing clash on the same slot – and not because of absence of new data in the ACM (original aim of Signal)

This problem cannot be resolved within the TWO slot ACM because of coherence violation. Can we do it with an extra slot?


Towards 3 slot signal

Towards 3-slot Signal

Idea:

After writing a slot (e.g.2) for the first time writer alternates between 3 and 2

Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free


Towards 3 slot signal1

Towards 3-slot Signal

Idea:

After writing a slot (e.g.2) for the first time writer alternates between 3 and 2

Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free

or


Towards 3 slot signal2

Towards 3-slot Signal

Idea:

After writing a slot (e.g.2) for the first time writer alternates between 3 and 2

Reader can then, after finishing reading slot 1, read slot 2 or 3 whichever is free

or


3 slot signal refined

Write part:

write slot w; l:=w; w:=differ(l,r)

Read part:

if (r<>l) r:=l else wait; read slot r;

3-slot Signal refined

Algorithm:

l21(32):

w(2->1)

l(3->2)

m32:

Control variables

r(3->2)

r l w

r-read, w-write, l-last


3 slot pool

3-slot Pool

Algorithm:

In Pool we must have:

Read asynchrony

Write part:

write slot w; l:=w; w:=differ(l,r)

Read part:

r:=l; read slot r;

r-read, w-write, l-last


Three slot algorithm due to hugo simpson

Three-slot algorithm (due to Hugo Simpson)

Reader:

Writer:

wr: d[n]:=input

w0: l:=n

w1: n:=differ(l,r)

r0: r:=l

rd: output:=d[r]

n(next), l(last), r(read) – 3-valued var’s


Three slot algorithm

differ:

1 2 3

1

2

3

2 3 2

3 3 1

2 1 1

Three-slot algorithm


Three slot pool

read

next

02.01

last

Three-slot Pool

Writer:

Reader:

s1

23.12

s2

27.12

s3

30.12


Three slot pool1

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

27.12

s3

30.12


Three slot pool2

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

27.12

s3

30.12


Three slot pool3

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

27.12

s3

30.12


Three slot pool4

read

next

03.01

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

27.12

s3

30.12


Three slot pool5

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

03.01

s3

30.12


Three slot pool6

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

03.01

s3

30.12


Three slot pool7

read

next

05.01

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

03.01

s3

30.12


Three slot pool8

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

02.01

03.01

s3

05.01


Three slot pool9

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

03.01

s3

05.01


Three slot pool10

read

next

last

Three-slot Pool

Writer:

Reader:

s1

02.01

s2

03.01

s3

05.01


3 slot acm design

3-slot ACM design

Rw0

Rr0

write control

mutex

read control

Gw0

Gr0

w0-req/ack

w1-req/ack

r0-req/ack

l

differ &

reg n

reg l

reg r

r

n

l

r


3 slot acm design1

3-slot ACM design

Rw0

Rr0

write control

mutex

read control

Gw0

Gr0

w0-req/ack

w1-req/ack

r0-req/ack

l

differ &

reg n

reg l

reg r

r

l

n

r


Differ and register logic

Differ and register logic

differ

register

l1

l2

n1

l3

n2

w1-ack

r1

r2

n3

r3

w1-req


3 slot acm design2

3-slot ACM design

Rw0

Rr0

write control

mutex

read control

Gw0

Gr0

w0-req/ack

w1-req/ack

r0-req/ack

l

differ &

reg n

reg l

reg r

r

n

l

r


Write control circuit stg

Write control circuit: STG


Write control ckt from petrify

Write control ckt: from Petrify


Four slot pool

d[1,1]

d[0,0]

d[1,0]

d[0,1]

28.12

30.12

23.12

24.12

read

next

02.01

last

Four-slot Pool

Writer:

Reader:

s[0]

s[1]

v[0]

v[1]


Four slot pool algorithm h simpson

Four-slot Pool algorithm (H.Simpson)

Reader:

Writer:

wr: d[n,¬s[n]]:=input

w0: s[n]:= ¬s[n]

w1: l:=n || n:=¬r

r0: r:=l

r1: v:=s

rd: output:=d[r,v[r]]

n (next), l(last), r(read) – binary var’s


3 slot vs 4 slot performance

3-slot vs 4-slot performance

Time for control statements


Are we in the end fully asynchronous

Are we in the end fully asynchronous?

  • Circuit implementations involve use of latches, which may go metastable.

  • Metastability always implies a trade-off, in terms of noise, between data or time domain error.

  • In a “truly busy (real-time)’’ environment, where the ack signal is not used, the corresponding process (e.g., writer) must allow for a small interval (3-4ns for .6mm CMOS), sufficient for metastability to get resolved practically with the probability of 1.

  • Our h/w solutions for “busy” domains aim at maximising the “wait-free” aspect of communication but theoretically cannot fully eliminate mutual dependency between processes (hidden within ACM control variable circuits).


Concluding remarks

Concluding remarks

  • Constructing ACMs to interface sub-systems with different time and energy requirements, and implementing them in high-speed hardware, proves feasible.

  • Application of hets in control or image processing (e.g. via neural networks) is needed to fully assess their potential for future application-specific SOCs

  • More work on mathematical modelling of hets and on developing an extensive parametrised library of ACM circuits is needed.


Vlsi design layout chip fab ed in june 2000 via europractice

VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE)

4-slot Pool ACM


4 slot acm part

4-slot ACM part

Tested (physically) correct (details on testing in 9thAsync UK Forum paper)


Acknowledgements and references

Acknowledgements and References

  • Members of the COMFORT team:

    At KCL – Tony Davies, Ian Clark, David Fraser, Sergio Velastin

    At NCL – Fei Xia, David Kinniment, Albert Koelmans, Delong Shang, Alex Bystrov

  • BAe colleagues: Hugo Simpson and Eric Campbell

  • Project COMFORT web site:

    http://www.eee.kcl.ac.uk/~comfort

  • Work supported by EPSRC, EU (ACiD-WG) and reported and published at Async2000, AINT’2000, Async2001 etc.


  • Login