Communicating in Systems with Heterogeneous Timing

Communicating in Systems with Heterogeneous Timing Alex Yakovlev, Asynchronous Systems Laboratory University of Newcastle upon Tyne Edinburgh,11 Jan. 2001

Objectives • To study a range of asynchronous communication mechanisms (ACMs) that can be used in constructing (distributed and concurrent) systems with heterogeneous timing • To develop hardware implementations for ACMs, using self-timed circuits for potential use in Systems-On-a-Chip (SOCs) and embedded (miniature, low power and EMC) applications • Work is done within a collaborative EPSRC research project COMFORT with King’s College London.

Heterogeneously Timed Nets (hets) A2 C2 A4 A1 C1 A3 C3

Hets Time/event/data-driven Data processing elements (active) A2 C2 A4 A1 C1 A3 C3

Hets Data communication elements (passive) - ACMs A2 C2 A4 A1 C1 A3 C3

Previous work • Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems • high time heterogeneity but relatively low speed • Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits • high speed but very limited time heterogeneity (mesa-chronous or source synchronous)

Interaction between system parts Comm. Mechanism (e.g. shared memory) A B

Terminology on timing • Temporal relationship between parts A and B in a system can be: • (Globally, locally for A/B) clocked = synchronous on (global, local for A/B) clock • Self-timed = synchronous on handshakes and/or by some time constraints, e.g. I/O and fundamental modes • (Mutually) asynchronous = NOT synchronous (on global clock or on handshakes); hence asynchronous is neither self-timed nor globally clocked

Globally clocked Comm. Mechanism (e.g. shared memory) A B Global clock

Self-timed (via handshake) Comm. Mechanism (e.g. shared memory) A B Req/Ack handshake(s), possibly with bounded buffer in between

Fully Asynchronous Comm. Mechanism (e.g. shared memory) A B Timing for A Timing for B Temporal firewall

Evolution of timing (1) • Globally clocked systems: Good: deterministic and predictable for real-time, safety-critical systems Bad: prone to clock skew, bad for power consumption and EMC: indiscriminate data-crunching

Evolution of timing (2) • Self-timed systems (with micropipelines and handshakes): Good: no skew problems, good for power and EMC if data-driven Bad: temporal non-determinism, lockable handshakes, hence bad for real-time

Evolution of timing (3) • Fully or partially Asynchronous systems: Good: distributed and heterogeneous clocking; real-time applied locally – fully predictable; self-timing can be applied where possible for power saving and EMC Bad: potential loss of information where full asynchrony (e.g. due to real-time) is applied

Asynchronous Communication mechanisms (ACMs) ACM Writer Reader Level of asynchrony is defined by WRITE and READ rules

Classification of ACMs Hugo Simpson’s classification:

Difficulty with Simpson’s classification • Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division, but what is meant is that: • Destructive (non-destructive) write cannot (can) wait • Destructive (non-destructive) read can (cannot) wait • There is symmetry (duality) between Pool and Channel but no symmetry between Signal and Constant, because Constant allows ‘constructive’ write only once - yet ‘constructive’ writes are also allowed by Signal

Petri net capture of Simpson’s protocols Signal Pool non-destr write empty destr write non-destr read destr read destr write full full Constructive writes Channel Constant empty empty non-destr write destr read non-destr read non-destr write full full

Another interpretation Signal Pool read read write write re-read over-write over-write read read unread unread Channel Command read read write write re-read Constant is a special case of Command read read unread unread

Another interpretation Signal Pool read read write write re-read over-write over-write read read Busy Writer unread unread Channel Command read read write write re-read read read unread unread

Another interpretation Signal Pool read read write write re-read over-write over-write read read unread unread Channel Command read read write write re-read read Lazy Writer read unread unread

Another interpretation Signal Pool read read write write re-read over-write over-write read read unread unread Busy Reader Channel Command read read write write re-read read read unread unread

Another interpretation Signal Pool read read write write re-read over-write over-write read read unread unread Lazy Reader Channel Command read read write write re-read read read unread unread

Another classification of ACMs

Real time 1 (busy domain) Real time 2 (busy domain) Pool Signal vs Pool Real time (busy domain) Data-driven (lazy domain) Signal Low Power!

Problems with the above Petri net definitions • These Petri nets assumed: • Data capacity (max value of the data state of the ACM) equals 1 (this can be easily generalised to any finite n>0 for Channel, defined as an n-place buffer with a wide range of known hardware implementations); do we semantically need other ACMs with n>1? • Write and Read access are held up only by the data state of the ACM and not by the Read and Write operations themselves – those are treated as atomic and taking no time; in reality they are not and should be assumed to take arbitrary time

read write read unread Breaking the atomicity Signal with non- atomic access Signal with atomic access write over-write read in writing reading unread over-write not-in-writing

read write read unread Breaking the atomicity Signal with non- atomic access Signal with atomic access not-in-reading write over-write read in writing unread over-write in reading Read may be held up by write being in progress … but not write by reading! not-in-writing

But … Signal with non- atomic access What if Reading begins just before Writing? write read in writing reading unread over-write Problem with data integrity if only one data slot (one data token) is available not-in-writing

Required Properties of Signal(1) • Data states and their updating: • Signal’s capacity is 1 (at any time, it has either 0 or 1 unread data items) • At the end of write access, Signal’s state is set to unread (1) • At the end of read access, Signal’s state is set to read (0)

Required Properties of Signal(2) • Conditional asynchrony for the reader: • Read access may start only when Signal’s data state is unread (1) and no write access is in progress • Read access can be arbitrarily long • Unconditional asynchrony for the writer: • Write must be allowed to start and complete access at any time, regardless of Signal’s data state and the status of read access.

Required Properties of Signal(3) • Data coherence: • Any item of data that is read from Signal must not have been changed since been written (i.e. no writing or reading in part) • Data freshness: • Any read access must obtain the data item designated as the current unread item in Signal, i.e. the data item made available by the latest completed write access

Data slots and Signal • “Data slot” is a unique portion of the shared memory which may contain one item of data of arbitrary (but bounded) size • Signal cannot be implemented using One Slot only and satisfy all of the above properties • Let us construct a Signal with TWO data slots • First a formal specification, State Graph (or Transition System) must be built

Formal spec of Signal Write slot 0 (wr0) Read slot 0 (rd0) Automaton for Signal Write slot 1 (wr1) Read slot 1 (rd1) Problem: construct a maximally permissible automaton, on alphabet of {wr0,wr1,rd0,rd1}, satisfying the required properties of the Signal ACM

rdi wri wri rdi wrj rdj rdj wrj s s s s wri wri s s rdj rdj State Graph constraints 1. Data states, their updates and asynchrony: An wraction is enabled in every state 2. Data coherence: only if i<>j

wri rdi s wrj rdj s … s’ rdi rdj i<>j State Graph constraints 3. Data freshness (slot swapping): wri s If then rdj 4. No “re-try loops” (persistency in reading): there is no rdi on this path wrj wri

State Graph for 2-slot Signal wr1 wr1 s0 s2 rd0 rd0 wr0 init state wr1 wr0 s1 s3 s5 rd1 rd1 wr0 s4 s0

wr1 wr1 s0 s2 rd0 rd0 wr0 init state wr1 wr0 s1 s3 s5 rd1 rd1 wr0 s4 s0 How to implement 2-slot Signal? • In order to implement Signal we must distribute states and events between elements of implementation architecture. • For that we must first separate states using a behavioural model of the implementation

Implementation architecture The following structure must be kept in mind: Data access Data access Data slots wr1 rd0 rd1 wr0 Writer Reader Wreq Rreq Control access Control access Signal control Wack Rack In hardware implementation of Signal control, latches and logic will be used to generate signals corresponding to steering events wri and rdi, events on handshakes with writer and reader, and some internal events

Behavioural model for Signal • Petri nets can be used as a behavioural model (algorithm) for Signal: • A 1-safe Petri net can be synthesised from a finite Transition System using theory of regions (Ehrenfeucht, Rozenberg et al) • A 1-safe Petri net can be implemented in a self-timed circuit using either direct translation techniques or logic synthesis from Signal Transition Graphs (Yakovlev,Koelmans98)

l3 wr1 wr1 wr1 rd0 rd0 s0 s2 l3 wr1 rd0 rd0 wr0 init state wr1 wr0 s1 s3 s5 l1 m1 m1 l0 rd1 rd1 wr1 wr0 wr0 s4 s0 rd1 rd1 l1 wr0 m0 m0 wr0 l2 State Graph refinement This Transition System cannot be synthesised into a 1-safe Petri net with unique event labelling – it requires refinement (it violates some separation conditions). There is also arbitration (conflict relation) between rdi and wrj events – in a physical implementation one cannot disable output actions

l3 wr1 rd0 rd0 l3 wr1 l1 m1 m1 l0 wr1 wr0 rd1 rd1 l1 wr0 m0 m0 wr0 l2 State Graph refinement Now arbitration is between internal events while wri and rdj are persistent

l3 l0 Write elementary states wr1 5 1 rd0 rd0 l3 wr1 wr0 l1 l1 m1 m1 2 l0 wr1 wr0 3 4 l2 6 rd1 rd1 l1 wr0 3 6 Write superstates m0 m0 l3 wr1 wr0 l2 1 2 5 4 Distributing states b/w Write and Read parts Write part:

l3 m1 wr1 11 7 rd0 rd0 Read elementary states l3 10 wr1 rd1 l1 m1 m1 8 l0 7 wr1 wr0 m0 rd1 rd1 l1 11 wr0 8 9 12 m0 m0 Read superstates rd0 wr0 l2 12 9 10 Distributing states b/w Write and Read parts Read part:

l0 m1 5 11 1 7 wr0 rd1 l1 8 2 m0 l2 9 3 12 6 l3 rd0 wr1 4 10 Completing the Petri net model

r+ l0 m1 w- w=1 5 11 1 7 r=1 wr0 rd1 l1 2 8 r- w+ l2 m0 w=0 3 6 9 12 r=0 l3 wr1 rd0 10 Introducing binary control variables ‘w’ encodes the slot being accessed for writing ‘r’ encodes the slot being accessed for reading 4

Towards circuit implementation Data-in Data-out Slot 0 Slot 1 wr0 wr1 rd0 rd1 set/reset test Rreq Wreq w Read part Write part Wack Rack test set/reset r

Direct translation of PNs to circuits Controlled Operation p1 p2 p2 p1 (0) (1) (1) (0) (1) 1* To Operation

Direct translation of PNs to circuits p1 p2 p2 p1 0->1 1->0 (1) (0) (1) 1->0 To Operation

Direct translation of PNs to circuits p1 p2 p2 p1 1->0 0->1 1->0 0->1 1* 1->0->1 To Operation

Communicating in Systems with Heterogeneous Timing

Communicating in Systems with Heterogeneous Timing

Presentation Transcript

Communicating with UniBoard

Communicating with Parents

Communicating with Parents

Heterogeneous Aqueous Systems

Communicating with Customers

Programming Heterogeneous (GPU) Systems

Communicating with Immigrants

Heterogeneous Systems

Communicating With Customers

Communicating with Hardware

Timing-Driven Placement for Heterogeneous FPGA

Communicating with Congress

COMMUNICATING WITH ANALYSTS

Heterogeneous adaptive systems

Resource Allocation in Heterogeneous Computing Systems

Communicating with Children in Crises

Communicating with Graphs

Timing-Predictable Systems - Reconciling Predictability with Performance -

Programming Heterogeneous Systems with CORBA

Modeling of Heterogeneous Systems in Metropolis

Communicating with Investors