Introduction to silicon programming in the tangram haste language
Download
1 / 54

Introduction to Silicon Programming in the Tangram/Haste language - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Introduction to Silicon Programming in the Tangram/Haste language. Material adapted from lectures by: Prof.dr.ir Kees van Berkel [Dr. Johan Lukkien] [Dr.ir. Ad Peeters] at the Technical University of Eindhoven, the Netherlands. request a r. active side. passive side. acknowledge a k.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to Silicon Programming in the Tangram/Haste language' - lawrence-miles


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Introduction to silicon programming in the tangram haste language

Introduction to Silicon Programmingin the Tangram/Haste language

Material adapted from lectures by:

Prof.dr.ir Kees van Berkel

[Dr. Johan Lukkien]

[Dr.ir. Ad Peeters]

at the Technical University of Eindhoven, the Netherlands


Handshake signaling and data

requestar

active side

passive side

acknowledge ak

data ad

active side

passive side

acknowledgeak

dataad

Handshake signaling and data

push channel

versus

pull channel

requestar


Handshake signaling push channel

time

Handshake signaling: push channel

req ar

ack ak

earlyad

broad ad

late ad


Data bundling
Data bundling

In order to maintain event ordering at both sides of a channel,the circuit must satisfy data bundling constraint:

  • for push channel: delay along request wire must exceed delay of data wire;

  • for pull channel: delay along acknowledge wire must exceed delay of data wire.


Handshake signaling pull channel

time

Handshake signaling: pull channel

When data wires are invalid: multiple and incomplete transitions allowed.

req ar

ack ak

earlyad

broad ad

late ad


Tangram assignment x f y z

y

|

x

f

yw

y

y

z

f

f

xw0

zw

z

z

|

x

xr

xw1

|

x

Tangram assignment x:= f(y,z)

Handshake circuit


F our phase data transfer

time

b

c

Four-phase data transfer

r / br

ba / cr

ca / a

bd / cd

1 2 3 4 5


Handshake latch

w

x

r

wd

rd

wr

Handshake latch

[ [ w ; [w : rd:= wd] [] r ; r] ]

  • 1-bit handshake latch: wd  wr  rd  wd  wr  rd wk = wr rk = rr


N bit handshake latch

wr

rr

wd1

rd1

wd2

rd2

...

wdN

rdN

wk

rk

N-bit handshake latch

area, delay, energy

  • area: 2(N+1) gate eqs.

  • delay per cycle: 4 gate delays

  • energy per write cycle: 4 + 0.5*2N transitions, in average


Transferrer

a

b

c

ar ak

ck

br

cr

bk

cd

bd

Transferrer

[ [ a : (b ; c)] ; [ a : (b ; cd:= bd ; c ; cd:= )] ]


Multiplexer

a

c

|

b

Multiplexer

[ [ a : c ; a : (cd:= ad; c ; cd:= )[] b : c ; b : (cd:= bd; c ; cd:= )] ]

Restriction: arbr must hold at all times!


Multiplexer realization
Multiplexer realization

control circuit

data circuit


Logic arithmetic operator

b

f

a

c

Logic/arithmetic operator

[ [ a : (b || c) ]; [ a : ((b || c) ; ad:= f(bd , cd ))]]

Cheaper realization (delay sensitive):

[ [ a : (b || c) ]; [ a : ((b || c) ; ad:= f(bd , cd ))]; “delay” ; ad:= ]


A one place fifo buffer

a

b

BUF1

A one-place fifo buffer

byte = type [0..255]

& BUF1 = main proc(a?chan byte & b!chan byte).beginx: var byte|forever do a?x ; b!x odend


A one place fifo buffer1

;

a

x

x

b

;

;

a

x

b

A one-place fifo buffer

byte = type [0..255]

& BUF1 = main proc(a?chan byte & b!chan byte).begin x: var byte|forever do a?x ; b!x odend

a

x

x

b


2 place buffer

BUF1

b

BUF1

a

c

2-place buffer

byte = type [0..255]

&BUF1 = proc (a?chan byte & b!chan byte).begin x: var byte |forever do a?x ; b!x od end

&BUF2: main proc (a?chan byte & c!chan byte).begin b: chan byte |BUF1(a,b) || BUF1(b,c)end



Two place wagging buffer

a

b

Two-place wagging buffer

byte = type [0..255]&wag2: main proc(a?chan byte & b!chan byte).begin x,y: var byte|a?x ; forever do (a?y || b!x) ; (a?x || b!y)odend


Two place ripple register
Two-place ripple register

…begin x0, x1: var byte|forever do b!x1 ; x1:=x0; a?x0 odend


4 place ripple register
4-place ripple register

byte = type [0..255]&rip4: main proc (a?chan byte & b!chan byte).begin x0, x1, x2, x3: var byte|forever do b!x3 ; x3:=x2 ; x2:=x1 ; x1:=x0 ; a?x0 odend


4 place ripple register1

x1

x2

x3

x3

x2

x3

x0

x1

x0

x0

x0

x1

x2

x3

4-place ripple register

  • area : N (Avar + Aseq )

  • cycle time : Tc = (N+1) T:=

  • cycle energy: Ec = N E:=


Introducing vacancies
Introducing vacancies

…begin x0, x1, x2, x3, v: var byte|forever do (b!x3 ; x3:=x2 ; x2:=v) || (v:=x1 ; x1:=x0 ; a?x0) odend

  • what is wrong?


Introducing vacancies1
Introducing vacancies

forever do ((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0 ; a?x0)) ; x2:=v od

or:

forever do ((b!x3 ; x3:=x2) || (v:=x1 ; x1:=x0));(x2:=v || a?x0)od


Synchronous 4 p ripple register

m0

m0

m1

m1

m2

m2

m3

m3

x0

x0

b

b

s0

s0

s1

s1

s2

s2

m3

x0

b

m0

m1

m2

m0

m0

m1

m1

m2

m2

m3

m3

x0

x0

b

b

s0

s1

s2

s0

s0

s1

s1

s2

s2

“synchronous” 4-p ripple register

forever do (s0:=m0 || s1:=m1 || s2:=m2 || b!m3 ); ( a?m0 || m1:=s0 || m2:=s1 || m3:=s2)od


4 place wagging register

x0

x1

x0

x1

y0

y1

a

a

a

a

b

b

b

x2

b

y0

y1

x0

x1

a

b

x2

y0

y1

x3

4-place wagging register

forever do b!x1 ; x1:=x0 ; a?x0; b!y1 ; y1:=y0 ; a?y0od


8 place register
8-place register

4-way wagging

forever do b!u1 ; u1:=u0 ; a?u0; b!v1 ; v1:=v0 ; a?v0; b!x1 ; x1:=x0 ; a?x0; b!y1 ; y1:=y0 ; a?y0od


Four 8 8 shift registers compared
Four 88 shift registers compared


Tangram haste
Tangram/Haste

  • Purpose: programming language for asynchronous VLSI circuits.

  • Creator: Tangram team @ Philips Research Labs (proto-Tangram 1986; release 2 in 1998).

  • Inspiration: Hoare’s CSP, Dijkstra’s GCL.

  • Lectures: no formal introduction; manual hand-out (learn by example, learn by doing).

  • Main tools: compiler, analyzer, simulator, viewer.


2 place buffer1

BUF1

b

BUF1

a

c

2-place buffer

byte = type [0..255]

&BUF1 = proc (a?chan byte & b!chan byte).begin x: var byte |forever do a?x ; b!x od end

&BUF2: main proc (a?chan byte & c!chan byte).begin b: chan byte |BUF1(a,b) || BUF1(b,c)end


Median filter

Median

a

b

Median filter

median: main proc(a? chan W & b! chan W). begin x,y,z: var W & xy, yz, zw: var bool | forever do((z:=y; y:=x) || yz:=xy) ; a?x; (xy:= x<=y || zx:= z<=x); if zx=xy then b!xor xy=yz then b!yor yz=zx then b!zfiodend


Greatest common divisor

GCD

ab

c

Greatest Common Divisor

gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).begin x,y: var byte| forever doab?<<x,y>> ; do x<y then y:= y-xor x>y then x:= x-yod ; c!xodend


Nacking arbiter

a

Nacking

arbiter

b

Nacking Arbiter

nack: main proc (a?chanbool & b!chanbool).begin na,nb: varbool | <<na,nb>> := <<true,true>>; forever do selprobe(a) then a!nb || na:= na#nborprobe(b) then b!na || nb:= nb#na les odend


C tangram handshake circuit

C(T) =

C(R;S)=

;

T

R

S

a

b

a

c

C: Tangram  handshake circuit


C tangram handshake circuit1

C(R;S)=

C(R;S)=

;

;

R

S

R

S

a

c

a

c

|

b

C: Tangram  handshake circuit


C tangram handshake circuit2

||

R

S

|

i

o

rx

C: Tangram  handshake circuit

C (R||S) =


Tangram compilation

Tangram program T

H

C

||

Handshake circuit

Handshake process

E

 · H · T = || · C ·T

VLSI circuit

Tangram Compilation


Vlsi programming of asynchronous circuits
VLSI programming of asynchronous circuits

behavior,

area, time, energy,

test coverage

Tangram program

feedback

compiler

simulator

Handshake circuit

expander

Asynchronous circuit

(netlist of gates)


Tangram tool box
Tangram tool box

Let Rlin4.tg be a Tangram program:

  • htcomp -B Rlin4

    • compiles Rlin4.tg into Rlin4.hcl, a handshake circuit

  • htmap Rlin4

    • produces Rlin4*.v files, a CMOS standard-cell circuit

  • htsim Rlin4 a b

    • executes Rlin4.hcl with files a, b for input/output

  • htview Rlin4

    • provides interactive viewing of simulation results


Tangram program conway
Tangram program “Conway”

a

P

b

Q

c

R

d

B1 = type [0..1] & B2 = type <<B1,B1>>& B3 = type <<B1,B1,B1>>& P = … & Q = … & R = …& conway: main proc(a?chan B2 & d!chan B3). begin b,c: chan B1 |P(a,b) || Q(b,c) || R(c,d)end


Tangram program conway1
Tangram program “Conway”

& P = proc(a?chan B2 & b!chan B1).begin x: var B2|forever do a?x; b!x.0; b!x.1 od end

& Q= proc(b?chan B1 & c!chan B1).begin y: var B1| forever do b?y; c!y od end

& R= proc(c?chan B1 & d!chan B3).begin x,y,z: var B1| forever do c?x; c?y; c?z; d!<<x,y,z>> od end


Vlsi programming for
VLSI programming for …

  • Low costs:

    • introduce resource sharing.

  • Low delay (high throughput):

    • introduce parallelism.

  • Low energy (low power):

    • reduce activity; …


Vlsi programming for low costs
VLSI programming for low costs

  • Keep it simple!!

  • Introduce resource sharing: commands, auxiliary variables, expressions, operators.

  • Enable resource sharing, by:

    • reducing parallelism

    • making similar commands equal


Command sharing

0

1

0

1

|

S

S

S

Command sharing

P : proc(). S

P() ; … ; P()

S ; … ; S


Command sharing example

0

0

1

1

a

xw

|

|

|

a

xw

Command sharing: example

ax : proc(). a?x

ax() ; … ; ax()

a?x ; … ; a?x


Procedure definition vs declaration
Procedure definition vs declaration

Procedure definition: P = proc (). S

  • provides a textual shorthand (expansion)

  • each call generates copy of resource, i.e. no sharing

    Procedure declaration: P : proc (). S

  • defines a sharable resource

  • each call generates access to this resource


Command sharing1
Command sharing

  • Applies only to sequentially used commands.

  • Saves resources, almost always(i.e. when command is more costly than a mixer).

  • Impact on delay and energy often favorable.

  • Introduced by means of procedure declaration.

  • Makes Tangram program less well readable. Therefore, apply after program is correct & sound.

  • Should really be applied by compiler.


Sharing of auxiliary variables
Sharing of auxiliary variables

  • x:=E is an auto assignment when E depends on x. This is compiled as aux:=E; x:= aux , where aux is a “fresh” auxiliary variable.

  • With multiple auto assignments to x, as in:

    x:=E; ... ; x:=F

    auxiliary variables can be shared, as in:

    aux:=E; aux2x(); ... ; aux:=F; aux2x() with aux2x(): proc(). x:=aux


Expression sharing

e0

E

e1

E

E

Expression sharing

f : func(). E

x:=f() ; … ; a!f()

x:=E ; … ; a!E

e0

|

e1


Expression sharing1
Expression sharing

  • Applies only to sequentially used expressions.

  • Often saves resources, (i.e. when expression is more costly than the demultiplexer).

  • Introduced by means of function declarations.

  • Makes Tangram program less well readable. Therefore apply after program is correct & sound.

  • Should really be applied by compiler.


Operator sharing
Operator sharing

  • Consider x0 := y0+z0 ; … ; x1 := y1+z1 .

  • Operator+can be shared by introducing

    add : func(a,b? var T): T. a+b

    and applying it as in x0 := add(y0, z0) ; … ; x1 := add(y1,z1) .


Operator sharing the costs
Operator sharing: the costs

  • Operator sharing may introduce multiplexers to (all) inputs of the operator and a demultiplexer to its output.

  • This form of sharing only reduces costs when:

    • operator is expensive,

    • some input(s) and/or output are common.


Operator sharing example
Operator sharing: example

  • Consider x := y+z0 ; … ; x := y+z1 .

  • Operator + can be shared by introducingadd2y : proc(b? var T). x:=y+b

    and applying it as inadd2y(z0) ; … ; add2y(z1) .


Greatest common divisor1

GCD

ab

c

Greatest Common Divisor

gcd: main proc (ab?chan <<byte,byte>> & c!chan byte).begin x,y: var byte| forever doab?<<x,y>> ; do x<y then y:= y-xorx>y then x:= x-yod ; c!xodend


Assigment make gcd smaller
Assigment: make GCD smaller

  • Both assignments (y:= y-x and x:= x-y) are auto assignments and hence require an auxiliary variable.

  • Program requires 4 arithmetic resources (twice <and –) .

  • Reduce costs of GCD by saving on auxiliary variables and arithmetic resources. (Beware the costs of multiplexing!)

  • Use of ffvariables not allowed for this exercise.


ad