Embedded Computer Architecture 2. (BOCA). Bijzondere Onderwerpen Computer Architectuur Block A Introduction. The aims of the course. Show the relation between the algorithm and the architecture. Derive the architecture from the algorithm. Explaining and formalizing the design process.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Embedded Computer Architecture 2
Bijzondere Onderwerpen
Computer Architectuur
Block A
Introduction
A design description may express:
Pure behavioral, structural or geometrical descriptions do not exist in practice.
Behavior
Geometry
Structure
Application
Algorithm
Basic operator
Boolean logic
Physical level
Board level
Layout
Cell
Block level
Processing element
Basic block
Transistor
verification:
The implementation i
is the specification
for the
implementation i+1
Idea
by simulation
only
Spec 0
by simulation
formal verification
Spec 1
For practical reasons a specification must be executable
by simulation
formal verification
Spec N
Specification overloading means that the specification gives a possibly unwanted implementation suggestion,
i.e. the behavioral specification expresses structure
In practice:
A behavioral specification always contains structure.
Definition:
Architecture is the way in which hardware and software is structured;
the structure is usually based on grandiose design philosophies.
Architecture deals with fundamental elements that affect the way a system operates and thus its capabilities and its limitations.
The New American Computer Dictionary
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Array processorAn array processor is a structure in which identical processing elements are arranged regularly
1 dimension
2 dimensions
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Systolic arrayIn a systolic array processor all communication path contain at least one unit delay (register).
is register or delay
Delay constraints are local. Therefore unlimited extension
without changing the cells
Many simple calculations
on a lot of data
in a short time
General purpose processors do not provide sufficient processing power
Filter:
Matrix algebra:
Transformations like Fourier transform
Z transform
Sorting
. . . .
idea
program (imperative)
single assignment code (functional)
recurrent relations
dependency graph
Sorting: inserting one element
if (x>= m[j])
{ y = m[j];
m[j] = x;
x = y;
}
if (x>= m[j]) swap(m[j],x);
Identical descriptions of swapping
m[j],x = MaxMin(m[j],x);
Inserting an element into a sorted array of i elements such that the order is preserved:
m[i] = infinite
for(j = 0; j < i+1; j++)
{ m[j],x = MaxMin(m[j],x);
}
Sorting N elements in an array is composed from N times inserting an element into a sorted array of N elements such that the order is preserved. An empty array is ordered.
int in[0:N1], x[0:N1], m[0:N1];
for(int i = 0; i < N; i++)
{ x[i] = in[i]; m[i] =  infinite; }
input
body
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[j],x[i] = MaxMin(m[j],x[i]);}
}
output
for(int j = 0; j < N; j++)
{ out[j] = m[j];}
Sorting: Towards ‘Single assignment’
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Why?
Code Nodes Graph
x=a+b;
x=c*d;
a
x
+
b
How do you connect these?
c
x
*
d
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Why?
Code
x=a+b;
x=c*d;
Description already optimized towards
implementation: memory optimization.
But, fundamentally you produce two
different values, e.g. x1 an x2
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i] = MaxMin(m[i1,j],x[i]);}
}
Sorting: Towards ‘Single assignment’
Single assignment:
Each scalar variable is assigned only once
Start with m[j]:
m[j] at loop index i depends on the value at loop index i1
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; i++)
{ m[j],x[i] = MaxMin(m[j],x[i]);}
}
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; i++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: Towards ‘Single assignment’
x[i] at loop index j depends on the value at loop index j1
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; i++)
{ m[i,j],x[i] = MaxMin(m[i1,j],x[i]);}
}
Sorting: The algorithm in ‘single assignment’
input
int in[0:N1], x[0:N1,1:N1], m[1:N1,0:N1];
for(int i = 0; i < N; i++)
{ x[i,1] = in[i]; m[i1,i] =  infinite; }
body
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
output
for(int j = 0; j < N; j++)
{ out[j] = m[N1,j];}
All scalar variables are assigned only once.
The algorithm satisfies the single assignment property
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
m
i = 1
j = 0
n1
n1
in
x
out
int in[0:N1], x[0:N1,1:N1], m[1:N1,0:N1];
for(int i = 0; i < N; i++)
{ x[i,1] = in[i]; m[i1,i] =  infinite; }
MaxMin
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
MM
m
i
n1
n1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
m
MM
i
n1
n1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
m
i
n1
n1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
m
MM
i
n1
n1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
Sorting: The algorithm in ‘single assignment’
0
n1
n1
1
1
0
m
i
n1
n1
j
for(int i = 0; i < N; i++)
{ for(j = 0; j < i+1; j++)
{ m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1]);}
}
A description in single assignment can be directly translated into a recurrent relation
in[0:N1], out[0:N1], x[0:N1, 1:N1], m[1:N1, 0:N1];
declaration
x[i,1] = in[i]
m[i1,i] =  infinite
input
m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1])
body
out[j] = m[N1,j]
output
0 <= i < N;
0 <= j < i+1 }
area
Notice that the order of these relations is arbitrary
m[i1,j]
x[i,j1]
x[i,j]
MaxMin
m[i,j]
i
Sorting: Body in two dimensions
m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,j1])
body
The body is executed for all i and j. Hence two dimensions
m[i1,j]
1
x[i,j]
0
x[i,j1]
i
1
0
m[i,j]
Sorting: Body implementation
body
m[i,j],x[i,j] = MaxMin(m[i1,j],x[i,ji])
if( m[i1,j] <= x[i,j1])
{ m[i,j] = x[i,j1]; x[i,j] = m[i1,j]; }
else
{ m[i,j] = m[i1,j]; x[i,j] = x[i,j1]); }
m[2,3]=
m[1,2]=
m[0,1]=
m[1,0]=
i
Sorting: Implementation N = 4
1
0
1
2
3
1
PE = MaxMin
x[0,1]
PE
0
PE
PE
x[1,1]
1
PE
PE
PE
x[2,1]
2
PE
PE
PE
PE
x[3,1]
3
m[3,0]
m[3,1]
m[3,2]
m[3,3]
Cartesian product: set of all tuples
The number of tuples in the set
If Q is a set and P is a subset of Q,
then the set of all subsets of Q is
The number of subsets of Q is
Hence, the set of all subsets of
and the number of subsets of
Something on functions
is the set of all functions with domain X and codomain Y
F is a function in if and only if
Something on functions
Each element of the domain of X is mapped by F on a single element of the codomain Y
Hence
and
F can be represented as a set of tuples
Hence,
Functions, Arrays, Tuples, Sequences, ....
Arrays, tuples and sequences are all representations of the same set of functions
in which Dl,uis a closed subset of the set of integers Z
and V is some value codomain
So
corresponds to
Hence, yi, y(i) and y[i] are syntactically different notations for the function value in i.
Functions on more than one variableCurrying
A function on two variables can be represented in three different ways:
y
F
z
z
time
time
Linear Time Invariant Systems
x and y are streams.
Time is represented by the set of integers Z,
so F maps functions on functions
Obviously, this class of functions also models systems that cannot exist in reality. For example noncausal systems
Why?
Linear: Because they can easily be described
Timeinvariant: Because electrical systems like transistors resistors capacitance and induction satisfy this property.
The behavior of a linear timeinvariant system can be fully described by its impulse response h, i.e. the response on the output to a single unit pulse on the input.
The response y on the output to an input stream x then follows from:
or
We will derive this convolution operation for time discrete signals
z represents time,
i represents the location of the unit pulse
The convolution algorithm
Let the unit sample sequence be defined by
1
i
z
Consider a linear timeinvariant system F
(i)
h*(i)
F
Let h*(i) be the response of this system to the unit sample sequence (i).
h*(i)(z)
(i)(z)
z
z
F is timeinvariant, so
Example
(i)
h*(i)
F
h*(0)(z)
(0)(z)
1
0
1
2
3
1
0
1
2
3
z
z
h*(1)(z)
(1)(z)
3
1
2
0
1
0
1
2
3
1
½.h*(2)(z)
½.(2)(z)
4
1
0
1
2
3
1
1
2
3
0
From the preceding we derive:
scalar
function on Z
F is linear and x(i) is a scalar, hence
continue
recall
This is called the convolution operation, denoted by
We will apply this formula several times
continue
with j = z – i, we obtain:
and if the impulse response h is finite (bounded), i.e.
we get
Dependency Graphs and Signal Flow Graphs
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
Hence, the graph describes the dependencies of the data that is communicated, or said differently:
The graph describes the way in which the data values at the outputs of a processing element depend on the data at the outputs of the other processing elements.
So we may consider it as a Dependency Graph or
a Signal Flow Graph
Dependency graphs and Signal Flow Graphs
Dependency Graph:
All communicated values are scalars and the processing elements are functions on scalars. Each arrow carries only one value. Time does not play a role.
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
V is the value domain, number of inputs = number of outputs = N
Signal Flow Graph:
The communicated values are streams, i.e. functions on time and the processing elements are functions on steams.
Z represents time
For simple algorithms the transformation from single assignment code to a recurrent relation is simple.
We will answer these questions by means of an example: MatrixVector multiplication
The basic cell is described by:
We have two indices i and j, so the dependency graph can be described as a twodimensional array
j
bj
ai,j
bj
x
si,j
si,j1
si,j1
si,j
PE
+
i
b1
b2
j
s0,1
S0,0
s0,1
s0,2=c0
0
PE
PE
PE
s1,0
0
PE
PE
s1,2=c1
PE
s2,0
s2,2=c2
0
PE
PE
PE
s3,0
s3,1
i
0
s3,2=c3
PE
PE
PE
DG1 of the Matrix Vector multiplication
(K = 4)
(N = 3)
b0, b1 and b2 are global dependencies.
Therefore this graph is called a Globally recursive Graph
i
DG2 of the Matrix Vector multiplication
b0
b1
b2
s0,1
s0,2
s0,3
c0=s0,0
0
PE
PE
PE
s1,1
c1=s1,0
0
PE
PE
(K = 4)
PE
(N = 3)
s2,1
c2=s2,0
0
PE
PE
PE
s3,1
s3,3
c3=s3,0
0
PE
PE
PE
i
Variable naming and index assignment
A variable associated to an arrow gets the indices of the processing element that delivers its value.
ci1,j
bi1,j1
ai,j1
ai,j
PEi,j
( i , j )
bi,j
ci,j
vi,j
PEi,j
Local constants get the indices of the processing element that they are in
results in
Equation
results in
Recurrent relations: Conclusion
The associative operations and result in two different recurrent relations and thus in two different dependency graphs.
Other associative operations are for example ‘AND’ and ‘OR’.

N
1
=
c
a
.
b
i
i
,
j
j
=
j
0
Changing global data dependencies into local data dependencies
Global data dependencies resist manipulating the dependency graph
j
bj
Global data dependencies
ci
i
bj
Local data dependencies
di1,j
ci
si,j
b1=d1,1
b2=d1,2
s0,1
s0,0
s0,1
s0,2=c0
0
PE
PE
PE
d0,0
d0,1
s1,0
0
PE
PE
s1,2=c1
PE
d1,0
s2,0
å

N
1
=
c
a
.
b
s2,2=c2
0
PE
PE
PE
i
i
,
j
j
=
j
0
s3,0
s3,1
0
s3,2=c3
PE
PE
PE
Changing global data dependencies into local data dependencies
So the matrixvector multiplications becomes:
Relations:
(K = 4)
(N = 3)
Locally recursive graph

N
1
=
c
a
.
b
i
i
,
j
j
=
j
0
Alternative transformation from global data dependencies to local data dependencies
bi
Global data dependencies
ci
Local data dependencies
di,j
ci
si,j
bi
s0,0
s0,1
s0,2=c0
0
PE
PE
PE
d1,0
d1,1
s1,0
0
PE
PE
s1,2=c1
PE
d2,0
å

N
1
s2,0
=
c
a
.
b
s2,2=c2
0
PE
PE
PE
i
i
,
j
j
=
j
0
s3,0
s3,1
0
s3,2=c3
PE
PE
PE
b2=d4,2
b0=d4,0
b1=d4,1
Changing global data dependencies into local data dependencies
So the alternative locally recursive graph becomes:
Relations:
(K = 4)
(N = 3)
Consider an Ndimensional dependency graph with processing elements PE at locations (i,j,k, ...) .
Base (1,0,0,..), (0,1,0,..), (0,0,1,...), ... .
If for any (i,j,k, ...) and for any input x of the PE at (i,j,k, ...) that is delivered by the output x of PE at (p,q,r,... ), holds that the input x of the PE at (i,j+1,k,...) is delivered by the output x of the PE at (p,q+1,r,... ), then the graph is called shiftinvariant in the direction (0,1,0,..).
j
i
ShInv in direction i
ShInv in direction i and j
Shiftinvariant graphs (Examples)
j
i
ShInv in no direction
ShInv in direction i and j
ShInv in direction j
ShInv in no direction
Associative operations give two alternative DG’s.
Transformation from global to local dependencies gives two alternative DG’s.
Input, output and intermediate edges will be treated separately.
Dependeny Graphs
Conclusions: