(BOCA)

1 / 86

# (BOCA) - PowerPoint PPT Presentation

Embedded Computer Architecture 2. (BOCA). Bijzondere Onderwerpen Computer Architectuur Block A Introduction. The aims of the course. Show the relation between the algorithm and the architecture. Derive the architecture from the algorithm. Explaining and formalizing the design process.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '(BOCA)' - leona

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Embedded Computer Architecture 2

### (BOCA)

Bijzondere Onderwerpen

Computer Architectuur

Block A

Introduction

The aims of the course
• Show the relation between the algorithm and the architecture.
• Derive the architecture from the algorithm.
• Explaining and formalizing the design process.
• Explain the distinction between structure and behavior.
• Explain some architectures.
The design process

A design description may express:

• Behavior: Expresses the relation between the input and the output value-streams of the system
• Structure: Describes how the system is decomposed into subsystems and how these subsystems are connected
• Geometry: Describes where the different parts are located.

Pure behavioral, structural or geometrical descriptions do not exist in practice.

Abstraction levels

Behavior

Geometry

Structure

Application

Algorithm

Basic operator

Boolean logic

Physical level

Board level

Layout

Cell

Block level

Processing element

Basic block

Transistor

The Design Process

verification:

The implementation i

is the specification

for the

implementation i+1

Idea

by simulation

only

Spec 0

by simulation

formal verification

Spec 1

For practical reasons a specification must be executable

by simulation

formal verification

Spec N

Descriptions
• Predicate logic
• Algebra (language Z, SDL (VDM) )
• Process algebras CCS, CSP, Lotos
• VHDL, Verilog
• Silage, ......

Specification overloading means that the specification gives a possibly unwanted implementation suggestion,

i.e. the behavioral specification expresses structure

In practice:

A behavioral specification always contains structure.

2

x

+

z

a

b

2

x

+

z

a

x

b

Example:

same function same behavior,

different expressions different structure

different designs

suggests:

and

suggests:

Architecture

Definition:

Architecture is the way in which hardware and software is structured;

the structure is usually based on grandiose design philosophies.

Architecture deals with fundamental elements that affect the way a system operates and thus its capabilities and its limitations.

The New American Computer Dictionary

Our focus
• Array processors.
• Systolic arrays.
• Wave-front array processors.
• Architectures for embedded algorithms s.a. digital signal processing algorithms.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Array processor

An array processor is a structure in which identical processing elements are arranged regularly

1 dimension

2 dimensions

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Array processor 3 dimensions

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Systolic array

In a systolic array processor all communication path contain at least one unit delay (register).

is register or delay

Delay constraints are local. Therefore unlimited extension

without changing the cells

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Wave-front array

Array Processors
• Can be approached from:
• Application
• Algorithm
• Architecture
• Technology
• We will focus on
• Algorithm Architecture
• Derive the architecture from the algorithm
Array processors: Application areas
• Speech processing
• Image processing (video, medical ,.....)
• Weather
• Medical signal processing
• Geology
• . . . . . . . . . . .

Many simple calculations

on a lot of data

in a short time

General purpose processors do not provide sufficient processing power

Example video processing
• 1000 operations per pixel (is not that much)
• 1024 x 1024 pixels per frame (high density TV)
• 50 frames per second (100 Hz TV)
• 50 G operations per second
• < 1 Watt available
• Pentium 2Ghz: 2G operations per second
• > 30 Watt
• required 25 Pentiums 750 Watt
Description of the algorithms
• In practice the algorithms are described (specified) in:
• some programming language.
• In our (toy) examples we use:
• programming languages
• algebraic descriptions
Examples of algorithms we will use:

Filter:

Matrix algebra:

Transformations like Fourier transform

Z transform

Sorting

. . . .

Graphs
• Graphs are applicable for describing
• behavior
• structure
• Dependency graphs
• consist of:
• nodes expressing operations or functions
• edges expressing data dependencies or
• the flow of data
• So, graphs are suitable to describe the design flow from
• Algorithm to architecture
Design flow example: Sorting

idea

program (imperative)

single assignment code (functional)

recurrent relations

dependency graph

8

Sorting: the idea

>

empty place

needed

10

9

8

5

3

2

1

12

>

8

8

5

2

1

3

10

9

12

shifted

one position

8

8

9

3

3

1

8

9

6

1

3

3

8

9

9

9

6

8

6

3

3

1

8

9

9

8

3

6

3

1

8

9

mj-1

mj-1

mj-1

mj-1

mj

mj

mj

mj

mj+1

mj+1

mj+1

mj+1

y

x

y

y := mj

x

x

y

mj:= x

x

y

x:= y

Sorting: inserting one element

if (x>= m[j])

{ y = m[j];

m[j] = x;

x = y;

}

if (x>= m[j]) swap(m[j],x);

Identical descriptions of swapping

m[j],x = MaxMin(m[j],x);

Inserting an element into a sorted array of i elements such that the order is preserved:

m[i] = -infinite

for(j = 0; j < i+1; j++)

{ m[j],x = MaxMin(m[j],x);

}

Sorting: The program

Sorting N elements in an array is composed from N times inserting an element into a sorted array of N elements such that the order is preserved. An empty array is ordered.

int in[0:N-1], x[0:N-1], m[0:N-1];

for(int i = 0; i < N; i++)

{ x[i] = in[i]; m[i] = - infinite; }

input

body

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[j],x[i] = MaxMin(m[j],x[i]);}

}

output

for(int j = 0; j < N; j++)

{ out[j] = m[j];}

Sorting: Towards ‘Single assignment’

• Single assignment:
• Each scalar variable is assigned only once
• Why?
• Goal is a data dependency graph
• - nodes expressing operations or functions
• - edges expressing data dependencies or
• the flow of data

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

Why?

Code Nodes Graph

x=a+b;

x=c*d;

a

x

+

b

How do you connect these?

c

x

*

d

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

Why?

Code

x=a+b;

x=c*d;

implementation: memory optimization.

But, fundamentally you produce two

different values, e.g. x1 an x2

hence,

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);}

}

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

m[j] at loop index i depends on the value at loop index i-1

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[j],x[i] = MaxMin(m[j],x[i]);}

}

hence,

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: Towards ‘Single assignment’

x[i] at loop index j depends on the value at loop index j-1

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);}

}

Sorting: The algorithm in ‘single assignment’

input

int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];

for(int i = 0; i < N; i++)

{ x[i,-1] = in[i]; m[i-1,i] = - infinite; }

body

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

output

for(int j = 0; j < N; j++)

{ out[j] = m[N-1,j];}

All scalar variables are assigned only once.

The algorithm satisfies the single assignment property

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i = 1

j = 0

n-1

n-1

in

x

out

int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];

for(int i = 0; i < N; i++)

{ x[i,-1] = in[i]; m[i-1,i] = - infinite; }

MaxMin

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

MM

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

MM

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

MM

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: Recurrent relation

A description in single assignment can be directly translated into a recurrent relation

in[0:N-1], out[0:N-1], x[0:N-1, -1:N-1], m[-1:N-1, 0:N-1];

declaration

x[i,-1] = in[i]

m[i-1,i] = - infinite

input

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])

body

out[j] = m[N-1,j]

output

0 <= i < N;

0 <= j < i+1 }

area

Notice that the order of these relations is arbitrary

j

m[i-1,j]

x[i,j-1]

x[i,j]

MaxMin

m[i,j]

i

Sorting: Body in two dimensions

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])

body

The body is executed for all i and j. Hence two dimensions

j

m[i-1,j]

1

x[i,j]

0

x[i,j-1]

i

1

0

m[i,j]

Sorting: Body implementation

body

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-i])

if( m[i-1,j] <= x[i,j-1])

{ m[i,j] = x[i,j-1]; x[i,j] = m[i-1,j]; }

else

{ m[i,j] = m[i-1,j]; x[i,j] = x[i,j-1]); }

j

m[2,3]=

m[1,2]=

m[0,1]=

m[-1,0]=

i

Sorting: Implementation N = 4

-1

0

1

2

3

-1

PE = MaxMin

x[0,-1]

PE

0

PE

PE

x[1,-1]

1

PE

PE

PE

x[2,-1]

2

PE

PE

PE

PE

x[3,-1]

3

m[3,0]

m[3,1]

m[3,2]

m[3,3]

Tuple :

Cartesian product: set of all tuples

The number of tuples in the set

If Q is a set and P is a subset of Q,

then the set of all subsets of Q is

The number of subsets of Q is

Hence, the set of all subsets of

and the number of subsets of

Something on functions

Function F

is the set of all functions with domain X and co-domain Y

F is a function in if and only if

Something on functions

Each element of the domain of X is mapped by F on a single element of the codomain Y

Hence

and

F can be represented as a set of tuples

Hence,

Functions, Arrays, Tuples, Sequences, ....

Arrays, tuples and sequences are all representations of the same set of functions

in which Dl,uis a closed subset of the set of integers Z

and V is some value co-domain

So

corresponds to

Hence, yi, y(i) and y[i] are syntactically different notations for the function value in i.

Functions on more than one variableCurrying

A function on two variables can be represented in three different ways:

x

y

F

z

z

time

time

Linear Time Invariant Systems

x and y are streams.

Time is represented by the set of integers Z,

so F maps functions on functions

Obviously, this class of functions also models systems that cannot exist in reality. For example non-causal systems

z

x and y are streams modeled by functions on Z.

time

+

=

Linear functions, linear systems

Definition:

A system F is called linear if

or

x1

y1

x2

y2

x1+x2

y1+y2

Time invariant systems

Definition:

A system F is called time invariant if

x1

y1

x2

y2

Linear time-invariant systems

Why?

Linear: Because they can easily be described

Time-invariant: Because electrical systems like transistors resistors capacitance and induction satisfy this property.

The convolution algorithm

The behavior of a linear time-invariant system can be fully described by its impulse response h, i.e. the response on the output to a single unit pulse on the input.

The response y on the output to an input stream x then follows from:

or

We will derive this convolution operation for time discrete signals

In which

z represents time,

i represents the location of the unit pulse

The convolution algorithm

Let the unit sample sequence be defined by

1

i

z

The convolution algorithm

• Step 1: express x using a delta function

The convolution algorithm

Then

1

2

3

4

5

6

7

in which (i) is a function on Z and x(i) is a scalar

The convolution algorithm

• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function

The convolution algorithm

Shifting over 

z

z

Hence

-1

0

1

2

3

-1

0

1

2

3

z

z

Hence

The convolution algorithm

• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time invariance property

The convolution algorithm

Consider a linear time-invariant system F

(i)

h*(i)

F

Let h*(i) be the response of this system to the unit sample sequence (i).

h*(i)(z)

(i)(z)

z

z

F is time-invariant, so

The convolution algorithm

• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time invariance property
• Step 4: rewrite impulse response using linearity property

The convolution algorithm

Example

(i)

h*(i)

F

h*(0)(z)

(0)(z)

-1

0

1

2

3

-1

0

1

2

3

z

z

-h*(1)(z)

-(1)(z)

3

1

2

0

-1

0

1

2

3

-1

½.h*(2)(z)

½.(2)(z)

4

-1

0

1

2

3

-1

1

2

3

0

The convolution algorithm

• Step 1: express x using a delta function
• Step 2: rewrite time-shifted delta function
• Step 3: rewrite impulse response using time invariance property
• Step 4: rewrite impulse response using linearity property
• Step 5: rewrite general expression by means of algebraic manipulation using result from step 4.

The convolution algorithm

x

y

F

in which h is called the impulse response of the system F

The convolution algorithm

From the preceding we derive:

scalar

function on Z

F is linear and x(i) is a scalar, hence

The convolution algorithm

continue

recall

recall

The convolution algorithm

continue

recall

This is called the convolution operation, denoted by

We will apply this formula several times

The convolution algorithm

continue

with j = z – i, we obtain:

and if the impulse response h is finite (bounded), i.e.

we get

Dependency Graphs and Signal Flow Graphs

• The array processor described:
• the way in which the processors are
• arranged and
• the way in which the data is communicated
• between the processing elements.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Hence, the graph describes the dependencies of the data that is communicated, or said differently:

The graph describes the way in which the data values at the outputs of a processing element depend on the data at the outputs of the other processing elements.

So we may consider it as a Dependency Graph or

a Signal Flow Graph

Dependency graphs and Signal Flow Graphs

Dependency Graph:

All communicated values are scalars and the processing elements are functions on scalars. Each arrow carries only one value. Time does not play a role.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

V is the value domain, number of inputs = number of outputs = N

Signal Flow Graph:

The communicated values are streams, i.e. functions on time and the processing elements are functions on steams.

Z represents time

Recurrent relations

For simple algorithms the transformation from single assignment code to a recurrent relation is simple.

• How do recurrent relations influence the dependency graph
• How can recurrent relations be manipulated such that the behavior remains the same and the structure of the dependency graph is changed

We will answer these questions by means of an example: Matrix-Vector multiplication

Matrix Vector multiplication

Recurrent relations:

Alternative (because  is associative)

Matrix Vector multiplication

The basic cell is described by:

We have two indices i and j, so the dependency graph can be described as a two-dimensional array

j

bj

ai,j

bj

x

si,j

si,j-1

si,j-1

si,j

PE

+

i

b0

b1

b2

j

s0,-1

S0,0

s0,1

s0,2=c0

0

PE

PE

PE

s1,0

0

PE

PE

s1,2=c1

PE

s2,0

s2,2=c2

0

PE

PE

PE

s3,0

s3,-1

i

0

s3,2=c3

PE

PE

PE

DG-1 of the Matrix Vector multiplication

(K = 4)

(N = 3)

b0, b1 and b2 are global dependencies.

Therefore this graph is called a Globally recursive Graph

j

i

DG-2 of the Matrix Vector multiplication

b0

b1

b2

s0,1

s0,2

s0,3

c0=s0,0

0

PE

PE

PE

s1,1

c1=s1,0

0

PE

PE

(K = 4)

PE

(N = 3)

s2,1

c2=s2,0

0

PE

PE

PE

s3,1

s3,3

c3=s3,0

0

PE

PE

PE

j

i

Variable naming and index assignment

A variable associated to an arrow gets the indices of the processing element that delivers its value.

ci-1,j

bi-1,j-1

ai,j-1

ai,j

PEi,j

( i , j )

bi,j

ci,j

vi,j

PEi,j

Local constants get the indices of the processing element that they are in

Equation

results in

Equation

results in

Recurrent relations: Conclusion

The associative operations and result in two different recurrent relations and thus in two different dependency graphs.

Other associative operations are for example ‘AND’ and ‘OR’.

å

-

N

1

=

c

a

.

b

i

i

,

j

j

=

j

0

Changing global data dependencies into local data dependencies

Global data dependencies resist manipulating the dependency graph

j

bj

Global data dependencies

ci

i

bj

Local data dependencies

di-1,j

ci

si,j

b0=d-1,0

b1=d-1,1

b2=d-1,2

s0,-1

s0,0

s0,1

s0,2=c0

0

PE

PE

PE

d0,0

d0,1

s1,0

0

PE

PE

s1,2=c1

PE

d1,0

s2,0

å

-

N

1

=

c

a

.

b

s2,2=c2

0

PE

PE

PE

i

i

,

j

j

=

j

0

s3,0

s3,-1

0

s3,2=c3

PE

PE

PE

Changing global data dependencies into local data dependencies

So the matrix-vector multiplications becomes:

Relations:

(K = 4)

(N = 3)

Locally recursive graph

å

-

N

1

=

c

a

.

b

i

i

,

j

j

=

j

0

Alternative transformation from global data dependencies to local data dependencies

bi

Global data dependencies

ci

Local data dependencies

di,j

ci

si,j

bi

s0,-1

s0,0

s0,1

s0,2=c0

0

PE

PE

PE

d1,0

d1,1

s1,0

0

PE

PE

s1,2=c1

PE

d2,0

å

-

N

1

s2,0

=

c

a

.

b

s2,2=c2

0

PE

PE

PE

i

i

,

j

j

=

j

0

s3,0

s3,-1

0

s3,2=c3

PE

PE

PE

b2=d4,2

b0=d4,0

b1=d4,1

Changing global data dependencies into local data dependencies

So the alternative locally recursive graph becomes:

Relations:

(K = 4)

(N = 3)

Shift-invariant graph

Consider an N-dimensional dependency graph with processing elements PE at locations (i,j,k, ...) .

Base (1,0,0,..), (0,1,0,..), (0,0,1,...), ... .

If for any (i,j,k, ...) and for any input x of the PE at (i,j,k, ...) that is delivered by the output x of PE at (p,q,r,... ), holds that the input x of the PE at (i,j+1,k,...) is delivered by the output x of the PE at (p,q+1,r,... ), then the graph is called shift-invariant in the direction (0,1,0,..).

j

i

Sh-Inv in direction i

Sh-Inv in direction i and j

Shift-invariant graphs (Examples)

j

i

Sh-Inv in no direction

Sh-Inv in direction i and j

Sh-Inv in direction j

Sh-Inv in no direction

Shift-invariant graphs

• Because the inputs and outputs often negatively influence the shift-invariance property, the inputs and outputs are treated separately.
• Hence, we always distinguish between
• Input edges,
• Output edges and
• Intermediate edges

Associative operations give two alternative DG’s.

Transformation from global to local dependencies gives two alternative DG’s.

Input, output and intermediate edges will be treated separately.

Dependeny Graphs

Conclusions: