slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
(BOCA) PowerPoint Presentation
Download Presentation
(BOCA)

Loading in 2 Seconds...

play fullscreen
1 / 86

(BOCA) - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Embedded Computer Architecture 2. (BOCA). Bijzondere Onderwerpen Computer Architectuur Block A Introduction. The aims of the course. Show the relation between the algorithm and the architecture. Derive the architecture from the algorithm. Explaining and formalizing the design process.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '(BOCA)' - leona


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Embedded Computer Architecture 2

(BOCA)

Bijzondere Onderwerpen

Computer Architectuur

Block A

Introduction

the aims of the course
The aims of the course
  • Show the relation between the algorithm and the architecture.
  • Derive the architecture from the algorithm.
  • Explaining and formalizing the design process.
  • Explain the distinction between structure and behavior.
  • Explain some architectures.
the design process
The design process

A design description may express:

  • Behavior: Expresses the relation between the input and the output value-streams of the system
  • Structure: Describes how the system is decomposed into subsystems and how these subsystems are connected
  • Geometry: Describes where the different parts are located.

Pure behavioral, structural or geometrical descriptions do not exist in practice.

abstraction levels
Abstraction levels

Behavior

Geometry

Structure

Application

Algorithm

Basic operator

Boolean logic

Physical level

Board level

Layout

Cell

Block level

Processing element

Basic block

Transistor

the design process1
The Design Process

verification:

The implementation i

is the specification

for the

implementation i+1

Idea

by simulation

only

Spec 0

by simulation

formal verification

Spec 1

For practical reasons a specification must be executable

by simulation

formal verification

Spec N

descriptions
Descriptions
  • Predicate logic
  • Algebra (language Z, SDL (VDM) )
  • Process algebras CCS, CSP, Lotos
  • VHDL, Verilog
  • Silage, ......
specification overloading
Specification overloading

Specification overloading means that the specification gives a possibly unwanted implementation suggestion,

i.e. the behavioral specification expresses structure

In practice:

A behavioral specification always contains structure.

example

2

x

+

z

a

b

2

x

+

z

a

x

b

Example:

same function same behavior,

different expressions different structure

different designs

suggests:

and

suggests:

architecture
Architecture

Definition:

Architecture is the way in which hardware and software is structured;

the structure is usually based on grandiose design philosophies.

Architecture deals with fundamental elements that affect the way a system operates and thus its capabilities and its limitations.

The New American Computer Dictionary

our focus
Our focus
  • Array processors.
  • Systolic arrays.
  • Wave-front array processors.
  • Architectures for embedded algorithms s.a. digital signal processing algorithms.
array processor

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Array processor

An array processor is a structure in which identical processing elements are arranged regularly

1 dimension

2 dimensions

slide12

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Array processor 3 dimensions

systolic array

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Systolic array

In a systolic array processor all communication path contain at least one unit delay (register).

is register or delay

Delay constraints are local. Therefore unlimited extension

without changing the cells

slide14

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Wave-front array

array processors
Array Processors
  • Can be approached from:
    • Application
    • Algorithm
    • Architecture
    • Technology
  • We will focus on
  • Algorithm Architecture
  • Derive the architecture from the algorithm
array processors application areas
Array processors: Application areas
  • Speech processing
  • Image processing (video, medical ,.....)
  • Radar
  • Weather
  • Medical signal processing
  • Geology
  • . . . . . . . . . . .

Many simple calculations

on a lot of data

in a short time

General purpose processors do not provide sufficient processing power

example video processing
Example video processing
  • 1000 operations per pixel (is not that much)
  • 1024 x 1024 pixels per frame (high density TV)
  • 50 frames per second (100 Hz TV)
        • 50 G operations per second
        • < 1 Watt available
  • Pentium 2Ghz: 2G operations per second
        • > 30 Watt
  • required 25 Pentiums 750 Watt
description of the algorithms
Description of the algorithms
  • In practice the algorithms are described (specified) in:
    • some programming language.
  • In our (toy) examples we use:
    • programming languages
    • algebraic descriptions
examples of algorithms we will use
Examples of algorithms we will use:

Filter:

Matrix algebra:

Transformations like Fourier transform

Z transform

Sorting

. . . .

graphs
Graphs
  • Graphs are applicable for describing
    • behavior
    • structure
  • Dependency graphs
  • consist of:
    • nodes expressing operations or functions
    • edges expressing data dependencies or
    • the flow of data
  • So, graphs are suitable to describe the design flow from
  • Algorithm to architecture
design flow example sorting
Design flow example: Sorting

idea

program (imperative)

single assignment code (functional)

recurrent relations

dependency graph

sorting the idea

8

Sorting: the idea

>

empty place

needed

10

9

8

5

3

2

1

12

>

8

8

5

2

1

3

10

9

12

shifted

one position

slide23

8

8

9

3

3

1

8

9

6

1

3

3

8

9

9

9

6

8

6

3

3

1

8

9

9

8

3

6

3

1

8

9

mj-1

mj-1

mj-1

mj-1

mj

mj

mj

mj

mj+1

mj+1

mj+1

mj+1

y

x

y

y := mj

x

x

y

mj:= x

x

y

x:= y

slide24

Sorting: inserting one element

if (x>= m[j])

{ y = m[j];

m[j] = x;

x = y;

}

if (x>= m[j]) swap(m[j],x);

Identical descriptions of swapping

m[j],x = MaxMin(m[j],x);

Inserting an element into a sorted array of i elements such that the order is preserved:

m[i] = -infinite

for(j = 0; j < i+1; j++)

{ m[j],x = MaxMin(m[j],x);

}

slide25

Sorting: The program

Sorting N elements in an array is composed from N times inserting an element into a sorted array of N elements such that the order is preserved. An empty array is ordered.

int in[0:N-1], x[0:N-1], m[0:N-1];

for(int i = 0; i < N; i++)

{ x[i] = in[i]; m[i] = - infinite; }

input

body

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[j],x[i] = MaxMin(m[j],x[i]);}

}

output

for(int j = 0; j < N; j++)

{ out[j] = m[j];}

slide26

Sorting: Towards ‘Single assignment’

  • Single assignment:
  • Each scalar variable is assigned only once
  • Why?
  • Goal is a data dependency graph
    • - nodes expressing operations or functions
    • - edges expressing data dependencies or
    • the flow of data
slide27

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

Why?

Code Nodes Graph

x=a+b;

x=c*d;

a

x

+

b

How do you connect these?

c

x

*

d

slide28

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

Why?

Code

x=a+b;

x=c*d;

Description already optimized towards

implementation: memory optimization.

But, fundamentally you produce two

different values, e.g. x1 an x2

slide29

hence,

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);}

}

Sorting: Towards ‘Single assignment’

Single assignment:

Each scalar variable is assigned only once

Start with m[j]:

m[j] at loop index i depends on the value at loop index i-1

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[j],x[i] = MaxMin(m[j],x[i]);}

}

slide30

hence,

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

Sorting: Towards ‘Single assignment’

x[i] at loop index j depends on the value at loop index j-1

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; i++)

{ m[i,j],x[i] = MaxMin(m[i-1,j],x[i]);}

}

slide31

Sorting: The algorithm in ‘single assignment’

input

int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];

for(int i = 0; i < N; i++)

{ x[i,-1] = in[i]; m[i-1,i] = - infinite; }

body

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

output

for(int j = 0; j < N; j++)

{ out[j] = m[N-1,j];}

All scalar variables are assigned only once.

The algorithm satisfies the single assignment property

slide32

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i = 1

j = 0

n-1

n-1

in

x

out

int in[0:N-1], x[0:N-1,-1:N-1], m[-1:N-1,0:N-1];

for(int i = 0; i < N; i++)

{ x[i,-1] = in[i]; m[i-1,i] = - infinite; }

MaxMin

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide33

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

MM

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide34

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

MM

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide35

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide36

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

MM

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide37

Sorting: The algorithm in ‘single assignment’

0

n-1

n-1

-1

-1

0

m

i

n-1

n-1

j

for(int i = 0; i < N; i++)

{ for(j = 0; j < i+1; j++)

{ m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1]);}

}

slide38

Sorting: Recurrent relation

A description in single assignment can be directly translated into a recurrent relation

in[0:N-1], out[0:N-1], x[0:N-1, -1:N-1], m[-1:N-1, 0:N-1];

declaration

x[i,-1] = in[i]

m[i-1,i] = - infinite

input

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])

body

out[j] = m[N-1,j]

output

0 <= i < N;

0 <= j < i+1 }

area

Notice that the order of these relations is arbitrary

slide39

j

m[i-1,j]

x[i,j-1]

x[i,j]

MaxMin

m[i,j]

i

Sorting: Body in two dimensions

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-1])

body

The body is executed for all i and j. Hence two dimensions

slide40

j

m[i-1,j]

1

x[i,j]

0

x[i,j-1]

i

1

0

m[i,j]

Sorting: Body implementation

body

m[i,j],x[i,j] = MaxMin(m[i-1,j],x[i,j-i])

if( m[i-1,j] <= x[i,j-1])

{ m[i,j] = x[i,j-1]; x[i,j] = m[i-1,j]; }

else

{ m[i,j] = m[i-1,j]; x[i,j] = x[i,j-1]); }

slide41

j

m[2,3]=

m[1,2]=

m[0,1]=

m[-1,0]=

i

Sorting: Implementation N = 4

-1

0

1

2

3

-1

PE = MaxMin

x[0,-1]

PE

0

PE

PE

x[1,-1]

1

PE

PE

PE

x[2,-1]

2

PE

PE

PE

PE

x[3,-1]

3

m[3,0]

m[3,1]

m[3,2]

m[3,3]

slide43

Tuple :

Cartesian product: set of all tuples

The number of tuples in the set

If Q is a set and P is a subset of Q,

then the set of all subsets of Q is

The number of subsets of Q is

Hence, the set of all subsets of

and the number of subsets of

Something on functions

slide44

Function F

is the set of all functions with domain X and co-domain Y

F is a function in if and only if

Something on functions

Each element of the domain of X is mapped by F on a single element of the codomain Y

Hence

and

F can be represented as a set of tuples

Hence,

slide45

Functions, Arrays, Tuples, Sequences, ....

Arrays, tuples and sequences are all representations of the same set of functions

in which Dl,uis a closed subset of the set of integers Z

and V is some value co-domain

So

corresponds to

Hence, yi, y(i) and y[i] are syntactically different notations for the function value in i.

slide46

Functions on more than one variableCurrying

A function on two variables can be represented in three different ways:

slide49

x

y

F

z

z

time

time

Linear Time Invariant Systems

x and y are streams.

Time is represented by the set of integers Z,

so F maps functions on functions

Obviously, this class of functions also models systems that cannot exist in reality. For example non-causal systems

slide50

Adding functions

z

x and y are streams modeled by functions on Z.

time

+

=

slide51

Linear functions, linear systems

Definition:

A system F is called linear if

or

x1

y1

x2

y2

x1+x2

y1+y2

slide52

Time invariant systems

Definition:

A system F is called time invariant if

x1

y1

x2

y2

slide53

Linear time-invariant systems

Why?

Linear: Because they can easily be described

Time-invariant: Because electrical systems like transistors resistors capacitance and induction satisfy this property.

slide54

The convolution algorithm

The behavior of a linear time-invariant system can be fully described by its impulse response h, i.e. the response on the output to a single unit pulse on the input.

The response y on the output to an input stream x then follows from:

or

We will derive this convolution operation for time discrete signals

slide55

In which

z represents time,

i represents the location of the unit pulse

The convolution algorithm

Let the unit sample sequence be defined by

1

i

z

slide56

The convolution algorithm

  • Step 1: express x using a delta function
slide57

The convolution algorithm

Then

1

2

3

4

5

6

7

in which (i) is a function on Z and x(i) is a scalar

slide58

The convolution algorithm

  • Step 1: express x using a delta function
  • Step 2: rewrite time-shifted delta function
slide59

The convolution algorithm

Shifting over 

z

z

Hence

-1

0

1

2

3

-1

0

1

2

3

z

z

Hence

slide60

The convolution algorithm

  • Step 1: express x using a delta function
  • Step 2: rewrite time-shifted delta function
  • Step 3: rewrite impulse response using time invariance property
slide61

The convolution algorithm

Consider a linear time-invariant system F

(i)

h*(i)

F

Let h*(i) be the response of this system to the unit sample sequence (i).

h*(i)(z)

(i)(z)

z

z

F is time-invariant, so

slide62

The convolution algorithm

  • Step 1: express x using a delta function
  • Step 2: rewrite time-shifted delta function
  • Step 3: rewrite impulse response using time invariance property
  • Step 4: rewrite impulse response using linearity property
slide63

The convolution algorithm

Example

(i)

h*(i)

F

h*(0)(z)

(0)(z)

-1

0

1

2

3

-1

0

1

2

3

z

z

-h*(1)(z)

-(1)(z)

3

1

2

0

-1

0

1

2

3

-1

½.h*(2)(z)

½.(2)(z)

4

-1

0

1

2

3

-1

1

2

3

0

slide64

The convolution algorithm

  • Step 1: express x using a delta function
  • Step 2: rewrite time-shifted delta function
  • Step 3: rewrite impulse response using time invariance property
  • Step 4: rewrite impulse response using linearity property
  • Step 5: rewrite general expression by means of algebraic manipulation using result from step 4.
slide65

The convolution algorithm

x

y

F

in which h is called the impulse response of the system F

slide66

The convolution algorithm

From the preceding we derive:

scalar

function on Z

F is linear and x(i) is a scalar, hence

slide67

The convolution algorithm

continue

recall

recall

slide68

The convolution algorithm

continue

recall

This is called the convolution operation, denoted by

We will apply this formula several times

slide69

The convolution algorithm

continue

with j = z – i, we obtain:

and if the impulse response h is finite (bounded), i.e.

we get

slide70

Dependency Graphs and Signal Flow Graphs

  • The array processor described:
  • the way in which the processors are
  • arranged and
  • the way in which the data is communicated
  • between the processing elements.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

Hence, the graph describes the dependencies of the data that is communicated, or said differently:

The graph describes the way in which the data values at the outputs of a processing element depend on the data at the outputs of the other processing elements.

So we may consider it as a Dependency Graph or

a Signal Flow Graph

slide71

Dependency graphs and Signal Flow Graphs

Dependency Graph:

All communicated values are scalars and the processing elements are functions on scalars. Each arrow carries only one value. Time does not play a role.

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

V is the value domain, number of inputs = number of outputs = N

Signal Flow Graph:

The communicated values are streams, i.e. functions on time and the processing elements are functions on steams.

Z represents time

slide72

Recurrent relations

For simple algorithms the transformation from single assignment code to a recurrent relation is simple.

  • Questions to answer:
  • How do recurrent relations influence the dependency graph
  • How can recurrent relations be manipulated such that the behavior remains the same and the structure of the dependency graph is changed

We will answer these questions by means of an example: Matrix-Vector multiplication

slide73

Matrix Vector multiplication

Recurrent relations:

Alternative (because  is associative)

slide74

Matrix Vector multiplication

The basic cell is described by:

We have two indices i and j, so the dependency graph can be described as a two-dimensional array

j

bj

ai,j

bj

x

si,j

si,j-1

si,j-1

si,j

PE

+

i

slide75

b0

b1

b2

j

s0,-1

S0,0

s0,1

s0,2=c0

0

PE

PE

PE

s1,0

0

PE

PE

s1,2=c1

PE

s2,0

s2,2=c2

0

PE

PE

PE

s3,0

s3,-1

i

0

s3,2=c3

PE

PE

PE

DG-1 of the Matrix Vector multiplication

(K = 4)

(N = 3)

b0, b1 and b2 are global dependencies.

Therefore this graph is called a Globally recursive Graph

slide76

j

i

DG-2 of the Matrix Vector multiplication

b0

b1

b2

s0,1

s0,2

s0,3

c0=s0,0

0

PE

PE

PE

s1,1

c1=s1,0

0

PE

PE

(K = 4)

PE

(N = 3)

s2,1

c2=s2,0

0

PE

PE

PE

s3,1

s3,3

c3=s3,0

0

PE

PE

PE

slide77

j

i

Variable naming and index assignment

A variable associated to an arrow gets the indices of the processing element that delivers its value.

ci-1,j

bi-1,j-1

ai,j-1

ai,j

PEi,j

( i , j )

bi,j

ci,j

vi,j

PEi,j

Local constants get the indices of the processing element that they are in

slide78

Equation

results in

Equation

results in

Recurrent relations: Conclusion

The associative operations and result in two different recurrent relations and thus in two different dependency graphs.

Other associative operations are for example ‘AND’ and ‘OR’.

slide79

å

-

N

1

=

c

a

.

b

i

i

,

j

j

=

j

0

Changing global data dependencies into local data dependencies

Global data dependencies resist manipulating the dependency graph

j

bj

Global data dependencies

ci

i

bj

Local data dependencies

di-1,j

ci

si,j

slide80

b0=d-1,0

b1=d-1,1

b2=d-1,2

s0,-1

s0,0

s0,1

s0,2=c0

0

PE

PE

PE

d0,0

d0,1

s1,0

0

PE

PE

s1,2=c1

PE

d1,0

s2,0

å

-

N

1

=

c

a

.

b

s2,2=c2

0

PE

PE

PE

i

i

,

j

j

=

j

0

s3,0

s3,-1

0

s3,2=c3

PE

PE

PE

Changing global data dependencies into local data dependencies

So the matrix-vector multiplications becomes:

Relations:

(K = 4)

(N = 3)

Locally recursive graph

slide81

å

-

N

1

=

c

a

.

b

i

i

,

j

j

=

j

0

Alternative transformation from global data dependencies to local data dependencies

bi

Global data dependencies

ci

Local data dependencies

di,j

ci

si,j

bi

slide82

s0,-1

s0,0

s0,1

s0,2=c0

0

PE

PE

PE

d1,0

d1,1

s1,0

0

PE

PE

s1,2=c1

PE

d2,0

å

-

N

1

s2,0

=

c

a

.

b

s2,2=c2

0

PE

PE

PE

i

i

,

j

j

=

j

0

s3,0

s3,-1

0

s3,2=c3

PE

PE

PE

b2=d4,2

b0=d4,0

b1=d4,1

Changing global data dependencies into local data dependencies

So the alternative locally recursive graph becomes:

Relations:

(K = 4)

(N = 3)

slide83

Shift-invariant graph

Consider an N-dimensional dependency graph with processing elements PE at locations (i,j,k, ...) .

Base (1,0,0,..), (0,1,0,..), (0,0,1,...), ... .

If for any (i,j,k, ...) and for any input x of the PE at (i,j,k, ...) that is delivered by the output x of PE at (p,q,r,... ), holds that the input x of the PE at (i,j+1,k,...) is delivered by the output x of the PE at (p,q+1,r,... ), then the graph is called shift-invariant in the direction (0,1,0,..).

j

i

Sh-Inv in direction i

Sh-Inv in direction i and j

slide84

Shift-invariant graphs (Examples)

j

i

Sh-Inv in no direction

Sh-Inv in direction i and j

Sh-Inv in direction j

Sh-Inv in no direction

slide85

Shift-invariant graphs

  • Because the inputs and outputs often negatively influence the shift-invariance property, the inputs and outputs are treated separately.
  • Hence, we always distinguish between
  • Input edges,
  • Output edges and
  • Intermediate edges
slide86

Associative operations give two alternative DG’s.

Transformation from global to local dependencies gives two alternative DG’s.

Input, output and intermediate edges will be treated separately.

Dependeny Graphs

Conclusions: