1 / 11

# Chapter 5 Unfolding - PowerPoint PPT Presentation

Chapter 5 Unfolding. Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration. Also known as (a.k.a.) Loop unrolling (in compilers for parallel programs) Block processing . Applications

Related searches for Chapter 5 Unfolding

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Chapter 5 Unfolding' - nishan

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 5 Unfolding

Unfolding is the process of unfolding a loop so that several iterations are unrolled into the same iteration.

Also known as (a.k.a.)

Loop unrolling (in compilers for parallel programs)

Block processing

Applications

Reducing sampling period to achieve iteration bound (desired throughput rate) T.

Parallel (block processing) to execute several iterations concurrently.

Digit-serial or bit-serial processing

Definitions

(C) 1997-2006 by Yu Hen Hu

Block processing formulation iterations are unrolled into the same iteration.

J = 3, 9/J = 3 (an integer)

X(k) = [x(3k) x(3k+1) x(3k+2)]T

Y(k) = [y(3k) y(3k+1) y(3k+2)]T

Y(k) = a*Y(k- 3 ) + X(k)

J = 2, 9/J = 5 (not an integer)

X(k) = [x(2k) x(2k+1)]T

Y(k) = [y(2k) y(2k+1)]T

Y(k) = a*Y(k- 5 ) + X(k)

Before unfolding:

For n = 0 to N-1,

y(n)=a*y(n-9)+x(n)

end

Unfolding once (J = 2)

For k = 0 to N/2-1,

y(2k)=a*y(2k-9)+x(2k)

y(2k+1)=a*y(2k-8)+x(2k+1)

end

Unfolding twice (J = 3)

For k = 0 to N/3-1,

y(3k)=a*y(3k-9)+x(3k)

y(3k+1)=a*y(3k-8)+x(3k+1)

y(3k+2)=a*y(3k-7)+x(3k+2)

end

An example

(C) 1997-2006 by Yu Hen Hu

Implementation with J=3 iterations are unrolled into the same iteration.

3Ts

Serial-to-parallel conversion

parallel-to-Serial conversion

y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

.

.

.

Ts

Ts

+

X

D

+

X

D

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

.

.

.

+

X

D

(C) 1997-2006 by Yu Hen Hu

Rewrite the algorithm formulation: iterations are unrolled into the same iteration.

y(2k)=a*y(2k-9)+x(2k)

y(2k+1)=a*y(2k-8)+x(2k+1)

y(2k)=a*y(2(k-5)+1)+x(2k)

y(2k+1)=a*y(2(k-4))+x(2k+1)

After J-folded unfolding, the clock period T = J Ts, where Ts is the data sampling period.

Unfolding the DFG

T=Ts

T=J Ts

(C) 1997-2006 by Yu Hen Hu

Above timing diagram is obtained assuming that the sampling period Ts remains unchanged. Thus, the clock period T is increased J-fold.

Since 9/2 is not an integer, output (y(0), y(1)) will be needed by two different future iterations, 4T and 5T later.

Timing Diagram

y(0)

y(1)

y(2)

y(3)

y(4)

y(5)

y(6)

y(7)

y(8)

y(9)

y(10)

y(11)

y(12)

y(13)

9 T

T=Ts

9 T

T=2Ts

y(0)

y(2)

y(4)

y(6)

y(8)

y(10)

y(12)

4T

5T

y(1)

y(3)

y(5)

y(7)

y(9)

y(11)

y(13)

(C) 1997-2006 by Yu Hen Hu

Define period T

Step 1. For each node U in original DFG, draw J nodes {Ui; 0 iJ-1} in the unfolded DFG

Step 2. For each edge from U to V with w delays, draw J edges from Ui to V(i+w)%J with (i+w)/J delays

General DFG Unfolding Method

(C) 1997-2006 by Yu Hen Hu

Another DFG Unfolding Example period T

J=2

S0

Q0

T0

S

R0

Q

T

3D

2D

S1

R

Q1

T1

T=3

R1

Step 1. Duplicate J copies of each node

(C) 1997-2006 by Yu Hen Hu

Another DFG Unfolding Example period T

J=2

S0

Q0

T0

S

R0

Q

T

3D

2D

S1

R

Q1

T1

T=3

R1

Step 2. Add all edges with 0 delay on them.

(C) 1997-2006 by Yu Hen Hu

Another DFG Unfolding Example period T

J=2

S0

Q0

T0

S

D

R0

Q

T

2D

D

3D

2D

S1

R

Q1

T1

T=3

D

R1

Step 3. Use table on the left to figure out edges with delays.

T=6

(C) 1997-2006 by Yu Hen Hu

For a loop with w delays in a DFG that has been unfolded J times, it leads to

g.c.d.(w, J) loops in the unfolded DFG, with each of these loops containing

w/(g.c.d.(w,J)) delays and

J/(g.c.d.(w,J)) copies of each node that appear in the original loop.

Unfolding a DFG with iteration bound T results in a J-folded DFG with iteration bound JT.

A path with w (< J) delays in a DFG will lead to J-w paths with no delays, and w paths with 1 delay each in the J-unfolded DFG.

Any path in the original DFG containing J or more delays leads to J paths with 1 or more delay in each path. Therefore, it can not create a critical path in the J-unfolded DFG

Any clock period that can be achieved by retiming a J-unfolded DFG can be achieved by retiming the original DFG and followed by J-unfolding.

Properties of Unfolding

(C) 1997-2006 by Yu Hen Hu