Loop Restructuring

1 / 16

# Loop Restructuring - PowerPoint PPT Presentation

Loop Restructuring. Loop unswitching Loop peeling Loop fusion Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion. Unswitching.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Loop Restructuring' - becca

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Loop Restructuring
• Loop unswitching
• Loop peeling
• Loop fusion
• Loop alignment for fusion
• Loop reversal
• Loop fission
• Loop alignment
• Loop index set splitting
• Loop interchange
• Scalar expansion
Unswitching

DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = 0.0 ENDIF ENDDOENDDO

• Loop unswitching removes loop-independent conditionals
• Reduces the frequency of executing branches
• But: leads to code expansion

DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = 0.0 ENDDO ENDIFENDDO

Peeling

J = 0K = MDO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1

ENDDO

• Loop peeling removes the first (or last) iteration of a loop into separate code
• Enables loop fusion by changing bounds of one loop to match bounds of another
• But: leads to code expansion

J = 0K = MA(K) = B(J) - B(K)K = JJ = J + 1DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1ENDDO

Fusion

S1 B(1) = T(1)*X(1)S2 DO I = 2, NS3 B(I) = T(I)*X(I)S4 ENDDOS5 DO I = 2, NS6 A(I) = B(I) - B(I-1)S7 ENDDO

• Combine two consecutive loops with same IV and loop bounds into one
• Fused loop must preserve all dependence relations of the original loop
• Enables more effective scalar optimizations in fused loop
• But: may reduce temporal locality

S1 S6S3 S6

S1 B(1) = T(1)*X(1)Sx DO I = 2, NS3 B(I) = T(I)*X(I)S6 A(I) = B(I) - B(I-1)Sy ENDDO

S1 S6S3(=)S6S3(<)S6

Original code has dependencesS1 S6 and S3 S6Fused loop has dependencesS1 S6 and S3(=)S6 and S3(<)S6

Example

a)

S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDOS4 DO I = 1, NS5 C(I) = A(I)/2S6 ENDDOS7 DO I = 1, NS8 D(I) = 1/C(I+1)

S9 ENDDO

S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDO

Sx DO I = 1, NS5 C(I) = A(I)/2S8 D(I) = 1/C(I+1)

Sy ENDDO

b)

Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2Sy ENDDO

S7 DO I = 1, NS8 D(I) = 1/C(I+1)

S9 ENDDO

Which of the threefused loops is legal?

c)

Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2S8 D(I) = 1/C(I+1)

Sy ENDDO

Alignment for Fusion

S1 DO I = 1, NS2 B(I) = T(I)/CS3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO

• Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion

S2 S5

S1 DO I = 0, N-1S2 B(I+1) = T(I+1)/CS3 ENDDO

S4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO

S2 S5

Sx B(1) = T(1)/CS1 DO I = 1, N-1S2 B(I+1) = T(I+1)/CS5 A(I) = B(I+1) - B(I-1)S6 ENDDOSy A(N) = B(N+1) - B(N-1)

Loop deps:S2(=)S5S2(<)S5

Reversal

S1 DO I = 1, NS2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1)S6 ENDDO

• Reverse the direction of the iteration
• Only legal for loops that have no carried dependences
• Enables loop fusion by ensuring dependences are preserved between loop statements

S2 S5

S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = N, 1, -1S5 A(I) = B(I+1)S6 ENDDO

S2 S5

S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S5 A(I) = B(I+1)S6 ENDDO

S2(<)S5

Fission (1)

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO

S6 ENDDO

• Loop fission (or loop distribution) splits a single loop into multiple loops
• Enables vectorization
• Enables parallelization of separate loops if original loop is sequential
• Loop fission must preserve all dependence relations of the original loop

S3(=,<)S4

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDO

Sy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO

S6 ENDDO

S3(=,<)S4

S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO

S3(=,<)S4

Fission (2)

S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO

• Compute the acyclic condensation of the dependence graph to find a legal order of the loops

S3(<)S2S4(<)S3 S3(=)S4S4(=)S5

S2

S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDO

Sy DO I = 1, 10S2 A(I) = A(I) + B(I-1)Sz ENDDO

Su DO I = 1, 10S5 D(I) = sqrt(C(I))Sv ENDDO

1

S3 S4

S3

1

0

S2

S5

S4

0

Acyclic condensation

S5

Dependence graph

Alignment

S1 DO I = 2, NS2A(I) = B(I) + C(I)S3 D(I) = A(I-1) * 2.0S4 ENDDO

• Align statements in a loop body by expanding the iteration set
• Attempts to transform loop-carried dependences into loop-independent dependences
• Enables loop parallelization

S2(<)S3

S1 DO i = 1, NS2 IF (i>1) A(i) = B(i) + C(i)S3 IF (i<N) D(i+1) = A(i) * 2.0S4 ENDDO

S2(=)S3

S1

Before

S2

S1

After

S2

Index Set Splitting

S1 DO I = 1, 100S2 A(I) = B(I) + C(I)S3 IF I > 10 THENS4 D(I) = A(I) + A(I-10)S5 ENDIF

S6 ENDDO

• Divide index set into two portions
• Removes conditionals to enable other transformations
• General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope
• But: code expansion

S1 DO I = 1, 10S2 A(I) = B(I) + C(I)Sx ENDDO

Sy DO I = 11, 100S2 A(I) = B(I) + C(I)S4 D(I) = A(I) + A(I-10) Su ENDDO

3*J>I

Loop1

Loop2

J

I

Loop Interchange (1)

S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

• Changes the nesting order of nested loops
• Loop interchange must preserve all dependence relations of the original loop
• Enables vectorization of an outer loop
• Can be used to improve spatial locality

S3(=,<)S3

S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

S3(<,=)S3

S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO

S3(<,=)S3

Loop Interchange (2)

S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO

• Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest

S4(<,<,=)S4S4(<,=,>)S4

< < =< = >

< < =< = >

< = <= > <

Invalid

Direction matrix

< < =< = >

< < == < >

Valid

Scalar Expansion

S1 DO I = 1, NS2T = A(I) + B(I)S3 C(I) = T + 1/TS4 ENDDO

• Breaks anti-dependence relations by expanding or promoting a scalar into an array
• Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange

S2(=)S3S2-1(<)S3

Sx IF N > 0 THENSyALLOC Tx(1:N)S1 DO I = 1, NS2Tx(I) = A(I) + B(I)Sx C(I) = Tx(I) + 1/Tx(I)S4 ENDDOSz T = Tx(N)Su ENDIF

S2(=)S3

Example

S1 DO I = 1, 10S2 T = A(I,1)S3 DO J = 2, 10S4 T = T + A(I,J)S5 ENDDO

S6 B(I) = TS7 ENDDO

S1 DO I = 1, 10S2 Tx(I) = A(I,1)S3 DO J = 2, 10S4 Tx(I) = Tx(I)+A(I,J)S5 ENDDO

S6 B(I) = Tx(I)S7 ENDDO

S2(=)S4S4(=,<)S4S4(=)S6S2-1(<)S6

S2(=)S4S4(=,<)S4S4(=)S6

S1 DO I = 1, 10S2 Tx(I) = A(I,1)Sx ENDDO

S1 DO I = 1, 10S3 DO J = 2, 10S4 Tx(I) = Tx(I) + A(I,J)S5 ENDDO

Sy ENDDO

Sz DO I = 1, 10S6 B(I) = Tx(I)S7 ENDDO

S2 Tx(1:10) = A(1:10,1)S3 DO J = 2, 10S4 Tx(1:10) = Tx(1:10)+A(1:10,J)S5 ENDDO

S6 B(1:10) = Tx(1:10)

S2 S4S4(<,=)S4S4 S6

S2 S4S4(=,<)S4S4 S6

Other Loop Restructuring Transformations
• Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange
• Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data)
• Loop tiling: the loop space is divided into tiles