loop restructuring n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Loop Restructuring PowerPoint Presentation
Download Presentation
Loop Restructuring

Loading in 2 Seconds...

play fullscreen
1 / 16

Loop Restructuring - PowerPoint PPT Presentation


  • 220 Views
  • Uploaded on

Loop Restructuring. Loop unswitching Loop peeling Loop fusion Loop alignment for fusion Loop reversal Loop fission Loop alignment Loop index set splitting Loop interchange Scalar expansion. Unswitching.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Loop Restructuring' - becca


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
loop restructuring
Loop Restructuring
  • Loop unswitching
  • Loop peeling
  • Loop fusion
  • Loop alignment for fusion
  • Loop reversal
  • Loop fission
  • Loop alignment
  • Loop index set splitting
  • Loop interchange
  • Scalar expansion
unswitching
Unswitching

DO I = 1, N DO J = 2, N IF T(I) > 0 THEN A(I,J) = A(I,J-1)*T(I)+B(I) ELSE A(I,J) = 0.0 ENDIF ENDDOENDDO

  • Loop unswitching removes loop-independent conditionals
  • Reduces the frequency of executing branches
  • But: leads to code expansion

DO I = 1, N IF T(I) > 0 THEN DO J = 2, N A(I,J) = A(I,J-1)*T(I)+B(I) ENDDO ELSE DO J = 2, N A(I,J) = 0.0 ENDDO ENDIFENDDO

peeling
Peeling

J = 0K = MDO I = 0, N A(K) = B(J) - B(K) K = J J = J + 1

ENDDO

  • Loop peeling removes the first (or last) iteration of a loop into separate code
  • Enables loop fusion by changing bounds of one loop to match bounds of another
  • But: leads to code expansion

J = 0K = MA(K) = B(J) - B(K)K = JJ = J + 1DO I = 1, N A(K) = B(J) - B(K) K = J J = J + 1ENDDO

fusion
Fusion

S1 B(1) = T(1)*X(1)S2 DO I = 2, NS3 B(I) = T(I)*X(I)S4 ENDDOS5 DO I = 2, NS6 A(I) = B(I) - B(I-1)S7 ENDDO

  • Combine two consecutive loops with same IV and loop bounds into one
  • Fused loop must preserve all dependence relations of the original loop
  • Enables more effective scalar optimizations in fused loop
  • But: may reduce temporal locality

S1 S6S3 S6

S1 B(1) = T(1)*X(1)Sx DO I = 2, NS3 B(I) = T(I)*X(I)S6 A(I) = B(I) - B(I-1)Sy ENDDO

S1 S6S3(=)S6S3(<)S6

Original code has dependencesS1 S6 and S3 S6Fused loop has dependencesS1 S6 and S3(=)S6 and S3(<)S6

example
Example

a)

S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDOS4 DO I = 1, NS5 C(I) = A(I)/2S6 ENDDOS7 DO I = 1, NS8 D(I) = 1/C(I+1)

S9 ENDDO

S1 DO I = 1, NS2 A(I) = B(I) + 1S3 ENDDO

Sx DO I = 1, NS5 C(I) = A(I)/2S8 D(I) = 1/C(I+1)

Sy ENDDO

b)

Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2Sy ENDDO

S7 DO I = 1, NS8 D(I) = 1/C(I+1)

S9 ENDDO

Which of the threefused loops is legal?

c)

Sx DO I = 1, NS2 A(I) = B(I) + 1S5 C(I) = A(I)/2S8 D(I) = 1/C(I+1)

Sy ENDDO

alignment for fusion
Alignment for Fusion

S1 DO I = 1, NS2 B(I) = T(I)/CS3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO

  • Alignment for fusion changes iteration bounds of one loop to enable fusion when dependences would otherwise prevent fusion

S2 S5

S1 DO I = 0, N-1S2 B(I+1) = T(I+1)/CS3 ENDDO

S4 DO I = 1, NS5 A(I) = B(I+1) - B(I-1)S6 ENDDO

S2 S5

Sx B(1) = T(1)/CS1 DO I = 1, N-1S2 B(I+1) = T(I+1)/CS5 A(I) = B(I+1) - B(I-1)S6 ENDDOSy A(N) = B(N+1) - B(N-1)

Loop deps:S2(=)S5S2(<)S5

reversal
Reversal

S1 DO I = 1, NS2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = 1, NS5 A(I) = B(I+1)S6 ENDDO

  • Reverse the direction of the iteration
  • Only legal for loops that have no carried dependences
  • Enables loop fusion by ensuring dependences are preserved between loop statements

S2 S5

S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S3 ENDDOS4 DO I = N, 1, -1S5 A(I) = B(I+1)S6 ENDDO

S2 S5

S1 DO I = N, 1, -1S2 B(I) = T(I)*X(I)S5 A(I) = B(I+1)S6 ENDDO

S2(<)S5

fission 1
Fission (1)

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO

S6 ENDDO

  • Loop fission (or loop distribution) splits a single loop into multiple loops
  • Enables vectorization
  • Enables parallelization of separate loops if original loop is sequential
  • Loop fission must preserve all dependence relations of the original loop

S3(=,<)S4

S1 DO I = 1, 10S2 DO J = 1, 10S3 A(I,J) = B(I,J) + C(I,J)Sx ENDDO

Sy DO J = 1, 10S4 D(I,J) = A(I,J-1) * 2.0S5 ENDDO

S6 ENDDO

S3(=,<)S4

S1 PARALLEL DO I = 1, 10S3 A(I,1:10)=B(I,1:10)+C(I,1:10)S4 D(I,1:10)=A(I,0:9) * 2.0S6 ENDDO

S3(=,<)S4

fission 2
Fission (2)

S1 DO I = 1, 10S2 A(I) = A(I) + B(I-1)S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)S5 D(I) = sqrt(C(I))S6 ENDDO

  • Compute the acyclic condensation of the dependence graph to find a legal order of the loops

S3(<)S2S4(<)S3 S3(=)S4S4(=)S5

S2

S1 DO I = 1, 10S3 B(I) = C(I-1)*X + ZS4 C(I) = 1/B(I)Sx ENDDO

Sy DO I = 1, 10S2 A(I) = A(I) + B(I-1)Sz ENDDO

Su DO I = 1, 10S5 D(I) = sqrt(C(I))Sv ENDDO

1

S3 S4

S3

1

0

S2

S5

S4

0

Acyclic condensation

S5

Dependence graph

alignment
Alignment

S1 DO I = 2, NS2A(I) = B(I) + C(I)S3 D(I) = A(I-1) * 2.0S4 ENDDO

  • Align statements in a loop body by expanding the iteration set
  • Attempts to transform loop-carried dependences into loop-independent dependences
  • Enables loop parallelization

S2(<)S3

S1 DO i = 1, NS2 IF (i>1) A(i) = B(i) + C(i)S3 IF (i<N) D(i+1) = A(i) * 2.0S4 ENDDO

S2(=)S3

S1

Before

S2

S1

After

S2

index set splitting
Index Set Splitting

S1 DO I = 1, 100S2 A(I) = B(I) + C(I)S3 IF I > 10 THENS4 D(I) = A(I) + A(I-10)S5 ENDIF

S6 ENDDO

  • Divide index set into two portions
  • Removes conditionals to enable other transformations
  • General case handles affine conditions in multi-dimensional loops by detecting a hyperplane through the iteration space polytope
  • But: code expansion

S1 DO I = 1, 10S2 A(I) = B(I) + C(I)Sx ENDDO

Sy DO I = 11, 100S2 A(I) = B(I) + C(I)S4 D(I) = A(I) + A(I-10) Su ENDDO

3*J>I

Loop1

Loop2

J

I

loop interchange 1
Loop Interchange (1)

S1 DO I = 1, NS2 DO J = 1, MS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

  • Changes the nesting order of nested loops
  • Loop interchange must preserve all dependence relations of the original loop
  • Enables vectorization of an outer loop
  • Can be used to improve spatial locality

S3(=,<)S3

S2 DO J = 1, MS1 DO I = 1, NS3 A(I,J) = A(I,J-1) + B(I,J)S4 ENDDOS5 ENDDO

S3(<,=)S3

S2 DO J = 1, MS3 A(1:N,J)=A(1:N,J-1)+B(1:N,J)S5 ENDDO

S3(<,=)S3

loop interchange 2
Loop Interchange (2)

S1 DO I = 1, NS2 DO J = 1, MS3 DO K = 1, LS4 A(I+1,J+1,K) = A(I,J,K) + A(I,J+1,K+1)S5 ENDDOS6 ENDDOS7 ENDDO

  • Compute the direction matrix and find which columns can be permuted without violating dependence relations in original loop nest

S4(<,<,=)S4S4(<,=,>)S4

< < =< = >

< < =< = >

< = <= > <

Invalid

Direction matrix

< < =< = >

< < == < >

Valid

scalar expansion
Scalar Expansion

S1 DO I = 1, NS2T = A(I) + B(I)S3 C(I) = T + 1/TS4 ENDDO

  • Breaks anti-dependence relations by expanding or promoting a scalar into an array
  • Scalar anti-dependence relations prevent certain loop transformations such as loop fission and loop interchange

S2(=)S3S2-1(<)S3

Sx IF N > 0 THENSyALLOC Tx(1:N)S1 DO I = 1, NS2Tx(I) = A(I) + B(I)Sx C(I) = Tx(I) + 1/Tx(I)S4 ENDDOSz T = Tx(N)Su ENDIF

S2(=)S3

example1
Example

S1 DO I = 1, 10S2 T = A(I,1)S3 DO J = 2, 10S4 T = T + A(I,J)S5 ENDDO

S6 B(I) = TS7 ENDDO

S1 DO I = 1, 10S2 Tx(I) = A(I,1)S3 DO J = 2, 10S4 Tx(I) = Tx(I)+A(I,J)S5 ENDDO

S6 B(I) = Tx(I)S7 ENDDO

S2(=)S4S4(=,<)S4S4(=)S6S2-1(<)S6

S2(=)S4S4(=,<)S4S4(=)S6

S1 DO I = 1, 10S2 Tx(I) = A(I,1)Sx ENDDO

S1 DO I = 1, 10S3 DO J = 2, 10S4 Tx(I) = Tx(I) + A(I,J)S5 ENDDO

Sy ENDDO

Sz DO I = 1, 10S6 B(I) = Tx(I)S7 ENDDO

S2 Tx(1:10) = A(1:10,1)S3 DO J = 2, 10S4 Tx(1:10) = Tx(1:10)+A(1:10,J)S5 ENDDO

S6 B(1:10) = Tx(1:10)

S2 S4S4(<,=)S4S4 S6

S2 S4S4(=,<)S4S4 S6

other loop restructuring transformations
Other Loop Restructuring Transformations
  • Loop skewing: denormalize iteration vectors to change the shape of the iteration space (skew) to allow loop interchange
  • Strip mining: decompose a single loop into two nested loops (where the inner loop computes a strip of the data)
  • Loop tiling: the loop space is divided into tiles