1 / 38

V alue E volution G raph

The. V alue E volution G raph. And its Applications to Automatic Parallelization. Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger. Automatically Parallelized. !$OMP PARALLEL DO DO i = 1, 100 B(i) = 1 ENDDO q = 100. Motivating Example: Parallelization.

manuelaa
Download Presentation

V alue E volution G raph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ValueEvolution Graph And its Applications to Automatic Parallelization Silvius Rus, Dongmin Zhang, and Lawrence Rauchwerger

  2. Automatically Parallelized !$OMP PARALLEL DO DO i = 1, 100 B(i) = 1 ENDDO q = 100 Motivating Example: Parallelization Sample Code q = 0 DO i = 1, 100 q = q+1 B(q) = 1 ENDDO • Classic Solution • Induction Variable Substitution: q  f(i) = i • Dependence Test: 1 ≤ i1 ≤ 100 1 ≤ i2 ≤ 100 i1  i2 f(i1) = f(i2)

  3. After Induction Variable Recognition/Substitution 1 old = p 2 3 DO i = 1, old 4 5 B(i) = 1 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF 10 ENDDO Anti Flow Output Array B is independent Array A is dependent Motivating Example: Parallelization Sample Code 1 old = p 2 q = 0 3 DO i = 1, old 4 q = q+1 5 B(q) = 1 6 IF (A(q).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF 10 ENDDO q is substituted with closed form q = i p cannot be substituted with a closed form

  4. After Induction Variable Recognition/Substitution 1 old = p 2 3 DO i = 1, old 4 5 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF 10 ENDDO Anti Flow Output p(8)[1:old] p(8)[1:old] p(8) non-repeating Motivating Example: Parallelization

  5. STEP Recurrence Properties 1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF 10 ENDDO Cross-iteration mutually independent if pstrictly increasing, or step(p|i=k, p|i=k+1) > 0,  k  [1:old]

  6. STEP Recurrence Properties IMAGE 1 old = p 3 DO i = 1, old 6 IF (A(i).GT.0) 7 p = p+1 8 A(p) = 0 9 ENDIF 10 ENDDO Independent if p and i belong to disjoint sets, or image(p|i[1:old]) image(i)|i[1:old])=

  7. A Simple Value Evolution Graph Static Single Assignment Form Sample Code 1 p= 0 2 IF (cond) 3 p = p+5 4 ELSE 5 p = p+7 6 ENDIF 7 IF (p>0) 8 … 9 ENDIF 1 p1= 0 2 IF (cond) 3 p2= p1+5 4 ELSE 5 p3 = p1+7 6 ENDIF p4 = γ(p2, p3, cond) 7 IF (p4>0) 8 … 9 ENDIF p1:0 5 7 p2 p3 0 0 p4 p4 = p1 + 5 + 0 p4 > p1 p4 > 0 p4 = p1 + 7 + 0 p4 > p1 p4 > 0

  8. The Value Evolution Graph 1 old = p0 3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0) 10 ENDDO p4 = η(p0, p1)

  9. Our Solution: The Value Evolution Graph 1 old = p0 3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2,A(i).GT.0) 10 ENDDO p4 = η(p0, p1) p1 1 0 p2 0 p3 VEG for the loop body • VEG: • acyclic graph, GSA names as nodes • one for each loop body/subprogram

  10. 0 old p0 0 0 p1 [0:old] p4 VEG for the outer context Our Solution: The Value Evolution Graph 1 old = p0 3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p, A(i).GT.0) 10 ENDDO p4 = η(p0, p1) • VEG: • acyclic graph, GSA names as nodes • one for each loop body/subprogram

  11. 0 old p0 0 0 p1 [0:old] p4 VEG for the outer context Our Solution: The Value Evolution Graph 1 old = p0 3 DO i = 1, old p1 = μ(p0, p3) 5 B(i) = 1 6 IF (A(i).GT.0) 7 p2 = p1+1 8 A(p2) = 0 9 ENDIF p3 = γ(p1, p2, A(i).GT.0) 10 ENDDO p4 = η(p0, p1) p1 1 0 p2 0 p3 VEG for the loop body • VEG: • acyclic graph, GSA names as nodes • one for each loop body/subprogram • hierarchical relations among VEGs

  12. p1 VEG Nodes p0 = 0 DO i = 1, N p1 = μ(p0, p4) IF (A(i).GT.0) p2 = p1+1 ELSE p3 = 0 ENDIF p4 = γ(p2, p3, A(i).GT.0) ENDDO p0 μ 1 Input p3:0 p2 Regular 0 0 p4 Back Input: result of assignment of loop invariant μ : merges value from outside with loop-back Back: last value in one iteration Regular: all others

  13. VEG Edges p1 p1 = … IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2, A(i).GT.0) (+1, .TRUE.) p2 (+0, A(i).LE.0) (+0, A(i).GT.0) p3

  14. VEG Distance p1 p1 = … IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2, A(i).GT.0) 1 p2 0 0 p3 distance(p1,p3) = [ ShortestPath(p1,p3) : LongestPath(p1,p3) ] distance(p1,p3) = [0:1]

  15. Recurrence Properties step(p2|i=k, p2|i=k+1) = distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1 p1 1 0 p2 0 Back Node μ-Node p3 p0 = 0 DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2) ENDDO

  16. Recurrence Properties step(p2|i=k, p2|i=k+1) = distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1 p1 1 0 p2 0 Back Node μ-Node p3 p0 = 0 DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2) ENDDO image(p2) i[1:N]= initial value(p1) + step(p1|i=k, p1|i=k+1) * [0:N–1] + distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N]

  17. Recurrence Properties step(p2|i=k, p2|i=k+1) = distance(p2, p3) + distance(p1, p2) = 0 + 1 = 1 p1 1 0 p2 0 Back Node μ-Node p3 p0 = 0 DO i = 1, N p1 = μ(p0, p3) IF (A(i).GT.0) p2 = p1+1 ENDIF p3 = γ(p1, p2) ENDDO image(p2) i[1:N]= initial value(p1) + step(p1|i=k, p1|i=k+1) * [0:N–1] + distance(p1, p2) = 0 + [0:1]*[0:N-1] + 1 = [1:N] last value(p1) i=N= initial value(p1) + step(p1|i=k, p1|i=k+1) * N = 0 + [0:1]*N = [0:N]

  18. q1 p1 1 1 0 p2 0 q2 p3 Recurrence Properties old = p0 q0 = 0 DO i = 1, old q1 = μ(q0, q2) p1 = μ(p0, p3) q2 = q1+1 B(q2) = 1 IF (A(i).GT.0) p2 = p1+1 A(p2) = 0 ENDIF p3 = γ(p1, p2) ENDDO No Closed Form Closed Form step(q2, q2) = 1  B(q2) independent step(p2, p2) = 1  A(p2) independent

  19. f1:0 f2:1 0 0 f4+0  [1:1] f3+0  [1:1] f3 2 f3+2  [1:1] f1 [1:1] f2 [1:1] f4 (0,c2) f1+2  [1:1] f2+2  [1:1] 0  [1:1] 1  [1:1] 0 2  [1:1] 3  [1:1] f5 ? (f5.EQ.1)  c2 Logic Inference on the VEG 1 f1 = 0 2 IF (c1) 3 f2 = 1 4 ENDIF 5 f3 = γ(f1,f2,c1) 6 IF (c2) 7 value = … 8 ELSE 9 f4 = f3+2 10 ENDIF 11 f5 = γ(f3,f4,c2) 12 IF (f5.EQ.1) 13 PRINT *, value 14 ENDIF f5 [1:1] Extract range: f5.EQ.1  f5  [1:1] Propagate value from 7 to 13 Trace range backwards: f5  [1:1]

  20. f3.EQ.1  cond p1 p1 p1 1 1 1 0 0 p2 p2 p2 0 0 0 f1:0 f2:1 p3 p3 p3 -1 -1 -1 0 0 0 p4 p4 p4 f3 0 0 0 p5 p5 p5 VEG before Pruning After GSA-Path Pruning After VEG-based GSA-Path Pruning [ Tu, Padua, ICS95 ] VEG Pruning f3.EQ.1  f3.GT.0 1 A(p1) = … 2 f1 = 0 3 IF (cond) 4 f2 = 1 5 p2 = p1+1 6 ENDIF p3 = γ(p1, p2, cond) f3 = γ(f1, f2, cond) 7 IF (f3.GT.0) 8 p4 = p3-1 9 ENDIF p5 = γ(p3, p4, f3.GT.0) 10 IF (f3.EQ.1) 11 … = A(p5) 12 ENDIF Is… = A(p5)covered byA(p1) = …? p5[p1-1:p1 +1] p5[p1-1:p1] p5 = p1

  21. Automatic Parallelization Framework [Rus, Rauchwerger, Hoeflinger 2002] PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis DATAFLOW Memory Classification Analysis

  22. Memory Classification Analysis [Hoeflinger 1998] • Memory reference set partition • Provides array dataflow/dependence information • Relies heavily on closed forms ReadOnly (A) = { 2 } WriteFirst (A) = { 3 } ReadWrite (A) = { 1 } A(3) = A(1) + A(2) A(1) = A(3) + A(2)

  23. WF :predwrite [p : p+lengthwrite] Recurrence :predstep { p = p + lengthstep } Memory Reference Sequences Stack push 1 DO i = 1, N 2 p = 0 3 DO j = 1, M 4 IF (…) 5 p = p+1 6 A(p) = … 7 ENDIF 8 ENDDO 9 DO j = 1, p 10 … = A(j) 11 ENDDO 12 ENDDO Contiguous: predstep predwrite, lengthstep lengthwrite P3M / PP_do100 Increasing: predstep predwrite, lengthstep lengthwrite Is Aprivatizable in the outer loop? Yes,contiguous write in inner loop Is the inner loop independent? Yes, increasing in inner loop Consecutive: predstep predwrite, lengthstep= lengthwrite

  24. Pushback Sequences Conditional Pushback DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIF ENDDO HYDRO2D WNFLE_do10

  25. Pushback Sequences Conditional Pushback, Stack lookup Conditional Pushback DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIF ENDDO old = p DO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIF ENDDO HYDRO2D WNFLE_do10 TRACK FPTRAK_do300

  26. Pushback Sequences Conditional Pushback, Stack lookup & update Conditional Pushback, Stack lookup Conditional Pushback DO i = 1, N IF (C(i).EQ.1) A(p) = … p = p+1 ENDIF ENDDO old = p DO i = 1, N next = p+1 same = 0 A(next) = … DO j = 1, old IF (A(j).EQ.A(next)) same = 1 ENDIF ENDDO IF (same.EQ.0) p = next ENDIF ENDDO old = p DO i = 1, N ifdata = p+1 DO k = 1, M A(p+1) = … DO j = ifdata, p IF (A(1,j).EQ.A(1,p+1)) A(2,j) = A(2,j)+A(2,p+1) same = 1 ENDIF ENDDO IF (same.EQ.0) p = p+1 ENDIF ENDDO ENDDO HYDRO2D WNFLE_do10 TRACK FPTRAK_do300 TRACK / EXTEND_do400

  27. Pushback Sequences • Detection • Consecutive WF • Parallelization • Accumulation to private storage • Simple copy-out to shared storage

  28. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis

  29. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis Partially aggregated descriptors are fed to VEG-based analysis

  30. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis Contiguous sequences lead to more accurate dataflow information

  31. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis More storage dependences eliminated by privatization

  32. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis Closer value ranges, increasing sequences  less false dependences

  33. Implementation in Polaris PARALLELIZATION Generation of Parallel Code Privatization Analysis Dependence Analysis VEG-based Analysis DATAFLOW Memory Classification Analysis Efficient pushback sequence parallelization

  34. Experimental Results Seq% = Sequential Time (loop) / Sequential Time (whole application)

  35. Pushbacks in PERFECT More in C and C++ codes !

  36. Related Work

  37. Conclusions Value Evolution Graph Memory Reference Analysis Comparison Array Dataflow Range Privatization Recurrences Dependence Analysis Logic Inferences Pushback Parallelization

  38. Sample VEGs EXTEND_do400 EXTEND_do300

More Related