1 / 20

Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min. Introduction. Motivation

miyo
Download Presentation

Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min

  2. Introduction • Motivation : Current parallelizing compilers cannot handle complex or statically insufficiently defined access patterns. ( input dependent, run-time dependent conditions, subscripted subscripts, etc…) • LRPD Test - Speculatively executes the loop as a doall - applies a fully parallel data dependency test (x-iter.) - if the test fails, then the loop is re-executed serially

  3. Inspector-Executor Method • Inspector/Executor - extract and analyze the memory access pattern - transform the loop if necessary and execute • Disadvantage - cost and side effect : if the address computation of the array under test depends on the actual data computation. - parallel execution of the inspector loop is not always possible

  4. speculative run-time parallelization Compile time Polaris Static analysis Run Time Run-time transformations Checkpoint reorder heuristic Speculative parallel execution fail test sequential execution restore pass

  5. Hazards(during the speculative execution) • Exceptions - invalidate the parallel execution - clear the exception flag, restore the values of any altered variables, and execute serially. • Cross-iteration dependencies in the loop - LRPD Test

  6. LPD Test(The Lazy Privatizing doall Test) 1. Marking Phase - For each shared array A[1:s] - read, write and not-private shadow arrays, Ar[1:s], Aw[1:s], and Anp[1:s] (a) Uses : if this array element has not been modified, then set corresponding elem. in Ar and Anp (b) Defs : set corresp. elem. in Aw and clear in Ar if set. (c) twi(A) : Count the total number of write accesses to A that are set in this iteration (i : iteration #)

  7. LPD Test(The Lazy Privatizing doall Test) 2. Analysis Phase (Performed after the speculative exec.) (a) Compute (i) tw(A) = (twi(A)) (ii) tm(A) = sum(Aw[1:s]) (iii) tm(A) != tw(A) : cross iteration output depend. (b) If any(Aw[:] & Ar[:]), then ends the phase. : def and use values stored at the same location in different iterations (flow/anti dependency)

  8. LPD Test(The Lazy Privatizing doall Test) 2. Analysis Phase (Performed after the speculative exec.) (c) Else if tw(A) == tm(A), then the loop is doall (without privatizing the array A) (d) Else if any(Aw[:] & Anp[:]), then the array A is not privatizable. (there is at least one iteration in which some element of A was used before modified) (e) Otherwise, the loop was made into a doall by privatizing the shared array A.

  9. Dynamic dead reference elimination • To avoid introducing false dependences, the marking of the read and private shadow arrays, Ar and Anp can be postponed until the value of the shared variable is actually used. • Definition : A dynamic dead read reference in a loop is a read access of a shared variable that does not contribute to the computation of any other shared variable which is live at loop end. • The “lazy” marking employed by the LPD test, i.e., the dynamic dead reference elimination tech., allows it to qualify more loops than the PD test.

  10. Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo PD Test Do i=1, 5 markread(K(i)) z = A(K(i)) if (B1(i).eq..true.) then markwrite(L(i)) A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

  11. PD Test Do i=1, 5 markread(K(i)) z = A(K(i)) if (B1(i).eq..true.) then markwrite(L(i)) A(L(i)) = z + C(i) endif enddo Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

  12. LPD Test Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then markread(K(i)) markwrite(L(i)) A(L(i)) = z + C(i) endif enddo Do i=1, 5 z = A(K(i)) if (B1(i).eq..true.) then A(L(i)) = z + C(i) endif enddo B1(1:5) = (1 0 1 0 1) K(1:5) = (1 2 3 4 1) L(1:5) = (2 2 4 4 2)

  13. Run-time Reduction Parallelization • Recognition of reduction variable + Parallelizing reduction variable • Pattern matching identification - The DD test to qualify a statement as a reduction statement cannot be performed statically in the presence of input-dependent access patterns. - Syntactic pattern matching cannot identify all potential reduction variables (e.g. subscripted subscripts)

  14. The LRPD Test : Extending the LPD Test for Reduction Validation do i = 1, n S1: A(K(i)) = ……… S2: ……… = A(L(i)) S3: A(R(i)) = A(R(i)) + exp() enddo doall i = 1, n markwrite(K(i)) markredux(K(i)) S1: A(K(i)) = ……… markread(L(i)) markredux(L(i)) S2: ……… = A(L(i)) markwrite(R(i)) S3: A(R(i)) = A(R(i)) + exp() enddo (a) Source program Anx : To check only that the reduction variable is not accessed outside the single reduction statement. (b) transformed program markredux operation sets the shadow array element of Anx to true

  15. LRPD Test • Modified Analysis Pass - 2(d’) Else if any(Aw[:] & Anp[:] & Anx[:]), then some elements of A written in the loop is neither a reduction variable nor privatizable. Thus, the loop is not a doall and the phase ends. - 2(e’) Otherwise, the loop was made into a doall by parallelizing reduction and privatization.

  16. Performance (1)

  17. Performance (2)

  18. Experimental Results Summary

  19. Other Run-time Parallelization Papers • “Techniques for Speculative Run-Time Parallelization of Loops”, Manish, Gupta and Rahul Nim, SC’98. - More efficient run-time array privatization - No rolling back of entire loop computation and complete the loop (by generating synchronization) - Early hazard detection

  20. Other Run-time Parallelization Papers • “Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors”, Ye Zhang, L., Rauchwerger, and Josep Torrellas. HPCA 1998. • - Run-time parallelization techniques are often computationally expensive and not general enough. • - Idea : execute the code in parallel speculatively and let extended cache coherence protocol hardware detect any dependence violations. • - Perf. 7.3 for 16 procs. & 50% faster than soft-only

More Related