The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parall...
Download
1 / 20

Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min. Introduction. Motivation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min' - miyo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Lawrence Rauchwerger and David A. Padua

PLDI 1995

Presented by Seung-Jai Min


Introduction
Introduction with Privatization and Reduction Parallelization

  • Motivation

    : Current parallelizing compilers cannot handle complex or statically insufficiently defined access patterns. ( input dependent, run-time dependent conditions, subscripted subscripts, etc…)

  • LRPD Test

    - Speculatively executes the loop as a doall

    - applies a fully parallel data dependency test (x-iter.)

    - if the test fails, then the loop is re-executed serially


Inspector executor method
Inspector-Executor Method with Privatization and Reduction Parallelization

  • Inspector/Executor

    - extract and analyze the memory access pattern

    - transform the loop if necessary and execute

  • Disadvantage

    - cost and side effect : if the address computation of the array under test depends on the actual data computation.

    - parallel execution of the inspector loop is not always possible


Speculative run time parallelization
speculative run-time parallelization with Privatization and Reduction Parallelization

Compile time

Polaris

Static analysis

Run Time

Run-time transformations

Checkpoint

reorder

heuristic

Speculative parallel execution

fail

test

sequential execution

restore

pass


Hazards during the speculative execution
Hazards with Privatization and Reduction Parallelization(during the speculative execution)

  • Exceptions

    - invalidate the parallel execution

    - clear the exception flag, restore the values of any altered variables, and execute serially.

  • Cross-iteration dependencies in the loop

    - LRPD Test


Lpd test the lazy privatizing doall test
LPD Test with Privatization and Reduction Parallelization(The Lazy Privatizing doall Test)

1. Marking Phase

- For each shared array A[1:s]

- read, write and not-private shadow arrays,

Ar[1:s], Aw[1:s], and Anp[1:s]

(a) Uses : if this array element has not been modified,

then set corresponding elem. in Ar and Anp

(b) Defs : set corresp. elem. in Aw and clear in Ar if set.

(c) twi(A) : Count the total number of write accesses to A that are set in this iteration (i : iteration #)


Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

LPD Test with Privatization and Reduction Parallelization(The Lazy Privatizing doall Test)

2. Analysis Phase (Performed after the speculative exec.)

(a) Compute

(i) tw(A) = (twi(A))

(ii) tm(A) = sum(Aw[1:s])

(iii) tm(A) != tw(A) : cross iteration output depend.

(b) If any(Aw[:] & Ar[:]), then ends the phase.

: def and use values stored at the same location in different iterations (flow/anti dependency)


Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

LPD Test with Privatization and Reduction Parallelization(The Lazy Privatizing doall Test)

2. Analysis Phase (Performed after the speculative exec.)

(c) Else if tw(A) == tm(A), then the loop is doall

(without privatizing the array A)

(d) Else if any(Aw[:] & Anp[:]), then the array A is not privatizable.

(there is at least one iteration in which some element of A was used before modified)

(e) Otherwise, the loop was made into a doall by privatizing the shared array A.


Dynamic dead reference elimination
Dynamic dead reference elimination with Privatization and Reduction Parallelization

  • To avoid introducing false dependences, the marking of the read and private shadow arrays, Ar and Anp can be postponed until the value of the shared variable is actually used.

  • Definition : A dynamic dead read reference in a loop is a read access of a shared variable that does not contribute to the computation of any other shared variable which is live at loop end.

  • The “lazy” marking employed by the LPD test, i.e., the dynamic dead reference elimination tech., allows it to qualify more loops than the PD test.


Pd test

Do i=1, 5 with Privatization and Reduction Parallelization

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

PD Test

Do i=1, 5

markread(K(i))

z = A(K(i))

if (B1(i).eq..true.) then

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)


Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

PD Test with Privatization and Reduction Parallelization

Do i=1, 5

markread(K(i))

z = A(K(i))

if (B1(i).eq..true.) then

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)


Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

LPD Test with Privatization and Reduction Parallelization

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

markread(K(i))

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)


Run time reduction parallelization
Run-time Reduction Parallelization with Privatization and Reduction Parallelization

  • Recognition of reduction variable + Parallelizing reduction variable

  • Pattern matching identification

    - The DD test to qualify a statement as a reduction statement cannot be performed statically in the presence of input-dependent access patterns.

    - Syntactic pattern matching cannot identify all potential reduction variables (e.g. subscripted subscripts)


The lrpd test extending the lpd test for reduction validation
The LRPD Test : Extending the LPD Test for Reduction Validation

do i = 1, n

S1: A(K(i)) = ………

S2: ……… = A(L(i))

S3: A(R(i)) = A(R(i)) + exp()

enddo

doall i = 1, n

markwrite(K(i))

markredux(K(i))

S1: A(K(i)) = ………

markread(L(i))

markredux(L(i))

S2: ……… = A(L(i))

markwrite(R(i))

S3: A(R(i)) = A(R(i)) + exp()

enddo

(a) Source program

Anx : To check only that the

reduction variable is not

accessed outside the single

reduction statement.

(b) transformed program

markredux operation sets the shadow array element of Anx to true


Lrpd test
LRPD Test Validation

  • Modified Analysis Pass

    - 2(d’) Else if any(Aw[:] & Anp[:] & Anx[:]), then some elements of A written in the loop is neither a reduction variable nor privatizable. Thus, the loop is not a doall and the phase ends.

    - 2(e’) Otherwise, the loop was made into a doall by parallelizing reduction and privatization.


Performance 1
Performance (1) Validation


Performance 2
Performance (2) Validation



Other run time parallelization papers
Other Run-time Parallelization Papers Validation

  • “Techniques for Speculative Run-Time Parallelization of Loops”, Manish, Gupta and Rahul Nim, SC’98.

    - More efficient run-time array privatization

    - No rolling back of entire loop computation

    and complete the loop

    (by generating synchronization)

    - Early hazard detection


Lawrence rauchwerger and david a padua pldi 1995 presented by seung jai min

Other Run-time Parallelization Papers Validation

  • “Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors”, Ye Zhang, L., Rauchwerger, and Josep Torrellas. HPCA 1998.

  • - Run-time parallelization techniques are often computationally expensive and not general enough.

  • - Idea : execute the code in parallel speculatively and let extended cache coherence protocol hardware detect any dependence violations.

  • - Perf. 7.3 for 16 procs. & 50% faster than soft-only