slide1
Download
Skip this Video
Download Presentation
Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min

Loading in 2 Seconds...

play fullscreen
1 / 20

Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min. Introduction. Motivation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Lawrence Rauchwerger and David A. Padua PLDI 1995 Presented by Seung-Jai Min' - miyo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

Lawrence Rauchwerger and David A. Padua

PLDI 1995

Presented by Seung-Jai Min

introduction
Introduction
  • Motivation

: Current parallelizing compilers cannot handle complex or statically insufficiently defined access patterns. ( input dependent, run-time dependent conditions, subscripted subscripts, etc…)

  • LRPD Test

- Speculatively executes the loop as a doall

- applies a fully parallel data dependency test (x-iter.)

- if the test fails, then the loop is re-executed serially

inspector executor method
Inspector-Executor Method
  • Inspector/Executor

- extract and analyze the memory access pattern

- transform the loop if necessary and execute

  • Disadvantage

- cost and side effect : if the address computation of the array under test depends on the actual data computation.

- parallel execution of the inspector loop is not always possible

speculative run time parallelization
speculative run-time parallelization

Compile time

Polaris

Static analysis

Run Time

Run-time transformations

Checkpoint

reorder

heuristic

Speculative parallel execution

fail

test

sequential execution

restore

pass

hazards during the speculative execution
Hazards(during the speculative execution)
  • Exceptions

- invalidate the parallel execution

- clear the exception flag, restore the values of any altered variables, and execute serially.

  • Cross-iteration dependencies in the loop

- LRPD Test

lpd test the lazy privatizing doall test
LPD Test(The Lazy Privatizing doall Test)

1. Marking Phase

- For each shared array A[1:s]

- read, write and not-private shadow arrays,

Ar[1:s], Aw[1:s], and Anp[1:s]

(a) Uses : if this array element has not been modified,

then set corresponding elem. in Ar and Anp

(b) Defs : set corresp. elem. in Aw and clear in Ar if set.

(c) twi(A) : Count the total number of write accesses to A that are set in this iteration (i : iteration #)

slide7

LPD Test(The Lazy Privatizing doall Test)

2. Analysis Phase (Performed after the speculative exec.)

(a) Compute

(i) tw(A) = (twi(A))

(ii) tm(A) = sum(Aw[1:s])

(iii) tm(A) != tw(A) : cross iteration output depend.

(b) If any(Aw[:] & Ar[:]), then ends the phase.

: def and use values stored at the same location in different iterations (flow/anti dependency)

slide8

LPD Test(The Lazy Privatizing doall Test)

2. Analysis Phase (Performed after the speculative exec.)

(c) Else if tw(A) == tm(A), then the loop is doall

(without privatizing the array A)

(d) Else if any(Aw[:] & Anp[:]), then the array A is not privatizable.

(there is at least one iteration in which some element of A was used before modified)

(e) Otherwise, the loop was made into a doall by privatizing the shared array A.

dynamic dead reference elimination
Dynamic dead reference elimination
  • To avoid introducing false dependences, the marking of the read and private shadow arrays, Ar and Anp can be postponed until the value of the shared variable is actually used.
  • Definition : A dynamic dead read reference in a loop is a read access of a shared variable that does not contribute to the computation of any other shared variable which is live at loop end.
  • The “lazy” marking employed by the LPD test, i.e., the dynamic dead reference elimination tech., allows it to qualify more loops than the PD test.
pd test
Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

PD Test

Do i=1, 5

markread(K(i))

z = A(K(i))

if (B1(i).eq..true.) then

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)

slide11

PD Test

Do i=1, 5

markread(K(i))

z = A(K(i))

if (B1(i).eq..true.) then

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)

slide12

LPD Test

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

markread(K(i))

markwrite(L(i))

A(L(i)) = z + C(i)

endif

enddo

Do i=1, 5

z = A(K(i))

if (B1(i).eq..true.) then

A(L(i)) = z + C(i)

endif

enddo

B1(1:5) = (1 0 1 0 1)

K(1:5) = (1 2 3 4 1)

L(1:5) = (2 2 4 4 2)

run time reduction parallelization
Run-time Reduction Parallelization
  • Recognition of reduction variable + Parallelizing reduction variable
  • Pattern matching identification

- The DD test to qualify a statement as a reduction statement cannot be performed statically in the presence of input-dependent access patterns.

- Syntactic pattern matching cannot identify all potential reduction variables (e.g. subscripted subscripts)

the lrpd test extending the lpd test for reduction validation
The LRPD Test : Extending the LPD Test for Reduction Validation

do i = 1, n

S1: A(K(i)) = ………

S2: ……… = A(L(i))

S3: A(R(i)) = A(R(i)) + exp()

enddo

doall i = 1, n

markwrite(K(i))

markredux(K(i))

S1: A(K(i)) = ………

markread(L(i))

markredux(L(i))

S2: ……… = A(L(i))

markwrite(R(i))

S3: A(R(i)) = A(R(i)) + exp()

enddo

(a) Source program

Anx : To check only that the

reduction variable is not

accessed outside the single

reduction statement.

(b) transformed program

markredux operation sets the shadow array element of Anx to true

lrpd test
LRPD Test
  • Modified Analysis Pass

- 2(d’) Else if any(Aw[:] & Anp[:] & Anx[:]), then some elements of A written in the loop is neither a reduction variable nor privatizable. Thus, the loop is not a doall and the phase ends.

- 2(e’) Otherwise, the loop was made into a doall by parallelizing reduction and privatization.

other run time parallelization papers
Other Run-time Parallelization Papers
  • “Techniques for Speculative Run-Time Parallelization of Loops”, Manish, Gupta and Rahul Nim, SC’98.

- More efficient run-time array privatization

- No rolling back of entire loop computation

and complete the loop

(by generating synchronization)

- Early hazard detection

slide20

Other Run-time Parallelization Papers

  • “Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors”, Ye Zhang, L., Rauchwerger, and Josep Torrellas. HPCA 1998.
  • - Run-time parallelization techniques are often computationally expensive and not general enough.
  • - Idea : execute the code in parallel speculatively and let extended cache coherence protocol hardware detect any dependence violations.
  • - Perf. 7.3 for 16 procs. & 50% faster than soft-only
ad