An evaluation of auto scoping in openmp
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

An Evaluation of Auto-Scoping in OpenMP PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

An Evaluation of Auto-Scoping in OpenMP. Michael Voss, Eric Chiu, Patrick Chow, Catherine Wong and Kevin Yuen ECE Department University of Toronto. An Overview of Auto-scoping. Dieter an Mey proposed Auto-scoping as an extension to OpenMP ( www.cOMPunity.org )

Download Presentation

An Evaluation of Auto-Scoping in OpenMP

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An evaluation of auto scoping in openmp

An Evaluation of Auto-Scoping in OpenMP

Michael Voss, Eric Chiu, Patrick Chow, Catherine Wong and Kevin Yuen

ECE Department

University of Toronto


An overview of auto scoping

An Overview of Auto-scoping

  • Dieter an Mey proposed Auto-scoping as an extension to OpenMP (www.cOMPunity.org)

  • Relieve users from burden of explicit scoping

    • error prone

    • tedious

  • compromise: explicit and automatic parallelization

  • analysis is similar to automatic parallelization

    • successful in 1 of 2 scientific programs

WOMPAT 2004


Using default auto

C$OMP PARALLEL DO SHARED(A,B)

C$OMP&PRIVATE(I,J)

DO I = 1,100

DO J = 1,100

A(I,J) = A(J,I)

+ B(I,J)

ENDDO

ENDDO

C$OMP END PARALLEL DO

Using DEFAULT(AUTO)

C$OMP PARALLEL DO

C$OMP&DEFAULT(AUTO)

DO I = 1,100

DO J = 1,100

A(I,J) = A(J,I)

+ B(I,J)

ENDDO

ENDDO

C$OMP END PARALLEL DO

WOMPAT 2004


Outline of talk

Outline of Talk

  • Introduction

  • Implementing DEFAULT(AUTO) in Polaris

  • An evaluation of DEFAULT(AUTO) in Polaris

    • comparison with EA Sun Studio 9 F95 compiler

  • A Discussion of runtime support

  • Related Work

  • Conclusion

WOMPAT 2004


Implementing default auto in polaris

Implementing DEFAULT(AUTO) in Polaris

  • Polaris is auto-parallelizer for Fortran 77

  • Supports a range of advanced techniques

    • The Range Test

    • The Omega Test

    • Array and Scalar Privatization

    • Array and Scalar Reduction Recognition

    • Induction Variables Substitution

    • Interprocedural Constant Propagation

    • Most Interprocedural Optimization by Inlining

WOMPAT 2004


Polaris as an omp to omp translator

Polaris as an OMP to OMP Translator

Polaris

Parser

DDtest pass

Reduction pass

Privatization pass

OpenMP Backend

Fortran 77

Fortran 77 +

OpenMP

Polaris

Parser

Moerae Backend

Fortran 77 +

Moerae calls

Fortran 77 +

OpenMP

Original automatic parallelization path

OpenMP to explicitly threaded code path

New OpenMP to OpenMP path

WOMPAT 2004


Supporting default auto

Supporting DEFAULT(AUTO)

  • Parse DEFAULT(AUTO)

  • React appropriately to user directives

    • selective loop parallelization

    • no changes without AUTO directive

    • user scoping overrides Polaris scoping

      • can parallelize loops that cannot be fully auto-scoped

  • Limitations

    • only regions with PARALLEL DO semantics

    • bails out on general parallel regions

WOMPAT 2004


Example 1 no explicit scoping

Example 1: No explicit scoping

!$OMP PARALLEL DEFAULT(AUTO)

DO N = 1,7

DO M = 1,7

!$OMP DO

DO L = LSS(itsub),LEE(itsub)

I = IG(L)

J = JG(L)

K = KG(L)

LIJK = L2IJK(L)

RHS(L,M) = RHS(L,M)

+ - FJAC(LIJK,LM00,M,N)*DQCO(i-1,j,k,n,NB)*FM00(L)

+ - FJAC(LIJK,LP00,M,N)*DQCO(i+1,j,k,n,NB)*FP00(L)

+ - FJAC(LIJK,L0M0,M,N)*DQCO(i,j-1,k,n,NB)*F0M0(L)

+ - FJAC(LIJK,L0P0,M,N)*DQCO(i,j+1,k,n,NB)*F0P0(L)

ENDDO

!$OMP END DO NOWAIT

ENDDO

ENDDO

!$OMP END PARALLEL

WOMPAT 2004


Example 1 no explicit scoping1

Example 1: No explicit scoping

!$OMP PARALLEL

!$OMP+DEFAULT(SHARED)!$OMP+PRIVATE(M,L,N)

DO n = 1, 7, 1

DO m = 1,7, 1

!$OMP DO

DO l = lss(itsub), lee(itsub), 1

rhs(l, m) = rhs(l, m)+(-dqco(ig(l), (-1)+jg(l), kg(l), n, nb))*

*f0m0(l)*fjac(l2ijk(l), l0m0, m, n)+(-dqco(ig(l), 1+jg(l), kg(l), n

*, nb))*f0p0(l)*fjac(l2ijk(l), l0p0, m, n)+(-dqco((-1)+ig(l), jg(l)

*, kg(l), n, nb))*fjac(l2ijk(l), lm00, m, n)*fm00(l)+(-dqco(1+ig(l)

*, jg(l), kg(l), n, nb))*fjac(l2ijk(l), lp00, m, n)*fp00(l)

ENDDO

!$OMP END DO NOWAIT

ENDDO

ENDDO

!$OMP END PARALLEL

WOMPAT 2004


Example 2 explicit scoping

Example 2: Explicit scoping

SUBROUTINE RECURSION(n,k,a,b,c,d,e,f,g,h,s)

REAL*8 A(*),B(*),C(*),D(*),E(*),F(*),G(*),H(*)

REAL*8 T,S

INTEGER N,K,I

S = 0.0D0

C$OMP PARALLEL SHARED(D)

C$OMP+DEFAULT(AUTO)

C$OMP DO

DO I = 1,N

T = F(I) + G(I)

A(I) = B(I) + C(I)

D(I+K) = D(I) + E(I)

H(I) = H(I) * T

S = S + H(I)

END DO

C$OMP END DO

C$OMP END PARALLEL

END

WOMPAT 2004


Example 2 explicit scoping1

Example 2: Explicit scoping

SUBROUTINE recursion(n, k, a, b, c, d, e, f, g, h, s)

DOUBLE PRECISION a, b, c, d, e, f, g, h, s, t

INTEGER*4 i, k, n

DIMENSION a(*), b(*), c(*), d(*), e(*), f(*), g(*), h(*)

s = 0.0D0

!$OMP PARALLEL

!$OMP+DEFAULT(SHARED)

!$OMP+PRIVATE(T,I)

!$OMP DO

!$OMP+REDUCTION(+:s)

DO i = 1, n, 1

t = f(i)+g(i)

a(i) = b(i)+c(i)

d(i+k) = d(i)+e(i)

h(i) = h(i)*t

s = h(i)+s

ENDDO

!$OMP END DO

!$OMP END PARALLEL

RETURN

END

WOMPAT 2004


Evaluation of default auto

Evaluation of DEFAULT(AUTO)

  • Fortran 77 Benchmarks from SPEC OpenMP

    • removed all explicit scoping

    • added DEFAULT(AUTO) to all regions

    • used Omni OpenMP compiler as backend (-O2)

  • Explicit speedup –vs- auto-scope speedup

    • four processor Xeon server

    • 1.8 GHz processors, 16 GBytes main memory

    • Hyperthreaded, but only used 1 thread per CPU

  • Also used EA Sun Studio 9 Fortran 95 compiler

    • supports DEFAULT(__AUTO)

    • report number of regions auto-scoped

WOMPAT 2004


Performance of auto scoping

Performance of Auto-scoping

Sun results are for the Early Access Version of the Sun Microsystems Studio 9 Fortran 95 compiler.

WOMPAT 2004


Discussion

Discussion

  • Many regions were not fully analyzable

    • Polaris could not fully inline the regions

    • several regions were general parallel regions

  • Early Access Sun Studio 9 compiler

    • auto-scoped fewer regions in general

    • missed important regions in Swim and Mgrid

    • regions could be parallelized but not auto-scoped

  • Sun compiler could auto-scope some regions that Polaris could not

    • can analyze general parallel regions

WOMPAT 2004


A general parallel region from wupwise polaris fails but the sun compiler succeeds

A general parallel region from WupwisePolaris fails but the Sun compiler succeeds

C$OMP PARALLEL DEFAULT(AUTO)

LSCALE = ZERO

LSSQ = ONE

C$OMP DO

DO IX = 1, 1 + (N - 1) *INCX, INCX

IF (DBLE (X(IX)) .NE. ZERO) THEN

...

LSSQ = ONE + LSSQ* (LSCALE / TEMP) ** 2

LSCALE = TEMP

END IF

...

END DO

C$OMP END DO

C$OMP CRITICAL

IF (SCALE .LT. LSCALE) THEN

SSQ = ((SCALE / LSCALE) ** 2) * SSQ + LSSQ

SCALE = LSCALE

ELSE

SSQ = SSQ + ((LSCALE / SCALE) ** 2) * LSSQ

END IF

C$OMP END CRITICAL

C$OMP END PARALLEL

WOMPAT 2004


Runtime support for auto scoping

Runtime Support for Auto-scoping

  • add speculate directive for regions that cannot be auto-scoped

  • applies to very few regions in SPEC OpenMP

    • requires interprocedural marking of reads/writes

    • only 2 regions not auto-scoped can be fully analyzed

!$OMP PARALLEL

!$OMP+DEFAULT(SHARED)

!$OMP+PRIVATE(U51K,U41K,U31K,Q,U21K,M,K,I,U41,U31KM1,U51KM1,U21KM1)

!$OMP+PRIVATE(U41KM1,TMP,J)

!$OMP+SPECULATE(UTMP,RTMP)

!$OMP DO

!$OMP+LASTPRIVATE(FLUX2)

DO j = jst, jend, 1

...

ENDDO

!$OMP END DO

!$OMP END PARALLEL

(a region from the RHS subroutine of Applu)

WOMPAT 2004


Related work

Related Work

  • DEFAULT(AUTO) proposed by Dieter an Mey

  • Many commercial and research auto-parallelizers

    • Polaris, SUIF, CAPO, …

    • Perform parallelization and scoping

  • The EA Sun Studio 9 Fortran 95 Compiler

    • paper also here at WOMPAT

    • thanks to Yuan Lin for pointing me to it

  • Runtime dependence testing

    • Saltz, Rauchwerger, …

WOMPAT 2004


Conclusion

Conclusion

  • Implemented DEFAULT(AUTO) in Polaris

    • created full OpenMP to OpenMP translator

    • added facilities for auto-scoping

  • Evaluated implementation

    • 2 of 5 benchmarks fully auto-scoped

    • remainder showed significant loss of speedup

    • results different from EA Sun compiler

      • performance not portable across compilers

  • Discussed speculative parallelization support

WOMPAT 2004


Conclusion cont

Conclusion cont…

  • Combination of loop and region analyzer

    • Polaris auto-scoped more regions

    • Sun compiler can handle general regions

  • Performance not be portable across compilers

    • never is but…

    • sacrifice performance for convenience

    • perhaps a useful tool during manual parallelization

  • Future work

    • general region support in Polaris

WOMPAT 2004


  • Login