1 / 24

High Performance Fortran (HPF)

High Performance Fortran (HPF). Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995). Question. Can't we just have a clever compiler generate a parallel program from a sequential program? Fine-grained parallelism x = a*b + c*d Trivial parallelism

olympe
Download Presentation

High Performance Fortran (HPF)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)

  2. Question • Can't we just have a clever compiler generate a parallel program from a sequential program? • Fine-grained parallelism x = a*b + c*d • Trivial parallelism for i := 1 to 100 do for j := 1 to 100 do C [i, j] := dotproduct ( A [ i,*], B [*, j ]); od od

  3. Automatic parallelism Automatic parallelization of any program is extremely hard Solutions: • Make restrictions on source program • Restrict kind of parallelism used • Use semi-automatic approach • Use application-domain oriented languages

  4. High Performance Fortran (HPF) • Designed by a forum from industry, government, universities • Extends Fortran 90 • To be used for computationally expensive numerical applications • Portable to SIMD machines, vector processors, shared-memory MIMD and distributed-memory MIMD

  5. Fortran 90 - Base language of HPF Extends Fortran 77 with 'modern' features • abstract data types, modules • recursion • pointers, dynamic storage Array operators A = B + C A = A + 1.0 A(1:7) = B(1:7) + B(2:8) WHERE (X /= 0) X = 1.0/X

  6. Data parallelism • Data parallelism: same operation applied to different data elements in parallel • Data parallel program: sequence of data parallel operations • Overall approach: • Programmer does domain decomposition • Compiler partitions operations automatically • Data may be regular (array) orirregular (tree, sparse matrix) • Most data parallel languages only dealwith arrays

  7. Data parallelism - Concurrency Explicit parallel operations A = B + C ! A, B, and C are arrays Implicit parallelism do i = 1,m do j = 1,n A(i,j) = B(i,j) + C(i,j) enddo enddo

  8. Compiling data parallel programs • Programs are translated automatically into parallel SPMD (Single Program Multiple Data) programs • Each processor executes same program on subset of the data • Owner computes rule: - Each processor owns subset of the data structures - Operations required for an element are executed by the owner - Each processor may read (but not modify) other elements

  9. X Y Example real s, X(100), Y(100) ! s is scalar, X and Y are arrays X = X * 3.0 ! Multiply each X(i) by 3.0 do i = 2,99 Y(i) = (X(i-1) + X(i+1))/2 ! Communication required enddo s = SUM(X) ! Communication required X and Y are distributed (partitioned) s is replicated on each machine

  10. HPF primitives for data distribution • Directives: PROCESSORS: shape & size of abstract processors ALIGN: align elements of different arrays DISTRIBUTE: distribute (partition) an array • Directives affect performance of the program, not its result

  11. Processors directive !HPF$ PROCESSORS P(32) !HPF$ PROCESSORS Q(4,8) • Mapping of abstract to physical processors not specified in HPF (implementation-dependent)

  12. Alignment directive • Aligns an array with another array • Species that specific elements should be mapped to the same processor real A(50), B(50) !HPF$ ALIGN A(I) WITH B(I) ! A(1) on same cpu as B(1), etc !HPF$ ALIGN A(I) WITH B(I+2) ! A(1) on same cpu as B(3), etc

  13. Distribution directive • Species how elements should be partitioned among the local memories • Each dimension can be distributed as follows: * no distribution BLOCK (n) block distribution CYCLIC (n) cyclic distribution

  14. Figure 7.7 from Foster's book

  15. Example: Successive Over relaxation (SOR) Recall algorithm discussed in Introduction: float G[1:N, 1:M], Gnew[1:N, 1:M]; for (step = 0; step < NSTEPS; step++) for (i = 2; i < N; i++) /* update grid */ for (j = 2; j < M; j++) Gnew[i,j] = f(G[i,j], G[i-1,j], G[i+1,j],G[i,j-1], G[i,j+1]); G = Gnew;

  16. Parallel SOR with message passing float G[lb-1:ub+1, 1:M], Gnew[lb-1:ub+1, 1:M]; for (step = 0; step < NSTEPS; step++) SEND(cpuid-1, G[lb]); /* send 1st row left */ SEND(cpuid+1, G[ub]); /* send last row right */ RECEIVE(cpuid-1, G[lb-1]); /* receive from left */ RECEIVE(cpuid+1, G[ub+1]); /* receive from right */ for (i = lb; i <= ub; i++) /* update my rows */ for (j = 2; j < M; j++) Gnew[i,j] = f(G[i,j], G[i-1,j], G[i+1,j], G[i,j-1], G[i,j+1]); G = Gnew;

  17. Finite differencing (~ SOR) in HPF See Ian Foster, Program 7.2; uses convergence criterion instead of fixed number of steps program hpf_finite_difference !HPF$ PROCESSORS pr(4) ! use 4 CPUs real X(100, 100), New(100, 100) ! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK,*) ONTO pr ! row-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

  18. Changing the distribution Use block distribution instead of row distribution program hpf_finite_difference !HPF$ PROCESSORS pr(2,2) ! use 2x2 grid real X(100, 100), New(100, 100) ! data arrays !HPF$ ALIGN New(:,:) WITH X(:,:) !HPF$ DISTRIBUTE X(BLOCK, BLOCK) ONTO pr ! block-wise New(2:99, 2:99) = (X(1:98, 2:99) + X(3:100, 2:99) + X(2:99, 1:98) + X(2:99, 3:100))/4 diffmax = MAXVAL (ABS (New-X)) end

  19. Performance Distribution affects • Load balance • Amount of communication Example (communication costs): !HPF$ PROCESSORS pr(3) integer A(8), B(8), C(8) !HPF$ ALIGN B(:) WITH A(:) !HPF$ DISTRIBUTE A(BLOCK) ONTO pr !HPF$ DISTRIBUTE C(CYCLIC) ONTO pr

  20. Figure 7.9 from Foster's book

  21. Historical Evaluation • See : “The rise and fall of High Performance Fortran: an historical object lesson” by Ken Kennedy, Charles Koelbel, Hans Zima.In: Proceedings of the third ACM SIGPLAN conference on History of programming languages, June 2007 [Optional, obtainable from ACM Digital Library]

  22. Problems with HPF • Immature compiler technology • Upgrading to Fortran 90 was complicated • Implementing HPF extensions took much time • HPC community was impatient and started using MPI • Missing features: • Support for sparse array and other irregular data structures • Obtaining portable performance was difficult • Performance tuning was difficult

  23. Impact of HPF • Huge impact on parallel language design • Very frequently cited • Some impact on OpenMP (shared-memory standard) • Impact on programming systems for GPUs • New wave of High Productivity Computing Systems (HPCS) languages: Chapel (Cray), Fortress (Sun), X10 (IBM) • Used in extended form (HPF/JA) for Japanese Earth Simulator

  24. Conclusions • High-level model • User species data distribution • Compiler generates parallel program + communication • More restrictive than general message passing model (only data parallelism) • Restricted to array-based data structures • HPF programs will be easy to modify, enhances portability • Changing data distribution only requires changing directives

More Related