1 / 53

Introduction to Compilers HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007

Introduction to Compilers HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007 . Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D. Agenda. Introduction Availability of compilers, GNU, Intel and IBM Compiler naming and default setting Memory management

murray
Download Presentation

Introduction to Compilers HPC Workshop – University of Kentucky May 9, 2007 – May 10, 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Compilers HPC Workshop – University of KentuckyMay 9, 2007 – May 10, 2007 Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D.

  2. Agenda • Introduction • Availability of compilers, GNU, Intel and IBM • Compiler naming and default setting • Memory management • 32 vs. 64-bit, Memory allocation • Compiler performance optimization • Profile • Optimization level 0-5 • -qhot • Target machine specification • Many others

  3. Quick Reference Page – Cheat Sheet • Which IBM Fortran compiler to use • Compiler options for performance • -O3 -qarch=pwr5 -qtune=pwr5 (use these at minimum) • -hot (High order Transformation) • -pg (profiling) • -qstrict (do not alter the semantics of a program) • -qipa (inter procedural analysis)

  4. XLF Fortran Documentation 1.Installation Guide - XL Fortran Enterprise Edition V10.1 for AIX 2. Getting Started - XL Fortran Enterprise Edition V10.1 for AIX 3. Language Reference - XL Fortran Enterprise Edition V10.1 for AIX 4. Compiler Reference - XL Fortran Enterprise Edition V10.1 for AIX 5. Optimization and Programming Guide - XL Fortran Enterprise Edition V10.1 for AIX 6. Readme File - XL Fortran Enterprise Edition V10.1 for AIX 7. Readme updates for XL Fortran Enterprise Edition V10.1 for AIX (and many similar volumes for C, C++, and Linux)

  5. Compiler Document on Your Systems The “man” page $ man xlf – this one works! $ man xlC – I did not find man page

  6. Compiler documentation on the Internet • www.software.ibm.com/ • Products A-Z • X • XLFORTRAN • XLC/C++ • Editions: • Linux on pSeries • AIX • Many IBM customers place IBM document on line and it’s often easier to find, • www.google.com

  7. IBM Compiler Names There are a lot more, including fort77, cc99_128, xlc128_r7…

  8. IBM Compiler Versions

  9. C Compiler Invocations Two C compilers: • C and C++ • C is a subset of C++

  10. Fortran Compiler Invocations One Fortran compiler, multiple invocations.

  11. Finding your compiler and path • r78n06:/u/komornic:516> . setup.compilers (just source this script) Using Visual Age C/C++ Version: 8.0 path to vacpp: /opt/ibmcmp/vacpp/8.0/bin Using XL Fortran Version: 10.1 path to xlf Fortran: /opt/ibmcmp/xlf/10.1/bin

  12. program hello print *, ‘Hello, World’ end xlf_r and mpxlf Example: Hello, World % xlf_r hello.f –l hello <<< using xlf_r % hello Hello, World % mpxlf hello.f –l hello <<<< using mpxlf % hello ERROR: 0031-808 Hostfile or pool must be used to request nodes % hello –procs 4 –hostfile hostfile Hello, World Hello, World Hello, World Hello, World mpxlf will enable the binary to run in SPMD mode across multiple CPUs

  13. Environment Variables • LANG=en_US • NLSPATH=/usr/lib/nls/msg/%L/%N:/usr/lib/nls/msg/%L/%N.cat • For AIX • Libxlf90.a should be at /usr/lib or set the path: • LIBPATH = /my_xlf90_lib_path:/usr/lib • Fox Linux, the path should be • LD_LIBRARY_PATH=/usr/lib • You may also need LD_RUN_PATH= runtime library search path

  14. xlf Version 10.1 • Traditional allowable extensions: • .f • .F (will pass through cpp before compiling) • New allowable extensions: • .f77 • .f90 • .f95

  15. Shifting to a Next Topic • 32-bit, 64-bit and memory management

  16. Address Mode: -q{32,64} • Available application modes: • -q32 (Default) • -q64 • Also: environment variable OBJECT_MODE • export OBJECT_MODE={32,64} • Cannot mix -q32 objects with -q64 objects • Be aware of AIX kernel modes: • 32-bit • 64-bit • Application address mode is independent of AIX kernel mode

  17. One more thing about 64-bit… • If you use –q64: • Your job can use lots more memory than –q32 • INTEGER*8 or long long operations are faster • If you use –q32: • You may run approximately (~10%) percent faster • Fewer bytes are used storing and moving pointers • You will have to learn AIX link options –bmaxdata • -bmaxdata:0x10000000 = 256 Mbyte = default • -bmaxdata:0x80000000 = 2 Gbyte • -bmaxdata:0xC0000000 = not widely publicized trick to use more than 2 Gbyte with –q32 “C” is the maximum • -q64 • –bmaxdata:0 = default = unlimited • Other –bmaxdata values will be enforced if set

  18. Even more on 64-bit...(because it is so often confused) • 64-bit floating point representation is higher precision • Fortran: REAL*8, DOUBLE PRECISION • C/C++: double • You can use 64-bit floating point with –q32 or –q64 • 64-bit addressing is totally different. It refers to how many bits are used to store memory addresses and ultimately how much memory one can access. • Compile and link with –q64 • Use file a.out myobj.o to query addressing mode • The AIX kernel can be either a build that uses 32-bit addressing for kernel operations or uses 64-bit addressing, but that does not affect an application’s addressibilty. • ls –l /unix to find out which kernel is used • Certain system limits depend on kernel chosen

  19. Suggested Fortran Compiler Usage xlf90_r –q64 • Fortran 90 is the most portable standard • Consistent storage • Reentrant code (..._r) • Required for: • phreads • Many other programming utilities • 64-bit addressing: • Memory management

  20. Suggested C Compiler Usage xlc_r –q64 • Reentrant code (..._r). • Required for: • phreads • Many other programming utilities • 64-bit addressing: • Better memory management

  21. Address Modes • ILP32 • Integers, Long integers and Pointers are 32 bits • LP64 • Long integers and Pointers are 64 bits • Standard C and C++ relationship: • sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

  22. C and C++ Data Type Sizes long and pointer change size with –q{32,64}

  23. Fortran Data Type Sizes pointers change size with –q{32,64}

  24. Fortran Data Type Sizes

  25. Fortran Data Type Sizes

  26. Memory Management -bmaxdata: extend addressability to 2 GB in 32-bit mode. e.g. “-bmaxdata:0x80000000” (0x70000000 for MPI) -bmaxstack: similar to –bmaxdata but for stack

  27. ALLOCATE and malloc Arrays • Allocation occurs at statement execution • Heap operation • Inexpensive • Limited size • Maximum size specification (for 32-bit only): • $(LDR) ...-bmaxdata:0x80000000 Subroutine sub(n) Integer, allocatable, Dimension(:) :: A … Allocate(A(n)) … Deallocate(A) end void my_proc() { long *A … A = (long *) malloc(n*sizeof(long)); … free(A); }

  28. Dynamic and Automatic Arrays • Memory allocation occurs at subroutine entry • Stack operation • Inexpensive • Limited size • Maximum size specification: $(LDR) ...-bmaxstack:256000000 void sub(int n) { long A[n]; …. A[i] = … … return(0); } Subroutine sub(n) integer A(n) …. A(i) = … … return end

  29. Dynamic and Automatic Arrays Note: Extra concern with pthreads or OpenMP Default SLAVE stack is only 4 Mbyte. Use XLSMPOPTS=stack=…

  30. Memory Allocation: Summary • Programming advice: Fortran ALLOCATE C malloc

  31. Shifting to Next Topic • Compiler options

  32. Fortran Compiler Options – 10 categories

  33. Summary: Commonly Used Options • -q32, -q64 • -O0, -O2,-O3,-O4,-O5 • Large/medium memory page set up • -qmaxmem=-1(allow max mem for compiling) • -qarch=,-qtune= • -hot (High order Transformation) • -g (debugging) • -p, -pg (profiling) • -qstrict (no alter the semantics of a program) • -qstatic • -qipa (inter procedural analysis) • -qieee • -qlist (assembly lang report) • -qsmp • -qreport(smp list when –qsmp also used)

  34. Example • xlf95 -p needs_tuning.f • a.out • mon.out created • prof • Example • xlf95 -pg needs_tuning.f • a.out • gmon.out created • gprof a.out gmon.out • xprofiler a.out gmon.out Profiling Your Code • Compile the code with –p (or –pg) compiler will set up the object file for profile (or graph profile) • Execute the program. A mon.out (or gmon.out) file will be created • Use prof (or gprof) command to generate a profile • Or xprofiler a.out gmon.out (if you can open xwindow)

  35. Large Pages Option -qlargepage • This option enables large page usage • It instructs the compiler to exploit large page heaps available on POWER4 and POWER5 systems • HINT to the compiler: • Heap data will be allocated from large page pool • Actual control is from loader option –blpdata or LDR_CNTRL=LARGE_PAGE_DATA • Compiler may divert large data from the stack to the heap • Compiler may bias optimization of heap or static data references

  36. Compiler Optimizations for Performance Optimization • 5 levels of optimization • No specification: same as –qnoopt, -O0 • -O0: no optimization, same as –qnoopt. Eq. to default • -O2, -O3, -O4, -O5 • Target machine specification • -qhot: (High Order Transformation) • -qipa: (Inter Procedural Analysis) • Many others • INLINE • UNROLL • etc

  37. Optimization Levels Strongly recommended -O O2 Low level opt. Not enough for performance -O3 Extensive opt. May change semantics -O4 Aggressive opt. -qipa -qhot -O0 (not specified) (-qnoopt) -O5 ipa=level2 Default

  38. Optimization Level 0, 2

  39. Optimization Level 2 - 5 Strongly recommended for performance * Use –qstrict to ensure semantics is unchanged

  40. Optimization Level Hierarchy * Limit memory to be used by compiler for optimization

  41. Effect of –O2 vs. –O3 • Wider optimization scope • Replaces divide with reciprocal • Unrolls inner loops • Precision tradeoffs • Not strictly IEEE floating point rules rs=1/s for (i=0;i<n;i=i+1) {b[i] = a[i]*rs b[i+1] = a[i+1]*rs} for (i=0;i<n;i++) b[i] = a[i]/s -O3

  42. Effect of –O2 vs –O3 • Replaces divide with reciprocal • Unrolls inner loops • "Regular" code runs well with -O2 1.3 GHz POWER4

  43. -qessl Option • -qessl allows the use of the ESSL routines in place of Fortran 90 intrinsic procedures. This does not link in all other essl libraries Rules • Be sure to add –lessl (or –lesslsmp) to link command • Use thread safe version of compiler xlf_r, xlf90_r or xlf95_r, since libessl.so and libesslsmp.so have a dependency on libxlf90_r.so, or specify –lxlf90_r on link command line • Example: c=MATMUL(a,b) may use ESSL routines • ESSL libraries are not shipped with the XLF compiler

  44. High Order Transformations (HOT) • -qhot [=[no]vector | arraypad[=n]] • Transformation of loop nests • Hardware prefetch • Balance loop computation • Vector intrinsic library • Included at -O4 and higher level optimization

  45. -qhot Transformation: Merge do i=1,n A(i) = A(i) + B(i)*s end do do i=1,n C(i) = C(i) + A(i)*s end do do i=1,n A(i) = A(i) + B(i)*s C(i) = C(i) + A(i)*s end do -qhot 1.3 GHz POWER4

  46. -qhot Transformation: Loop Interchange do i=1,m do j=1,n sum = sum + X(i)*A(i,j) end do end do do j=1,n do i=1,m sum = sum + X(i)*A(i,j) end do end do -qhot 1.3 GHz POWER4

  47. -qhot Transformation: Vectorization • Extract intrinsic function, compute in batches • Lower latency • Better register utilization • Pipelined

  48. Vectorization Example SUBROUTINE VD(A,B,C,N) REAL*8 A(N),B(N),C(N) DO I = 1, N A(I) = C(I) / SQRT(B(I)) END DO END

  49. -qhot Transformation: Vectorization do i=1,n A(i) = exp(B(i)) end do CALL __vexp(a ,b,n) -qhot 1.3 GHz POWER4

  50. Target Machine Specification -qarch • -qarch=[com,auto,ppc,pwr3,pwr4,pwr5,...] • Generate a subset of the Power instruction set

More Related