1 / 6

TAU MPI Wrapper: Simplified Profiling and Instrumentation for MPI Applications

The TAU MPI Wrapper Interposition Library utilizes the standard MPI Profiling Interface to seamlessly integrate TAU's performance instrumentation into MPI applications. By interposing the TAU MPI wrapper library, users can generate performance data without altering the source code—merely relink the application using the modified MPI libraries. This innovative approach ensures weak bindings and adheres to environment configurations, allowing automatic instrumentation of compiled Fortran and C/C++ programs. Ensure optimal performance without recompilation by leveraging TAU's robust tooling.

osman
Download Presentation

TAU MPI Wrapper: Simplified Profiling and Instrumentation for MPI Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! • Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi -[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers

  2. Runtime MPI Shared Library Instrumentation • We can now interpose the MPI wrapper library for applications that have already been compiled • No re-compilation or re-linking necessary! • Uses LD_PRELOAD for Linux • On AIX, TAU uses MPI_EUILIB / MPI_EUILIBPATH • Simply compile TAU with MPI support and prefix your MPI program with tauex % mpirun -np 4 tauex a.out • Requires shared library MPI - does not work on XT3 • Approach will work with other shared libraries

  3. TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi-[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers

  4. Automatic Instrumentation • We now provide compiler wrapper scripts • Simply replace mpxlf90 with tau_f90.sh • Automatically instruments Fortran source code, links with TAU MPI Wrapper libraries. • Use tau_cc.sh and tau_cxx.sh for C/C++ Before CXX = mpCC F90 = mpxlf90_r CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $< After CXX = tau_cxx.sh F90 = tau_f90.sh CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $<

  5. I/O notes • Application file I/O performance often highly variable • depends on load on shared filesystem/network resources • and application/system configuration at time of measurement • tuning requires very careful extensive benchmarking • worst care performance very different from typical case • current tools don't deal well with this • Optimal I/O is no I/O! • preferable to eliminate non-essential I/O during measurement • configure tools to avoid intermediate measurement I/O (e.g., trace buffer flushes) where appropriate • configure measurement or analysis to exclude I/O phases • typically part of one-off application initialization/finalization cost which would be amortized in long production execution

More Related