60 likes | 182 Views
The TAU MPI Wrapper Interposition Library utilizes the standard MPI Profiling Interface to seamlessly integrate TAU's performance instrumentation into MPI applications. By interposing the TAU MPI wrapper library, users can generate performance data without altering the source code—merely relink the application using the modified MPI libraries. This innovative approach ensures weak bindings and adheres to environment configurations, allowing automatic instrumentation of compiled Fortran and C/C++ programs. Ensure optimal performance without recompilation by leveraging TAU's robust tooling.
E N D
TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! • Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi -[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers
Runtime MPI Shared Library Instrumentation • We can now interpose the MPI wrapper library for applications that have already been compiled • No re-compilation or re-linking necessary! • Uses LD_PRELOAD for Linux • On AIX, TAU uses MPI_EUILIB / MPI_EUILIBPATH • Simply compile TAU with MPI support and prefix your MPI program with tauex % mpirun -np 4 tauex a.out • Requires shared library MPI - does not work on XT3 • Approach will work with other shared libraries
TAU’s MPI Wrapper Interposition Library • Uses standard MPI Profiling Interface • Provides name shifted interface • MPI_Send = PMPI_Send • Weak bindings • Interpose TAU’s MPI wrapper library between MPI and TAU • -lmpi replaced by –lTauMpi –lpmpi –lmpi • No change to the source code! Just re-link the application to generate performance data • setenv TAU_MAKEFILE <dir>/<arch>/lib/Makefile.tau-mpi-[options] • Use tau_cxx.sh, tau_f90.sh and tau_cc.sh as compilers
Automatic Instrumentation • We now provide compiler wrapper scripts • Simply replace mpxlf90 with tau_f90.sh • Automatically instruments Fortran source code, links with TAU MPI Wrapper libraries. • Use tau_cc.sh and tau_cxx.sh for C/C++ Before CXX = mpCC F90 = mpxlf90_r CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $< After CXX = tau_cxx.sh F90 = tau_f90.sh CFLAGS = LIBS =-lm OBJS = f1.o f2.o f3.o … fn.o app: $(OBJS) $(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS) .cpp.o: $(CC) $(CFLAGS) -c $<
I/O notes • Application file I/O performance often highly variable • depends on load on shared filesystem/network resources • and application/system configuration at time of measurement • tuning requires very careful extensive benchmarking • worst care performance very different from typical case • current tools don't deal well with this • Optimal I/O is no I/O! • preferable to eliminate non-essential I/O during measurement • configure tools to avoid intermediate measurement I/O (e.g., trace buffer flushes) where appropriate • configure measurement or analysis to exclude I/O phases • typically part of one-off application initialization/finalization cost which would be amortized in long production execution