Totalview
Download
1 / 18

TotalView - PowerPoint PPT Presentation


  • 199 Views
  • Updated On :

TotalView. Strategies for debugging hybrid codes with TotalView on the IBM clusters. Overview of this presentation. Print statements are the easiest form of debugging TotalView functionality and limitations Settings needed to run a job on an IBM cluster (whether you use TotalView or not)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'TotalView' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Totalview l.jpg

TotalView

Strategies for debugging hybrid codes with TotalView on the IBM clusters


Overview of this presentation l.jpg
Overview of this presentation

  • Print statements are the easiest form of debugging

  • TotalView functionality and limitations

  • Settings needed to run a job on an IBM cluster (whether you use TotalView or not)

  • Demonstration of a simple TotalView session

  • Run the CCSM under TotalView


Print statement debugging l.jpg
Print statement debugging

  • Simple to learn when errors are easy to find

  • A repeating postmortem operation

  • Only works when you place the print statements in the right places

  • Uses system resources each time a job re-runs

  • No flexibility to explore other parts of the code while job is running


Totalview functionality l.jpg
TotalView functionality

  • Interactive debugger

  • Debugs serial, threaded, and message-passing codes, and any combination of these

  • Supports source debugging in Fortran 77, Fortran 90-95, C, and C++

  • Interoperates with IBM's Parallel Operating Environment (POE), and with OpenMP/Pthreads


Debugging with totalview l.jpg
Debugging with TotalView

  • You can watch variables as the code runs and interactively refine your investigation

  • You can calculate derived quantities based on data calculated during the run

  • You can interactively stop the run when you see abnormal events occur

  • You can set conditions that will stop the run; this helps uncover trouble spots and shortens the time needed to discover the problem


Totalview limitations l.jpg
TotalView limitations

  • Its use is limited to Loadleveler's interactive pool (also called the interactive class or queue)

  • It has a maximum number of parallel processes that it can run as a job in the interactive pool, (not counting threads in a process):

    • bluesky: 96 processes

    • blackforest: 26 processes

    • babyblue: 48 processes

  • It is an X application and requires a high-bandwidth network connection


Building applications for debugging sessions l.jpg
Building applications for debugging sessions

Include the following compiler parameters on the command line or in the FFLAGS/CFLAGS statement in the makefile:

-g (generates symbol table)

-OxUse the minimum optimization that reproduces the problem

-qsmp=omp:noopt (if threads are used, don't optimize them)

-qfullpath (generates full path representations to sources, objects, and executables in the symbol table so that TotalView can find the sources)


Poe runtime resource parameters l.jpg
POE runtime resource parameters

POE parameters needed before using TotalView:

  • Resource pool

    MP_RMPOOL=1 (necessary)

  • stdout/stderr organizationMP_INFOLEVEL=3 (useful)

    MP_LABELIO= yes (useful)

    MP_STDOUTMODE=ordered (can be useful)

  • Location of executables

    MP_PGMMODEL=[spmd/mpmd] (spmd is the default)

    MP_CMDFILE=cmdfile (if mpmd, then this is necessary)


Poe runtime resource parameters9 l.jpg
POE runtime resource parameters

Node resourcesSpecify two of these three:MP_NODES MP_PROCS MP_TASKS_PER_NODE

Or specify a POE-submitted LoadLeveler script, for example task geometryMP_LLFILE=llfile

Note: Some things cannot be set with POE parameters, so they must be set using LoadLeveler. MP_LLFILE allows POE to access LoadLeveler.


Poe runtime resource parameters10 l.jpg
POE runtime resource parameters

Communication path and node usage

  • On node:

    MP_SHARED_MEMORY=yes

    MP_CPU_USE=multiple

  • Off node:

    MP_EUIDEVICE=csss

    MP_ADAPTER_USE=shared

    MP_EUILIB=ip


Demo basic totalview skills l.jpg
Demo: Basic TotalView skills

  • An MPI-OMP demonstration code

  • Understanding what TotalView shows you

    • Root Window

    • Program Window

      • Source code

      • Stack trace

      • Stack frame

      • Allows you to view processes and threads to see their asynchronous behavior


Demo establishing action points l.jpg
Demo: Establishing Action Points

Using the source window to establish:

  • Breakpoints and global barriers

  • Watch points

  • Evaluation points

  • Evaluation at any action point


Demo diving into subprograms l.jpg
Demo: Diving into subprograms

  • How to "dive" in the Frame Stack and the Program Source subwindows

  • How to find variables and data structures in the Frame Stack subwindow


Demo exploring threads and processes with the program window l.jpg
Demo: Exploring threads and processes with theProgram Window

  • Tabbing through processes and threads withP and T tabs

  • Using the Root Window to select processes


Demo configuring totalview for the run l.jpg
Demo: Configuring TotalView for the run

  • Setting the TotalView search paths

  • Setting the signal processing environment


Totalview demo using the ccsm l.jpg
TotalView demo using the CCSM

This demonstration runs on an "active" model using the current version of the CCSM using 32 processes with threads. The demonstration will trace coupler startup communication, examine land model structures, and use threads.


Defining ccsm3 totalview machine on babyblue l.jpg
Defining CCSM3 TotalView machine on babyblue

Actions to define a case "dan" and set up a

machine "totalview"

$ROOTDIR/scripts/create_newcase -case "dan"$ROOTDIR/scripts/dan/addmach totalview$ROOTDIR/scripts/dan/configure -mach totalview

Files created or affected- In {$ROOTDIR/scripts/ccsm_utils/Machines} env_mach_pes.totalview run.ibm.totalview batch.ibm.totalview env.totalview


Defining ccsm3 totalview machine on babyblue continued l.jpg
Defining CCSM3 TotalView machine on babyblue (continued)

Files created or affected- In $ROOTDIR/models/{utils, bld}

Changes to makefiles

Changes to Macros.AIX

set FFLAGS = -g -qsmp=omp:noopt