bronis r de supinski and jeffrey s vetter center for applied scientific computing august 15 2000 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Umpire: Making MPI Programs Safe PowerPoint Presentation
Download Presentation
Umpire: Making MPI Programs Safe

Loading in 2 Seconds...

play fullscreen
1 / 13

Umpire: Making MPI Programs Safe - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000. Umpire: Making MPI Programs Safe. Umpire. Writing correct MPI programs is hard Unsafe or erroneous MPI programs Deadlock Resource errors Umpire Automatically detect MPI programming errors

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Umpire: Making MPI Programs Safe' - quinto


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
bronis r de supinski and jeffrey s vetter center for applied scientific computing august 15 2000

Bronis R. de Supinski and Jeffrey S. VetterCenter for Applied Scientific ComputingAugust 15, 2000

Umpire: Making MPI Programs Safe

umpire
Umpire
  • Writing correct MPI programs is hard
  • Unsafe or erroneous MPI programs
    • Deadlock
    • Resource errors
  • Umpire
    • Automatically detect MPI programming errors
    • Dynamic software testing
    • Shared memory implementation
umpire architecture

MPI Application

Umpire

Manager

Task 0

Task 1

Task 2

Task N-1

Task 0

...

Task 1

Task 2

Task N-1

Interposition

using MPI profiling layer

Transactions

via Shared Memory

Task 0

Task 1

Task 2

Task N-1

...

MPI Runtime System

Umpire Architecture

Verification

Algorithms

collection system
Collection system
  • Calling task
    • Use MPI profiling layer
    • Perform local checks
    • Communicate with manager if necessary
      • Call parameters
      • Return program counter (PC)
      • Call specific information (e.g. Buffer checksum)
  • Manager
    • Allocate Unix shared memory
    • Receive transactions from calling tasks
manager
Manager
  • Detects global programming errors
  • Unix shared memory communication
  • History queues
    • One per MPI task
    • Chronological lists of MPI operations
  • Resource registry
    • Communicators
    • Derived datatypes
    • Required for message matching
  • Perform verification algorithms
configuration dependent deadlock
Configuration Dependent Deadlock
  • Unsafe MPI programming practice
  • Code result depends on:
    • MPI implementation limitations
    • User input parameters
  • Classic example code:

Task 0 Task 1

MPI_Send MPI_Send

MPI_Recv MPI_Recv

mismatched collective operations
Mismatched Collective Operations
  • Erroneous MPI programming practice
  • Simple example code:

Tasks 0, 1, & 2 Task 3

MPI_Bcast MPI_Barrier

MPI_Barrier MPI_Bcast

  • Possible code results:
    • Deadlock
    • Correct message matching
    • Incorrect message matching
    • Mysterious error messages
deadlock detection
Deadlock detection
  • MPI history queues
    • One per task in Manager
    • Track MPI messaging operations
      • Items added through transactions
      • Remove when safely matched
  • Automatically detect deadlocks
    • MPI operations only
    • Wait-for graph
    • Recursive algorithm
    • Invoke when queue head changes
  • Also support timeouts
deadlock detection example

Task 0

Task 1

Task 2

Task 3

Deadlock Detection Example

Barrier

Barrier

Barrier

Bcast

Bcast

Bcast

Barrier

Task 1: MPI_Bcast

Task 0: MPI_Bcast

Task 2: MPI_Bcast

Task 2: MPI_Barrier

Task 0: MPI_Barrier

Task 3: MPI_Barrier

Task 1: MPI_Barrier

ERROR! Report it!

resource tracking errors
Resource Tracking Errors
  • Many MPI features require resource allocations
    • Communicators, datatypes and requests
    • Detect “leaks” automatically
  • Simple “lost request” example:

MPI_Irecv (..., &req);

MPI_Irecv (..., &req);

MPI_Wait (&req,…)

  • Complicated by assignment
  • Also detect errant writes to send buffers
conclusion
Conclusion
  • First automated MPI debugging tool
    • Detect deadlocks
    • Eliminates resource leaks
    • Assure correct non-blocking sends
  • Performance
    • Low overhead (21% for sPPM)
    • Located deadlock in code set-up
  • Limitations
    • MPI_Waitany and MPI_Cancel
    • Shared memory implementation
    • Prototype only
future work
Future Work
  • Further prototype testing
  • Improve user interface
  • Handle all MPI calls
  • Tool distribution
    • LLNL application group testing
    • Exploring mechanisms for wider availability
  • Detection of other errors
    • Datatype matching
    • Others?
  • Distributed memory implementation
slide13

UCRL-VG-139184

Work performed under the auspices of the U. S. Department of Energy by University of California Lawrence Livermore National Laboratory under Contract W-7405-Eng-48