Ft mpi
This presentation is the property of its rightful owner.
Sponsored Links
1 / 9

FT-MPI PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on
  • Presentation posted in: General

FT-MPI. Fault Tolerant MPI Brian Alexander CSS 534, Spring 2014. The problem. user process. Every member of MPI_COMM_WORLD is expected to complete its task. If one process fails, then all communicators with that process become invalid.

Download Presentation

FT-MPI

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ft mpi

FT-MPI

Fault Tolerant MPI

Brian Alexander

CSS 534, Spring 2014


The problem

The problem

user

process

  • Every member of MPI_COMM_WORLD is expected to complete its task. If one process fails, then all communicators with that process become invalid.

  • In most MPI implementations there’s no way to recover from a failed task.

  • With only a few nodes, failures are relatively rare and easy to recover from. With hundreds or thousands of nodes the problem is bigger.

user

process

user

process

user

process

rank 0

rank 3

rank 1

rank 2

Init( ): barrier

Send( )

Send( )

Send( )

Data Processing

Recv( )

Recv( )

Recv( )

Crash

Crash

Crash

Finalize( ): barrier

Crash


The not optimal solution

The (not-optimal) solution

user

process

user

process

user

process

user

process

  • The whole system needs to be checkpointed so that if a process fails, the program can restart without losing all progress.

  • This hurts performance because the checkpoint operation isn’t free and the ranks need to be synchronized and message buffers cleared before the checkpoint.

rank 0

rank 3

rank 1

rank 2

Init( ): barrier

Send( )

Send( )

Send( )

Checkpoint( )

Data Processing

Recv( )

Crash

Recv( )

Recv( )

Finalize( ): barrier


What is ft mpi

What is FT-MPI?

  • Developed in the late 90s/early 2000s as part of the HARNESS (Heterogeneous Adaptable Reconfigurable Networked Systems) project funded by the Department of Energy

  • It’s an implementation of MPI that allows for user-level recovery after a process has failed.

  • Fault recovery is done without affecting the performance of MPI as a whole (bandwidth/message size is as good as other MPI implementations.

From http://web.eecs.utk.edu/~dongarra/lyon2002/Fagg.ppt


Ft mpi solution

FT-MPI solution

user

process

  • Instead of having only two states for processes (OK, FAILED), FT-MPI adds a third (OK, PROBLEM, FAILED) and gives recovery options to the user.

  • Instead of crashing FT-MPI can:

    • “Blank” (ignore) the failed process the failed process.

    • Shrink the size of MPI_COMM_WORLD by removing the failed process

    • Rebuild failed processes or rebuild all processes (like starting from a checkpoint)

  • FT-MPI doesn’t handle data or process recovery. That has to be done by the user.

user

process

user

process

user

process

rank 0

rank 3

rank 1

rank 2

Init( ): barrier

Send( )

Send( )

Send( )

Data Processing

Recv( )

Recv( )

ProcessFailed()

Rebuild( )

Recover

Finalize( ): barrier


Ft mpi recovery

FT-MPI Recovery

Normal startup (all ranks)

Recovery (failed rank)

MPI_Init(…)

MPI_Init(…)

I Am Recovered

Recover()

Make LongJMP

Install Error

Handler & Set

LongJMP

ErrorHandler (…)

Recover ()

SolveProblem(…)

SolveProblem(…)

MPI_Finalize(…)

MPI_Finalize(…)


Ft mpi limitations

FT-MPI limitations

  • Implementation was created in the early 2000s based on the MPI 1.2 spec (with parts of the MPI-2 spec) and hasn’t been updated.

  • Some MPI implementations (like OpenMPI) claim to include some of FT-MPI’s features but they generally don’t work. For these, checkpointing is the most commonly used method of fault tolerance.

  • The implementation is at the MPI level. If you’re working in a system that already includes MPI at a lower level, you will need to possibly re-write some of the lower-level code to handle the new error states.

  • FT-MPI is build in to HARNESS and also works with PVM (Parallel Virtual Machines), but if you don’t want to use HARNESS it may be more complicated to use FT-MPI.


Recommendations

Recommendations

  • For small, simple problems:

    • Checkpointning and FT in MPI implementations like OpenMPI should be good enough.

    • Use checkpointingif your program is long-running enough that a crash would cause considerable lost data or time.

  • For large-scale or grid systems:

    • If it makes sense for your problem to be able to recover on the fly and you can do so without losing data, then you may want to use FT-MPI.

    • Also make sure you don’t need any features outside of MPI 1.2 you can use FT-MPI

    • If map-reduce makes sense for the problem use Hadoop.


Resources and references

Resources and references

  • http://icl.cs.utk.edu/harness/

  • http://icl.cs.utk.edu/graphics/posters/files/FT-MPI-2006.pdf

  • http://web.eecs.utk.edu/~dongarra/lyon2002/Fagg.ppt

  • Fagg, G. E., & Dongarra, J. (2000). FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. Paper presented at the Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 346-353. Retrieved from http://dl.acm.org/citation.cfm?id=648137.746632

  • Gropp, W., & Lusk, E. (2004). Fault tolerance in message passing interface programs. International Journal of High Performance Computing Applications, 18(3), 363-372.

  • HARNESS: http://www.csm.ornl.gov/harness/


  • Login