1 / 9

FT-MPI

FT-MPI. Fault Tolerant MPI Brian Alexander CSS 534, Spring 2014. The problem. user process. Every member of MPI_COMM_WORLD is expected to complete its task. If one process fails, then all communicators with that process become invalid.

yakov
Download Presentation

FT-MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FT-MPI Fault Tolerant MPI Brian Alexander CSS 534, Spring 2014

  2. The problem user process • Every member of MPI_COMM_WORLD is expected to complete its task. If one process fails, then all communicators with that process become invalid. • In most MPI implementations there’s no way to recover from a failed task. • With only a few nodes, failures are relatively rare and easy to recover from. With hundreds or thousands of nodes the problem is bigger. user process user process user process rank 0 rank 3 rank 1 rank 2 Init( ): barrier Send( ) Send( ) Send( ) Data Processing Recv( ) Recv( ) Recv( ) Crash Crash Crash Finalize( ): barrier Crash

  3. The (not-optimal) solution user process user process user process user process • The whole system needs to be checkpointed so that if a process fails, the program can restart without losing all progress. • This hurts performance because the checkpoint operation isn’t free and the ranks need to be synchronized and message buffers cleared before the checkpoint. rank 0 rank 3 rank 1 rank 2 Init( ): barrier Send( ) Send( ) Send( ) Checkpoint( ) Data Processing Recv( ) Crash Recv( ) Recv( ) Finalize( ): barrier

  4. What is FT-MPI? • Developed in the late 90s/early 2000s as part of the HARNESS (Heterogeneous Adaptable Reconfigurable Networked Systems) project funded by the Department of Energy • It’s an implementation of MPI that allows for user-level recovery after a process has failed. • Fault recovery is done without affecting the performance of MPI as a whole (bandwidth/message size is as good as other MPI implementations. From http://web.eecs.utk.edu/~dongarra/lyon2002/Fagg.ppt

  5. FT-MPI solution user process • Instead of having only two states for processes (OK, FAILED), FT-MPI adds a third (OK, PROBLEM, FAILED) and gives recovery options to the user. • Instead of crashing FT-MPI can: • “Blank” (ignore) the failed process the failed process. • Shrink the size of MPI_COMM_WORLD by removing the failed process • Rebuild failed processes or rebuild all processes (like starting from a checkpoint) • FT-MPI doesn’t handle data or process recovery. That has to be done by the user. user process user process user process rank 0 rank 3 rank 1 rank 2 Init( ): barrier Send( ) Send( ) Send( ) Data Processing Recv( ) Recv( ) ProcessFailed() Rebuild( ) Recover Finalize( ): barrier

  6. FT-MPI Recovery Normal startup (all ranks) Recovery (failed rank) MPI_Init(…) MPI_Init(…) I Am Recovered Recover() Make LongJMP Install Error Handler & Set LongJMP ErrorHandler (…) Recover () SolveProblem(…) SolveProblem(…) MPI_Finalize(…) MPI_Finalize(…)

  7. FT-MPI limitations • Implementation was created in the early 2000s based on the MPI 1.2 spec (with parts of the MPI-2 spec) and hasn’t been updated. • Some MPI implementations (like OpenMPI) claim to include some of FT-MPI’s features but they generally don’t work. For these, checkpointing is the most commonly used method of fault tolerance. • The implementation is at the MPI level. If you’re working in a system that already includes MPI at a lower level, you will need to possibly re-write some of the lower-level code to handle the new error states. • FT-MPI is build in to HARNESS and also works with PVM (Parallel Virtual Machines), but if you don’t want to use HARNESS it may be more complicated to use FT-MPI.

  8. Recommendations • For small, simple problems: • Checkpointning and FT in MPI implementations like OpenMPI should be good enough. • Use checkpointingif your program is long-running enough that a crash would cause considerable lost data or time. • For large-scale or grid systems: • If it makes sense for your problem to be able to recover on the fly and you can do so without losing data, then you may want to use FT-MPI. • Also make sure you don’t need any features outside of MPI 1.2 you can use FT-MPI • If map-reduce makes sense for the problem use Hadoop.

  9. Resources and references • http://icl.cs.utk.edu/harness/ • http://icl.cs.utk.edu/graphics/posters/files/FT-MPI-2006.pdf • http://web.eecs.utk.edu/~dongarra/lyon2002/Fagg.ppt • Fagg, G. E., & Dongarra, J. (2000). FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. Paper presented at the Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 346-353. Retrieved from http://dl.acm.org/citation.cfm?id=648137.746632 • Gropp, W., & Lusk, E. (2004). Fault tolerance in message passing interface programs. International Journal of High Performance Computing Applications, 18(3), 363-372. • HARNESS: http://www.csm.ornl.gov/harness/

More Related