EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS

EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS Dissertation Defense Sarvani Vakkalanka Committee: Prof. GaneshGopalakrishnan (advisor), Prof. Mike Kirby (co-advisor), Prof. Suresh Venkatasubramanian, Prof. Matthew Might, Prof. Stephen Siegel (Univ. of Delaware)

Necessity for Verification • Software testing is ad-hoc. • Software Errors expensive - $59.5 Billion/yr (2001 NTSI Study). • Software written today is complex and uses many existing libraries. • Our focus – contribute to • Parallel scientific software written using MPI

Motivation • Concurrent software debugging is hard! • Very little formal support for Message Passing concurrency. • Active testing (schedule enforcement) is important. • Reducing redundant (equivalent) verification runs is crucial. • Verification for portability – another important requirement.

Approaches to Verification Testing methods suffer from bug omissions. Static analysis based methods generate many false alarms. Model based verification is tedious. Dynamic verification – no false alarms

Contributions • New dynamic verification algorithmsfor MPI. • New Happens-Before models for Message Passing concurrency. • Verification to handleresource dependency. • MPI dynamic verification tool ISP that handles non-trivial codes for safety properties.

Agenda • Intro to Dynamic Verification • Intro to MPI • Four MPI Operations (S, R, W, B). • MPI Ordering Guarantees. • Applying DPOR to MPI • Dynamic verification algorithms avoiding redundant searches and handling resource dependencies • Formal MPI Transition System • Experimental Results • Conclusions

EFFICIENT DYNAMIC VERIFICATION

Growing Importance of Dynamic Verification Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) Dynamic Verification abstracts verification details. (static analysis and model based verification can play important supportive roles) Runtime semantics determined by realistic compilers and runtimes

Exponential number of TOTAL Interleavings – most are EQUIVALENT – generate only RELEVANT ones !! P0 P1 P2 P3 P4 TOTAL > 10 Billion Interleavings !! a++ b-- g=2 g=3

Dynamic Partial Order Reduction P0 P1 P2 P3 P4 TOTAL > 10 Billion Interleavings !! g=2 g=3 Only these 2 are RELEVANT!!! Dependent actions All other actions are pairwise independent

DPOR • A state σ consists of the following sets: • enabled(σ) • backtrack(σ) : sufficient subset of enabled(σ) • enabled(σ) = backtrack(σ) , then the full state space is explored. • Co-enabledness of transitions • Dependence among transitions σ

Co-enabledness & Dependence {t1, t2} t2 t1 {t2} {t1} t1 t2

DPOR Concepts • DPOR requires the identification of dependence and co-enabledness among transitions • Identifying dependence is simple • Two lock accesses on the same mutex. • Two writes to the same global variable. • Similar concepts for MPI. • Identifying co-enabledness is difficult (like will happen in parallel).

Illustration of DPOR Concepts P1 P2 lock(l) x = 1 x = 2 unlock(l) lock(l) y = 1 x = 2 unlock(l)

Thread Verification vs MPI Verification • Thread verification – well studied! . • Well known dynamic verification tools on thread verification [CHESS, INSPECT]. • Thread verification follows traditional dynamic partial order reduction. DPOR does not extend directly for MPI • MPI Verification – not so! • requires a formal definition. • out-of-order completion semantics. • Must define dependence

INTRODUCTION TO MPI

The Ubiquity of MPI IBM Blue Gene (Picture Courtesy IBM) LANL’sPetascale machine “Roadrunner” (AMD Opteron CPUs and IBM PowerX Cell) • The choice for ALL large-scale parallel simulations (earthquake, weather..) • Runs “everywhere”. • Very mature codes exist in MPI – tens of person years. • Performs critical simulations in science and engineering.

Overview of Message Passing Interface (MPI) API • One of the major Standardization Successes. • Lingua franca of Parallel Computing • Runs on parallel machines of a WIDE range of sizes • Standard is published at www.mpi-forum.org • MPI 2.0 includes over 300 functions

MPI Execution Environment • MPI execution environment consists of two main components: • MPI processes. • The MPI runtime daemon. • All processes statically created. • Process rank between 0 and n-1. • The MPI processes issue instructions into MPI runtime. • The MPI runtime implements and executes the MPI library.

MPI Execution Contd… • Every process starts execution with MPI_Init(intargc, char **argv); • MPI_Finalize – at the end

MPI_Isend (void *buff, …, int dest, int tag, MPI_Commcomm, MPI_Request handle); • Abbreviated as S

MPI_Irecv (void *buff, …, intsrc, int tag, MPI_Commcomm, MPI_Request *handle); • Abbreviated as R

MPI_Wait (MPI_Request *handle, MPI_Status *status); • Abbreviated as W

MPI_Barrier (MPI_Commcomm); • Abbreviated as B. • All processes must invoke B before any can get past.

MPI Ordering Guarantees

MPI Ordering Guarantee

Applying DPOR to MPI Programs like this – almost impossible to test on real platforms.

Why DPOR does not work!

Modifying Runtime Doesn’t Help! • Assume that the MPI runtime is modified to support verification • The sends are matched with receives in the order they are issued to the MPI runtime • Is this sufficient?

Crooked Barrier Example P0 P1 P2 Isend(1, req) Irecv(*, req) Barrier Barrier Barrier Isend(1, req) Wait(req) Irecv(2, req1) Wait(req) Verification Support does not work! Wait(req1) Wait(req)

Our Main Algorithms • Partial Order avoiding Elusive Interleavings (POE). • POEOPT : Reduced interleavings even further. • POEMSE: Handle resource dependencies.

Illustration of POE Scheduler P0 P1 P2 Isend(1) sendNext Barrier Isend(1, req) Barrier Irecv(*, req) Barrier Isend(1, req) Barrier Wait(req) Wait(req) Recv(2) Wait(req) MPI Runtime

Illustration of POE Scheduler P0 P1 P2 Isend(1) Barrier sendNext Isend(1, req) Irecv(*, req) Irecv(*) Barrier Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime

Illustration of ISP’s Verification Algorithm Scheduler P0 P1 P2 Isend(1) Barrier Barrier Isend(1, req) Irecv(*, req) Barrier Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Barrier Wait(req) Recv(2) Wait(req) Wait(req) Barrier MPI Runtime

Illustration of POE Scheduler P0 P1 P2 Isend(1) Irecv(2) Barrier Isend Wait (req) Isend(1, req) Irecv(*, req) Barrier No Matching Send Irecv(*) Barrier Barrier Isend(1, req) Barrier Recv(2) SendNext Wait(req) Recv(2) Wait(req) Wait(req) Barrier Deadlock! Isend(1) Wait Wait (req) MPI Runtime IntraCB

Notations • MPI_Isend : Si,j(k), where • i is the process issuing the send, • jis the dynamic execution count of S in process i and • k is the destination process rank where the message is to be sent • MPI_Irecv: Ri,j(k) • k is the source • MPI_Barrier: Bi,j • MPI_Wait : Wi,j’(hi,j) • hi,jis the request handle of Si,j(k) or Ri,j(k)

POE Issue: Redundancy POE explores both the match-sets resulting in 2 interlevings while just 1 interleaving is sufficient. SOLUTION : Explore only match-sets for single wildcard receive. DOES NOT WORK! BREAKS PERSISTENCE.

POE and Persistent Sets Add only this match-setto bactrack Maintaining Persistent backtrack sets is important. Otherwise, verification algorithm is broken

POE Issue: Buffering Deadlocks When no sends are buffered Deadlock!

POE Issue: Redundancy Simple Optimization: If there is no more sends targeting a wildcard receive, then add only of of the match-sets to backtrack set.

Redundancy : POEOPT P0 P1 P2 P3 S0,1(1) R1,1(*) S3,1(*) R2,1(1) W0,2(h0,1) W1,2(h1,1) W3,2(h3,1) W2,2(h2,1) S1,3(3) R3,3(1) W0,4(h1,3) W3,4(h3,3) R1,5(*) S3,5(1) W1,6(h1,5) W3,6(h3,5)

Detecting Matching • Exploring all non-deterministic matchings in a state is not a solution • The IntraHB relation is not sufficient to detect matchings across processes • We introduce the notion of Inter-HB

InterHB Relation

Redundancy : POEOPT P0 P1 P2 P3 S0,1(1) R1,1(*) S3,1(2) R2,1(*) W0,2(h0,1) W1,2(h1,1) W3,2(h3,1) W2,2(h2,1) S1,3(3) R3,3(1) W0,4(h1,3) W3,4(h3,3) R1,5(*) S3,5(1) W1,6(h1,5) W3,6(h3,5)

Redundancy : POEOPT P0 P1 P2 P3 P4 P5 R2,1(*) S3,1(2) R1,1(*) R4,1(*) S5,1(1) S0,1(1) W3,2(h3,1) W4,2(h4,1) W5,2(h5,1) W0,2(h0,1) W1,2(h1,1) W2,2(h2,1) S3,3(1) R1,3(3) W3,4(h3,3) W1,4(h1,3) NO PATH

Slack/Buffering Deadlocks Deadlocks only when S0,1 or S1,1 or both are buffered

Buffer All Sends ??? ZERO SLACK

EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS

EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS

Presentation Transcript

Dynamic Instrumentation of Large-Scale MPI and OpenMP Applications

Memory-Efficient Algorithms for the Verification of Temporal Properties

Efficient Algorithms for Matching

DYNAMIC PROGRAMMING ALGORITHMS

Scalable Formal Dynamic Verification of MPI Programs through Distributed Causality Tracking

Dynamic Topology Aware Load Balancing Algorithms for MD Applications

Scheduling Considerations for building Dynamic Verification Tools for MPI

Toward Efficient Support for Multithreaded MPI Communication

Energy-Efficient Algorithms

Efficient Dynamic Aggregation

MCC: A dynamic verification tool for MCAPI user applications

Efficient Algorithms for Large-Scale GIS Applications

Dynamic languages for dynamic applications

Dynamic Topology Aware Load Balancing Algorithms for MD Applications

An Integration of Dynamic MPI Formal Verification Within Eclipse PTP

Algorithms for Efficient Collaborative Filtering

Optimised MPI for HPEC applications

Efficient Algorithms for Motif Search

MPI Verification

Efficient Hierarchical Self-Scheduling for MPI Applications Executing in Computational Grids

Efficient Algorithms for Large-Scale GIS Applications