830 likes | 989 Views
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS. Dissertation Defense Sarvani Vakkalanka. Committee: Prof. Ganesh Gopalakrishnan (advisor), Prof. Mike Kirby (co-advisor), Prof. Suresh Venkatasubramanian , Prof. Matthew Might, Prof. Stephen Siegel (Univ. of Delaware).
E N D
EFFICIENT DYNAMIC VERIFICATION ALGORITHMS FOR MPI APPLICATIONS Dissertation Defense Sarvani Vakkalanka Committee: Prof. GaneshGopalakrishnan (advisor), Prof. Mike Kirby (co-advisor), Prof. Suresh Venkatasubramanian, Prof. Matthew Might, Prof. Stephen Siegel (Univ. of Delaware)
Necessity for Verification • Software testing is ad-hoc. • Software Errors expensive - $59.5 Billion/yr (2001 NTSI Study). • Software written today is complex and uses many existing libraries. • Our focus – contribute to • Parallel scientific software written using MPI
Motivation • Concurrent software debugging is hard! • Very little formal support for Message Passing concurrency. • Active testing (schedule enforcement) is important. • Reducing redundant (equivalent) verification runs is crucial. • Verification for portability – another important requirement.
Approaches to Verification Testing methods suffer from bug omissions. Static analysis based methods generate many false alarms. Model based verification is tedious. Dynamic verification – no false alarms
Contributions • New dynamic verification algorithmsfor MPI. • New Happens-Before models for Message Passing concurrency. • Verification to handleresource dependency. • MPI dynamic verification tool ISP that handles non-trivial codes for safety properties.
Agenda • Intro to Dynamic Verification • Intro to MPI • Four MPI Operations (S, R, W, B). • MPI Ordering Guarantees. • Applying DPOR to MPI • Dynamic verification algorithms avoiding redundant searches and handling resource dependencies • Formal MPI Transition System • Experimental Results • Conclusions
Growing Importance of Dynamic Verification Code written using mature libraries (MPI, OpenMP, PThreads, …) API calls made from real programming languages (C, Fortran, C++) Dynamic Verification abstracts verification details. (static analysis and model based verification can play important supportive roles) Runtime semantics determined by realistic compilers and runtimes
Exponential number of TOTAL Interleavings – most are EQUIVALENT – generate only RELEVANT ones !! P0 P1 P2 P3 P4 TOTAL > 10 Billion Interleavings !! a++ b-- g=2 g=3
Dynamic Partial Order Reduction P0 P1 P2 P3 P4 TOTAL > 10 Billion Interleavings !! g=2 g=3 Only these 2 are RELEVANT!!! Dependent actions All other actions are pairwise independent
DPOR • A state σ consists of the following sets: • enabled(σ) • backtrack(σ) : sufficient subset of enabled(σ) • enabled(σ) = backtrack(σ) , then the full state space is explored. • Co-enabledness of transitions • Dependence among transitions σ
Co-enabledness & Dependence {t1, t2} t2 t1 {t2} {t1} t1 t2
DPOR Concepts • DPOR requires the identification of dependence and co-enabledness among transitions • Identifying dependence is simple • Two lock accesses on the same mutex. • Two writes to the same global variable. • Similar concepts for MPI. • Identifying co-enabledness is difficult (like will happen in parallel).
Illustration of DPOR Concepts P1 P2 lock(l) x = 1 x = 2 unlock(l) lock(l) y = 1 x = 2 unlock(l)
Illustration of DPOR Concepts P1 P2 lock(l) x = 1 x = 2 unlock(l) lock(l) y = 1 x = 2 unlock(l)
Thread Verification vs MPI Verification • Thread verification – well studied! . • Well known dynamic verification tools on thread verification [CHESS, INSPECT]. • Thread verification follows traditional dynamic partial order reduction. DPOR does not extend directly for MPI • MPI Verification – not so! • requires a formal definition. • out-of-order completion semantics. • Must define dependence
The Ubiquity of MPI IBM Blue Gene (Picture Courtesy IBM) LANL’sPetascale machine “Roadrunner” (AMD Opteron CPUs and IBM PowerX Cell) • The choice for ALL large-scale parallel simulations (earthquake, weather..) • Runs “everywhere”. • Very mature codes exist in MPI – tens of person years. • Performs critical simulations in science and engineering.
Overview of Message Passing Interface (MPI) API • One of the major Standardization Successes. • Lingua franca of Parallel Computing • Runs on parallel machines of a WIDE range of sizes • Standard is published at www.mpi-forum.org • MPI 2.0 includes over 300 functions
MPI Execution Environment • MPI execution environment consists of two main components: • MPI processes. • The MPI runtime daemon. • All processes statically created. • Process rank between 0 and n-1. • The MPI processes issue instructions into MPI runtime. • The MPI runtime implements and executes the MPI library.
MPI Execution Contd… • Every process starts execution with MPI_Init(intargc, char **argv); • MPI_Finalize – at the end
MPI_Isend (void *buff, …, int dest, int tag, MPI_Commcomm, MPI_Request handle); • Abbreviated as S
MPI_Irecv (void *buff, …, intsrc, int tag, MPI_Commcomm, MPI_Request *handle); • Abbreviated as R
MPI_Wait (MPI_Request *handle, MPI_Status *status); • Abbreviated as W
MPI_Barrier (MPI_Commcomm); • Abbreviated as B. • All processes must invoke B before any can get past.
Applying DPOR to MPI Programs like this – almost impossible to test on real platforms.
Modifying Runtime Doesn’t Help! • Assume that the MPI runtime is modified to support verification • The sends are matched with receives in the order they are issued to the MPI runtime • Is this sufficient?
Crooked Barrier Example P0 P1 P2 Isend(1, req) Irecv(*, req) Barrier Barrier Barrier Isend(1, req) Wait(req) Irecv(2, req1) Wait(req) Verification Support does not work! Wait(req1) Wait(req)
Our Main Algorithms • Partial Order avoiding Elusive Interleavings (POE). • POEOPT : Reduced interleavings even further. • POEMSE: Handle resource dependencies.
Illustration of POE Scheduler P0 P1 P2 Isend(1) sendNext Barrier Isend(1, req) Barrier Irecv(*, req) Barrier Isend(1, req) Barrier Wait(req) Wait(req) Recv(2) Wait(req) MPI Runtime
Illustration of POE Scheduler P0 P1 P2 Isend(1) Barrier sendNext Isend(1, req) Irecv(*, req) Irecv(*) Barrier Barrier Barrier Barrier Isend(1, req) Wait(req) Recv(2) Wait(req) Wait(req) MPI Runtime
Illustration of ISP’s Verification Algorithm Scheduler P0 P1 P2 Isend(1) Barrier Barrier Isend(1, req) Irecv(*, req) Barrier Barrier Irecv(*) Barrier Barrier Barrier Isend(1, req) Barrier Wait(req) Recv(2) Wait(req) Wait(req) Barrier MPI Runtime
Illustration of POE Scheduler P0 P1 P2 Isend(1) Irecv(2) Barrier Isend Wait (req) Isend(1, req) Irecv(*, req) Barrier No Matching Send Irecv(*) Barrier Barrier Isend(1, req) Barrier Recv(2) SendNext Wait(req) Recv(2) Wait(req) Wait(req) Barrier Deadlock! Isend(1) Wait Wait (req) MPI Runtime IntraCB
Notations • MPI_Isend : Si,j(k), where • i is the process issuing the send, • jis the dynamic execution count of S in process i and • k is the destination process rank where the message is to be sent • MPI_Irecv: Ri,j(k) • k is the source • MPI_Barrier: Bi,j • MPI_Wait : Wi,j’(hi,j) • hi,jis the request handle of Si,j(k) or Ri,j(k)
POE Issue: Redundancy POE explores both the match-sets resulting in 2 interlevings while just 1 interleaving is sufficient. SOLUTION : Explore only match-sets for single wildcard receive. DOES NOT WORK! BREAKS PERSISTENCE.
POE and Persistent Sets Add only this match-setto bactrack Maintaining Persistent backtrack sets is important. Otherwise, verification algorithm is broken
POE Issue: Buffering Deadlocks When no sends are buffered Deadlock!
POE Issue: Redundancy Simple Optimization: If there is no more sends targeting a wildcard receive, then add only of of the match-sets to backtrack set.
Redundancy : POEOPT P0 P1 P2 P3 S0,1(1) R1,1(*) S3,1(*) R2,1(1) W0,2(h0,1) W1,2(h1,1) W3,2(h3,1) W2,2(h2,1) S1,3(3) R3,3(1) W0,4(h1,3) W3,4(h3,3) R1,5(*) S3,5(1) W1,6(h1,5) W3,6(h3,5)
Detecting Matching • Exploring all non-deterministic matchings in a state is not a solution • The IntraHB relation is not sufficient to detect matchings across processes • We introduce the notion of Inter-HB
Redundancy : POEOPT P0 P1 P2 P3 S0,1(1) R1,1(*) S3,1(2) R2,1(*) W0,2(h0,1) W1,2(h1,1) W3,2(h3,1) W2,2(h2,1) S1,3(3) R3,3(1) W0,4(h1,3) W3,4(h3,3) R1,5(*) S3,5(1) W1,6(h1,5) W3,6(h3,5)
Redundancy : POEOPT P0 P1 P2 P3 P4 P5 R2,1(*) S3,1(2) R1,1(*) R4,1(*) S5,1(1) S0,1(1) W3,2(h3,1) W4,2(h4,1) W5,2(h5,1) W0,2(h0,1) W1,2(h1,1) W2,2(h2,1) S3,3(1) R1,3(3) W3,4(h3,3) W1,4(h1,3) NO PATH
Slack/Buffering Deadlocks Deadlocks only when S0,1 or S1,1 or both are buffered
Buffer All Sends ??? ZERO SLACK
Buffer All Sends ??? ZERO SLACK