1 / 42

Gauss: A Framework for Verifying Scientific Software

Gauss: A Framework for Verifying Scientific Software. Robert Palmer Steve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. Kirby University of Utah. Supported in part by NSF Award ITR-0219805. Motivations.

surrett
Download Presentation

Gauss: A Framework for Verifying Scientific Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gauss: A Framework for Verifying Scientific Software Robert PalmerSteve Barrus, Yu Yang, Ganesh Gopalakrishnan, Robert M. KirbyUniversity of Utah Supported in part by NSF Award ITR-0219805

  2. Motivations “One of the simulations will run for 30 days. A Cray supercomputer built in 1995 would take 60,000 years to perform the same calculations.” 12,300 GFLOPS Need permission/grant to use it.

  3. Motivations 136,800 GFLOPS Max $10k/week on Blue Gene (180 GFLOPS)at IBM’s Deep Computing Lab

  4. Motivations • 50% of development of parallel scientific codes spent in debugging [Vetter and deSupinski 2000] • Programmers from a variety of backgrounds—often not computer science

  5. Overview • What Scientific programs look like • What challenges are faced by scientific code developers • How formal methods can help • The Utah Gauss project

  6. SPMD Programs • Single Program Multiple Data • Same image runs on each node in the grid • Processes do different things based on rank • Possible to impose a virtual topology within the program

  7. MPI Library for communication • MPI is to HPC what PThreads to systems or OpenGL is to Graphics • More than 60% of HPC applications use MPI Libraries in some form • There are proprietary and open source implementations • Provides both communication primitives and virtual topologies in MPI-1

  8. Concurrency Primitives • Point to point communications that • Don’t specify system buffering (but might have it in some implementations) and • Block • Don’t’ block • Use user program provided buffering (with possibly hard or soft limitations) and • Block • Don’t block • Collective communications that • “can (but are not required to) return as soon as their participation in the collective communication is complete.” [MPI-1.1 Standard pg 93, lines 10-11]

  9. MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  10. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  11. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  12. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  13. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  14. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  15. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  16. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  17. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  18. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  19. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  20. P0 P1 P2 P3 MPI Tutorial #include<mpi.h> #define CNT 1 #define TAG 1 int main(int argc, char ** argv){ int mynode = 0, totalnodes = 0, recvdata0 = 0, recvdata1 = 0; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &totalnodes); MPI_Comm_rank(MPI_COMM_WORLD, &mynode); if(mynode%2 == 0){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); if(mynode%2 == 1){ MPI_Send(&mynode,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD); MPI_Send(&mynode,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD); } else { MPI_Recv(&recvdata0,CNT,MPI_INT,(mynode-1+totalnodes)%totalnodes,TAG,MPI_COMM_WORLD,&status); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status); } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); }

  21. Why is parallel scientific programming hard? • Portability • Scaling • Performance

  22. Variety of bugs that are common in parallel scientific programs • Deadlock • Race Conditions • Misunderstanding the semantics of MPI procedures • Resource related assumptions • Incorrectly matched send/receives

  23. State of the art in Debugging • TotalView • Parallel debugger – trace visualization • Parallel DBX • gdb • MPICHECK • Does some deadlock checking • Uses trace analysis

  24. Related work • Verification of wildcard free models [Siegel, Avrunin, 2005] • Deadlock free with length zero buffers ==> deadlock free with length > zero buffers. • SPIN models of MPI programs [Avrunin, Seigel, Seigel, 2005] and [Seigel, Mironova, Avrunin, Clarke, 2005] • Compare serial and parallel versions of numerical computations for numerical equivelnace.

  25. Automatic Formal Analysis • Can prove it correct by hand in a theorem prover • Don’t want to spend time making models • Approach should be completely automatic (Intended for use by the scientific community at large)

  26. The Big Picture • Automatic model extraction • Improved static analysis • Model checking • Better partial-order reduction • Parallel state-space enumeration • Symmetry • Abstraction Refinement • Integration with existing tools • Visual Studio • TotalView

  27. #include <mpi.h> #include <stdio.h> #include <stdlib.h> int main(int argc, char** argv){ int myid; int numprocs; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid); if(myid == 0){ int i; for(i = 1; i < numprocs; ++i){ MPI_Send(&i, 1, MPI_INT, i, 0, MPI_COMM_WORLD); } printf("%d Value: %d\n", myid, myid); } else { int val; MPI_Status s; MPI_Recv(&val, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &s); printf("%d Value: %d\n", myid, val); } MPI_Finalize(); return 0; } 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 10010101000101010001010100101010010111 00100100111010101101101001001001001100 10011100100100001111001011001111000111 00100100111010101101101001001001001100 MPI Program MPI Binary The Big Picture proctype MPI_Send(chan out, int c){ out!c; } proctype MPI_Bsend(chan out, int c){ out!c; } proctype MPI_Isend(chan out, int c){ out!c; } typedef MPI_Status{ int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; } … MPI LibraryModel int y; active proctype T1(){ int x; x = 1; if :: x = 0; :: x = 2; fi; y = x; } active proctype T2(){ int x; x = 2; if :: y = x + 1; :: y = 0; fi; assert( y == 0 ); } Compiler + ProgramModel Model Generator + EnvironmentModel Error Simulator Refinement Abstractor Zing Result Analyzer MC Server MC Client MC Client MC Client MC Client MC Client MC Client … OK MC Client MC Client MC Client

  28. Environment modeling • C • Very prevalent among HPC developers • Want to analyze code as it is written for performance • Zing has it all. • Numeric types • Pointers, Arrays, Casting, Recursion, … • Missing one thing only: “&” (not bitwise and) • We provide a layer that makes this possible • Can also track pointer arithmetic and unsafe casts • Also provide a variety of stubs for system calls

  29. Environment Example class pointer{ object reference; static object addressof(pointer p){ pointer ret; ret = new pointer; ret.reference = p; return ret; } … } • Data encapsulated in Zing objects makes it possible to handle additional Cisms

  30. MPI Library • MPI Library modeled carefully by hand from the MPI Specification • Preliminary shared memory based implementation • Send, Recv, Barrier, BCast, Init, Rank, Size, Finalize.

  31. Library Example integer MPI_Send(pointer buf, integer count, integer datatype, integer dest, integer tag, integer c){ … comm = getComm(c); atomic { ret = new integer; msg1 = comm.create(buf, count, datatype, _mpi_rank, dest, tag, true); msg2 = comm.find_match(msg1); if (msg1 != msg2) { comm.copy(msg1, msg2); comm.remove(msg2); msg1.pending = false; } else { comm.add(msg1); } ret.value = 0; } select{ wait(!msg1.pending) -> ; }

  32. Model Extraction • Map C onto Zing (using CIL) • First through the cpp • Processes to Zing Threads • File to Zing Class • Structs and Unions also extracted to Classes • Integral data types to environment layer • All numeric types • Pointer Class

  33. Extraction Example __cil_tmp45 = integer.addressof(recvdata1); __cil_tmp46 = integer.create(1); __cil_tmp47 = integer.create(6); __cil_tmp48 = integer.create(1); __cil_tmp49 = integer.add(mynode, __cil_tmp48); __cil_tmp50 = integer.mod(__cil_tmp49, totalnodes); __cil_tmp51 = integer.create(1); __cil_tmp52 = integer.create(91); __cil_tmp53 = __anonstruct_MPI_Status_1.addressof(status); MPI_Recv(__cil_tmp45, __cil_tmp46, __cil_tmp47, __cil_tmp50, __cil_tmp51, __cil_tmp52,__cil_tmp53); MPI_Recv(&recvdata1,CNT,MPI_INT,(mynode+1)%totalnodes,TAG,MPI_COMM_WORLD,&status);

  34. Experimental Results • Correct example • 2 processes: 12,882 states • 4 processes: Does not complete • Deadlock example • 24 processes: 2,522 states

  35. Possible Improvements • Atomic regions • Constant reuse • More formal extraction semantics

  36. Looking ahead (in the coming year) • Full MPI-1.1 Library Model in Zing • All point to point and collective communication primitives • Virtual Topologies • ANSI C Capable Model Extractor • Dependencies: CIL, GCC, CYGWIN • Preliminary Tool Integration • Error visualization and simulation • Text book and validation suite examples

  37. Looking ahead (beyond) • Better Static Analysis through • Partial-order reduction • MPI library model is intended to leverage transaction based reduction • Can improve by determining transaction independence

  38. Looking ahead (beyond) • Better Static Analysis through • Abstraction Refinement • Control flow determined mostly by rank • Nondeterministic over-approximation

  39. Looking ahead (beyond) • Better Static Analysis through • Distributed computing • Grid based • Client server based

  40. Looking ahead (beyond) • More library support • MPI-2 Library • One sided communication • PThread Library • Mixed MPI/PThread concurrency • More languages • FORTRAN • C++ • Additional static analysis techniques

  41. Can we get more performance? • Can we phrase a performance bug as a safety property? • There does not exist a communication chain longer than N • Is there a way to leverage formal methods to reduce synchronizations? • Can formal methods help determine the right balance between MPI and PThreads for concurrency?

  42. Questions?

More Related