1 / 25

Thread Contracts for Safe Parallelism

Thread Contracts for Safe Parallelism. Rajesh Karmani , P Madhusudan , Brandon Moore University of Illinois at Urbana-Champaign PPoPP 2011. NSF. What’s the problem? . Data-races in parallel programs is just plain wrong. For example, C++ gives no semantics for programs with races!

Download Presentation

Thread Contracts for Safe Parallelism

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thread Contracts for Safe Parallelism Rajesh Karmani, P Madhusudan, Brandon Moore University of Illinois at Urbana-Champaign PPoPP 2011 NSF

  2. What’s the problem? Data-races in parallel programs is just plain wrong. • For example, C++ gives no semantics for programs with races! But no mechanism currently for building large data-race-free programs, without significantly changing the programming style Proposal of this paper: • An annotation mechanism for data-parallel programs that helps building large data-race-free programs • Annotation augments existing code • Annotation  Race-freedom (automatically checked) • Program  Annotation (tested)

  3. Data-race and language semantics “simultaneous” accesses to a memory location by two different threads, where one of them is a write. • Memory models for programming languages • Specifies what exactly will be ensured for a read instruction in the program • Java memory model [Manson et al., POPL ‘05] • A complex , buggy model for programs with races • C++ (new version) [Boehm, Adve, PLDI ’08] • No semantics for programs with races! Consensus – Data-Race-Free (DRF) Guarantee: Memory model assures that Data-race-free programs  sequentially consistent

  4. ACCORD (Annotations for Concurrent Co-ORDination) • Light-weight annotation that help develop parallel software • Formally express the coordination strategy that the programmer has in mind • Check the strategy (for data-race-freedom) and its implementation • Annotations have been successful in avoiding memory errors in sequential programs • Eiffel, JML, Microsoft Code Contracts, Cofoja • NOT a type system to express memory regions or program synchronization

  5. In this work… Annotations for data race freedom in • data-parallel programs that access arrays, vectors, matrices • with fork-join synchronization and locks • Typical for OpenMP applications Annotations express the sharing strategy

  6. Illustration: Parallel Matrix Multiplication void mm (int[m,n] A, int[n,p] B) { for (inti:=0; i < m; i := i + 1) for (int j:=0; j < n; j := j + 1) foreach (int k:=0; k < p; k := k+1) { C[i,k] := C[i,k] + (A[i,j] * B[j,k]) ; } } • foreach loop: • semantically creates p threads; one for each iteration of the loop • implicit barrier at end of foreach loop

  7. Illustration: Parallel Matrix Multiplication ACCORDannotation void mm (int[m,n] A, int[n,p] B) { for (inti:=0; i < m; i := i + 1) for (int j:=0; j < n; j := j + 1) foreach (int k:=0; k < p; k := k+1) reads A[i,j], B[j,k], C[i,k] writes C[i,k] { C[i,k] := C[i,k] + (A[i,j] * B[j,k]) ; } } Annotation specifies the readand write sets for kth thread, for each k. Sets can be an over-approximation. Annotation utilizes currentvalues of the program variables (i,j) and thread-id (k)

  8. Fully Parallel Matrix Multiplication void mm ( int [m,n] A, int[n,p] B ) { foreach (inti:=0; i < m; i := i + 1) readsA[i,$x], B[$x, $y], C[i, $y] writes C[i, $y] where 0 <=$x and $x< n and 0 <= $y and $y <p { for (int j:=0; j < n; j := j + 1) foreach (int k:=0; k < p ; k := k+1) reads A[i,j], B[j,k], C[i,k] writes C[i,k] { C[i,k] := C[i,k] + (A[i,j] * B[j,k]) ; } } Aux variables ($x,$y) are quantified over specified ranges whereclause: arithmetic and logical constraints

  9. Checking data-race-freedom Given a program with an ACCORD annotation, program has no data-races iff: I. ACCORD annotations imply race-freedom? • Check the strategy, independent of the program II. Program satisfies the ACCORD annotations? • Check program implementation against strategy

  10. Task I: Annotations imply race-freedom? • Annotation aLogical Formula alpha (automatically) • alpha is satisifiableiff annotations allow a race • Is alpha satisfiable? Ask an SMT solver Thread 0 Thread 1 foreach (inti=0; i < 2; i++) writes A[$x] where ($x % 2 = i) Can two threads i1 and i2 write to the same position in A[] ? • i1, i2, x1, x2.(i1  i2  x1%2 = i1  x2%2 = i2  x1 = x2) Write to the Same index Thread ids are different Constraints on $x

  11. SMT solver: What is that? • Satisfiability Modulo Theory solvers • Solvers for logical and arithmetic constraints • Extremely fast automatic decision procedures for logic developed in the verification community • We use the Z3 SMT Solver from Microsoft Research

  12. Buggy Parallel Matrix Multiplication void mm ( int [m,n] A, int [n,p] B ) { foreach (inti:=0; i < m; i := i + 1) reads A[i,x], B[x,y], C[i,y] writes C[i,y] where 0 <=x and x< n and 0 <= y and y <p foreach (int j:=0; j < n; j := j + 1) reads A[i,j], B[j,$y], C[i,$y] writes C[i,$y] where 0 <= $y and $y <p foreach (int k:=0; k < p ; k := k+1) reads A[i,j], B[j,k], C[i,k] writes C[i,k]{ C[i,k] := C[i,k] + (A[i,j] * B[j,k]) ; } } • j1, j2 (j1  j2  0  y  y < p  i=i  y=y) • Satisfiable; hence annotation does not imply race-freedom

  13. Task II: Program satisfies annotations? • Does a program P satisfy its ACCORDannotations? • Transformation (automatic): Program P with ACCORD annotations  Program P’ with assertions • P’ checks whether annotations hold at run-time

  14. Task II: Program satisfies annotations? void mm ( int [m,n] A, int[n,p] B ) { foreach (inti:=0; i < m; i := i + 1) // reads A[i,$x], B[$x, $y], C[i, $y] writes C[i, $y] // where 0 <=$x and $x< n and 0 <= $y and $y <p { n’ := n; p’ := p; for (int j:=0; j < n; j := j + 1) for(int k:=0; k < p ; k := k+1){ assert ( i=i and 0 <= j and j < n’); // A[i,$x] assert ( 0 <= j and j < n’ and 0 <= k and k < p’); // B[$x,$y] assert ( i=i and 0 <= k and k < p’); // C[i, $y] C[i,k] := C[i,k] + (A[i,j] * B[j,k]) ; } } Memoize variables Convert annotation to assertions

  15. is P + ACCORD data-race-free? TASK I Yes Transformer I SMT Solver (Z3) Formula ACCORDAnnotation No P No Transformer II Runtime Testing P’ with Assertions TASK II

  16. Successive Over-Relaxation Checker board pattern -Phase Green: write to green elements, read blue elements, in parallel -Barrier -Phase Blue: write to blue elements, read green elements, in parallel

  17. Successive Over-Relaxation // Phase Green: write to greens, read from blues foreach (id:=0; id < m; id := id + 1) readsw, A[id,$j], A[id-1,$j], A[id+1,$j], A[id,$j-1], A[id,$j+1] writesA[id,$j] where0 <= $j and $j < n and ((id+$j) % 2 = 0) { for (k := 1-(id%2); k<n; k := k + 2) { A[id,k] := (1-w)*A[id,k] + w*0.25*(A[id-1,k]+ A[id+1,k]+A[id,k-1]+A[id,k+1]); } } // Phase Blue: write to blues, read from greens … Modulo arithmetic constraints

  18. Montecarlo void mc (int[nTasks] tasks) { int slice := (nTasks+nThreads-1)/nThreads; foreach (inti:=0;i<nThreads;i:=i+1) reads next under lockgl, tasks[$k] writesnext,results[$j] under lockgl where (i*slice)<= $k and $k<((i+1)*slice) and 0 <= $j and $j < nTasks { intilow := i*slice; intiupper := (i+1)*slice; if (i = nThreads-1) iupper := nTasks; for(int run:=ilow; run<iupper; run:=run+1) { int result := simulate(tasks[run]); synchronized (gl) { next := next + 1; results[next] := result; } } } } Annotations for expressing accesses protected by a lock Threads read, write to a global location by first acquiring a lock

  19. Quicksort void qsort(int[n] A, inti, int j) reads i, j writes A[$k] where (i <= $k and $k < j) { if (j-i < 2) return; int pivot := A[i]; //first element intp_index := partition(A, pivot, i, j); // swap 1st element and element at p_index par requires p_index >= i and p_index < j { thread writes A[$k] where (i<=$k and $k<p_index) { qsort(A, i, p_index); } with thread writes A[$k] where (p_index<$k and $k<j) { qsort(A, p_index + 1, j); } } }

  20. Sparse Matrix Vector Multiplicationsparsematmult-jgf writes SparseMatmult.yt[row[$j]] where $j >= low[t-id] and $j <= high[t-id] requires forall$i1, $i2,$x, $y. ($i1 != $i2 and low[$i1] <= $x and $x <= high[$i1] and low[$i2] <= $y and $y <= high[$i2] implies row[$x] != row [$y]) Thread 0 Low Thread 1 High SparseMatmult.yt Row

  21. Experience with ACCORD QF_LIA – Quantifier Free Linear Integer Arithmetic, QF_NIA – Quantifier Free Non-linear Integer Arithmetic, QF_UFLIA+MA –Quantifier Free Linear Integer Arithmetic with Uninterpreted Functions and MultiplicationAxioms, AUFLIA – Arrays and Linear Integer Arithmetic with Uninterpreted Functions

  22. Experience with ACCORD (Spec OMP) QF_LIA – Quantifier Free Linear Integer Arithmetic, QF_NIA – Quantifier Free Non-linear Integer Arithmetic, QF_UFLIA+MA –Quantifier Free Linear Integer Arithmetic with Uninterpreted Functions and MultiplicationAxioms, AUFLIA – Arrays and Linear Integer Arithmetic with Uninterpreted Functions

  23. Experience with ACCORD (Spec OMP) • Used the ACCORD methodology to discover long-unknown and erroneous data races in applu_l benchmark in Spec OMP2001 suite (v3.1c) • Recovered the sharing strategy of complex parallel fma3d benchmark from Spec OMP2001 suite (v3.1c) • Summarized and tested the sharing strategy of 10k+ lines using 7 lines of annotation

  24. Applications • Serve as formal specification and documentation of data sharing strategy among parallel threads • Incrementally parallelizing sequential code • May enable runtime and hardware simplification (DeNovo project at Illinois, other architectures with simple or no cache-coherence) • Safe and efficient sharing of large data among Actors and MPI processes on shared memory

  25. Take home message “Thread Contracts for Safe Parallelism,” Rajesh Karmani, P Madhusudan, Brandon Moore. • ACCORD: a light-weight, formal annotation language that allows programmers to document the sharing strategy among threads • Reduces the task of checking race-freedom to two simpler problems of constraint satisfaction and program testing • Does NOT change the style of programming • Only for fork-join parallelism on arrays/matrices now • Future work: • Combine Static regions of dynamic heaps (See work on DPJ) and Dynamic regions (ACCORD) • Support aliasing and richer synchronization idioms • Help in efficiently testing parallel programs

More Related