Inherent limitations facilitate design and verification of concurrent programs

Inherent limitations facilitate design and verificationof concurrent programs Hagit AttiyaTechnion

Concurrent Programs • Core challenge is synchronization • Correct synchronization is hard to get right • Efficient synchronization is even harder Ad-hoc VS Principled Manual VS Automatic

Example I:Verifying Locking protocols Work with Ramalingam and Rinetzky (POPL 2010)

The Goal: Sequential Reductions Verify concurrent data structures • Pre-execution static analysis E.g., linked list with hand-over-hand locking • no memory leaks, shape (it’s a list), serializability Find sequential reductions • Consider only sequentialexecutions • But conclude that properties hold in allexecutions

Back-of-envelop estimate of gain Static analysis of a linked-list algorithm [Amit, Rinetzky, Reps, Sagiv, Yahav, CAV 2007] • Verifies e.g., memory safety, sortedness, pointed-to by a variable, heap sharing

~ ~ ~ ~ ~ ~ ~ ~ ~ Serializability [Papadimitriou ‘79] interleaved execution operation to the thread locally complete non-interleaved execution

Serializability assists verification Concurrent code M Π= all executions of M φ = a property local to the threads cni-Π: complete non-interleaved executions of M (small subset of Π) • If M is serializable • Then Π ⊨ φcni-Π⊨ φ Easily derived from [Papadimitriou ‘79]

How do we know that M is serializable, w/o considering all executions? E.g., from only complete non interleaved executions • If M is serializable • Then Π ⊨ φcni-Π⊨ φ

Special (and common) case: Disciplined programming with locks Guard access to data with locks • Lock() acquire the lock • Unlock() release the lock Only one process holds the lock at each time Follow a locking protocol that guarantees conflictserializability E.g., two-phase locking (2PL) or tree locking (TL)

H Two-phase locking [Papadimitriou `79] • Locks acquire (grow) phase followed by locks release (shrink) phase • No lock is acquired after somelock is released t2 t1 t1 t1 t1

H Tree (hand-over-hand) locking [Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77] • Except for the first lock, acquire a lock only when holding the lock on its parent • No lock is acquired after being released t2 t1 t1 t1

H Tree (hand-over-hand) locking [Kedem & Sliberschatz ‘76] [Smadi ‘76] [Bayer & Scholnick ‘77] • Except for the first lock, acquire a lock only when holding the lock on its parent • No lock is acquired after being released t2 t1 t1 t2

void p() { acquire(B) B = 0 release(B) int b = B if (b) acquire(A) } void q() { acquire(B) B = 1 release(B) } Not two-phase locked But only in interleaved executions Yes! • for databases • concurrency control monitor ensures that M follows the locking policy at run-time  M is serializable No! • for static analysis • no central monitor

Our Goal Statically verify that M followsa locking policy Applies to localconflict-serializable locking protocols • Depending only on thread’s local variables & global variables locked by it E.g., two phase locking, tree locking, (dynamic) DAG locking… But not protocols that rely on a concurrency control monitor!

non-interleaved execution Our contribution: Easy step ni-Π: complete non-interleaved executions of M Two phase locking Tree locking Dynamic tree locking Dynamic DAG locking For any local conf serializable locking policy LP Π ⊨ LPni-Π⊨ LP For any thread-local property φ Π ⊨ φni-Π ⊨ φ

Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP • σ’ follows LP, guarantees conflict-serializability •  non interleaved execution “equivalent” to σ’ (t,e) σ σ’

σ’ni σ’ Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP • σ’ follows LP, guarantees conflict-serializability •  non interleaved execution “equivalent” to σ’ (t,e) σ

σ’ (t,e) Reduction to non-interleaved executions: Proof idea σ is the shortest execution that does not follow LP • σ’ follows LP, guarantees conflict-serializability •  non interleaved execution “similar” to σ’ •  non interleaved execution “similar” to σ’ where LP is violated (t,e) σ σni

almost complete non-interleaved execution Further reduction acni-Π: almost-complete non-interleaved executions of M For any LCS locking policy LP Π ⊨ LPacni-Π ⊨ LP

Reduction to non-interleaved executions: A complication Need to argue about termination Observe Y == 1 & violates 2PL Y is set to 1 & the method enters an infinite loop int X=0, Y=0 void p() { acquire(Y) y = Y release(Y); if (y ≠ 0) acquire(X) X = 3 release(X) } void q() { if (random(5) == 3){ acquire(Y) Y = 1 release(Y) while (true) nop } }

Reduction to non-interleaved executions: Termination  Can use sequential reduction to verify termination For any “terminating” local conflict serializable locking policy LP Π ⊨ LPacni-Π ⊨ LP

Initial analysis results Shape analysis of hand-over-hand lists * Does not verify sortedness of list and fails to verify linearizabilityin some cases Shape analysis of hand-over-hand trees(for the first time)

What’s next? • Extend to shared (read) locks • Extend to software transactional memory • aborted transactions • non-locking non-conflict based serializability (e.g., using timestamps) • Combine with other reductions [Guerraoui, Henzinger, Jobstmann, Singh]

Work with Guerraoui, Hendler, Kuznetsov, Michael and Vechev (POPL 2011) Example II:Required Memory orderings

Relaxed memory models Out of order execution of memory accesses, to compensate for slow writes Optimize to issue reads before following writes, if they access different locations Reordering may lead to inconsistency

Read-after-write (RAW) Reordering Process P: Write(X,1) Read(Y) • Process Q: • Write(Y,1) • Read(X) W(X,1) R(Y) W(X,1) P Q W(Y,1) R(X)

Avoiding out-of-order:Read-after-write (RAW) Fence Process P: Write(X,1) FENCE Read(Y) • Process Q: • Write(Y,1) • FENCE • Read(X) R(Y) W(X,1) P Q W(Y,1) R(X)

Avoiding out-of-order:Atomic Operations Atomic operations: atomic-write-after-read (AWAR) E.g., CAS, TAS, Fetch&Add,… RAW fences / AWAR are ~60 slower than (remote) memory accesses • atomic{ • read(Y) • … • write(X,1) • }

Our result Any concurrent program in a certain class must use RAW/AWARs

Which programs? • Concurrent data types: • queues, counters, hash tables, trees,… • Non-commutative operations • Linearizable solo-terminating implementations • Mutual exclusion

Non-commutative operations Operation A is non-commutative if there is operation B where: A influences B and B influences A

Example: Queue enq(v) add v to the end of the queue deq() dequeues item at the head of the queue Q.deq():1;Q.deq():2 Q.deq():2;Q.deq():1 deq() influence each other Q.enq(3):ok;Q.deq():1 Q.deq():1;Q.enq(3):ok enq() is not non-commutative Q 1 2 Q 1 2 3 Q 1 2 3

Proof Intuition: Writing If an operation does not write, it does not influence anyoneIt would be commutative deq • 1 deq • 1 no shared write deq do not influence each other

Proof Intuition: Read If an operation does not read, it is not influenced by anyoneIt would be commutative deq • 1 deq • 1 no shared read deq do not influence each other

Proof Intuition: RAW • deq • 1 • 1 • deq W no RAW • deq • 1 • deq • 1 Linearization

Mutual exclusion (Mutex) Two processes do not hold lock at the same time (Deadlock-freedom) If a process calls Lock() then some process acquires the lock Two Lock() operations influence each other! Every successful lock acquire incurs a RAW/AWAR fence

Who should care? • Concurrent programmers: when is it futile to avoid expensive synchronization • Hardware designers: motivation to lower cost of specific synchronization constructs • API designers: choice of API affects synchronization • Verification engineers: declare incorrect when synchronization is missing “…although I hope that these shortcomings will be addressed, I hasten to add that they are insignificant compared to the huge step forward that this paper represents….” -- Linux Weekly News, Jan 26, 2011

What else? • Weaker operations? E.g., idempotent Work Stealing • Tight lower bounds? • Other patterns • Read-after-read, write-after-write, barriers

And beyond… • The cost of verifying adherence to a locking policy • (Semi-) Automatic insertion of lock acquire / release commands or fences

Thank you!

Inherent limitations facilitate design and verification of concurrent programs