Software Transactions: A Programming-Languages Perspective

Software Transactions: A Programming-Languages Perspective Dan Grossman University of Washington 29 January 2008

Atomic An easier-to-use and harder-to-implement primitive void deposit(int x){ synchronized(this){ int tmp = balance; tmp += x; balance = tmp; }} void deposit(int x){ atomic { int tmp = balance; tmp += x; balance = tmp; }} lock acquire/release (behave as if) no interleaved computation Dan Grossman, Software Transactions

Viewpoints Software transactions good for: • Software engineering (avoid races & deadlocks) • Performance (optimistic “no conflict” without locks) Research should be guiding: • New hardware with transactional support • Software support • Semantic mismatch between language & hardware • Prediction: hardware for the common/simple case • May be fast enough without hardware • Lots of nontransactional hardware exists Dan Grossman, Software Transactions

PL Perspective Complementary to lower-level implementation work Motivation: • What is the essence of the advantage over locks? Language design: • Rigorous high-level semantics • Interaction with rest of the language Language implementation: • Interaction with modern compilers • New optimization needs Answers urgently needed for the multicore era Dan Grossman, Software Transactions

Today, part 1 Language design, semantics: • Motivation: Example + the GC analogy [OOPSLA07] • Semantics: strong vs. weak isolation [PLDI07]* [POPL08] • Interaction w/ other features [ICFP05][SCHEME07][POPL08] * Joint work with Intel PSL Dan Grossman, Software Transactions

Today, part 2 Implementation: • On one core [ICFP05] [SCHEME07] • Static optimizations for strong isolation [PLDI07]* • Multithreaded transactions * Joint work with Intel PSL Dan Grossman, Software Transactions

Code evolution void deposit(…) { synchronized(this) { … }} void withdraw(…) { synchronized(this) { … }} int balance(…) { synchronized(this) { … }} Dan Grossman, Software Transactions

Code evolution void deposit(…) { synchronized(this) { … }} void withdraw(…) { synchronized(this) { … }} int balance(…) { synchronized(this) { … }} void transfer(Acct from, int amt) { if(from.balance()>=amt && amt<maxXfer) { from.withdraw(amt); this.deposit(amt); } } Dan Grossman, Software Transactions

Code evolution void deposit(…) { synchronized(this) { … }} void withdraw(…) { synchronized(this) { … }} int balance(…) { synchronized(this) { … }} void transfer(Acct from, int amt) { synchronized(this) { //race if(from.balance()>=amt && amt<maxXfer) { from.withdraw(amt); this.deposit(amt); } } } Dan Grossman, Software Transactions

Code evolution void deposit(…) { synchronized(this) { … }} void withdraw(…) { synchronized(this) { … }} int balance(…) { synchronized(this) { … }} void transfer(Acct from, int amt) { synchronized(this) { synchronized(from){ //deadlock (still) if(from.balance()>=amt && amt<maxXfer) { from.withdraw(amt); this.deposit(amt); } }} } Dan Grossman, Software Transactions

Code evolution void deposit(…) { atomic { … }} void withdraw(…) { atomic { … }} int balance(…) { atomic { … }} Dan Grossman, Software Transactions

Code evolution void deposit(…) { atomic { … }} void withdraw(…) { atomic { … }} int balance(…) { atomic { … }} void transfer(Acct from, int amt) { //race if(from.balance()>=amt && amt<maxXfer) { from.withdraw(amt); this.deposit(amt); } } Dan Grossman, Software Transactions

Code evolution void deposit(…) { atomic { … }} void withdraw(…) { atomic { … }} int balance(…) { atomic { … }} void transfer(Acct from, int amt) { atomic { //correct and parallelism-preserving! if(from.balance()>=amt && amt<maxXfer){ from.withdraw(amt); this.deposit(amt); } } } Dan Grossman, Software Transactions

But can we generalize So transactions sure looks appealing… But what is the essence of the benefit? Transactional Memory (TM) is to shared-memory concurrency as Garbage Collection (GC) is to memory management Dan Grossman, Software Transactions

roots heap objects GC in 60 seconds • Allocate objects in the heap • Deallocate objects to reuse heap space • If too soon, dangling-pointer dereferences • If too late, poor performance / space exhaustion Automate deallocation via reachabilityapproximation Dan Grossman, Software Transactions

GC Bottom-line Established technology with widely accepted benefits • Even though it can perform arbitrarily badly in theory • Even though you can’t always ignore how GC works (at a high-level) • Even though an active research area after 40 years Now about that analogy… Dan Grossman, Software Transactions

concurrent programming race conditions loss of parallelism deadlock lock lock acquisition The problem, part 1 Why memory management is hard: Balance correctness (avoid dangling pointers) And performance (no space waste or exhaustion) Manual approaches require whole-program protocols Example: Manual reference count for each object • Must avoid garbage cycles Dan Grossman, Software Transactions

synchronization release locks are held concurrent The problem, part 2 Manual memory-management is non-modular: • Caller and callee must know what each other access or deallocate to ensure right memory is live • A small change can require wide-scale code changes • Correctness requires knowing what data subsequent computation will access Dan Grossman, Software Transactions

TM thread-shared thread-local The solution Move whole-program protocol to language implementation • One-size-fits-most implemented by experts • Usually inside the compiler and run-time • GC system uses subtle invariants, e.g.: • Object header-word bits • No unknown mature pointers to nursery objects Dan Grossman, Software Transactions

So far… Dan Grossman, Software Transactions

memory conflict TM run-in-parallel Open nested txns unique id generation locking TM TM Incomplete solution GC a bad idea when “reachable” is a bad approximation of “cannot-be-deallocated” Weak pointers overcome this fundamental limitation • Best used by experts for well-recognized idioms (e.g., software caches) In extreme, programmers can encode manual memory management on top of GC • Destroys most of GC’s advantages Dan Grossman, Software Transactions

Circumventing TM class SpinLock { private boolean b = false; void acquire() { while(true) atomic { if(b) continue; b = true; return; } } void release() { atomic { b = false; } } } Dan Grossman, Software Transactions

It really keeps going (see the essay) Dan Grossman, Software Transactions

Lesson Transactional memory is to shared-memory concurrency as garbage collection is to memory management Huge but incomplete help for correct, efficient software Analogy should help guide transactions research Dan Grossman, Software Transactions

Today, part 1 Language design, semantics: • Motivation: Example + the GC analogy [OOPSLA07] • Semantics: strong vs. weak isolation [PLDI07]* [POPL08] [[Katherine Moore] • Interaction w/ other features [ICFP05][SCHEME07][POPL08] * Joint work with Intel PSL Dan Grossman, Software Transactions

“Weak” isolation initially y==0 Widespread misconception: “Weak” isolation violates the “all-at-once” property only if corresponding lock code has a race (May still be a bad thing, but smart people disagree.) atomic { y = 1; x = 3; y = x; } x = 2; print(y); //1? 2? 5577? Dan Grossman, Software Transactions

It’s worse Privatization: One of several examples where lock code works and weak-isolation transactions do not ptr initially ptr.f == ptr.g sync(lk) { r = ptr; ptr = new C(); } assert(r.f==r.g); sync(lk) { ++ptr.f; ++ptr.g; } f g (Example adapted from [Rajwar/Larus] and [Hudson et al]) Dan Grossman, Software Transactions

It’s worse Every published weak-isolation system lets the assertion fail! • Eager-update or lazy-update ptr f g initially ptr.f == ptr.g atomic { r = ptr; ptr = new C(); } assert(r.f==r.g); atomic { ++ptr.f; ++ptr.g; } Dan Grossman, Software Transactions

The need for semantics • Which is wrong: the privatization code or the transactions implementation? • What other “gotchas” exist? • What language/coding restrictions suffice to avoid them? • Can programmers correctly use transactions without understanding their implementation? • What makes an implementation correct? Only rigorous source-level semantics can answer Dan Grossman, Software Transactions

What we did Formal operational semantics for a collection of similar languages that have different isolation properties Program state allows at most one live transaction: a;H;e1|| … ||en a’;H’;e1’|| … ||en’ Multiple languages, including: Dan Grossman, Software Transactions

What we did Formal operational semantics for a collection of similar languages that have different isolation properties Program state allows at most one live transaction: a;H;e1|| … ||en a’;H’;e1’|| … ||en’ Multiple languages, including: 1. “Strong”: If one thread is in a transaction, no other thread may use shared memory or enter a transaction Dan Grossman, Software Transactions

What we did Formal operational semantics for a collection of similar languages that have different isolation properties Program state allows at most one live transaction: a;H;e1|| … ||en a’;H’;e1’|| … ||en’ Multiple languages, including: 2. “Weak-1-lock”: If one thread is in a transaction, no other thread may enter a transaction Dan Grossman, Software Transactions

What we did Formal operational semantics for a collection of similar languages that have different isolation properties Program state allows at most one live transaction: a;H;e1|| … ||en a’;H’;e1’|| … ||en’ Multiple languages, including: 3. “Weak-undo”: Like weak, plus a transaction may abort at any point, undoing its changes and restarting Dan Grossman, Software Transactions

A family Now we have a family of languages: “Strong”: … other threads can’t use memory or start transactions “Weak-1-lock”: … other threads can’t start transactions “Weak-undo”: like weak, plus undo/restart So we can study how family members differ and conditions under which they are the same Oh, and we have a kooky, ooky name: The AtomsFamily Dan Grossman, Software Transactions

Easy Theorems Theorem: Every program behavior in strong is possible in weak-1-lock Theorem: weak-1-lock allows behaviors strong does not Theorem: Every program behavior in weak-1-lock is possible in weak-undo Theorem: (slightly more surprising): weak-undo allows behavior weak-1-lock does not Dan Grossman, Software Transactions

Hard theorems Consider a (formally defined) type system that ensures any mutable memory is either: • Only accessed in transactions • Only accessed outside transactions Theorem: If a program type-checks, it has the same possible behaviors under strong and weak-1-lock Theorem: If a program type-checks, it has the same possible behaviors under weak-1-lock and weak-undo Dan Grossman, Software Transactions

A few months in 1 picture strong-undo strong weak-1-lock weak-undo Dan Grossman, Software Transactions

Lesson Weak isolation has surprising behavior; formal semantics let’s us model the behavior and prove sufficient conditions for avoiding it In other words: With a (too) restrictive type system, get semantics of strong and performance of weak Dan Grossman, Software Transactions

Today, part 1 Language design, semantics: • Motivation: Example + the GC analogy [OOPSLA07] • Semantics: strong vs. weak isolation [PLDI07]* [POPL08] • Interaction w/ other features [ICFP05][SCHEME07][POPL08] * Joint work with Intel PSL Dan Grossman, Software Transactions

What if… Real languages need precise semantics for all feature interactions. For example: • Native Calls [Ringenburg] • Exceptions [Ringenburg, Kimball] • First-class continuations [Kimball] • Thread-creation [Moore] • Java-style class-loading [Hindman] • Open: Bad interactions with memory-consistency model See joint work with Manson and Pugh[MSPC06] Dan Grossman, Software Transactions

Today, part 2 Implementation: • On one core [ICFP05] [SCHEME07] [Michael Ringenburg, Aaron Kimball] • Static optimizations for strong isolation [PLDI07]* • Multithreaded transactions * Joint work with Intel PSL Dan Grossman, Software Transactions

Interleaved execution The “uniprocessor (and then some)” assumption: Threads communicating via shared memory don't execute in “true parallel” Important special case: • Uniprocessors still exist • Many language implementations assume it (e.g., OCaml, Scheme48) • Multicore may assign one core to an application Dan Grossman, Software Transactions

Implementing atomic Key pieces: • Execution of an atomic block logs writes • If scheduler pre-empts during atomic, rollback the thread • Duplicate code or bytecode-interpreter dispatch so non-atomic code is not slowed by logging Dan Grossman, Software Transactions

Logging example Executing atomic block: • build LIFO log of old values: int x=0, y=0; void f() { int z = y+1; x = z; } void g() { y = x+1; } void h() { atomic { y = 2; f(); g(); } } y:0 z:? x:0 y:2 Rollback on pre-emption: • Pop log, doing assignments • Set program counter and stack to beginning of atomic On exit from atomic: • Drop log Dan Grossman, Software Transactions

Logging efficiency Keep the log small: • Don’t log reads (key uniprocessor advantage) • Need not log memory allocated after atomic entered • Particularly initialization writes • Need not log an address more than once • To keep logging fast, switch from array to hashtable after “many” (50) log entries y:0 z:? x:0 y:2 Dan Grossman, Software Transactions

Evaluation Strong isolation on uniprocessors at little cost • See papers for “in the noise” performance • Memory-access overhead Recall initialization writes need not be logged • Rare rollback Dan Grossman, Software Transactions

Lesson Implementing transactions in software for a uniprocessor is so efficient it deserves special-casing Note: Don’t run other multicore services on a uniprocessor either Dan Grossman, Software Transactions

Today, part 2 Implementation: • On one core [ICFP05] [SCHEME07] • Static optimizations for strong isolation [PLDI07]* [Steven Balensiefer, Benjamin Hindman] • Multithreaded transactions * Joint work with Intel PSL Dan Grossman, Software Transactions

Strong performance problem Recall uniprocessor overhead: With parallelism: Dan Grossman, Software Transactions

Optimizing away strong’s cost New: static analysis for not-accessed-in-transaction … Thread local Not accessed in transaction Immutable Dan Grossman, Software Transactions

Software Transactions: A Programming-Languages Perspective