1 / 72

Transactional Memory

Transactional Memory. Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit. Traditional Software Scaling. 7x. Speedup. 3.6x. 1.8x. User code. Traditional Uniprocessor. Time: Moore’s law. 7x. 3.6x. Speedup. 1.8x. Multicore Software Scaling.

ttraylor
Download Presentation

Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transactional Memory Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Art of Multiprocessor Programming

  2. Traditional Software Scaling 7x Speedup 3.6x 1.8x User code Traditional Uniprocessor Time: Moore’s law

  3. 7x 3.6x Speedup 1.8x Multicore Software Scaling User code Multicore Unfortunately, not so simple…

  4. Real-World Multicore Scaling Speedup 2.9x 2x 1.8x User code Multicore Parallelization and Synchronization require great care…

  5. Why? Amdahl’s Law: Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! As num cores grows the effect of 25% becomes more accute 2.3/4, 2.9/8, 3.4/16, 3.7/32….

  6. Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c Shared Data Structures Fine Grained Coarse Grained 25% Shared 25% Shared 75% Unshared 75% Unshared

  7. b c d a A FIFO Queue Head Tail Dequeue() => a Enqueue(d)

  8. Head Tail a b c d Q: Enqueue(d) P: Dequeue() => a A Concurrent FIFO Queue Simple Code, easy to prove correct Object lock Contention and sequential bottleneck

  9. a b c d Fine Grain Locks Finer Granularity, More Complex Code Head Tail Q: Enqueue(d) P: Dequeue() => a Verification nightmare: worry about deadlock, livelock…

  10. a a b c d b Head Tail Fine Grain Locks Complex boundary cases: empty queue, last item Head Tail Q: Enqueue(b) P: Dequeue() => a Worry how to acquire multiple locks

  11. Moreover: Locking Relies on Conventions • Relation between • Lock bit and object bits • Exists only in programmer’s mind Actual comment from Linux Kernel (hat tip: Bradley Kuszmaul) • /* • * When a locked buffer is visible to the I/O layer • * BH_Launder is set. This means before unlocking • * we must clear BH_Launder,mb() on alpha and then • * clear BH_Lock, so no reader can see BH_Launder set • * on an unlocked buffer and then risk to deadlock. • */ Art of Multiprocessor Programming 11

  12. a b c d Lock-Free (JDK 1.5+) Even Finer Granularity, Even More Complex Code Head Tail Q: Enqueue(d) P: Dequeue() => a Worry about starvation, subtle bugs, hardness to modify…

  13. a b a d b c d c Composing Objects Complex: Move data atomically between structures Enqueue(Q2,a) Head Head Tail Tail P: Dequeue(Q1,a) More than twice the worry…

  14. a b c d Transactional Memory Great Performance, Simple Code Head Tail Q: Enqueue(d) P: Dequeue() => a Don’t worry about deadlock, livelock, subtle bugs, etc…

  15. a a b c d b Head Tail Promise of Transactional Memory Don’t worry which locks need to cover which variables when… Head Tail Q: Enqueue(d) P: Dequeue() => a TM deals with boundary cases under the hood

  16. a b a d b c d c Composing Objects Will be easy to modify multiple structures atomically Enqueue(Q2,a) Head Head Tail Tail P: Dequeue(Q1,a) ProvideComposability…

  17. The Transactional Manifesto • Current practice inadequate • to meet the multicore challenge • Research Agenda • Replace locking with a transactional API • Design languages to support this model • Implement the run-time to be fast enough Art of Multiprocessor Programming 17

  18. Transactions • Atomic • Commit: takes effect • Abort: effects rolled back • Usually retried • Serizalizable • Appear to happen in one-at-a-time order Art of Multiprocessor Programming 18

  19. Atomic Blocks atomic{ x.remove(3); y.add(3);}atomic{ y =null; } Art of Multiprocessor Programming 19

  20. Atomic Blocks atomic { x.remove(3); y.add(3);}atomic {y =null; } No data race Art of Multiprocessor Programming 20

  21. Designing a FIFO Queue Public void LeftEnq(item x) { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } Write sequential Code Art of Multiprocessor Programming 21

  22. Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } } Art of Multiprocessor Programming 22

  23. Designing a FIFO Queue Public void LeftEnq(item x) { atomic { Qnode q = new Qnode(x); q.left = this.left; this.left.right = q; this.left = q; } } Enclose in atomic block Art of Multiprocessor Programming 23

  24. Not always this simple Conditional waits Enhanced concurrency Complex patterns But often it is Works for sadistic homework Warning Art of Multiprocessor Programming 24

  25. Composition Public void Transfer(Queue<T> q1, q2) { atomic { T x = q1.deq(); q2.enq(x); } } Trivial or what? Art of Multiprocessor Programming 25

  26. Roll Back Public T LeftDeq() { atomic { if (this.left == null) retry; … } } Roll back transaction and restart when something changes Art of Multiprocessor Programming 26

  27. OrElse Composition atomic { x = q1.deq(); } orElse { x = q2.deq(); } Run 1st method. If it retries … Run 2nd method. If it retries … Entire statement retries Art of Multiprocessor Programming 27

  28. Transactional Memory • Software transactional memory (STM) • Hardware transactional memory (HTM) • Hybrid transactional memory (HyTM, try in hardware and default to software if unsuccessful) Art of Multiprocessor Programming 28

  29. Hardware versus Software • Do we need hardware at all? • Analogies: • Virtual memory: yes! • Garbage collection: no! • Probably do need HW for performance • Do we need software? • Policy issues don’t make sense for hardware Art of Multiprocessor Programming 29

  30. Transactional Consistency • Memory Transactions are collections of reads and writes executed atomically • Tranactions should maintain internal and external consistency • External: with respect to the interleavings of other transactions. • Internal: the transaction itself should operate on a consistent state.

  31. External Consistency Invariant x = 2y 4 x Transaction A: Write x Write y 8 2 y 4 Transaction B: Read x Read y Compute z = 1/(x-y) = 1/2 Application Memory Art of Multiprocessor Programming

  32. Simple Lock-Based STM • STMs come in different forms • Lock-based • Lock-free • Here we will describe a simple lock-based STM Art of Multiprocessor Programming

  33. Synchronization • Transaction keeps • Read set: locations & values read • Write set: locations & values to be written • Deferred update • Changes installed at commit • Lazy conflict detection • Conflicts detected at commit Art of Multiprocessor Programming

  34. STM: Transactional Locking Map V# Array of Versioned- Write-Locks Application Memory V# V# Art of Multiprocessor Programming 34

  35. V# V# V# V# Reading an Object Mem Locks V# • Put V#s & value in RS • If not already locked Art of Multiprocessor Programming 35

  36. To Write an Object V# V# V# V# Mem Locks V# • Add V# and new value to WS Art of Multiprocessor Programming 36

  37. To Commit V# V# Mem Locks V# • Acquire W locks • Check V#s unchanged • In RS & WS • Install new values • Increment V#s • Release … X V#+1 V# Y V#+1 V# Art of Multiprocessor Programming 37

  38. Problem: Internal Inconsistency • A Zombie is a currently active transaction that is destined to abort because it saw an inconsistent state • If Zombies see inconsistent states errors can occur and the fact that the transaction will eventually abort does not save us

  39. Internal Consistency Invariant x = 2y 4 x Transaction B: Read x = 4 8 2 Transaction A: Write x (kills B) Write y y 4 Transaction B: (zombie) Read y = 4 Compute z = 1/(x-y) Application Memory DIV by 0 ERROR Art of Multiprocessor Programming

  40. Solution: The “Global Clock” • Have one shared global clock • Incremented by (small subset of) writing transactions • Read by all transactions • Used to validate that state worked on is always consistent Art of Multiprocessor Programming

  41. Read-Only Transactions 100 Shared Version Clock Mem Locks • Copy V Clock to RV • Read lock,V# • Read mem • Check unlocked • Recheck V# unchanged • Check V# < RV 12 Reads form a snapshot of memory. No read set! 32 56 100 19 17 Private Read Version (RV) Art of Multiprocessor Programming 41

  42. Regular Transactions 100 Shared Version Clock Mem Locks • Copy V Clock to RV • On read/write, check: • Unlocked • V# ≤ RV • Add to R/W set 12 32 56 19 100 17 Private Read Version (RV) Art of Multiprocessor Programming 42

  43. Regular Transactions Mem Locks • Acquire locks • WV = F&Inc(V Clock) • Check each V# ≤ RV • Update memory • Set write V#s to WV 12 x 100 32 56 19 100 100 101 100 y 17 Private Read Version (RV) Shared Version Clock Art of Multiprocessor Programming 43

  44. Hardware Transactional Memory • Exploit Cache coherence • Already almost does it • Invalidation • Consistency checking • Speculative execution • Branch prediction = optimistic synch! Art of Multiprocessor Programming 44

  45. T HW Transactional Memory read active caches Interconnect memory Art of Multiprocessor Programming 45

  46. T T Transactional Memory read active active caches memory Art of Multiprocessor Programming 46

  47. T T Transactional Memory active committed active caches memory Art of Multiprocessor Programming 47

  48. D T Transactional Memory write committed active caches memory Art of Multiprocessor Programming 48

  49. D T T Rewind write aborted active active caches memory Art of Multiprocessor Programming 49

  50. Transaction Commit • At commit point • If no cache conflicts, we win. • Mark transactional entries • Read-only: valid • Modified: dirty (eventually written back) • That’s all, folks! • Except for a few details … Art of Multiprocessor Programming 50

More Related