1 / 62

Two Ways of Speeding Up Transactional Memory Algorithms

Two Ways of Speeding Up Transactional Memory Algorithms. Vincent Gramoli Joint work with Pascal Felber, Rachid Guerraoui, Derin Harmanci. Roadmap. Motivations Transactional Memory Problems of Efficiency Input Acceptance Elastic Transactions Conclusion. Single CPU Limitations.

karlyn
Download Presentation

Two Ways of Speeding Up Transactional Memory Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two Ways of Speeding Up Transactional Memory Algorithms Vincent Gramoli Joint work with Pascal Felber, Rachid Guerraoui, Derin Harmanci

  2. Roadmap • Motivations • Transactional Memory • Problems of Efficiency • Input Acceptance • Elastic Transactions • Conclusion

  3. Single CPU Limitations • Transistor size still decreases [Moore’s law] • Induced overheating disturbs computation • Clock speed no longer doubles since 2004 [“The free lunch is over” by Herb Sutter]

  4. Manufactured Multicores SUN Niagara 2 w/ 8 cores & 64 HW threads Intel COO announces Multicore revolution AMD announces the 2-core Opteron Intel announces 6-core Xeon 7000 series Intel anounces 4-core Xeon 5000 series SUN announces the 8-core Niagara AMD announces the 4-core Opteron Intel announces 8-core Nahelem EX

  5. Concurrent Programming • Difficult task: • Using locks, how to avoid deadlock? Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);}

  6. Concurrent Programming • Difficult task: • Using locks, how to avoid deadlock? Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);} • Using lock-free (LF) primitives, how can composition preserve atomicity? LF-move(x,y) ≠ LF-delete(x) + LF-insert(y)

  7. Concurrent Programming • Difficult task: • Using locks, how to avoid deadlock? Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);} • Using lock-free (LF) primitives, how can composition preserve atomicity? LF-move(x,y) ≠ LF-delete(x) + LF-insert(y) • Dedicated to expert programmers: • Database programmers • Scientific computing programmers • What about other programmers?

  8. Concurrent Programming • Difficult task: • Using locks, how to avoid deadlock? Thread1 {lock(x); lock(y);} // Thread2 {lock(y); lock(x);} • Using lock-free (LF) primitives, how can composition preserve atomicity? LF-move(x,y) ≠ LF-delete(x) + LF-insert(y) • Dedicated to expert programmers: • Database programmers • Scientific computing programmers • What about other programmers? • Democratizing multicores requires new programming abstractions

  9. Roadmap • Motivations • Transactional Memory • Problems of Efficiency • Input Acceptance • Elastic Transactions • Conclusion

  10. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX Assume we want to read (R) and write (W) a shared bank account ‘act’ atomically. We simply have to label the region of the sequential code using transaction delimiters BEGIN_TX and END_TX

  11. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently after this point, operations will be handled by the TM BEGIN_TX R(act) W(act,v) END_TX TM

  12. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX read through the TM? TM

  13. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX read through the TM? Sounds good, I keep track of your read TM

  14. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX you can return v1 TM

  15. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v’) END_TX BEGIN_TX R(act) W(act,v) END_TX write through the TM? TM

  16. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v’) END_TX BEGIN_TX R(act) W(act,v) END_TX write through the TM? Sounds good, I keep track of your write TM

  17. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v’) END_TX BEGIN_TX R(act) W(act,v) END_TX write has been scheduled TM

  18. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX write through the TM? TM

  19. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX write through the TM? No way, there is a risk of safety violation TM

  20. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX abort, roll-back, and restart the whole transaction later on No way, there is a risk of safety violation TM

  21. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently BEGIN_TX R(act) W(act,v) END_TX after this point, all operations become unprotected again

  22. Transactional Memory An abstraction: a black box that encapsulates all synchronizations • all read/write accesses to shared data are protected transparently • atomicity is preserved under transaction composition delete(acc, amt) { BEGIN_TX v = R(act) W(act,v-amt) END_TX } insert(acc, amt) { BEGIN_TX v = R(act) W(act,v+amt) END_TX } move(acc1, acc2, amt) { BEGIN_TX delete(act1, amt) insert(acc2, amt) END_TX } + =

  23. Roadmap • Motivations • Transactional Memory • Problems of Efficiency • Input Acceptance • Elastic Transactions • Conclusion

  24. 1st Problem: Wasted Effort Problem Transactions waste efforts while aborting and rolling-back Some aborts are unnecessary BEGIN_TX W(x) END_TX BEGIN_TX R(x) END_TX (1) (2) (3) (4) Although transactions can commit safely one is aborted by common STMs: TL2, WSTM, DSTM, TinySTM

  25. 2nd Problem: Lack of Concurrency Transactions ensure stronger guarantees than necessary Example: sorted linked list implementation of integer set insert(x)/ search(z) x y z t h search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX

  26. 2nd Problem: Lack of Concurrency Transactions ensure stronger guarantees than necessary Example: sorted linked list implementation of integer set Both transactions could commit w/o violating linked list linearizability, but transactional models consider read/write atomicity. insert(x)/ search(z) x y z t h search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX

  27. Roadmap Motivations Transactional Memory Problems of Efficiency Input Acceptance Elastic Transactions Conclusion

  28. A Metric for Input Acceptance • TM efficiency depends on • Execution speed • Number of successful (committed) transactions

  29. A Metric for Input Acceptance • TM efficiency depends on • Execution speed • Number of successful (committed) transactions TM

  30. A Metric for Input Acceptance • TM efficiency depends on • Execution speed • Number of successful (committed) transactions • The Input acceptance is the ability for a TM to commit transactions • The commit-abort ratio is “σ”: # committed tx / # complete tx TM

  31. How do STMs perform w.r.t. this metric? • Ideal goal: no abort (σ = 1) • A TM accepts an input if σ = 1 • What is accepted by the existing STMs?

  32. Identifying TM designs

  33. Formalizing Workload as an Input Events (i.e., an alphabet): si: start event of transaction i wxi: write request of transaction i on location x rxi: read request of transaction i on location x π(x)i: any event of transaction i (on location x) ci: commit request of transaction i An input pattern is a totally ordered set of events (i.e., a word) An input class is a set of input patterns (i.e., a language): | represents the choice (e.g., “a | b” means “a” or “b”) * represents the Kleene closure (e.g., “a*” means “ε|a|aa|…”) ¬ represents the complement (e.g., “¬a” means “any event but a”)

  34. Input Acceptance Upper-bound of VWIR • Theorem. There is no VWIR design that accepts the following input class: • C2 = π∗ (rxi ¬ci∗ wxj ¬ci∗ cj | wxj ¬cj∗ rxi) π∗ .

  35. Input Acceptance Upper-bound of VWIR • Theorem. There is no VWIR design that accepts the following input class: • C2 = π∗ (rxi ¬ci∗ wxj ¬ci∗ cj | wxj ¬cj∗ rxi) π∗ . BEGIN_TX W(x) END_TX BEGIN_TX R(x) END_TX

  36. Going further • Other classes: • C 1 = π∗ (πxi ¬ci∗ wxj | wxj ¬cj∗ πxi) π∗ • C 3 = π∗ (rxi ¬ci∗ wxj | wxj ¬cj∗ rxi ) ¬ci∗ cj π∗ • C 4 = (¬wx)∗ rxi ¬ci∗ wxj ¬ci∗ cj ¬ci∗ sk ¬(ci |ck|rxk)∗ wyk • ¬(ci |ck | rxk )∗ ck ¬ci∗ ryi π∗ • Other impossibility results: • Theorem 1. VWVR design does not accept input class C1. • Theorem 3. IWIR design does not accept input class C3. • Theorem 4. CTR design does not accept input class C4.

  37. Input Acceptance Classification VWVR (e.g. SXM) ~C1

  38. Input Acceptance Classification VWVR (e.g. SXM) ~C1 ~C2 VWIR (e.g., DSTM, TinySTM)

  39. Input Acceptance Classification IWIR (e.g., WSTM TL2) VWVR (e.g. SXM) ~C1 ~C2 ~C3 VWIR (e.g., DSTM, TinySTM)

  40. Input Acceptance Classification IWIR (e.g., WSTM TL2) VWVR (e.g. SXM) ~C1 ~C2 ~C3 ~C4 VWIR (e.g., DSTM, TinySTM) CTR (e.g., TSTM)

  41. Input Acceptance Classification Serializable STM needs to track all conflicts IWIR (e.g., WSTM TL2) RTR (e.g., SSTM) VWVR (e.g. SXM) ~C1 ~C2 ~C3 ~C4 VWIR (e.g., DSTM, TinySTM) ~C5 CTR (e.g., TSTM) C5 = Ø

  42. Experimental Validation: Scalability 20% Update operations: 10% linked-list insert, 10% linked-list delete 80% Other operations: linked-list contains Dual quad-core Intel Xeon

  43. Roadmap Motivations Transactional Memory Problem Input Acceptance Elastic Transactions Conclusion

  44. Software Transactional Memories • TinySTM, LSA-STM, SSTM, SwissTM: efficient? insert(x)/ search(z) x y z t h

  45. Software Transactional Memories • TinySTM, LSA-STM, SSTM, SwissTM: efficient? insert(x)/ search(z) x y z t h search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX

  46. Software Transactional Memories • TinySTM, LSA-STM, SSTM, SwissTM: efficient? insert(x)/ search(z) x y z t h search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX Both transactions cannot commit, because read/write atomicity is violated even though linked list linearizability is guaranteed.

  47. Elastic Transactional Memory (ε-STM) • Elastic transactions: weaker than normal ones insert(x)/ search(z) x y z t h search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX The goal is to cut transactions into sub-parts

  48. Elastic Transactional Memory (ε-STM) • Elastic transactions: weaker than normal ones search(z) insert(x) search(z) insert(x) BEGIN_TX R(h) R(y) R(z) END_TX BEGIN_TX … W(h) END_TX BEGIN_EL_TX R(h) R(y) R(z) END_TX BEGIN_EL_TX … W(h) END_TX Cut • It is cut in 2 parts w/ resp. ops π(x,*) and π(y,*) if: • there are no two writes on x and y between. • all writes are in the same part; • the first op of any part is a read;

  49. Elastic Transactional Memory (ε-STM) • Elastic transactions: weaker than normal ones insert(x)/ search(z) x y z t h • The key idea is that when reading element e: • the predecessor has not changed since it has been read • or e has not changed since the predecessor has been read. • This ensures that the parsing is always consistent although atomicity is relaxed.

  50. Elastic Transactional Memory (ε-STM) • Elastic transactions: • Weaker than normal ones (cannot implement sum) • Compatible with normal ones (retain simplicity)

More Related