1 / 43

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM).

azana
Download Presentation

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory TRANSACT 2014 Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy

  2. Multicore Performance Scaling 2

  3. Hardware Transactional Memory (HTM) IBM’s Blue Gene/Q & System Z & Power Architecture Intel’s Haswell TSX: RTM & HLE Low overhead (cache based) 3

  4. Haswell RTM if (_xbegin() == _XBEGIN_STARTED) Speculate Execution _xend() else Abort Handler Speculate Execution, without any locks Read and Write Sets Abort on memory conflict 4

  5. Haswell RTM if (_xbegin() == _XBEGIN_STARTED) Speculate Execution _xend() _xbegin() _xbegin() Add to Write Set Write X ABORT Add to Read Set Read X Add to Write Set Write Y Write Y Add to Write Set _xend() COMMIT _xend() Make the change to Y visible 5

  6. Lock Elision <HLE_Aquire_Prefix> Lock(L) Atomic region executed as a transaction or mutually exclusive on L <HLE_Release_Prefix> Release(L) Execute optimistically, without any locks Track Read and Write Sets Abort on memory conflict: rollback acquire lock 6

  7. 7 [Anand Tech]

  8. Haswell RTM Small & Medium Transactions Best-effort Conflicts Overflow Unsupported Instructions Interrupts Needs software fallback 8

  9. Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 9

  10. Single Global Lock HyTM (simple and common) Begin HW txn Read L Try_SPEC: Wait until Lock is free Transactional_Read(Lock) If Lock is taken ABORT Speculate critical section End speculation End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 10

  11. Single Global Lock HyTM (simple and common) Begin HW txn Read L Begin HW txn Read L Begin SW txn Acquire L Begin HW txn Read L X X X Begin HW txn Read L End HW txn (1) Time End HW txn (2) X Release L End SW txn End HW txn (3) End HW txn (4) Legend: X = ABORT 11

  12. Thread 2 Thread 1 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Time Release(L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION End_HW_TXN (L) Execution Time 1 12

  13. Thread 2 Thread 1 Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Acquire(L) CRITICAL SECTION (SW TXN) Begin_HW_TXN(L) CRITICAL SECTION Time Release(L) End_HW_TXN(L) Begin_HW_TXN (L) Begin_HW_TXN (L) CRITICAL SECTION CRITICAL SECTION End_HW_TXN (L) End_HW_TXN (L) Execution Time 2 Execution Time 1 13

  14. Lazy SGL Begin HW txn Try_SPEC: Speculate critical section Transactional_Read(Lock) If Lock is taken ABORT End speculation Read L End HW txn Begin SW txn Acquire L On_ABORT: If try_lock(Lock) Critical section Release(Lock) Else Try_SPEC Does not abort! Release L End SW txn 14 14

  15. Lazy SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn X Begin HW txn Read L End HW txn (1) X Read L End HW txn (2) Time Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 15

  16. Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 16

  17. Transactional Memory Correctness Transaction 1 SW Order T2 BEFORE T1 Transaction 2 HW Time COMMIT Order T2 AFTER T1 COMMIT 17

  18. Case 1: HW begins SW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock X value: Time b a Correct: a Actual: b Check Lock Correct: a Actual: a ABORT 18

  19. Case 2: SW begins HW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check Lock ABORT 19

  20. Case 3: SW begins HW begins SW ends HW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time b a Check Lock Correct: b Actual: b COMMIT 20

  21. Case 4: HW begins SW begins SW ends HW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock Time X value: Correct: b Actual: b Check Lock COMMIT 21

  22. Hardware Sandboxing Thread 1 (SW) Thread 2 (HW) X = 5; Y = 6 Acquire Lock … ++X … ++Y … Release Lock TXN_BEGIN … Z = 1/(Y-X) … TXN_END Time Z = 1/0 !!! 22

  23. Indirect Jumps Thread 1 (SW) Thread 2 (HW) _xbegin … if (X == Y) *p = garbage p() … if (lock) abort _xend X = 5; Y = 6 Acquire Lock … ++X … ++Y … Release Lock _xend Indirect jump to garbage location Time 23

  24. Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 24

  25. Intruder (medium txns) Better 25

  26. Improved Lock Acquisition Rate Vacation Low (medium txns) Intruder (medium txns) Better Kmeans High (small txns) Labyrinth (large txns) 26

  27. No single thread overhead Slowdown relative to sequential for 1 thread 27

  28. Overview • Best-effort Hardware Transactional Memory Description • Lazy SGL Correctness Results • Bloom Filter SGL 28

  29. Bloom Filters • Efficient probabilistic data structure to compute fast set intersection • Can admit false positives • No false negatives • Used in TM for Conflict Detection 29

  30. Lazy SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn X Begin HW txn Read L End HW txn (1) X Read L End HW txn (2) Time Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 30

  31. BF SGL Begin HW txn Begin HW txn Begin SW txn Acquire L Begin HW txn Begin HW txn Check BF End HW txn (1) Time Check BF End HW txn (2) Release L End SW txn Read L End HW txn (3) Read L End HW txn (4) Legend: X = ABORT COMMIT COMMIT 31

  32. Case 1: HW begins SW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) TXN_BEGIN … X = b … TXN_END Acquire Lock … X = a … Release Lock X value: Time b a Correct: a Actual: b Check BF Check Lock Correct: a Actual: a ABORT If BFs intersect: ABORT Else: COMMIT 32

  33. Case 2: SW begins HW begins HW ends SW ends Thread 1 (SW) Thread 2 (HW) Acquire Lock … X = a … Release Lock TXN_BEGIN … X = b … TXN_END X value: Time Correct: a Actual: a Correct: a Actual: b Check Lock Check BF ABORT If BFs intersect: ABORT Else: COMMIT 33

  34. Conclusions • HTMs are becoming more available • Best-effort – need software fallback • Eager SGL • simple and fast fallback, • often preferred to more efficient solutions 34

  35. Conclusions • Lazy SGL • as simple as Eager SGL • more efficient • Bloom Filter SGL • more accurate conflict detection • Slower • Can be implemented directly in hardware 35

  36. References http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg

  37. Medium transactions 38

  38. Small transactions 39

  39. Large transactions 40

  40. 41

  41. 42

More Related