1 / 57

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

Marek Olszewski. Jeremy Cutler. Greg Steffan. A Dynamic Binary-Rewriting Approach to Software Transactional Memory. University of Toronto . The Parallel Programming Challenge. Coarse-grained locking Easy to program  Scales poorly  Fine-grained locking Scales well 

brinley
Download Presentation

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Marek Olszewski Jeremy Cutler Greg Steffan A Dynamic Binary-Rewriting Approach to Software Transactional Memory University of Toronto

  2. The Parallel Programming Challenge • Coarse-grained locking • Easy to program  • Scales poorly  • Fine-grained locking • Scales well  • Hard to get right  • eg., deadlock, priority inversion, etc. • The promise of Transactional Memory • As easy to program as coarse-grained locking  • Performance/scalability of fine-grained locking 

  3. Transactional Memory (TM) Transactions: ? ? Source Code: ... atomic { ... access_shared_data(); ... } ... ... atomic { ... access_shared_data(); ... } ... ... atomic { ... access_shared_data(); ... } ... TM System    Programmer: Specifies transactions in source code TM System: Executes transactions optimistically in parallel 1) Checkpoints execution 2) Detects conflicts 3) Commits or aborts and re-executes

  4. TM Implementations • Flavors of TM: • Hardware (HTM), Software (STM), Hybrid (HyTM) • STM is especially compelling • Exploit current commodity hardware (multicores) • Learn about real TM systems and apps • Current STM Systems: • Java: DSTM, ASTM • C or C++: McRT icc, TL2, RSTM, OSTM • object-based or programmer intensive (or both) Our focus: arbitrary C/C++, realistic environment

  5. Programming with STM Loader  Source Code: #include <glib.h> GTree *tree; ... atomic { g_tree_insert(tree &key, &val); } ... Executable: STM Compiler my_app Running Application: my_app Shared Library: glib “Legacy Locks” Pre-compiled Binary kernel System Calls Not handled by current compiler/library-based STMs

  6. JudoSTM: An Overview • Key design choices: • Dynamic Binary Rewriting (DBR) • insert instrumentation to implement STM • Value-based conflict detection • Resulting key features: • Privileged transactions (support system calls) • Legacy lock elision • Efficient invisible readers

  7. JudoSTM Design Choice 1 • Dynamic Binary Rewriting (DBR) • Judo DBR Framework (user-space version of JIFL†) • † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007

  8. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb4 Judo

  9. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb2 bb2 bb3 bb2 bb4 Judo

  10. Dynamic Binary Rewriting Original Code: Code Cache: bb1 bb1 bb1 bb2 bb3 bb2 bb2 bb4 bb4 bb4 Judo

  11. Judo - Performance Normalized Runtime Overhead Overhead low enough to implement STM?

  12. JudoSTM Design Choice 2 • Value-Based Conflict Detection • (as opposed to location-based)

  13. Location-Based Conflict Detection Strip versions: Strip versions: 0 0 0 Strip versions: Strips Transaction 1: Main Memory: 6 2 3 5 2 3 5 Transaction 2: Legend: Read Written

  14. Location-Based Conflict Detection Transaction 1: Transaction 1: 2 3 5 Strip versions: Main Memory: 6 2 3 5 2 3 5 Strip versions: 0 0 0 0 0 Transaction 2: Strip versions: Legend: Read Written

  15. Location-Based Conflict Detection 6 2 3 5 Transaction 1: 2 3 5 Strip versions: 0 Main Memory: 6 2 Strip versions: 0 0 0 0 0 Transaction 2: Transaction 2: 6 9 Strip versions: Legend: Read Written

  16. Location-Based Conflict Detection 6 2 3 5 6 9 Transaction 1: 2 3 5 Strip versions: 0 Main Memory: 6 2 Strip versions: 0 1 0 0 0  Transaction 2: Transaction 2: 9 Strip versions: 0  Commit step 1) Validate Read Set Commit step 2) Publish Writes (and inc version #s) Legend: Read Written

  17. Location-Based Conflict Detection 6 2 3 5  Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5 Strip versions: 0  Main Memory: 6 9 Strip versions: 0 0 1 0 Transaction 2: Strip versions: 0 Note: all transactions must maintain strip version #s Legend: Read Written

  18. Value-Based Conflict Detection Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 2 3 5 Transaction 2: Legend: Read Written

  19. Value-Based Conflict Detection Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2 Transaction 2: Transaction 2: 6 9 Legend: Read Written

  20. Value-Based Conflict Detection Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2  Transaction 2: Transaction 2: 6 9 9  Commit step 1) Validate Read Set Commit step 2) Publish Writes Legend: Read Written

  21. Value-Based Conflict Detection  Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5    Main Memory: 6 2 3 5 6 9 Transaction 2: Note: no version information to maintain Legend: Read Written

  22. JudoSTM Feature 1: • Privileged transaction • can execute system calls

  23. Privileged Transactions Transaction 1: Transaction 1: 2 3 5 Main Memory: 6 2 3 5 2 3 5 Transaction 2: (executing natively) Legend: Read Written

  24. Privileged Transactions Transaction 1: 2 3 5 Main Memory: 6 2 3 5 6 2 Transaction 2: Transaction 2: 9 (executing natively) Can write directly to memory may be uninstrumented Legend: Read Written

  25. Privileged Transactions  Commit step 1) Validate Read Set Abort! Transaction 1: Transaction 1: 2 3 5    Main Memory: 6 2 3 5 6 9 Transaction 2: Value-based conflict detection facilitates system calls within transactions! Legend: Read Written

  26. JudoSTM Feature 2: • Legacy Lock Elision • Safely ignore locks within legacy code

  27. Legacy Lock Elision lock acquire Transaction 1: Transaction 1: 0 1 Main Memory: 2 2 0 0 5 6 Lock: Transaction 2: Legend: Read/Write Read Written

  28. Legacy Lock Elision Transaction 1: 1 0 Main Memory: 2 2 0 0 5 6 Lock: Transaction 2: Transaction 2: 1 0 lock acquire Legend: Read/Write Read Written

  29. Legacy Lock Elision Transaction 1: 1 0 Main Memory: 2 6 0 0 5 6 Lock: Transaction 2: Transaction 2: 0 0 1 6 9 lock release Legend: Read/Write Read Written

  30. Legacy Lock Elision Transaction 1: 1 0 silent store Main Memory: 2 6 0 0 5 6 Lock:   Transaction 2: Transaction 2: 0 0 1 0 6 9 9 Commit step 1) Validate Read Set Commit step 2) Publish Writes Legend: Read/Write Read Written

  31. Legacy Lock Elision lock release Transaction 1: Transaction 1: 1 0 0 5 7 Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Legend: Read/Write Read Written

  32. Legacy Lock Elision  Commit step 1) Validate Read Set Transaction 1: Transaction 1: 0 0 1 5 7   Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Legend: Read/Write Read Written

  33. Legacy Lock Elision Commit step 2) Publish Writes Transaction 1: Transaction 1: 0 0 0 1 5 7 7 Main Memory: 5 6 0 0 5 6 9 Lock: Transaction 2: Value-based conflict detection facilitates the elision of legacy locks! Legend: Read/Write Read Written

  34. JudoSTM Feature 3: • Efficient Invisible Readers

  35. Supporting Invisible Readers • Invisible Readers: don’t report reads to others • good performance • but can lead to inconsistent read data: errors! • Data errors: segfault, divide by zero • Cheap solution: catch with trap/signal handlers • Control errors: jump to non-instrumented code • Typical solution: verify read-set after every load • Expensive! O(N2) • DBR solution: prevented by sandboxing • DBR instruments all code as it executes

  36. JudoSTM Details • Programming with JudoSTM

  37. Programming with JudoSTM Library: #ifndef JUDOSTM_H #define JUDOSTM_H extern void judostm_start(void); extern void judostm_stop(void); #define atomic \ asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \ "esi", "flags", "memory");\ int __count = 0; \ judostm_start();\ for (; __count < 1; judostm_stop(), __count++) #endif judoSTM Executable: Source Code: my_app Running Application: gcc #include <glib.h> #include <judostm.h> GTree *tree; ... g_tree_insert(tree &key, &val); ... #include <glib.h> #include <judostm.h> GTree *tree; ... atomic { g_tree_insert(tree &key, &val); } ... #include <glib.h> #include <judostm.h> GTree *tree; ... atomic { g_tree_insert(tree &key, &val); } ... my_app loader Shared Library: glib Instrumented my_app + glib kernel Code Cache • Easy to use, with no compiler support!

  38. JudoSTM Details • Implementation

  39. Goal: Perform These Efficiently • For all non-stack write instructions • Track write addresses and values (write-set) • Buffer the values from regular memory • For all non-stack read instructions • Redirect to buffered values • If miss: track read addr. and value (read-set) • When a transaction completes: • Acquire commit lock(s) • Validate read-set (value-based conflict detection) • Commit write-set to memory • Release commit lock(s)

  40. Read/Write Buffer Implementation Linear probed open-addressed hashtables Read Hashtable: Read Buffer: Write Hashtable: Write Buffer: Address Address Efficient lookup: 5 insts for a hit (+ state-saving?) Efficient validate and commit?

  41. Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

  42. Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

  43. Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

  44. Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Pre-allocated buffer of move instructions Emit value-address pairs as transaction executes

  45. Efficient Commit: Executable Write-Buffer Write Hashtable: Top ptr Write Buffer: movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x00000000,0x00000000 movl $0x80B10CFC,0x80B10CA4 movl $0x0000ab42,0x80B10BCC movl $0x00000025,0x80B10BB8 ret Execute the write-buffer to commit!

  46. Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

  47. Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

  48. Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

  49. Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Pre-allocated buffer of compare & jump instructions Emit value-address pairs as transaction executes

  50. Top ptr Efficient Validation: Executable Read-Buffer Read Hashtable: Read Buffer: cmp $0x00000000, 0x00000000 jne,pn judostm_trans_abort cmp $0x00000100, 0x80B10BCC jne,pn judostm_trans_abort cmp $0x00000005, 0x80B10BB8 jne,pn judostm_trans_abort cmp $0x00000a34, 0x80B10CA4 jne,pn judostm_trans_abort ret Execute the read-buffer to validate the read-set!

More Related