a dynamic binary rewriting approach to software transactional memory
Download
Skip this Video
Download Presentation
A Dynamic Binary-Rewriting Approach to Software Transactional Memory

Loading in 2 Seconds...

play fullscreen
1 / 61

A Dynamic Binary-Rewriting Approach to Software Transactional Memory - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

Marek Olszewski. Jeremy Cutler. Greg Steffan. A Dynamic Binary-Rewriting Approach to Software Transactional Memory. appeared in PACT 2007, Brasov, Romania University of Toronto. The Parallel Programming Challenge. Coarse-grained locking Easy to program  Scales poorly 

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Dynamic Binary-Rewriting Approach to Software Transactional Memory' - carr


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a dynamic binary rewriting approach to software transactional memory

Marek Olszewski

Jeremy Cutler

Greg Steffan

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

appeared in PACT 2007,

Brasov, Romania

University of Toronto

the parallel programming challenge
The Parallel Programming Challenge
  • Coarse-grained locking
    • Easy to program 
    • Scales poorly 
  • Fine-grained locking
    • Scales well 
    • Hard to get right 
      • eg., deadlock, priority inversion, etc.
  • The promise of Transactional Memory
    • As easy to program as coarse-grained locking 
    • Performance/scalability of fine-grained locking 
transactional memory tm
Transactional Memory (TM)

Transactions:

?

?

Source Code:

...

atomic {

...

access_shared_data();

...

}

...

...

atomic {

...

access_shared_data();

...

}

...

...

atomic {

...

access_shared_data();

...

}

...

TM System

Programmer:

Specifies threads/transactions in source code

TM System:

Executes transactions optimistically in parallel

1) Checkpoints execution

2) Detects conflicts

3) Commits or aborts and re-executes

tm implementations
TM Implementations
  • Flavors of TM:
    • Hardware (HTM), Software (STM), Hybrid (HyTM)
  • STM is especially compelling
    • Exploit current commodity hardware (multicores)
    • Learn about real TM systems and apps
  • Current STM Systems:
    • Java: DSTM, ASTM
    • C or C++: McRT icc, TL2, RSTM, OSTM
      • object-based or programmer intensive (or both)

Our focus: arbitrary C/C++, realistic environment

programming with stm
Programming with STM

Loader

Source Code:

#include <glib.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

Executable:

STM Compiler

my_app

Running

Application:

my_app

Shared Library:

glib

“Legacy Locks”

Pre-compiled

Binary

kernel

System Calls

Not handled by current compiler/library-based STMs

judostm an overview
JudoSTM: An Overview
  • Key design choices:
    • Dynamic Binary Rewriting (DBR)
      • insert instrumentation to implement STM
    • Value-based conflict detection
  • Resulting key features:
    • Privileged transactions (support system calls)
    • Legacy lock elision
    • Efficient invisible readers
judostm design choice 1
JudoSTM Design Choice 1
  • Dynamic Binary Rewriting (DBR)
    • Judo DBR Framework (user-space version of JIFL†)
  • † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007
dynamic binary rewriting
Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb1

bb2

bb3

bb4

Judo

dynamic binary rewriting1
Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb2

bb2

bb3

bb2

bb4

Judo

dynamic binary rewriting2
Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb1

bb2

bb3

bb2

bb2

bb4

bb4

bb4

Judo

judo performance
Judo - Performance

Normalized Runtime Overhead

Overhead low enough to implement STM?

dbr based stm goal perform these efficiently
DBR-Based STM Goal: Perform These Efficiently
  • For all non-stack write instructions
    • Track write addresses and values (write-set)
    • Write-buffer the values from regular memory
  • For all non-stack read instructions
    • Redirect to the write-buffer
    • If miss: track read addr.s and values (read-set)
  • When a transaction completes:
    • Acquire commit lock(s)
    • Validate read-set (value-based conflict detection)
    • Commit write-set to memory
    • Release commit lock(s)
dbr attractive properties for stm
DBR: Attractive Properties for STM
  • Performance: overheads are amortized
    • code cache
  • Can handle arbitrary code and shared libraries
    • any/all code is transactionalized as it executes
  • Sandboxed Transactions
    • Typical STM:
      • inconsistent values could stray execution
        • i.e., stray to non-transactionalized code (very bad!)
      • solution: frequent & costly read-set validation
    • DBR-based STM:
      • any/all code is transactionalized as it executes

Tough problems for conventional STMs addressed by DBR

judostm design choice 2
JudoSTM Design Choice 2
  • Value-Based Conflict Detection
    • (as opposed to location-based)
location based conflict detection
Location-Based Conflict Detection

Strip versions:

Strip versions:

0

0

0

Strip versions:

Strips

Transaction 1:

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written

location based conflict detection1
Location-Based Conflict Detection

Transaction 1:

Transaction 1:

2

3

5

Strip versions:

Main Memory:

6

2

3

5

2

3

5

Strip versions:

0

0

0

0

0

Transaction 2:

Strip versions:

Legend:

Read

Written

location based conflict detection2
Location-Based Conflict Detection

6

2

3

5

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

2

Strip versions:

0

0

0

0

0

Transaction 2:

Transaction 2:

6

9

Strip versions:

Legend:

Read

Written

location based conflict detection3
Location-Based Conflict Detection

6

2

3

5

6

9

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

2

Strip versions:

0

1

0

0

0

Transaction 2:

Transaction 2:

9

Strip versions:

0

Commit step 1) Validate Read Set

Commit step 2) Publish Writes (and inc version #s)

Legend:

Read

Written

location based conflict detection4
Location-Based Conflict Detection

6

2

3

5

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

9

Strip versions:

0

0

1

0

Transaction 2:

Strip versions:

0

Note: all transactions must maintain strip version #s

Legend:

Read

Written

value based conflict detection
Value-Based Conflict Detection

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written

value based conflict detection1
Value-Based Conflict Detection

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

6

9

Legend:

Read

Written

value based conflict detection2
Value-Based Conflict Detection

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

6

9

9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

Legend:

Read

Written

value based conflict detection3
Value-Based Conflict Detection

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

9

Transaction 2:

Note: no version information to maintain

Legend:

Read

Written

judostm feature 1
JudoSTM Feature 1:
  • Privileged transactions
    • Can execute (but not roll back) system calls
    • Grab commit lock(s) when about to make a syscall
      • Release when transaction completes
    • Only one privileged transaction exists at a time
privileged transactions
Privileged Transactions

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written

privileged transactions1
Privileged Transactions

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

9

(privileged, syscalls)

Privileged: can write directly to memory

may be uninstrumented

Legend:

Read

Written

privileged transactions2
Privileged Transactions

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

9

Transaction 2:

Value-based conflict detection facilitates system calls within transactions!

Legend:

Read

Written

judostm feature 2
JudoSTM Feature 2:
  • Legacy Lock Elision
    • Safely ignore locks within legacy code
legacy lock elision
Legacy Lock Elision

lock acquire

Transaction 1:

Transaction 1:

0

1

Main Memory:

2

2

0

0

5

6

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written

legacy lock elision1
Legacy Lock Elision

Transaction 1:

1

0

Main Memory:

2

2

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

1

0

lock acquire

Legend:

Read/Write

Read

Written

legacy lock elision2
Legacy Lock Elision

Transaction 1:

1

0

Main Memory:

2

6

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

0

0

1

6

9

lock release

Legend:

Read/Write

Read

Written

legacy lock elision3
Legacy Lock Elision

Transaction 1:

1

0

silent store

Main Memory:

2

6

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

0

0

1

0

6

9

9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

Legend:

Read/Write

Read

Written

legacy lock elision4
Legacy Lock Elision

lock release

Transaction 2:

Transaction 1:

1

0

0

5

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written

legacy lock elision5
Legacy Lock Elision

Commit step 1) Validate Read Set

Transaction 2:

Transaction 1:

0

0

1

5

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written

legacy lock elision6
Legacy Lock Elision

Commit step 2) Publish Writes

Transaction 2:

Transaction 1:

0

0

0

1

5

7

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Value-based conflict detection facilitates the elision of legacy locks!

Legend:

Read/Write

Read

Written

judostm feature 3
JudoSTM Feature 3:
  • Efficient Invisible Readers
supporting invisible readers
Supporting Invisible Readers
  • Invisible Readers: don’t report reads to others
    • good performance
    • but can lead to inconsistent read data: errors!
  • Data errors: segfault, divide by zero
    • Cheap solution: catch with trap/signal handlers
  • Control errors: jump to non-instrumented code
    • Typical solution: verify read-set after every load
      • Expensive! O(N2)
    • DBR solution: prevented by sandboxing
      • DBR instruments all code as it executes
judostm details
JudoSTM Details
  • Implementation
reminder goal perform these efficiently
(reminder)Goal: Perform These Efficiently
  • For all non-stack write instructions
    • Track write addresses and values (write-set)
    • Buffer the values from regular memory
  • For all non-stack read instructions
    • Redirect to the write-buffer
    • If miss: track read addr.s and values (read-set)
  • When a transaction completes:
    • Acquire commit lock(s)
    • Validate read-set (value-based conflict detection)
    • Commit write-set to memory
    • Release commit lock(s)
read write buffer implementation
Read/Write Buffer Implementation

Linear probed open-addressed hashtables

Read

Hashtable:

Read

Buffer:

Write

Hashtable:

Write

Buffer:

Address

Address

Efficient lookup: 5 insts for a hit (+ state-saving?)

Efficient validate and commit?

efficient commit executable write buffer
Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes

efficient commit executable write buffer1
Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes

efficient commit executable write buffer2
Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes

efficient commit executable write buffer3
Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x80B10CFC,0x80B10CA4

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes

efficient commit executable write buffer4
Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x80B10CFC,0x80B10CA4

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Execute the write-buffer to commit!

slide46

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes

slide47

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes

slide48

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes

slide49

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000100, 0x80B10BCC

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes

slide50

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000100, 0x80B10BCC

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Execute the read-buffer to validate the read-set!

evaluation
Evaluation
  • JudoSTM performance
    • Comparison with Rochester’s RSTM†
  • † http://www.cs.rochester.edu/research/synchronization/rstm
rstm vs judostm design
RSTM vs JudoSTM: Design

JudoSTM more flexible, less intrusive; but performance?

experimental framework
Experimental Framework
  • RSTM micro-benchmarks
    • Linked List, Hash Table, RBTree
    • Equal mix of insert, remove, and lookup
    • Measure throughput (transactions/sec)
  • Test platform
    • 4-way SMP Intel Pentium 4 Xeon - 2.8GHz
    • L1d/L2/L3 cache sizes: 8KB/512KB/2MB
    • Linux 2.6.17.13
      • with per thread signal handler support
linked list
Linked List

Coarse-grained locking best, but not scaling

linked list zoomed in
Linked List – Zoomed in

Single-lock JudoSTM scaling nicely ; RSTM flatlined 

hash table
Hash Table

Distributed-lock JudoSTM beats CG-locking, tracks RSTM

rbtree
RBTree

JudoSTM on track to scale past CG-locking; RSTM flatlined 

conclusions
Conclusions
  • Judo: highly-efficient DBR framework
    • Beats DynamoRIO on SPEC benchmarks
  • JudoSTM: First STM based on DBR
    • Value-based conflict detection
    • Executable read/write buffers
  • Desirable features:
    • Efficient invisible readers (sandboxing)
    • Legacy lock elision
    • Privileged transactions (system call support)
    • Performance comparable to RSTM

Facilitates STM for real programs & environments!

judostm details1
JudoSTM Details
  • Programming with JudoSTM
programming with judostm
Programming with JudoSTM

Library:

#ifndef JUDOSTM_H

#define JUDOSTM_H

extern void judostm_start(void);

extern void judostm_stop(void);

#define atomic \

asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \

"esi", "flags", "memory");\

int __count = 0; \

judostm_start();\

for (; __count < 1; judostm_stop(), __count++)

#endif

judoSTM

Executable:

Source Code:

my_app

Running

Application:

gcc

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

judostm_start()

g_tree_insert(tree

&key, &val);

judostm_stop()

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

g_tree_insert(tree

&key, &val);

...

my_app

loader

Shared Library:

glib

Instrumented

my_app +

glib

kernel

Code Cache

  • Easy to use, with no compiler support!
ad