A dynamic binary rewriting approach to software transactional memory
This presentation is the property of its rightful owner.
Sponsored Links
1 / 61

A Dynamic Binary-Rewriting Approach to Software Transactional Memory PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Marek Olszewski. Jeremy Cutler. Greg Steffan. A Dynamic Binary-Rewriting Approach to Software Transactional Memory. appeared in PACT 2007, Brasov, Romania University of Toronto. The Parallel Programming Challenge. Coarse-grained locking Easy to program  Scales poorly 

Download Presentation

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A dynamic binary rewriting approach to software transactional memory

Marek Olszewski

Jeremy Cutler

Greg Steffan

A Dynamic Binary-Rewriting Approach to Software Transactional Memory

appeared in PACT 2007,

Brasov, Romania

University of Toronto


The parallel programming challenge

The Parallel Programming Challenge

  • Coarse-grained locking

    • Easy to program 

    • Scales poorly 

  • Fine-grained locking

    • Scales well 

    • Hard to get right 

      • eg., deadlock, priority inversion, etc.

  • The promise of Transactional Memory

    • As easy to program as coarse-grained locking 

    • Performance/scalability of fine-grained locking 


Transactional memory tm

Transactional Memory (TM)

Transactions:

?

?

Source Code:

...

atomic {

...

access_shared_data();

...

}

...

...

atomic {

...

access_shared_data();

...

}

...

...

atomic {

...

access_shared_data();

...

}

...

TM System

Programmer:

Specifies threads/transactions in source code

TM System:

Executes transactions optimistically in parallel

1) Checkpoints execution

2) Detects conflicts

3) Commits or aborts and re-executes


Tm implementations

TM Implementations

  • Flavors of TM:

    • Hardware (HTM), Software (STM), Hybrid (HyTM)

  • STM is especially compelling

    • Exploit current commodity hardware (multicores)

    • Learn about real TM systems and apps

  • Current STM Systems:

    • Java: DSTM, ASTM

    • C or C++: McRT icc, TL2, RSTM, OSTM

      • object-based or programmer intensive (or both)

Our focus: arbitrary C/C++, realistic environment


Programming with stm

Programming with STM

Loader

Source Code:

#include <glib.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

Executable:

STM Compiler

my_app

Running

Application:

my_app

Shared Library:

glib

“Legacy Locks”

Pre-compiled

Binary

kernel

System Calls

Not handled by current compiler/library-based STMs


Judostm an overview

JudoSTM: An Overview

  • Key design choices:

    • Dynamic Binary Rewriting (DBR)

      • insert instrumentation to implement STM

    • Value-based conflict detection

  • Resulting key features:

    • Privileged transactions (support system calls)

    • Legacy lock elision

    • Efficient invisible readers


Judostm design choice 1

JudoSTM Design Choice 1

  • Dynamic Binary Rewriting (DBR)

    • Judo DBR Framework (user-space version of JIFL†)

  • † JIT Instrumentation - A Novel Approach To Dynamically Instrument Operating Systems, SIGOPS EuroSys 2007


Dynamic binary rewriting

Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb1

bb2

bb3

bb4

Judo


Dynamic binary rewriting1

Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb2

bb2

bb3

bb2

bb4

Judo


Dynamic binary rewriting2

Dynamic Binary Rewriting

Original Code:

Code Cache:

bb1

bb1

bb1

bb2

bb3

bb2

bb2

bb4

bb4

bb4

Judo


Judo performance

Judo - Performance

Normalized Runtime Overhead

Overhead low enough to implement STM?


Dbr based stm goal perform these efficiently

DBR-Based STM Goal: Perform These Efficiently

  • For all non-stack write instructions

    • Track write addresses and values (write-set)

    • Write-buffer the values from regular memory

  • For all non-stack read instructions

    • Redirect to the write-buffer

    • If miss: track read addr.s and values (read-set)

  • When a transaction completes:

    • Acquire commit lock(s)

    • Validate read-set (value-based conflict detection)

    • Commit write-set to memory

    • Release commit lock(s)


Dbr attractive properties for stm

DBR: Attractive Properties for STM

  • Performance: overheads are amortized

    • code cache

  • Can handle arbitrary code and shared libraries

    • any/all code is transactionalized as it executes

  • Sandboxed Transactions

    • Typical STM:

      • inconsistent values could stray execution

        • i.e., stray to non-transactionalized code (very bad!)

      • solution: frequent & costly read-set validation

    • DBR-based STM:

      • any/all code is transactionalized as it executes

Tough problems for conventional STMs addressed by DBR


Judostm design choice 2

JudoSTM Design Choice 2

  • Value-Based Conflict Detection

    • (as opposed to location-based)


Location based conflict detection

Location-Based Conflict Detection

Strip versions:

Strip versions:

0

0

0

Strip versions:

Strips

Transaction 1:

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written


Location based conflict detection1

Location-Based Conflict Detection

Transaction 1:

Transaction 1:

2

3

5

Strip versions:

Main Memory:

6

2

3

5

2

3

5

Strip versions:

0

0

0

0

0

Transaction 2:

Strip versions:

Legend:

Read

Written


Location based conflict detection2

Location-Based Conflict Detection

6

2

3

5

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

2

Strip versions:

0

0

0

0

0

Transaction 2:

Transaction 2:

6

9

Strip versions:

Legend:

Read

Written


Location based conflict detection3

Location-Based Conflict Detection

6

2

3

5

6

9

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

2

Strip versions:

0

1

0

0

0

Transaction 2:

Transaction 2:

9

Strip versions:

0

Commit step 1) Validate Read Set

Commit step 2) Publish Writes (and inc version #s)

Legend:

Read

Written


Location based conflict detection4

Location-Based Conflict Detection

6

2

3

5

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Strip versions:

0

Main Memory:

6

9

Strip versions:

0

0

1

0

Transaction 2:

Strip versions:

0

Note: all transactions must maintain strip version #s

Legend:

Read

Written


Value based conflict detection

Value-Based Conflict Detection

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written


Value based conflict detection1

Value-Based Conflict Detection

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

6

9

Legend:

Read

Written


Value based conflict detection2

Value-Based Conflict Detection

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

6

9

9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

Legend:

Read

Written


Value based conflict detection3

Value-Based Conflict Detection

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

9

Transaction 2:

Note: no version information to maintain

Legend:

Read

Written


Judostm feature 1

JudoSTM Feature 1:

  • Privileged transactions

    • Can execute (but not roll back) system calls

    • Grab commit lock(s) when about to make a syscall

      • Release when transaction completes

    • Only one privileged transaction exists at a time


Privileged transactions

Privileged Transactions

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

2

3

5

Transaction 2:

Legend:

Read

Written


Privileged transactions1

Privileged Transactions

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

2

Transaction 2:

Transaction 2:

9

(privileged, syscalls)

Privileged: can write directly to memory

may be uninstrumented

Legend:

Read

Written


Privileged transactions2

Privileged Transactions

Commit step 1) Validate Read Set

Abort!

Transaction 1:

Transaction 1:

2

3

5

Main Memory:

6

2

3

5

6

9

Transaction 2:

Value-based conflict detection facilitates system calls within transactions!

Legend:

Read

Written


Judostm feature 2

JudoSTM Feature 2:

  • Legacy Lock Elision

    • Safely ignore locks within legacy code


Legacy lock elision

Legacy Lock Elision

lock acquire

Transaction 1:

Transaction 1:

0

1

Main Memory:

2

2

0

0

5

6

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written


Legacy lock elision1

Legacy Lock Elision

Transaction 1:

1

0

Main Memory:

2

2

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

1

0

lock acquire

Legend:

Read/Write

Read

Written


Legacy lock elision2

Legacy Lock Elision

Transaction 1:

1

0

Main Memory:

2

6

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

0

0

1

6

9

lock release

Legend:

Read/Write

Read

Written


Legacy lock elision3

Legacy Lock Elision

Transaction 1:

1

0

silent store

Main Memory:

2

6

0

0

5

6

Lock:

Transaction 2:

Transaction 2:

0

0

1

0

6

9

9

Commit step 1) Validate Read Set

Commit step 2) Publish Writes

Legend:

Read/Write

Read

Written


Legacy lock elision4

Legacy Lock Elision

lock release

Transaction 2:

Transaction 1:

1

0

0

5

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written


Legacy lock elision5

Legacy Lock Elision

Commit step 1) Validate Read Set

Transaction 2:

Transaction 1:

0

0

1

5

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Legend:

Read/Write

Read

Written


Legacy lock elision6

Legacy Lock Elision

Commit step 2) Publish Writes

Transaction 2:

Transaction 1:

0

0

0

1

5

7

7

Main Memory:

5

6

0

0

5

6

9

Lock:

Transaction 2:

Value-based conflict detection facilitates the elision of legacy locks!

Legend:

Read/Write

Read

Written


Judostm feature 3

JudoSTM Feature 3:

  • Efficient Invisible Readers


Supporting invisible readers

Supporting Invisible Readers

  • Invisible Readers: don’t report reads to others

    • good performance

    • but can lead to inconsistent read data: errors!

  • Data errors: segfault, divide by zero

    • Cheap solution: catch with trap/signal handlers

  • Control errors: jump to non-instrumented code

    • Typical solution: verify read-set after every load

      • Expensive! O(N2)

    • DBR solution: prevented by sandboxing

      • DBR instruments all code as it executes


Judostm details

JudoSTM Details

  • Implementation


Reminder goal perform these efficiently

(reminder)Goal: Perform These Efficiently

  • For all non-stack write instructions

    • Track write addresses and values (write-set)

    • Buffer the values from regular memory

  • For all non-stack read instructions

    • Redirect to the write-buffer

    • If miss: track read addr.s and values (read-set)

  • When a transaction completes:

    • Acquire commit lock(s)

    • Validate read-set (value-based conflict detection)

    • Commit write-set to memory

    • Release commit lock(s)


Read write buffer implementation

Read/Write Buffer Implementation

Linear probed open-addressed hashtables

Read

Hashtable:

Read

Buffer:

Write

Hashtable:

Write

Buffer:

Address

Address

Efficient lookup: 5 insts for a hit (+ state-saving?)

Efficient validate and commit?


Efficient commit executable write buffer

Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes


Efficient commit executable write buffer1

Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes


Efficient commit executable write buffer2

Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes


Efficient commit executable write buffer3

Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x80B10CFC,0x80B10CA4

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Pre-allocated buffer of move instructions

Emit value-address pairs as transaction executes


Efficient commit executable write buffer4

Efficient Commit: Executable Write-Buffer

Write

Hashtable:

Top ptr

Write Buffer:

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x00000000,0x00000000

movl $0x80B10CFC,0x80B10CA4

movl $0x0000ab42,0x80B10BCC

movl $0x00000025,0x80B10BB8

ret

Execute the write-buffer to commit!


A dynamic binary rewriting approach to software transactional memory

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes


A dynamic binary rewriting approach to software transactional memory

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes


A dynamic binary rewriting approach to software transactional memory

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes


A dynamic binary rewriting approach to software transactional memory

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000100, 0x80B10BCC

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Pre-allocated buffer of compare & jump instructions

Emit value-address pairs as transaction executes


A dynamic binary rewriting approach to software transactional memory

Top ptr

Efficient Validation: Executable Read-Buffer

Read

Hashtable:

Read Buffer:

cmp $0x00000000, 0x00000000

jne,pn judostm_trans_abort

cmp $0x00000100, 0x80B10BCC

jne,pn judostm_trans_abort

cmp $0x00000005, 0x80B10BB8

jne,pn judostm_trans_abort

cmp $0x00000a34, 0x80B10CA4

jne,pn judostm_trans_abort

ret

Execute the read-buffer to validate the read-set!


Evaluation

Evaluation

  • JudoSTM performance

    • Comparison with Rochester’s RSTM†

  • † http://www.cs.rochester.edu/research/synchronization/rstm


Rstm vs judostm design

RSTM vs JudoSTM: Design

JudoSTM more flexible, less intrusive; but performance?


Experimental framework

Experimental Framework

  • RSTM micro-benchmarks

    • Linked List, Hash Table, RBTree

    • Equal mix of insert, remove, and lookup

    • Measure throughput (transactions/sec)

  • Test platform

    • 4-way SMP Intel Pentium 4 Xeon - 2.8GHz

    • L1d/L2/L3 cache sizes: 8KB/512KB/2MB

    • Linux 2.6.17.13

      • with per thread signal handler support


Linked list

Linked List

Coarse-grained locking best, but not scaling


Linked list zoomed in

Linked List – Zoomed in

Single-lock JudoSTM scaling nicely ; RSTM flatlined 


Hash table

Hash Table

Distributed-lock JudoSTM beats CG-locking, tracks RSTM


Rbtree

RBTree

JudoSTM on track to scale past CG-locking; RSTM flatlined 


Conclusions

Conclusions

  • Judo: highly-efficient DBR framework

    • Beats DynamoRIO on SPEC benchmarks

  • JudoSTM: First STM based on DBR

    • Value-based conflict detection

    • Executable read/write buffers

  • Desirable features:

    • Efficient invisible readers (sandboxing)

    • Legacy lock elision

    • Privileged transactions (system call support)

    • Performance comparable to RSTM

Facilitates STM for real programs & environments!


Backups

Backups


Judostm details1

JudoSTM Details

  • Programming with JudoSTM


Programming with judostm

Programming with JudoSTM

Library:

#ifndef JUDOSTM_H

#define JUDOSTM_H

extern void judostm_start(void);

extern void judostm_stop(void);

#define atomic \

asm __volatile__ ("":::"eax", "ecx", "edx", "ebx", "edi", \

"esi", "flags", "memory");\

int __count = 0; \

judostm_start();\

for (; __count < 1; judostm_stop(), __count++)

#endif

judoSTM

Executable:

Source Code:

my_app

Running

Application:

gcc

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

judostm_start()

g_tree_insert(tree

&key, &val);

judostm_stop()

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

atomic {

g_tree_insert(tree

&key, &val);

}

...

#include <glib.h>

#include <judostm.h>

GTree *tree;

...

g_tree_insert(tree

&key, &val);

...

my_app

loader

Shared Library:

glib

Instrumented

my_app +

glib

kernel

Code Cache

  • Easy to use, with no compiler support!


  • Login