sigrace signature based data race detection n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SigRace: Signature-Based Data Race Detection PowerPoint Presentation
Download Presentation
SigRace: Signature-Based Data Race Detection

Loading in 2 Seconds...

play fullscreen
1 / 124

SigRace: Signature-Based Data Race Detection - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

SigRace: Signature-Based Data Race Detection. Abdullah Muzahid , Dario Suarez*, Shanxiang Qi & Josep Torrellas. Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu. *Universidad de Zaragoza, Spain. Debugging Multithreaded Programs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'SigRace: Signature-Based Data Race Detection' - briallen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sigrace signature based data race detection

SigRace: Signature-Based Data Race Detection

Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas

Computer Science DepartmentUniversity of Illinois at Urbana-Champaignhttp://iacoma.cs.uiuc.edu

*Universidad de Zaragoza, Spain

debugging multithreaded programs
Debugging Multithreaded Programs

“Debugging a multithreaded program has a lot in

common with medieval torture methods”

-- Random quote found via Google search

data race

T1

T2

lock L

x++

unlock L

x++

Data Race
  • Two threads access the same variable without intervening synchronization and at least one is a write
  • Common bug
  • Hard to detect and reproduce
dynamic data race detection
Dynamic Data Race Detection
  • Mainly two approaches
dynamic data race detection1
Dynamic Data Race Detection
  • Mainly two approaches
    • Lockset: Finds violation of locking discipline
dynamic data race detection2
Dynamic Data Race Detection
  • Mainly two approaches
    • Lockset: Finds violation of locking discipline
    • Happened-Before: Finds concurrent conflicting accesses
happened before approach1
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

happened before approach2
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

happened before approach3
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

Unlock L

[2, 0]

happened before approach4
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

Unlock L

[2, 0]

happened before approach5
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

Unlock L

[2, 0]

Lock L

happened before approach6
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

[0, 1]

Unlock L

[2, 0]

Lock L

happened before approach7
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

Lock L

[1, 0]

[0, 1]

Unlock L

[2, 0]

+

Lock L

[2, 1]

happened before approach8
Happened-Before Approach

Thread 0

Thread 1

[0, 0]

[0, 0]

  • Epoch : sync to sync

Lock L

[1, 0]

Unlock L

[2, 0]

Lock L

[2, 1]

happened before approach9
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

  • Epoch : sync to sync

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

happened before approach10
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
happened before approach11
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
happened before approach12
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
happened before approach13
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
  • a, b happened before e
happened before approach14
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
  • a, b happened before e
happened before approach15
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

b

[1, 0]

Unlock L

[2, 0]

c

Lock L

[2, 1]

e

  • Epoch : sync to sync
  • a, b happened before e
  • c, d unordered
happened before approach16
Happened-Before Approach

Thread 0

Thread 1

a

[0, 0]

[0, 0]

d

Lock L

= x

b

[1, 0]

Data Race

Unlock L

[2, 0]

c

Lock L

[2, 1]

x =

e

  • Epoch : sync to sync
  • a, b happened before e
  • c, d unordered
software implementation
Software Implementation
  • Need to instrument every memory access
  • 10x – 50x slowdown
  • Not suitable for production runs
hardware implementation3

P1

P2

Hardware Implementation

ts1

x

C1

C2

TS

TS

hardware implementation4

P1

P2

Hardware Implementation

WR

ts1

x

ts2

x

C1

C2

TS

TS

ts2

hardware implementation5

P1

P2

Hardware Implementation

check

WR

ts1

x

ts2

x

ts2

C1

C2

TS

TS

ts2

limitations of hw approaches1
Limitations of HW Approaches

P1

P2

  • Modify cache and coherence protocol

C1

C2

limitations of hw approaches2
Limitations of HW Approaches

P1

P2

check

  • Modify cache and coherence protocol
  • Perform checking at least on every coherence transaction

ts1

C1

C2

ts2

ts2

limitations of hw approaches3
Limitations of HW Approaches

P1

P2

  • Modify cache and coherence protocol
  • Perform checking at least on every coherence transaction
  • Lose detection ability when cache line is displaced or invalidated

C1

C2

our contributions
Our Contributions
  • SigRace: Novel HW mechanism for race detection based on signatures
  • Simple HW
    • Cache and coherence protocol are unchanged
  • Higher coverage than existing HW schemes
    • Detect races even if the line is displaced/invalidated
  • Usable on-the-fly in production runs
  • SigRace finds 150% more injected races than a state-of-the-art HW proposal
outline
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions
main idea
Main Idea

Address Signature + Happened-before

hardware address signatures

ROB

ld/st

address

Permute

[Ceze et al, ISCA06]

Hardware Address Signatures
hardware address signatures2
Hardware Address Signatures
  • Logical AND for intersection
  • Has false positives but not false negatives
using signatures for race detection2

Epoch

Using Signatures for Race Detection

sync

  • Block is a fixed number of dynamic instructions (not a cache block or basic block or atomic block)

Block

TS1

Sig1

sync

using signatures for race detection3

Epoch

Using Signatures for Race Detection

sync

Block

TS1

Sig1

sync

Race Detection Module

using signatures for race detection4

Epoch

Using Signatures for Race Detection

sync

Block

TS1

Ø

sync

Race Detection Module

using signatures for race detection5

Epoch

Using Signatures for Race Detection

sync

Block

TS1

Sig2

sync

Race Detection Module

using signatures for race detection6

Epoch

Using Signatures for Race Detection

sync

Block

TS1

Sig2

sync

Race Detection Module

using signatures for race detection7

Epoch

Using Signatures for Race Detection

sync

Block

TS1

Ø

sync

Race Detection Module

using signatures for race detection8

Epoch

Using Signatures for Race Detection

sync

sync

Block

TS2

Sig3

Race Detection Module

using signatures for race detection9

Epoch

Using Signatures for Race Detection

sync

sync

Block

TS2

Sig3

Race Detection Module

using signatures for race detection10

Epoch

Using Signatures for Race Detection

sync

sync

Block

TS2

Sig3

Race Detection Module

using signatures for race detection11

Epoch

Using Signatures for Race Detection

sync

sync

Sig Ո Sig

Block

TS2

Sig3

Race Detection Module

on chip race detection module rdm5
On Chip Race Detection Module (RDM)

P1

P2

T2 R2 W2

T1 R1 W1

Q1

Q2

RDM

Chip

on chip race detection module rdm6
On Chip Race Detection Module (RDM)

P1

P2

T2 R2 W2

T1 R1 W1

Q1

Q2

RDM

Chip

on chip race detection module rdm7
On Chip Race Detection Module (RDM)

P1

P2

T2 R2 W2

T1 R1 W1

Q1

Q2

RDM

Chip

on chip race detection module rdm8
On Chip Race Detection Module (RDM)

If T2 & TJ unordered

R2 Ո WJ

W2 Ո WJ

W2 Ո RJ

Else stop

P1

P2

TJ RJ WJ

T2 R2 W2

T1 R1 W1

Q1

Q2

RDM

Chip

on chip race detection module rdm9
On Chip Race Detection Module (RDM)

If T2 & TJ` unordered

R2 Ո WJ`

W2 Ո WJ`

W2 Ո RJ`

Else stop

P1

P2

T2 R2 W2

T1 R1 W1

TJ` RJ` WJ`

Q1

Q2

RDM

Chip

on chip race detection module rdm10
On Chip Race Detection Module (RDM)

If T2 & TJ” unordered

R2 Ո WJ”

W2 Ո WJ”

W2 Ո RJ”

Else stop

P1

P2

T2 R2 W2

T1 R1 W1

TJ” RJ” WJ”

Q1

Q2

RDM

Chip

on chip race detection module rdm11
On Chip Race Detection Module (RDM)

If T2 & TJ”` unordered

R2 Ո WJ”`

W2 Ո WJ”`

W2 Ո RJ”`

Else stop

P1

P2

T2 R2 W2

T1 R1 W1

TJ”` RJ”` WJ”`

Q1

Q2

RDM

Chip

on chip race detection module rdm12
On Chip Race Detection Module (RDM)

If T2 & TJ”` unordered

R2 Ո WJ”`

W2 Ո WJ”`

W2 Ո RJ”`

Else stop

P1

P2

Done in Background

T2 R2 W2

T1 R1 W1

TJ”` RJ”` WJ”`

Q1

Q2

RDM

Chip

on chip race detection module rdm13
On Chip Race Detection Module (RDM)

If T2 & TJ”` unordered

R2 Ո WJ”`

W2 Ո WJ”`

W2 Ո RJ”`

Else stop

P1

P2

False Positives

T2 R2 W2

T1 R1 W1

TJ”` RJ”` WJ”`

Q1

Q2

RDM

Chip

re execution
Re-execution
  • Needed for
  • Identify the accesses involved
  • Discard if a false positive
support for re execution
Support for Re-execution
  • Take periodic checkpoints: ReVive [Prvulovic et al, ISCA02]
  • Log inputs (interrupts, sys calls, etc)
  • Save synchronization history in TS Log
    • Timestamp at sync points
modes of operation
Modes of Operation
  • Re-Execution:Bring the program to just before the race
  • Normal Execution
  • Race Analysis:Pinpoint the racy accesses or discarding the false positive
sigrace re execution mode
SigRace Re-execution Mode
  • Can be done in another machine
sigrace re execution mode1

checkpoint

T0

T1

T2

sync

sync

sync

sync

sync

SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state
sigrace re execution mode2

checkpoint

T0

T1

T2

sync

sync

sync

sync

s2

sync

Data Race

s1

SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state
sigrace re execution mode3
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint

T0

T1

T2

sync

sync

sync

sync

s2

sync

Data Race

s1

Ո

Conflict Sig

Conflict Sig

sigrace re execution mode4
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint

checkpoint

T0

T1

T2

T0

T1

T2

sync

sync

sync

sync

s2

sync

Data Race

s1

Ո

Conflict Sig

Conflict Sig

sigrace re execution mode5
SigRace Re-execution Mode
  • Can be done in another machine
  • Periodic checkpoint of memory state

checkpoint

checkpoint

T0

T1

T2

T0

T1

T2

sync

sync

sync

sync

sync

sync

sync

sync

s2

sync

sync

Data Race

s1

Ո

Use the TS Log

Conflict Sig

Conflict Sig

sigrace analysis mode
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

sync

sync

sigrace analysis mode1
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

sync

sync

sigrace analysis mode2
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

sync

ld

Ո

sync

Conflict Sig

Conflict Sig

sigrace analysis mode3
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

log

sync

ld

Ո

sync

Conflict Sig

Conflict Sig

sigrace analysis mode4
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

log

sync

ld

Ո

sync

Conflict Sig

Conflict Sig

sync

sync

sigrace analysis mode5
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

log

sync

sync

log

sync

sync

sigrace analysis mode6
SigRace Analysis Mode

checkpoint

T0

T1

T2

sync

sync

sync

log

sync

sync

log

sync

sync

  • Pinpoints racy addresses or,
  • Identifies and discards false positives
outline1
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions
new instructions
New Instructions
  • collect_on
    • Enable R and W address collection in current thread
new instructions1
New Instructions
  • collect_on
    • Enable R and W address collection in current thread
  • collect_off
    • Disable R and W address collection in current thread
new instructions2
New Instructions
  • sync_reached
new instructions3
New Instructions

TS R W

P

  • sync_reached
    • Dump TS, R and W

Network

RDM

new instructions4
New Instructions

TS ØØ

P

  • sync_reached
    • Dump TS, R and W
    • Clear signatures

Network

RDM

new instructions5
New Instructions

TS` Ø Ø

P

  • sync_reached
    • Dump TS, R and W
    • Clear signatures
    • Update TS

Network

RDM

modifications in sync libraries1

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
modifications in sync libraries2

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

unlock($1.lock)

}’)

modifications in sync libraries3

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

TS R W

P

sync_reached

Network

unlock($1.lock)

RDM

}’)

modifications in sync libraries4

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

TS Ø Ø

P

sync_reached

Network

unlock($1.lock)

RDM

}’)

modifications in sync libraries5

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

TS Ø Ø

P

sync_reached

Network

unlock($1.lock)

RDM

}’)

modifications in sync libraries6

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

sync_reached

lock

TS

$1.timestamp = TS

unlock($1.lock)

}’)

modifications in sync libraries7

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

sync_reached

$1.timestamp = TS

unlock($1.lock)

}’)

modifications in sync libraries8

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Unlock macro

UNLOCK (‘{

sync_reached

$1.timestamp = TS

unlock($1.lock)

AppendtoTSLog(TS)

TS

TS Log

}’)

modifications in sync libraries9

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

LOCK (‘{

lock($1.lock)

}’)

modifications in sync libraries10

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

LOCK (‘{

lock($1.lock)

TS = GenerateTS (TS,

lock

timestamp

$1.timestamp)

}’)

modifications in sync libraries11

variable

timestamp

Modifications in Sync Libraries
  • Synchronization object
  • Lock macro

Transparent to Application Code

LOCK (‘{

lock($1.lock)

TS = GenerateTS (TS,

lock

timestamp

$1.timestamp)

}’)

other topics in paper
Other Topics in Paper
  • Easy to virtualize
  • Queue Overflow
  • Detailed HW structures
outline2
Outline
  • Motivation
  • Main Idea
  • Implementation
  • Results
  • Conclusions
experimental setup
Experimental Setup
  • PIN – Binary Instrumention Tool
  • Default parameters
  • Benchmarks: SPLASH2, PARSEC
  • # of proc: 8
  • Signature size: 2 Kbits
  • Block size: 2,000 ins
  • Queue size: 16 entries
  • Checkpoint interval: 1 Million ins
race detection ability
Race Detection Ability
  • Three configurations
  • SigRace Default
  • SigRace Ideal : Stores every signature between 2 checkpoints
  • ReEnact [Prvulovic et al, ISCA03]: Cache based approach with timestamp per word
race detection ability2
Race Detection Ability
  • More coverage than ReEnact

Total

95

90

70

race detection ability3
Race Detection Ability
  • More coverage than ReEnact
  • Coveragecomparable to ideal configuration

Total

95

90

70

injected races
Injected Races
  • Each application runs 25 times with diff sync elimination
  • Removed one dynamic sync per run
injected races2
Injected Races
  • More overall coverage than ReEnact
injected races3
Injected Races
  • More overall coverage than ReEnact
  • 150% more coverage
injected races4
Injected Races
  • More overall coverage than ReEnact
  • 150% more coverage
conclusions
Conclusions
  • Simple HW
    • Cache and coherence protocol are unchanged
  • Proposed SigRace:
  • Higher coverage than existing HW schemes
    • Detect races even if the line is displaced/invalidated
  • Usable on-the-fly in production runs
  • SigRace finds 150% more injected races than word-based ReEnact
sigrace signature based data race detection1

SigRace: Signature-Based Data Race Detection

Computer Science DepartmentUniversity of Illinois at Urbana-Champaignhttp://iacoma.cs.uiuc.edu

Abdullah Muzahid, Dario Suarez*, Shanxiang Qi & Josep Torrellas

*Universidad de Zaragoza, Spain

execution overhead
Execution Overhead
  • Additional instructions are negligible
  • No overhead in generating signatures (HW)
  • Main overheads
  • Checkpointing (ReVive – 6.3%)
  • Network traffic (63 bytes per 1000 ins - compressed)
  • Re-execution (depends on false positives & race position)
    • Can be done offline
network traffic overhead
Network Traffic Overhead

̴ 1 cache line

63

re execution overhead
Re-execution Overhead
  • Instructions re-executed until the first true data race is analyzed are shown as overhead
  • In this process, it may also encounter many false positive races
  • Instructions re-executed to analyze only the true race are shown as true overhead
  • Instructions re-executed to filter out the false positives are shown as false overhead
re execution overhead1
Re-execution Overhead

Modest overhead

22%

false positives
False Positives
  • Parallel bloom filters with H3 hash function

Low False Positive

1.57%

virtualization1
Virtualization
  • RDM uses as many queues as the number of threads
  • Timestamp is accessed by thread id
  • Thread id remains same even after migration
  • Timestamps, flags, conflict signature are saved and restored at context switch
  • RDM intersects incoming signatures against all other threads’(even inactive ones) signatures
  • Threads can be re-executed without any scheduling constraints
scalability
Scalability
  • The operation of RDM can be pipelined
    • Simple repetitive operation
  • For small # of proc., scalability is not a problem
  • Network traffic (compressed message) around 63Bytes/thousand ins
  • Checkpoint is an issue.