On the fly data race detection in multithreaded programs
Download
1 / 95

On-the-Fly Data-Race Detection in Multithreaded Programs - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

On-the-Fly Data-Race Detection in Multithreaded Programs. Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster. Table of Contents. What is a Data-Race? Why Data-Races are Undesired? How Data-Races Can be Prevented? Can Data-Races be Easily Detected?

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On-the-Fly Data-Race Detection in Multithreaded Programs' - aysha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On the fly data race detection in multithreaded programs

On-the-Fly Data-RaceDetection inMultithreaded Programs

Prepared by Eli Pozniansky under Supervision of Prof. Assaf Schuster


Table of contents
Table of Contents

  • What is a Data-Race?

  • Why Data-Races are Undesired?

  • How Data-Races Can be Prevented?

  • Can Data-Races be Easily Detected?

  • Feasible and Apparent Data-Races

  • Complexity of Data-Race Detection

    • NP and Co-NP

    • Program Execution Model & Ordering Relations

    • Complexity of Computing Ordering Relations

    • Proof of NP/Co-NP Hardness


Table of contents cont
Table of ContentsCont.

  • So How Data-Races Can be Detected?

    • Lamport’s Happens-Before Approximation

  • Approaches to Detection of Apparent Data-Races:

    • Static Methods

    • Dynamic Methods:

      • Post-Mortem Methods

      • On-The-Fly Methods


Table of contents cont1
Table of ContentsCont.

  • Closer Look at Dynamic Methods:

    • DJIT+

      • Local Time Frames

      • Vector Time Frames

      • Logging Mechanism

      • Data-Race Detection Using Vector Time Frames

      • Which Accesses to Check?

      • Which Time Frames to Check?

      • Access History & Algorithm

      • Coherency

      • Results


Table of contents cont2
Table of ContentsCont.

  • Lockset

    • Locking Discipline

    • The Basic Algorithm & Explanation

    • Which Accesses to Check?

    • Improving Locking Discipline

      • Initialization

      • Read-Sharing

      • Barriers

    • False Alarms

    • Results

  • Combining DJIT+ and Lockset

  • Summary

  • References


  • What is a data race
    What is a Data Race?

    • Concurrent accesses to a shared location by two or more threads, where at least one is for writing

    Example (variable X is global and shared):

    Thread 1Thread 2

    X=1 T=Y

    Z=2 T=X

    Usually indicative of bug!


    Why data races are undesired
    Why Data-Races areUndesired?

    • Programs with data-races:

      • Usually demonstrate unexpected and even non-deterministic behavior.

      • The outcome might depend on specific execution order (A.K.A threads’ interleaving).

      • Re-executing may not always produce the same results/same data-races.

    • Thus, hard to debug and hard to write correct programs.


    Why data races are undesired example

    Machine code

    for ‘X++’

    Why Data Races areUndesired? – Example

    • First interleaving: Thread 1Thread 2

      1. reg1X

      2. incr reg1

      3. Xreg1

      4. reg2X

      5. incr reg2

      6. Xreg2

    Second interleaving: Thread 1Thread 2

    1. reg1X

    2. incr reg1

    3. reg2X

    4. incr reg2

    5. Xreg2

    6. Xreg1

    At the beginning: X=0. At the end: X=1 or X=2?

    Depends on the scheduling order


    Execution order

    T1

    T2

    Time

    Execution Order

    • Each thread has a different execution speed.

    • The speed may change over time.

    • For an external observer of the time axis, instructions appear in execution order.

    • Any order is legal.

    • Execution order for a single

      thread is called program order.


    How data races can be prevented
    How Data Races Can be Prevented?

    • Explicit synchronization between threads:

      • Locks

      • Critical Sections

      • Barriers

      • Mutexes

      • Semaphores

      • Monitors

      • Events

      • Etc.

    Lock(m)

    Unlock(m)Lock(m)

    Unlock(m)

    Thread 1Thread 2

    X++

    T=X


    Synchronization bad bank account example
    Synchronization –“Bad” Bank Account Example

    Thread 1Thread 2

    Deposit( amount ) { Withdraw( amount ) {

    balance+=amount; if (balance<amount);

    } print( “Error” );

    else

    balance–=amount;

    }

    • ‘Deposit’ and ‘Withdraw’ are not “atomic”!!!

    • What is the final balance after a series of concurrent deposits and withdraws?


    Synchronization good bank account example
    Synchronization –“Good” Bank Account Example

    Thread 1Thread 2

    Deposit( amount ) { Withdraw( amount ) {

    Lock( m );Lock( m );

    balance+=amount; if (balance<amount)

    Unlock( m ); print( “Error” );

    } else

    balance–=amount;

    Unlock( m );

    }

    • Since critical sections can never execute concurrently, this version exhibits no data-races.

    Critical Sections


    Is this enough
    Is This Enough?

    • Theoretically – YES

    • Practically – NO

    • What if programmer accidentally forgets to place correct synchronization?

    • How all such data race bugs can be detected in large program?

    • How to eliminate redundant synchronization?


    Can data races be easily detected no
    Can Data Races be Easily Detected? – No!

    • The problem of deciding whether a given program contains potential data races (called feasible) is NP-hard [Netzer&Miller 1990]

      • Input size = # instructions performed

      • Even for 2 threads only

      • Even with no loops/recursion

    • Lots of execution orders: (#threads)thread_length*threads

    • Also all possible inputs should be tested

    • Side effects of the detection code can eliminate all data races

    a

    lock(m)

    ...

    unlock(m)

    lock(m)

    b

    unlock(m)


    Feasible data races
    Feasible Data-Races

    • Based on the possiblebehavior of the program (i.e. semantics of the program’s computation).

    • The actual (!) data-races that can possibly happen in some program execution.

    • Require full analyzing of the program’s semantics to determine if the execution could have allowed accesses to same shared variable to execute concurrently.


    Apparent data races
    Apparent Data Races

    • Approximations of the feasible data races

    • Based on only the behavior of program explicitsynchronization (and not on program semantics)

    • Important since data-races are usually result of improper synchronization

    • Easier to locate

    • Less accurate

    • Exist iff at least one feasible data race exists 

    • Exhaustively locating all apparent data races is still NP-hard (and, in fact, undecidable) 


    Apparent data races cont

    Initially: grades = oldDatabase; updated = false;

    Thread T.A.

    grades:=newDatabase;

    updated:=true;

    Thread Lecturer

    while (updated == false);

    X:=grades.gradeOf(lecturersSon);

    Apparent Data-Races Cont.

    • Accesses a and b to same shared variable in some execution, are ordered, if there is a chain of corresponding explicit synchronization events between them.

    • a and b are said to have potentially executedconcurrently if no explicit synchronization prevented them from doing so.


    Feasible vs apparent
    Feasible vs. Apparent

    Thread 1 [Ffalse]Thread 2

    X++

    F=true

    if (F==true)

    X– –

    • Apparent data-races in the execution above – 1 & 2.

    • Feasible data-races – 1 only!!! – No feasible execution exists, in which ‘X--’ is performed before ‘X++’ (suppose ‘F’ is false at start).

    • Protecting ‘F’ only will protect ‘X’ as well.

    1

    2


    Feasible vs apparent1
    Feasible vs. Apparent

    Thread 1 [Ffalse]Thread 2

    X++ Lock( m )

    Lock( m ) T = F

    F=true Unlock( m )

    Unlock( m ) if (T==true)

    X– –

    • No feasible or apparent data-races exist under any execution order!!!

    • ‘F’ is protected by a lock. ‘X++’ and ‘X– –’ are always ordered and properly synchronized.

    • Rather there is a sync‘ chain of Unlock(m)-Lock(m) between ‘X++’ and ‘X– –’, or only ‘X++’ executes.


    Complexity of data race detection
    Complexity ofData-Race Detection

    • Exactly locating the feasible data-races is an NP-hard problem.

      • The apparent races, which are simpler to locate, must be detected for debugging.

        Apparent data-races exist if and only if at least one feasible data-race exists somewhere in the execution.

        The problem of exhaustively locating all apparent data-races is still NP-hard.


    Reminder np and co np
    Reminder: NP and Co-NP

    • There is a set of NP problems for which:

      • There is no polynomial solution.

      • There is an exponential solution.

    • Problem is NP-hard if there is a polynomial reduction from any of the problems in NP to this problem.

    • Problem is NP-complete, if it is NP-hard and it resides in NP.

    • Intuitively - if the answer for the problem can be only ‘yes’/‘no’ we can either answer ‘yes’ and stop, or never stop (at least not in polynomial time).


    Reminder np and co np cont
    Reminder: NP and Co-NP Cont.

    • The set of Co-NP problems is complementary to the set of NP problems.

    • Problem is Co-NP-hard if we can only answer ‘no’.

    • If problem is both in NP and Co-NP, then it’s in P (i.e. there is a polynomial solution).

    • The problem of checking whether a boolean formula is satisfiable is NP-complete.

      • Answer ‘yes’ if satisfiable assignment for variables was found.

    • Same, but not-satisfiable – Co-NP-complete.


    Why data race detection is np hard
    Why Data-Race Detectionis NP-Hard?

    • Question: How can we know that in a program P two accesses, a and b, to the same shared variable are concurrent?

    • Answer: We must check all execution orders of P and see.

      • If we discover an execution order, in which a and b are concurrent, we can report on data-race and stop.

      • Otherwise we should continue checking.


    Program execution model
    Program Execution Model

    • Consider a class of multi-threaded programs that synchronize by counting semaphores.

    • Program execution is described by collection of events and two relations over the events.

    • Synchronization event – instance of some synchronization operation (e.g. signal, wait).

    • Computation event – instance of a group of statements in same thread, none of which are synchronization operations (e.g. x=x+1).


    Program execution model events relations
    Program Execution Model –Events’ Relations

    • Temporal orderingrelation – aT→ b means that a completes before b begins (i.e. last action of a can affect first action of b).

    • Shared data dependence relation - aD→b means that a accessesa shared variable that b later accesses and at least one of the accesses is a modification to variable.

      • Indicates when one event causally affects another.


    Program execution model program execution
    Program Execution Model –Program Execution

    • Program executionP – a triple <E,T→,D→>, where E is a finite set of events, and T→ and D→ are the above relations that satisfy the following axioms:

      • A1: T→ is an irreflexive partial order (a T↛ a).

      • A2: If a T→b T↮ c T→ d then a T→ d.

      • A3: If a D→ b then b T↛ a.

    • Notes:

      • ↛ is a shorthand for ¬(a→b).

      • ↮ is a shorthand for ¬(a→b)⋀¬(b→a).

      • Notice that A1 and A2 imply transitivity of T→.


    Program execution model feasible program execution
    Program Execution Model –Feasible Program Execution

    • Feasible program execution for P – execution of a program that:

      • performs exactly the same events as P

      • May exhibit different temporal ordering.

    • Definition: P’=<E’,T’→,D’→> is a feasible program execution for P=<E,T→,D→> (potentially occurred) if

      • F1: E’=E (i.e. exactly the same events), and

      • F2: P’ satisfies the axioms A1 - A3 of the model, and

      • F3: a D→ b ⇒ a D’→ b (i.e. same data dependencies)

    • Note: Any execution with same shared-data dependencies as P will execute exactly the same events as P.


    Program execution model ordering relations
    Program Execution Model –Ordering Relations

    • Given a program execution, P=<E,T→,D→>, and the set, F(P), of feasible program executions for P, the following relations are defined:

      • Summarize the temporal orderings present in the feasible program executions.


    Program execution model ordering relations explanation
    Program Execution Model –Ordering Relations - Explanation

    • The must-have relations describe orderings that are guaranteed to be present in all feasible program executions in F(P).

    • The could-have relations describe orderings that could potentially occur in at least one of the feasible program executions in F(P).

    • The happened-before relations show events that execute in a specific order.

    • The concurrent-with relations show events that execute concurrently.

    • The ordered-with relations show events that execute in either order but not concurrently.


    Complexity of computing ordering relations
    Complexity of Computing Ordering Relations

    • The problem of computing any of the must-have ordering relations (MHB, MCW, MOW) is Co-NP-hard.

    • The problem of computing any of the could-have relations (CHB, CCW, COW) is NP-hard.

    • Theorem 1: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a MHB→ b, a MCW↔ b or a MOW↔ b (any of the must-have orderings) is Co-NP-hard.


    Proof of theorem 1 notes
    Proof of Theorem 1 –Notes

    • The proof is a reduction from 3CNFSAT such that any boolean formula is not satisfiable iff a MHB→ b for two events, a and b, defined in the reduction.

    • The problem of checking whether 3CNFSAT formula is not satisfiable is Co-NP-complete.

    • The presented proof is only for the must-have-happened-before (MHB) relation.

      • Proofs for the other relations are analogous.

    • The proof can also be extended to programs that use binary semaphores, event style synchronization and other synchronization primitives (and even single counting semaphore).


    Proof of theorem 1 3cnfsat
    Proof of Theorem 1 –3CNFSAT

    • An instance of 3CNFSAT is given by:

      • A set of n variables, V={X1,X2, …,Xn}.

      • A boolean formula B consisting of conjunction of m clauses, B=C1⋀C2⋀…⋀Cm.

      • Each clause Cj=(L1⋁L2⋁L3) is a disjunction of three literals.

      • Each literal Lk is any variable from V or its negation - Lk=Xi or Lk=⌐Xi.

      • Example: B=(X1⋁X2⋁⌐X3)⋀(⌐X2⋁⌐X5⋁X6)⋀(X1⋁X4⋁⌐X5)


    Proof of theorem 1 idea of the proof
    Proof of Theorem 1 –Idea of the Proof

    • Given an instance of 3CNFSAT formula, B, we construct a program consisting of 3n+3m+2 threads which use 3n+m+1 semaphores (assumed to be initialized to 0).

    • The execution of this program simulates a nondeterministic evaluation of B.

    • Semaphores are used to represent the truth values of each variable and clause.

    • The execution exhibits certain orderings iff B is not satisfiable.


    Proof of theorem 1 the construction per variable

    wait( Ai )

    signal( Xi )

    .

    .

    signal( Xi )

    wait( Ai )

    signal( not-Xi )

    .

    .

    signal( not-Xi )

    signal( Ai )

    wait( Pass2 )

    signal( Ai )

    Proof of Theorem 1 –The Construction per Variable

    • For each variable, Xi, the following three threads are constructed:

    • “. . .” indicates as many signal(Xi) (or signal(not-Xi)) operations as the number of occurrences of the literal Xi (or ⌐Xi) in the formula B.


    Proof of theorem 1 the construction per variable1
    Proof of Theorem 1 –The Construction per Variable

    • The semaphores Xi and not-Xi are used to represent the truth value of variable Xi.

    • Signaling the semaphore Xi (or not-Xi) represents the assignment of True (or False) to variable Xi.

    • The assignment is accomplished by allowing either signal(Xi) or signal(not-Xi) to proceed, but not both (due to concurrent wait(Ai) operations in two leftmost threads).


    Proof of theorem 1 the construction per clause

    wait( L1 )

    signal( Cj )

    wait( L2 )

    signal( Cj )

    wait( L3 )

    signal( Cj )

    Proof of Theorem 1 –The Construction per Clause

    • For each clause, Cj, the following three threads are constructed:

    • L1, L2 and L3 are the semaphores corresponding to literals in clause Cj (i.e. Xi or not-Xi).

    • The semaphore Cj represents the truth value of clause Cj. It is signaled iff the truth assignments to variables, cause the clause Cj to evaluate to True.


    Proof of theorem 1 explanation of construction
    Proof of Theorem 1 –Explanation of Construction

    • The first 3n threads operate in two phases:

      • The first pass is a non-deterministic guessing phase in which:

        • Each variable used in the boolean formula B is assigned a unique truth value.

        • Only one of the Xi and not-Xi semaphores is signaled.

      • The second pass (begins after semaphore Pass2 is signaled) is used to ensure that the program doesn’t deadlock:

        • The semaphore operations that were not allowed to execute during the first pass are allowed to proceed.


    Proof of theorem 1 the final construction

    wait( C1 )

    .

    .

    wait( Cm )

    b: skip

    a: skip

    signal( Pass2 )

    .

    .

    signal( Pass2 )

    m

    n

    Proof of Theorem 1 –The Final Construction

    • Additional two threads are created:

    • There are n ‘signal(Pass2)’ operations – one for each variable.

    • There are m ‘wait(Cj)’ operations – one for each clause.


    Proof of theorem 1 putting all together
    Proof of Theorem 1 –Putting All Together

    • Event bis reached only after semaphore Cj,for each clause j, has been signaled.

    • The program contains no conditional statements or shared variables.

      • Every execution of the program executes the same events and exhibits the same shared-data dependencies (i.e. none).

    • Claim: For any execution a MHB→ b iff B is not satisfiable.


    Proof of theorem 1 proving the if part
    Proof of Theorem 1 –Proving the “if” Part

    • Assume that B is not satisfiable.

    • Then there is always some clause, Cj, that is not satisfied by the truth values guessed during the first pass. Thus, no signal(Cj) operation is performed during the first pass.

    • Event b can’t execute until this signal(Cj) operation is performed, which can then only be done during the second pass.

    • The second pass doesn’t occur until after event a executes, so event a must precede event b.

    • Therefore, a MHB→ b.


    Proof of theorem 1 proving the only if part
    Proof of Theorem 1 –Proving the “only if” Part

    • Assume that a MHB→ b.

    • This means that there is no execution in which b either precedes a or executes concurrently with a.

    • Assume by way of contradiction that B is satisfiable.

    • Then some truth assignment can be guessed during the first pass that satisfies all of the clauses.

    • Event b can then execute before event a, contradicting the assumption.

    • Therefore, B is not satisfiable.


    Complexity of computing ordering relations cont
    Complexity of Computing Ordering Relations – Cont.

    • Since a MHB→ b iff B is not satisfiable, the problem of deciding a MHB→ b is Co-NP-hard.

    • By similar reductions, programs can be constructed such that the non-satisfiability of B can be determined from the MCW or MOW relations. The problem of deciding these relations is therefore also Co-NP-hard.

    • Theorem 2: Given a program execution, P=<E,T→,D→>, that uses counting semaphores, the problem of deciding whether a CHB→ b, a CCW↔ b or a COW↔ b (any of the could-have orderings) is NP-hard.

    • Proof by similar reductions …


    Complexity of race detection conditions loops and input
    Complexity of Race Detection -Conditions, Loops and Input

    • The presented model is too simplistic.

    • What if the “if” and “while” statements are used? What if the user’s input is allowed?

    If Y≥0 there is a data-race. Otherwise it is not possible, since [1] is never reached.


    Complexity of race detection np harder
    Complexity of Race Detection -“NP-Harder”?

    • The proof above does not use conditional statements, loops or input from outside.

    • The problem of data-race detection is much-much harder then deciding an NP-complete problem.

      • Intuitively - there is no exponential solution, since it’s not known whether the program will stop.

    • Thus, in general case, it’s undecidable.


    So how data races can be detected approximations
    So How Data-Races Can be Detected? – Approximations

    • Deciding whether a CHB→ b or a CCW↔ b will reveal feasible data-races.

    • Since it is intractable problem, the temporal ordering relation T→ should be approximated and apparent data-races located instead.

    • Recall that apparent data-races exist if and only if at least one feasible race exists.

    • Yet, it remains a hard problem to locate all apparent data-races.


    Approximation example lamport s happens before
    Approximation Example – Lamport’s Happens-Before

    • The happens-before partial order, denoted ahb→b, is defined for access events (reads, writes, releases and acquires) that happen in a specific execution, as follows:

    • Shared accesses a and b are concurrent,

      ahb↮ b, if neither ahb→ b nor bhb→ a holds.

    • Program Order:

      a and b are events performed by the same thread, with a preceding b

    • Release and Acquire:

      a is a release of a some sync’ object S and b is a corresponding acquire

    • Transitivity:

      ahb→c and c hb→b

    ahb→b


    Approaches to detection of apparent data races static
    Approaches to Detection ofApparent Data-Races – Static

    There are two main approaches to detection of apparent data-races (sometimes a combination of the both is used):

    • Static – perform a compile-time analysis of the code.

      – Too conservative:

      • Can’t know or understand the semantics of the program.

      • Result in excessive false alarms that hide the real data-races.

        + Test the program globally:

      • See the whole code of the tested program

      • Can warn about all possible errors in all possible executions.


    Approaches to detection of apparent data races dynamic
    Approaches to Detection ofApparent Data-Races – Dynamic

    • Dynamic – use tracing mechanism to detect whether a particular execution actually exhibited data-races.

      + Detect only those apparent data-races that actually occur during a feasible execution.

      – Test the program locally:

      • Consider only one specific execution path of the program each time.

    • Post-Mortem Methods – after the execution terminates, analyze the trace of the run and warn about possible data-races that were found.

    • On-The-Fly Methods – buffer partial trace information in memory, analyze it and detect races as they occur.


    Approaches to detection of apparent data races
    Approaches to Detection ofApparent Data-Races

    • No “silver bullet” exists.

    • The accuracy is of great importance (especially in large programs).

    • There is always a tradeoff between the amount of false positives (undetected races) and false negatives (false alarms).

    • The space and time overheads imposed by the techniques are significant as well.


    Closer look at dynamic methods
    Closer Look atDynamic Methods

    • We show two dynamic methods for on-the-fly detection of apparent data-races in multi-threaded programs with locks and barriers:

      • DJIT+ – based on Lamport’s happens-beforepartial order relation and Mattern’s virtual time (vector clocks). Implemented in Millipede and MultiRace systems.

      • Lockset – based on locking discipline and locksetrefinement. Implemented in Eraser tool and MultiRace system.


    Djit description
    DJIT+Description

    • Detects the apparent data-races in program execution when they actually occurs.

    • Based on the happens-before partial order.

    • Can announce data-races race-by-race.

    • After the cause of the race is verified, the search for other races can proceed.

    • The main disadvantage of the technique is that it is highly dependent on the scheduling order.


    Djit local time frames ltf
    DJIT+Local Time Frames (LTF)

    • The execution of each thread is split into a sequence of time frames

    • A new time frame starts on each release (unlock/barrier)

    • For every access there is a time stamp = a vector built from LTFs of all threads at the moment of the access


    Djit local time frames
    DJIT+ Local Time Frames

    Claim 1: Let a in thread ta and b in thread tb be two accesses, where a occurs at time frame Ta, and the release in ta, corresponding to the latest acquire in tb which precedes b, occurs at time frame Tsync in ta. Then ahb→ b iff Ta< Tsync.


    Djit local time frames1
    DJIT+Local Time Frames

    Proof:

    - If Ta< Tsync then (ahb→ release) and since (release hb→ acquire) and (acquire hb→ b), we get (ahb→ b).

    - If (ahb→ b) and since a and b are in distinct threads, then by definition there exists a pair of corresponding release an acquire, so that (ahb→ release) and (acquire hb→ b). It follows that Ta< Trelease ≤ Tsync.


    Djit vector time frames vtf
    DJIT+Vector Time Frames (VTF)

    • A vector stt[.] for each thread t

      • Vector size = maxthreads (the maximum number of threads to execute)

      • Thread ID = thread index

    • stt[t] is the LTF of t

      • Holds the number of releases actually made by t

    • stt[u] stores the latest LTF of thread u known to t

    • If u is an acquirer of t’s release, then u’s vector is updated:

      for k=0 to maxthreads-1

      stu[k] = max( stu[k], stt[k] )


    Djit vector time frames
    DJIT+Vector Time Frames

    • In such way, the vector of u is notified of:

      • The latest time frame of t.

      • The latest time frames of other threads according to the knowledge of t.

    • Note that a thread can learn about a release performed by another thread through “gossip”, when this information is transferred through a chain of corresponding release-acquire pairs.


    Djit vector time frames1
    DJIT+Vector Time Frames


    Djit vector time frames2
    DJIT+ Vector Time Frames

    Claim 2: Let a and b be two accesses in respective threads ta and tb, which happened during respective local time frames Ta and Tb. Let f denote the value of sttb[ta] at the time when b occurs. Then ahb→ b iff Ta < f.


    Djit vector time frames3
    DJIT+ Vector Time Frames

    Proof:

    - If (ahb→ b) and since a and b are in distinct threads, then there exists a chain of releases and corresponding acquires such that the first release in ta and the last acquire in tb, so that (ahb→ first release) and (first release hb→ last acquire). The information on ta’s local time frame is transferred through that chain, reaches tb and stored in sttb[ta] (=f). Thus it follows that Ta< Tfirst release ≤ f.

    - If Ta< f then there is a sequence of corresponding release-acquire pairs, which transfer the local time frame from ta to tb, finally resulting in tb “hearing” that ta entered a time frame which is later than Ta. This same sequence can be used to transitively apply the hb→ relation from a to b.


    Djit logging mechanism
    DJIT+ Logging Mechanism

    • We assume the existence of some logging mechanism, which is:

      • Capable of logging all the accesses to all shared locations as they occur.

      • Accesses are logged ‘atomically’ (no data-races on the accesses to the log)

      • Agrees with the happens-before partial order:

        • If ahb→ b, then i is logged prior to b.

        • Also it follows that – if a and b accesses to same shared location v and a is logged prior to b, then bhb↛a.


    Djit data race detection using vtf
    DJIT+ Data-Race Detection Using VTF

    Theorem 1: Let a and b be two accesses to the same shared variable in respective threads ta and tb during respective local time frames Ta and Tb. Suppose that at least one of a or b is a write. Assume that a was logged and tested for races prior to b. Then a and b form a data-race iff at the time when b is logged it holds that sttb[ta] ≤Ta.


    Djit data race detection using vtf1
    DJIT+ Data-Race Detection Using VTF

    Proof:

    - If sttb[ta] ≤Ta then, by Claim 2, ahb→ b doesn’t hold. Since b is only currently being logged, it can not hold that bhb→ a. Thus a and b are concurrent and form a data race (since at least one of them is a write).

    - If a and b form a data race then ahb→ b doesn’t hold. Thus, by Claim 2, sttb[ta] ≤Ta.


    Djit data race detection predicate
    DJIT+Data Race DetectionPredicate

    P(a,b) ≜ ( a.type = write ⋁ b.type = write ) ⋀

    ⋀ ( a.time_frame ≥ stb.thread_id[a.thread_id] )

    • P gets two accesses, a and b, such that:

      • a and b are in different threads

      • a and b access same shared location

      • a was logged and tested earlier

      • b is currently logged

    • P returns TRUE iff a and b form a data race

       Obviously, very expensive


    Djit which accesses to check
    DJIT+Which Accesses to Check?

    • We have assumed that there is a logging mechanism, which records all accesses.

    • Logging all accesses in all threads and testing the predicate P for each pair of them will impose a great overhead on the system.

    • Actually some of the accesses can be discarded.


    Djit which accesses to check1
    DJIT+Which Accesses to Check?

    Claim 3: Consider an access a in thread ta during time frame Ta, and accesses b and c in thread tb=tc during time frame Tb=Tc. Assume that c precedes b in the program order. If a and b are concurrent, then a and c are concurrent as well.


    Djit which accesses to check2
    DJIT+Which Accesses to Check?

    • Proof:

      - Let fb and fc denote the respective values of sttb[ta] when b and c happen. Since sttb[ta] is monotonically increasing, and c precedes b, we know that fb≥ fc. Since ahb→ b does not hold, we know by Claim 2 that Ta≥ fb. Thus, Ta≥ fc and again by Claim 2 we get that ahb→ c is false.

      - Let fa denote the value of stta[tb] when a happens. Since bhb→ a does not hold, we know by Claim 2 that Tb≥ fa. Since Tb=Tc we get that Tc≥ fa. Thus by Claim 2, chb→ a is false.


    Djit which accesses to check example
    DJIT+Which Accesses to Check? - Example

    • Accesses b and c previously logged in thread t1 in a same time frame

    • Access b precedes access c in the program order

    • Access a currently logged in thread t2

    • If a and b are synchronized, then a and c are synchronized as well

    b

    c

    No logging

    It is sufficient to log and test only the first read access and the first write access to every variable in each time frame!

    a

    No logging

    DR


    Djit which time frames to check
    DJIT+Which Time Frames to Check?

    • Assume that in thread ta an access a is currently being logged and in thread tb we previously logged a write b in time frame Tb and another previous write c in time frame Tc, so that

      Tb< Tc.


    Djit which time frames to check1
    DJIT+Which Time Frames to Check?

    • Claim 4: If a forms a data-race with b then it certainly forms a data-race with c.

    • Proof: Easy, since Tc > Tb≥ stta[tb].

    • Either pair a-b or a-c can be considered to be the apparent data-race to be reported.

    • Also, if there is no data-race between a and c, then there is also no data-race between a and b.

      • Therefore, the a-b pair should not be checked.


    Djit which time frames to check2
    DJIT+Which Time Frames to Check?

    • For current read access to a shared variable v, it is enough to check it against the last time frame in each of the other threads, which wrote to v.

    • For current write access to v, it is enough to check it against the last time frame in each of the other threads, which read from v, and the last time frame in each of the other threads, which wrote to v.


    Djit access history algorithm

    Time frames of recent reads

    from v – one for each thread

    r-tf1

    r-tf2

    ...

    ...

    ...

    r-tfn

    V

    w-tf1

    w-tf2

    ...

    ...

    ...

    w-tfn

    Time frames of recent writes

    to v – one for each thread

    Djit+Access History & Algorithm

    • Each variable v holds for each of the threads:

      • The last time frames in which they read from v

      • The last time frames in which they wrote to v

    On each first read and first write to v in a time frame every thread updates the access history of v with LTF

    If the access to v is a read, the thread checks all recent writes by other threads to v

    If the access is a write, the thread checks all recent reads as well as all recent writes by other threads to v

    To support weak memory model, the

    history should be atomic and coherent


    Djit coherency
    DJIT+Coherency

    • In fact, the presented algorithm uses only the coherency assumption on the access history.

    • Coherency means that:

      • For each variable v there is an agreed-among-all-threads global order Rv on all accesses to v.

      • The reads always return the most recently written value.

    • Hence, the algorithm described above is correct also for weakly ordered systems.

    • E.g., the data-race-free-1 memory

      model only requires that in total

      absence of data-races the program

      executes as if it was sequentially consistent.

    The history is coherent, but

    not sequentially consistent.


    Djit results
    DJIT+Results

    • The DJIT algorithm was implemented in several academic systems – Millipede and MultiRace.

       No false alarms

       No missed races in given feasible execution

       Very sensitive to differences in threads’ scheduling

      • Should be applied each time the program executes (and not only in debug mode)

         Requires enormous number of runs

      • Yet cannot prove that the tested program is race free


    Lockset locking discipline
    LocksetLocking Discipline

    • Lockset detects violations of locking discipline

    • The locking discipline is a programming policy that ensures total absence of data races

    • A common and simple locking discipline is that every shared location is consistently protected by the same lock on each access

    • The main drawback is a possibly excessive number of false alarms


    Lockset what is the difference
    LocksetWhat is the Difference?

    [1] hb→ [2], yet there is a feasible data-race under different scheduling.

    No any locking discipline on Y. Yet [1] and [2] are ordered under all possible schedulings.


    Lockset the basic algorithm
    LocksetThe Basic Algorithm

    • C(v) – the set of all locks that consistently protected v in the execution so far

    • locks_held(t) – the set of all locks currently acquired by thread t

    The algorithm:

    - For each v, init C(v) to the set of all possible locks

    - On each access to v by thread t:

    - lhvlocks_held(t)

    - if this is a read, then lhvlhv∪ {readers_lock}

    - C(v)C(v) ∩lhv

    - if C(v)=∅, issue a warning


    Lockset explanation
    LocksetExplanation

    • The process is called lockset refinement.

    • It ensures that any lock that consistently protected v is contained in C(v).

    • A lock m is in C(v) if in execution up to that point, every thread that has accessed v was holding m at the moment of access.

    • If some lock m consistently protects v, it will remain in C(v) till the termination of the program.

    • The addition of fake readers_lock lock ensures that concurrent reads are not interpreted as data races.

    • The first write to v permanently removes readers_lock from C(v).


    Lockset example

    RL = readers_lock

    prevents from multiple reads to generate false alarms

    LocksetExample

    lock( L1 )

    read v

    {L1,RL}

    {L1,RL}

    unlock( L1 )

    Warning:

    locking

    discipline

    for v is

    violated!!!

    { }

    lock( L2 )

    write v

    {L2}

    { }

    unlock( L2 )

    { }


    Extended lockset which accesses to check
    Extended Lockset Which Accesses to Check?

    • Two accesses, a and b, to v

      • Both in same thread

      • Both in same time frame

      • Access a precedes access b

    • Then: Locksa(v) ⊆ Locksb(v)

      • Locksu(v) is the set of real locks acquired by the thread during access u to v

    Accesses [1], [2], [3] are all in same time frame


    Extended lockset which accesses to check1
    Extended Lockset Which Accesses to Check?

    • It follows that:

      • 1) [C(v) ∩Locksa(v)]⊆ [C(v) ∩Locksb(v)]

      • 2) If C(v) ∩Locksa(v)≠∅then C(v) ∩Locksb(v)≠∅

      • Only first access in each time frame need be logged and checked!!!

    • The addition of readers_lock forces us to check both first read and first write in each time frame

    • Lockset needs same logging mechanism as Djit+!


    Extended lockset improving locking discipline
    Extended Lockset Improving Locking Discipline

    • The locking discipline described above is too strict.

    • There are common programming practices that violate the discipline, yet are free from data-races:

      • Initialization: Shared variables are usually initialized without holding any locks.

      • Read-Shared Data: Some shared variables are written during initialization only and are read-only thereafter.

      • Barriers: Threads can synchronize through barriers, which are not supported by the notion of locking discipline. For data-race-free programs using barriers only, the basic Lockset will report false alarms on every pair of accesses from different threads.


    Extended lockset initialization
    Extended LocksetInitialization

    • When initializing newly allocated data there is no need to lock it, since other threads can not hold a reference to it yet.

    • Unfortunately, there is no easy way of knowing when initialization is complete.

    • Therefore, a shared variable is initialized when it is first accessed by a second thread.

    • As long as a variable is accessed by a single thread, reads and writes don’t update C(v).


    Extended lockset read shared data
    Extended LocksetRead-Shared Data

    • There is no need to protect a variable if it’s initialized once and thereafter is read-only.

    • To support unlocked read-sharing, the fake readers_lock was added.

    • Still, some additional mechanism is needed so that the initialization will not permanently remove the readers_lock from C(v).

    • Note: The fake lock doesn’t prevent from threads to execute the reads concurrently.


    Extended lockset supporting barriers
    Extended Lockset Supporting Barriers

    • Barrier is a global synchronization primitive

      • Locks are 2-way

    • In order to pass the barrier, all threads must reach it first and only then continue.

    • Observations:

      • reaching a barrier ≅ starting new execution

      • No races between accesses from different sides of a barrier

    • Idea – restart Lockset detection each time barrier is reached by all threads.


    Extended lockset supporting barriers1

    write by first thread

    read/write by same thread

    read/write by any thread,

    C(v) not empty

    read/write by new thread

    read by any thread

    Initializing

    Virgin

    Shared

    read/write by any thread, C(v) is empty

    barrier

    read/write by new thread,

    C(v) not empty

    barrier

    barrier

    Exclusive

    Empty

    barrier

    read/write by same thread

    Clean

    barrier

    read/write by any thread

    read/write by some thread

    barrier

    read/write by new thread, C(v) is empty

    Extended Lockset Supporting Barriers

    • Variable v is supposed to be initialized when:

      • It is first accessed by a second thread

      • The thread that first accessed v reaches a barrier

    State

    transition

    diagram

    employed

    for each

    variable


    Extended lockset states explanation
    Extended LocksetStates Explanation

    • Virgin – The variable is new and have not been referenced by any thread.

    • Initializing: – The variable is initialized by only one thread. C(v) is not updated in this state.

    • Shared: – The data is accessed by more than one thread. C(v) is updated on each access.

    • Empty: – C(v) became empty. Data race warning is announced only the first time this state is reached.

    • Clean: – Barrier was reached by all threads. C(v) is initialized to hold the set of all possible locks.

    • Exclusive – Similar to the Initializing state - after reaching the barrier, the variable is accessed by only one thread. It’s supposed to be already initialized. Thus, C(v) is updated on each access, but data race is announced only if another thread accesses v, and C(v) is empty.


    Lockset still false alarms
    LocksetStill False Alarms

    The refined algorithm will still produce a false alarm in the following simple case:


    Lockset additional false alarms
    LocksetAdditional False Alarms

    • Additional possible false alarms are:

      • Queue that implicitly protects its elements by accessing the queue through locked head and tail fields.

      • Thread that passes arguments to a worker thread. Since the main thread and the worker thread never access the arguments concurrently, they do not use any locks to serialize their accesses.

      • Privately implemented locks,

        which don’t communicate with Lockset.

      • True data races that don’t affect

        the correctness of the program

        (for example Benign races).

    if (f == 0)

    lock(m);

    if (f == 0)

    f = 1;

    unlock(m);


    Lockset results
    Lockset Results

    • The basic Lockset was implemented in a full scale testing tool, Eraser, which is used in industry (not “on paper only”).

    • The extended Lockset was implemented in MultiRace academic system.

       Less sensitive to differences in threads’ scheduling

       Detects a superset of all apparently raced locations in an execution of a program

      • Possible races can be rarely missed

         Our extension for barriers can be used to check programs that employ barriers only and no locks

         Still lots of false alarms

         Still dependent on scheduling

      • Cannot prove the tested program is race free


    Combining djit and lockset

    L

    Violations detected by Lockset in execution E of P

    All shared locations in some program P

    S

    A

    F

    D

    All feasibly raced locations in program P

    All apparently raced locations in program P

    Raced locations detected by DJIT+ in execution E of P

    Combining Djit+ and Lockset

    • Lockset can detect suspected races in more execution orders

    • DJIT+ can filter out the spurious warnings reported by Lockset

      • Every completed data race is also a locking discipline violation

    • For many types of programs L tends to cover A – we detect a subset and a superset of all raced locations!!!

    • The number of checks performed by DJIT+ can be reduced with the help of Lockset

      • If C(v) is not empty yet, DJIT+ should not check v for races

    • The implementation overhead comes mainly from the access logging mechanism

      • Can be shared by both algorithms


    Dynamic data race detection summary
    Dynamic Data-Race DetectionSummary

    • The solutions are not universal.

      • Not all located apparent data races are feasible.

      • Still requires a large number of runs to check as much executions paths as possible.

      • Still cannot prove the program to be data race free.

      • Since slowdowns can be high, a satisfying testing can take months.

      • Different (or new) types of synchronization might require different detection techniques.

      • Inserting a detection code in a program can perturb the threads’ interleaving so that races will disappear (less sensitive in Lockset).

      • Maybe to combine with some static analysis?

      • Maybe better approximations can be found...?



    References
    References

    • S. Adve and M. D. Hill. A Unified Formalization of Four Shared-Memory Models. Technical Report, University of Wisconsin, Sept. 1992.

    • A. Itzkovitz, A. Schuster, and O. Zeev-Ben-Mordechai. Towards Integration of Data Race Detection in DSM System. In The Journal of Parallel and Distributed Computing (JPDC), 59(2): pp. 180-203, Nov. 1999

    • L. Lamport. Time, Clock, and the Ordering of Events in a Distributed System. In Communications of the ACM, 21(7): pp. 558-565, Jul. 1978

    • F. Mattern. Virtual Time and Global States of Distributed Systems. In Parallel & Distributed Algorithms, pp. 215­226, 1989.


    References cont
    ReferencesCont.

    • R. H. B. Netzer and B. P. Miller. What Are Race Conditions? Some Issues and Formalizations. In ACM Letters on Programming Languages and Systems, 1(1): pp. 74-88, Mar. 1992.

    • R. H. B. Netzer and B. P. Miller. On the Complexity of Event Ordering for Shared-Memory Parallel Program Executions. In 1990 International Conference on Parallel Processing, 2: pp. 93­97, Aug. 1990

    • R. H. B. Netzer and B. P. Miller. Detecting Data Races in Parallel Program Executions. In Advances in Languages and Compilers for Parallel Processing, MIT Press 1991, pp. 109-129.


    References cont1
    ReferencesCont.

    • E. Pozniansky. Efficient On-The-Fly Data Race Detection in Multithreaded C++ Programs. Research Thesis, May 2003.

    • S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T.E. Anderson. Eraser: A Dynamic Data Race Detector for Multithreaded Programs. In ACM Transactions on Computer Systems, 15(4): pp. 391-411, 1997

    • O. Zeev-Ben-Mordehai. Efficient Integration of On-The-Fly Data Race Detection in Distributed Shared Memory and Symmetric Multiprocessor Environments. Research Thesis, May 2001.


    ad