Drf x
1 / 14

DRF x - PowerPoint PPT Presentation

  • Uploaded on

DRF x. Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy. UC Los Angeles. University of Michigan. A Simple and Efficient Memory Model for Concurrent Programming Languages. UC Los Angeles. MSR, Redmond. University of Michigan.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' DRF x' - jackson-daniel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Drf x


Dan Marino

Abhay Singh

Todd Millstein



UC Los Angeles

University of Michigan

A Simple and Efficient Memory Modelfor Concurrent Programming Languages

UC Los Angeles

MSR, Redmond

University of Michigan

State of the art sc for data race free memory models
State of the Art:SC for Data Race Free Memory Models

  • sequential consistency [Lamport 79]

    • intuitive for programmers

    • limits compiler and hardware optimizations

  • DRF0 [Adve&Hill 90] models balance performance and ease of programming

    • SC behavior guaranteed for race-free programs

    • most optimizations allowed

  • e.g. Java and C++0x memory models[Manson et al. 2005] [Boehm et al. 2008]

Program behavior under drf0
Program Behavior under DRF0

X* x = null;

bool init = false;


// Thread t // Thread u

A: x = new X();C: if(init)

B: init = true;D: x->f++;

A: x = new X();

C: if(init)

D: x->f++;

B: init = true;

Optimizing Compiler and Hardware


B doesn’t depend on A.

It might be faster to reorder them!

Deficiencies of drf0
Deficiencies of DRF0

weak or no semantics for racy programs

unintentional data races easy to introduce

problematic for


programmer must assume non-SC behavior for all programs


compiler correctness

[Boehm et al., PLDI 2008]

optimization + data race = jump to arbitrary code!

Java must maintain safety at the cost of complexity

[Ševčík&Aspinall, ECOOP 2008]

Our solution the drf x memory model
Our Solution: The DRFxMemory Model






Programming Error

Fatal Runtime Error

  • debuggabilitySC for all executions

  • safetyhalt program before non-SC behavior exhibited

  • compiler correctnessmost sequentially-valid optimization permitted

Drf x allows relaxed data race detection
DRFx Allows Relaxed Data Race Detection

source program

observed behavior

data race free

SC Behavior

simplify detection



has data races

precise runtime data race detection is slow in software and complex in hardware[Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]

Detecting an sc violation
Detecting an SC Violation

X* x = null;

bool init = false;

// Thread t // Thread u

A: x = new X();C: if(init)

B: init = true;D: x->f++;

Races need not be reported between regionsthat do not execute concurrently!region serializable for compiled ⇒ SC for source


region fence

B: init = true;

region fence

C: if(init)

D: x->f++;

region fence

A: x = new X();

region fence

data race,but no SC violation

Insight: compiler can communicate to runtime the

regions in which reordering may have occurred

runtime must detect conflicting accessesin regions that execute concurrently.

Drf x compiler and runtime requirements
DRFxCompiler and Runtime Requirements

  • DRFx Compiler

    • communicate regions in which optimizations were made by using fence instructions

    • synchronization in their own region

    • no speculative memory accesses

  • DRFx Execution Environment

    • trap on conflicting accesses in concurrent regions

    • global order on region fences

    • memory order consistent with fence order


  • compiler requirements

    • how program is split into regions

    • permitted optimizations

      • all non-speculative, sequentially valid optimizations

  • execution environment requirements

    • when conflict may/must be reported

    • memory orderings allowed w.r.t. fences

  • prove

    • no MM exception ⇒ SC behavior for source program

    • MM exception ⇒ data race in source program

Efficient simple conflict detection
Efficient & Simple Conflict Detection

  • perform detection in hardware

  • like transactional memory hardware – but simpler

    • no rollback

    • we control region boundaries

  • compiler bounds number of memory locations dynamically accessed in a region

    • limits optimization opportunities

    • distinguish “bounding” region fence

  • hardware can merge regions separated by a bounding fence when resources available

Compiler implementation
Compiler Implementation

  • built conservative DRFx-compliant compiler

    • LLVM [Lattner & Adve 2004]

    • naïve bounding analysis

    • bounding fence at all loop back edges

    • disable speculative optimizations

  • measured performance

    • PARSEC benchmark suite

    • stock x86 hardware – no architectural simulator

Drf x overhead on parsec benchmarks
DRFxOverhead on Parsec Benchmarks

slowdown over unmodified, fully optimizing LLVM

Related work
Related Work

  • memory modelse.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990]

  • hardware race detection[Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003]

  • software race detection e.g. [Yu et al. 2005 ],[Flanagan & Freund 2009],[Elmas et al. 2007]

  • detecting SC violations [Gharachorloo&Gibbons, SPAA 1991]

  • conflict exception [Lucia et al., ISCA 2010]

    • stronger guarantee : serializability of sync-free regions

    • requires unbounded detection scheme

    • focused on hardware

Drf x conclusion
DRFx Conclusion


lightweight form of data race detection

MM Exception


programmer gets understandable behavior for all programscompiler may perform most sequentially valid optimizations within regions


straightforward hardware supportcompiler restrictions ⇒ only 0% - 7% slowdown