analysis of multithreaded programs l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Analysis of Multithreaded Programs PowerPoint Presentation
Download Presentation
Analysis of Multithreaded Programs

Loading in 2 Seconds...

play fullscreen
1 / 145

Analysis of Multithreaded Programs - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Analysis of Multithreaded Programs. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology. What is a multithreaded program?. NOT general parallel programs No message passing No tuple spaces No functional programs No concurrent constraint programs

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Analysis of Multithreaded Programs' - taro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
analysis of multithreaded programs

Analysis of Multithreaded Programs

Martin Rinard

Laboratory for Computer Science

Massachusetts Institute of Technology

what is a multithreaded program
What is a multithreaded program?

NOT general parallel programs

No message passing

No tuple spaces

No functional programs

No concurrent constraint programs

NOT just multiple threads of control

No continuations

No reactive systems

Multiple Parallel

Threads Of Control

Lock Acquire

and Release

read

write

Shared

Mutable

Memory

why do programmers use threads
Why do programmers use threads?
  • Performance (parallel computing programs)
    • Single computation
    • Execute subcomputations in parallel
    • Example: parallel sort
  • Program structuring mechanism

(activity management programs)

    • Multiple activities
    • Thread for each activity
    • Example: web server
  • Properties have big impact on analyses
practical implications
Practical Implications
  • Threads are useful and increasingly common
    • POSIX threads standard for C, C++
    • Java has built-in thread support
    • Widely used in industry
  • Threads introduce complications
    • Programs viewed as more difficult to develop
    • Analyses must handle new model of execution
  • Lots of interesting and important problems!
outline
Outline
  • Examples of multithreaded programs
    • Parallel computing program
    • Activity management program
  • Analyses for multithreaded programs
  • Handling data races
  • Future directions
slide8

8

2

7

4

6

1

3

5

Example - Divide and Conquer Sort

7

4

6

1

3

5

8

2

Divide

slide9

8

2

7

4

6

1

3

5

Example - Divide and Conquer Sort

7

4

6

1

3

5

8

2

Divide

4

7

1

6

3

5

2

8

Conquer

slide10

8

2

7

4

6

1

3

5

Example - Divide and Conquer Sort

7

4

6

1

3

5

8

2

Divide

4

7

1

6

3

5

2

8

Conquer

1

4

6

7

2

3

5

8

Combine

slide11

8

2

7

4

6

1

3

5

Example - Divide and Conquer Sort

7

4

6

1

3

5

8

2

Divide

4

7

1

6

3

5

2

8

Conquer

1

4

6

7

2

3

5

8

Combine

1

2

3

4

5

6

7

8

slide12

Divide and Conquer Algorithms

  • Lots of Recursively Generated Concurrency
    • Solve Subproblems in Parallel
slide13

Divide and Conquer Algorithms

  • Lots of Recursively Generated Concurrency
    • Recursively Solve Subproblems in Parallel
slide14

Divide and Conquer Algorithms

  • Lots of Recursively Generated Concurrency
    • Recursively Solve Subproblems in Parallel
    • Combine Results in Parallel
slide15

“Sort n Items in d, Using t as Temporary Storage”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);
slide16

“Sort n Items in d, Using t as Temporary Storage”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

Divide array into subarrays and recursively sort subarrays in

parallel

slide17

7

4

6

1

3

5

8

2

“Sort n Items in d, Using t as Temporary Storage”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

Subproblems Identified

Using Pointers Into

Middle of Array

d

d+n/4

d+n/2

d+3*(n/4)

slide18

4

7

1

6

3

5

2

8

“Sort n Items in d, Using t as Temporary Storage”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

Sorted Results

Written Back Into

Input Array

d

d+n/4

d+n/2

d+3*(n/4)

slide19

4

1

4

7

1

6

6

7

3

2

3

5

2

5

8

8

“Merge Sorted Quarters of d Into Halves of t”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

d

t

t+n/2

slide20

1

1

4

2

3

6

4

7

5

2

3

6

7

5

8

8

“Merge Sorted Halves of t Back Into d”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

d

t

t+n/2

slide21

7

4

6

1

3

5

8

2

“Use a Simple Sort for Small Problem Sizes”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

d

d+n

slide22

7

4

1

6

3

5

8

2

“Use a Simple Sort for Small Problem Sizes”

  • void sort(int *d, int *t, int n)
  • if (n > CUTOFF) {
    • spawn sort(d,t,n/4);
    • spawn sort(d+n/4,t+n/4,n/4);
    • spawn sort(d+2*(n/4),t+2*(n/4),n/4);
    • spawn sort(d+3*(n/4),t+3*(n/4),n-3*(n/4));
    • sync;
    • spawn merge(d,d+n/4,d+n/2,t);
    • spawn merge(d+n/2,d+3*(n/4),d+n,t+n/2);
    • sync;
    • merge(t,t+n/2,t+n,d);
  • } else insertionSort(d,d+n);

d

d+n

key properties of parallel computing programs
Key Properties of Parallel Computing Programs
  • Structured form of multithreading
    • Parallelism confined to small region
    • Single thread coming in
    • Multiple threads exist during computation
    • Single thread going out
  • Deterministic computation
    • Tasks update disjoint parts of data structure in parallel without synchronization
    • May also have parallel reductions
slide25

Main Loop

Client Threads

Accept new

connection

Start new

client thread

slide26

Main Loop

Client Threads

Accept new

connection

Start new

client thread

slide27

Main Loop

Client Threads

Accept new

connection

Start new

client thread

Wait for input

Produce output

slide28

Main Loop

Client Threads

Accept new

connection

Start new

client thread

Wait for input

Produce output

slide29

Main Loop

Client Threads

Accept new

connection

Wait for input

Start new

client thread

Wait for input

Produce output

slide30

Main Loop

Client Threads

Accept new

connection

Wait for input

Start new

client thread

Wait for input

Produce output

slide31

Main Loop

Client Threads

Accept new

connection

Wait for input

Wait for input

Start new

client thread

Produce output

Wait for input

Produce output

slide32

Main Loop

Client Threads

Accept new

connection

Wait for input

Wait for input

Start new

client thread

Produce output

Produce output

Wait for input

Produce output

main loop
Main Loop

Class Main {

static public void loop(ServerSocket s) {

c = new Counter();

while (true) {

Socket p = s.accept();

Worker t = new Worker(p,c);

t.start();

}

}

Accept new

connection

Start new

client thread

worker threads
Worker threads

class Worker extends Thread {

Socket s; Counter c;

public void run() {

out = s.getOutputStream();

in = s.getInputStream();

while (true) {

inputLine = in.readLine();

c.increment();

if (inputLine == null) break;

out.writeBytes(inputLine + "\n");

}

}

}

Wait for input

Increment counter

Produce output

synchronized shared counter
Synchronized Shared Counter

Class Counter {

int contents = 0;

synchronized void increment() {

contents++;

}

}

Acquire lock

Increment counter

Release lock

simple activity management programs
Simple Activity Management Programs
  • Fixed, small number of threads
  • Based on functional decomposition

Device Management Thread

User Interface

Thread

Compute Thread

key properties of activity management programs
Key Properties of Activity Management Programs
  • Threads manage interactions
    • One thread per client or activity
    • Blocking I/O for interactions
  • Unstructured form of parallelism
  • Object is unit of sharing
    • Mutable shared objects (mutual exclusion)
    • Private objects (no synchronization)
    • Read shared objects (no synchronization)
    • Inherited objects passed from parent to child
why analyze multithreaded programs
Why analyze multithreaded programs?

Discover or certify absence of errors

(multithreading introduces new kinds of errors)

Discover or verify application-specific properties

(interactions between threads complicate analysis)

Enable optimizations

(new kinds of optimizations with multithreading)

(complications with traditional optimizations)

slide40

Deadlock

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

slide41

Deadlock

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Threads 1 and 2 Start Execution

slide42

Deadlock

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Thread 1 acquires lock l

slide43

Deadlock

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Thread 2 acquires lock m

deadlock
Deadlock

Deadlock if circular waiting for resources (typically mutual exclusion locks)

Thread 1:

lock(l);

lock(m);

x = x + y;

unlock(m);

unlock(l);

Thread 2:

lock(m);

lock(l);

y = y * x;

unlock(l);

unlock(m);

Thread 1 holds l and waits for m

while

Thread 2 holds m and waits for l

slide45

Data Races

Data race if two parallel threads access same memory location and at least one access is a write

Data race

No data race

A[i] = v

A[i] = v

A[i] = v;

A[j] = w;

||

A[j] = w

A[j] = w

synchronization and data races
Synchronization and Data Races

No data race if synchronization separates accesses

Thread 1:

lock(l);

x = x + 1;

unlock(l);

Thread 2:

lock(l);

x = x + 2;

unlock(l);

Synchronization protocol: Associate lock with data Acquire lock to update data atomically

why are data races errors
Why are data races errors?
  • Exist correct programs which contain races
  • But most races are programming errors
    • Code intended to execute atomically
    • Synchronization omitted by mistake
  • Consequences can be severe
    • Nondeterministic, timing-dependent errors
    • Data structure corruption
    • Complicates analysis and optimization
overview of analyses for multithreaded programs
Overview of Analyses for Multithreaded Programs

Key problem: interactions between threads

  • Flow-insensitive analyses
  • Escape analyses
  • Dataflow analyses
    • Explicit parallel flow graphs
    • Interference summary analysis
  • State space exploration
slide50

Program With Allocation Sites

void main(i,j)

———————

———————

———————

void compute(d,e)

————

————

————

void evaluate(i,j)

——————

——————

——————

void multiplyAdd(a,b,c)

—————————

—————————

—————————

void abs(r)

————

————

————

void scale(n,m)

——————

——————

void multiply(m)

————

————

————

void add(u,v)

——————

——————

slide51

Program With Allocation Sites

void main(i,j)

———————

———————

———————

Correlate lifetimes of objects

with lifetimes of computations

void compute(d,e)

————

————

————

void evaluate(i,j)

——————

——————

——————

void multiplyAdd(a,b,c)

—————————

—————————

—————————

void abs(r)

————

————

————

void scale(n,m)

——————

——————

void multiply(m)

————

————

————

void add(u,v)

——————

——————

slide52

Program With Allocation Sites

void main(i,j)

———————

———————

———————

Correlate lifetimes of objects

with lifetimes of computations

void compute(d,e)

————

————

————

void evaluate(i,j)

——————

——————

——————

Do not escape

computation of

this method

void multiplyAdd(a,b,c)

—————————

—————————

—————————

void abs(r)

————

————

————

void scale(n,m)

——————

——————

void multiply(m)

————

————

————

void add(u,v)

——————

——————

Objects allocated

at this site

classical approach
Classical Approach
  • Reachability analysis
  • If an object is reachable only from local variables of current procedure, then object does not escape that procedure
escape analysis for multithreaded programs
Escape Analysis for Multithreaded Programs
  • Extend analysis to recognize when objects do not escape to parallel thread – OOPSLA 1999
    • Blanchet
    • Bogda, Hoelzle
    • Choi, Bupta, Serrano, Sreedhar, Midkiff
    • Whaley, Rinard
  • Analyze interactions to recapture objects that do not escape multithreaded subcomputation
    • Salcianu, Rinard – PPoPP 2001
applications
Applications
  • Synchronization elimination
  • Stack allocation
  • Region-based allocation
  • Data race detection

Eliminate accesses to captured objects as source of data races

parallel flow graphs
Parallel Flow Graphs

Basic Idea: Do dataflow analysis on parallel flow graph

Thread 1

Thread 2

Heap

x

y

p = &x

q = &a

p

z

*p = &y

q

a

b

*q = &b

p = &z

Intrathread control-flow edges

Interthread control-flow edges

infeasible paths issue
Infeasible Paths Issue

Infeasible paths cause analysis to lose precision

Thread 1

Thread 2

Heap

x

y

p = &x

q = &a

p

z

*p = &y

q

a

b

*q = &b

p = &z

Infeasible Path

Because of infeasible path, analysis thinks x

z

analysis time issue
Analysis Time Issue

Potential Solutions

  • Partial Order Approaches

Thread 1

Thread 2

p = &x

q = &a

*p = &y

*q = &b

p = &z

slide60

Analysis Time Issue

  • Potential Solutions
  • Partial Order Approaches – remove edges between statements in independent regions

Thread 1

Thread 2

p = &x

q = &a

*p = &y

*q = &b

p = &z

slide61

Analysis Time Issue

  • Potential Solutions
  • Partial Order Approaches – remove edges between statements in independent regions
  • How to recognize independent regions?
  • Seems like might need analysis…

Thread 1

Thread 2

p = &x

q = &a

*p = &y

*q = &b

p = &z

slide62

Analysis Time Issue

  • Potential Solutions
  • Partial Order Approaches
  • Control flow/synchronization analysis
    • Synchronization may prevent m from immediately preceding n in execution
    • If so, no edge from m to n

x = 1

lock(a)

x = x + v

y = y + 1

unlock(a)

y = 1

lock(a)

y = y + w

x = x + 1

unlock(a)

No edges between these statements

experience
Experience
  • Lots of research in field over last two decades
    • Deadlock detection
    • Data race detection
    • Control analysis for multithreaded programs (mutual exclusion, precedence properties)
    • Finite-state properties
  • Scope – simple activity management programs
    • Inlinable programs
    • Bounded threads and objects
references
References
  • FLAVERS
    • Dwyer, Clarke - FSE 1994
    • Naumovich, Avrunin, Clarke – FSE 1999
    • Naumovich, Clarke, Cobleigh – PASTE 1999
  • Masticola, Ryder
    • ICPP 1990 (deadlock detection)
    • PPoPP 1993 (control-flow analysis)
  • Duesterwald, Soffa - TAV 1991
    • Handles procedures
  • Blieberger, Burgstaller, Scholz – Ada Europe 2000
    • Symbolic analysis for dynamic thread creation
  • Scope
  • Inlinable programs
  • Bounded objects and threads
dataflow analysis for bitvector problems
Dataflow Analysis for Bitvector Problems
  • Knoop, Steffen, Vollmer – TOPLAS 1996
  • Bitvector problems
    • Dataflow information is a vector of bits
    • Transfer function for one bit does not depend on values of other bits
    • Examples
      • Reaching definitions
      • Available expressions
  • As efficient and precise as sequential version!
slide67

Available Expressions Example

Where is x+y

available?

a = x + y

Available here!

parbegin

???

x = b

Not available here

(killed by x = b)

c = x + y

b = x + y

parend

Available here!

d = x + y

slide68

Three Interleavings

a = x + y

a = x + y

a = x + y

Available

here!

x = b

c = x + y

x = b

Not available here

(killed by x = b)

b = x + y

c = x + y

x = b

Available

here!

c = x + y

b = x + y

b = x + y

d = x + y

d = x + y

d = x + y

available expressions example
Available Expressions Example

Where is x+y

available?

a = x + y

Available here!

parbegin

Available here!

Not available here

(killed by x = b)

x = b

Not available here

(killed by x = b)

c = x + y

b = x + y

parend

Available here!

d = x + y

key concept interference
Key Concept: Interference

a = x + y

  • x=b interferes with x+y
  • x+y not available at any statement that executes in parallel with x=b
  • Nice algorithm:
    • Precompute interference
    • Propagate information along sequential control-flow edges only!
    • Handle parallel joins specially

parbegin

x = b

c = x + y

b = x + y

parend

d = x + y

limitations
Limitations
  • No procedures
  • Bitvector problems only (no pointer analysis)
  • But can remove these limitations
    • Integrate interference into abstraction
      • Adjust rules to flow information from end of thread to start of parallel threads
      • Iteratively compute interactions
    • Summary-based approach for procedures
    • Lose precision for non-bitvector problems
pointer analysis for multithreaded programs
Pointer Analysis for Multithreaded Programs
  • Dataflow information is a triple <C, I, E> :
    • C = current points-to information
    • I = interference points-to edges from parallel threads
    • E = set of points-to edges created by current thread
  • Interference: Ik = U Ej
  • where t1 … tn are n parallel threads
  • Invariant: I  C
    • Within each thread, interference points-to edges are always added to the current information

k = j

analysis for example
Analysis for Example

p = &x;

parbegin

*p = 1;

p = &y;

parend

*p = 2;

slide74

Analysis for Example

p = &x;

parbegin

Where does p point to at this statement?

*p = 1;

p = &y;

parend

Where does p point to at this statement?

*p = 2;

analysis for example75
Analysis for Example

p = &x;

<

p

x

,  ,

p

x

>

parbegin

*p = 1;

p = &y;

parend

*p = 2;

analysis of parallel threads
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

<

p

x

,  ,

>

<

p

x

,  ,

>

*p = 1;

p = &y;

parend

*p = 2;

analysis of parallel threads77
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

<

p

x

,  ,

>

<

p

x

,  ,

>

*p = 1;

p = &y;

<

p

x

,  ,

>

parend

*p = 2;

analysis of parallel threads78
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

<

p

x

,  ,

>

<

p

x

,  ,

>

*p = 1;

p = &y;

<

p

x

,  ,

>

<

p

y

,  ,

p

y

>

parend

*p = 2;

analysis of parallel threads79
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

<

p

x

,  ,

>

<

p

x

,  ,

>

*p = 1;

p = &y;

<

p

x

,  ,

>

<

p

y

,  ,

p

y

>

parend

*p = 2;

analysis of parallel threads80
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

<

p

y

,  ,

p

y

>

parend

*p = 2;

analysis of parallel threads81
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

x

<

p

,

p

y

, 

>

<

p

y

,  ,

p

y

>

y

parend

*p = 2;

analysis of parallel threads82
Analysis of Parallel Threads

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

x

<

p

,

p

y

, 

>

<

p

y

,  ,

p

y

>

y

parend

*p = 2;

analysis of thread joins
Analysis of Thread Joins

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

x

<

p

,

p

y

, 

>

<

p

y

,  ,

p

y

>

y

parend

x

,  ,

p

<

p

y

>

y

*p = 2;

analysis of thread joins84
Analysis of Thread Joins

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

x

<

p

,

p

y

, 

>

<

p

y

,  ,

p

y

>

y

parend

x

,  ,

p

<

p

y

>

y

*p = 2;

final result
Final Result

p = &x;

<

p

x

,  ,

p

x

>

parbegin

x

<

p

,

p

y

, 

>

<

p

x

,  ,

>

y

*p = 1;

p = &y;

x

<

p

,

p

y

, 

>

<

p

y

,  ,

p

y

>

y

parend

x

,  ,

p

<

p

y

>

y

*p = 2;

general dataflow equations
General Dataflow Equations

Parent Thread

<

C

, I ,

E

>

parbegin

C U E2

C U E1

<

, I U E2 ,

>

<

, I U E1 ,

>

Thread 1

Thread 2

C1

C1

<

, I U E2 ,

E1

>

<

, I U E1 ,

E2

>

parend

<

C1 C2

, I ,

E U E1 U E2

>

U

Parent Thread

general dataflow equations87
General Dataflow Equations

Parent Thread

<

C

, I ,

E

>

parbegin

C U E2

C U E1

<

, I U E2 ,

>

<

, I U E1 ,

>

Thread 1

Thread 2

C1

C2

<

, I U E2 ,

E1

>

<

, I U E1 ,

E2

>

parend

<

C1 C2

, I ,

E U E1 U E2

>

U

Parent Thread

general dataflow equations88
General Dataflow Equations

Parent Thread

<

C

, I ,

E

>

parbegin

C U E2

C U E1

<

, I U E2 ,

>

<

, I U E1 ,

>

Thread 1

Thread 2

C1

C2

<

, I U E2 ,

E1

>

<

, I U E1 ,

E2

>

parend

<

C1 C2

, I ,

E U E1 U E2

>

U

Parent Thread

compositionality extension
Compositionality Extension
  • Compositional at thread level
    • Analyze each thread once in isolation
    • Abstraction captures potential interactions
  • Compute interactions whenever need information
  • Combine with escape analysis to obtain partial program analysis
experience expectations
Experience & Expectations
  • Limited implementation experience
    • Pointer analysis (Rugina, Rinard – PLDI 2000)
    • Compositional pointer and escape analysis (Salcianu, Rinard – PPoPP 2001)
    • Small but real programs
  • Promising approach
    • Scales like analyses for sequential programs
    • Partial program analyses
issues
Issues
  • Developing abstractions
    • Need interference abstraction
    • Need fork/join rules
    • Need interaction analysis
  • Analysis time
  • Precision for richer abstractions
state space exploration for multithreaded programs
State Space Exploration for Multithreaded Programs

/* a controls x, b controls y */

lock a, b;

int x, y;

Thread 1:

lock(a)

lock(b)

t = x

x = y

y = t

unlock(b)

unlock(a)

Thread 2:

lock(b)

lock(a)

s = y

y = x

x = s

unlock(a)

unlock(b)

state space exploration94
State Space Exploration

1: lock(a)

2: lock(b)

1: lock(b)

2: lock(b)

1: lock(a)

2: lock(b)

Deadlocked

States

strengths
Strengths
  • Conceptually simple (at least at first…)
  • Harmony with other areas of computer science

(simple search often beats more sophisticated approaches)

  • Can test for lots of properties and errors
  • Lots of technology and momentum in this area
    • Packaged model checkers
    • Big successes in hardware verification
challenges
Challenges
  • Analysis time
  • Unbounded program features
    • Dynamic thread creation
    • Dynamic object creation
  • Potential solutions
    • Sophisticated abstractions (increases complexity…)
      • Cousot, Cousot - 1984
      • Chow, Harrison – POPL 1992
      • Yahav – POPL 2001
    • Granularity coarsening/partial-order techniques
      • Chow, Harrison – ICCL 1994
      • Valmari – CAV 1990
      • Godefroid, Wolper – LICS 1991
granularity coarsening
Granularity Coarsening

x = 1

a = 3

Basic Idea:

Eliminate Analysis of

Interleavings from

Independent Statements

y = 2

b = 4

x = 1

x = 1

a = 3

b = 4

y = 2

y = 2

a = 3

a = 3

x = 1

b = 4

b = 4

y = 2

issue aliasing
Issue: Aliasing

Are these two statements independent?

x = 1

*p = 3

Depends…

  • Potential Solution: Layered analysis (Ball, Rajamani - PLDI 2001)
    • Potential Problem: Information from later analyses may be needed or useful in previous analyses

Program

Pointer

Analysis

Model

Extraction

Model

Checking

Properties

experience99
Experience
  • Program analysis style
    • Has been used for very detailed properties
    • Analysis time issues limit to tiny programs
  • Explicit model extraction/model checking style
    • Still exploring how to work for software in general, not just multithreaded programs
    • No special technology required for multithreaded programs (at first …)
expectations
Expectations

In principle, approach should be quite useful

  • Multithreaded programs typically have sparse interaction patterns
  • Just not obvious from code
  • Need some way to target tool to only those that can actually occur/are interesting
  • Pointer preanalysis seems like promising approach
application to safety problems
Application to safety problems
  • Deadlock detection
    • Variety of existing approaches
    • Complex programs can have very simple synchronization behavior
    • Ripe for model extraction/model checking
  • Data race detection
    • More complicated problem
    • Largely unsolved
    • Very important in practice
why data races are so important
Why data races are so important
  • Inadvertent atomicity violations
    • Timing-dependent data structure corruption
    • Nondeterministic, irreproducible failures
  • Architecture effects
    • Data races expose weak memory consistency models
    • Destroy abstraction of single shared memory
  • Compiler optimization effects
    • Data races expose effect of standard optimizations
    • Compiler can change meaning of program
  • Analysis complications
atomicity violations
Atomicity Violations

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

length

1

head

4

slide104

Atomicity Violations

length

1

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

insert(5)

||

insert(6)

4

slide105

Atomicity Violations

length

1

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

slide106

Atomicity Violations

length

1

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

slide107

Atomicity Violations

length

1

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

slide108

Atomicity Violations

length

2

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

slide109

Atomicity Violations

length

3

class list {

static int length=0;

static list head = null;

list next; int value;

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

slide110

Atomicity Violation Solution

length

2

class list {

static int length=0;

static list head = null;

list next; int value;

Synchronized

static void insert(int i) {

list n = new list(i);

n.next = head;

head = n;

length++;

}

}

head

5

insert(5)

6

||

insert(6)

4

analysis complications
Analysis Complications

Analysis unsound if does not take effect of data races into account

  • Desirable to analyze program at granularity of atomic operations
    • Reduces state space
    • Required to extract interesting properties
  • But must verify that operations are atomic!
  • Complicated analysis problem
    • Extract locking protocol
    • Verify that program obeys protocol
architecture effects

Architecture Effects

Weak Memory Consistency Models

slide113

Initially:

y=1

x=0

Thread 1

Thread 2

y=0

z = x+y

x=1

What is value of z?

slide114

Initially:

y=1

x=0

Three Interleavings

z = x+y

y=0

Thread 1

Thread 2

z = x+y

y=0

y=0

x=1

x=1

z = x+y

z = 0

z = 1

x=1

y=0

What is value of z?

x=1

z = x+y

z = 1

slide115

Initially:

y=1

x=0

Three Interleavings

z = x+y

y=0

Thread 1

Thread 2

z = x+y

y=0

y=0

x=1

x=1

z = x+y

z = 0

z = 1

x=1

y=0

What is value of z?

x=1

z can be 0 or 1

z = x+y

z = 1

slide116

Initially:

y=1

x=0

Three Interleavings

z = x+y

y=0

Thread 1

Thread 2

INCORRECT

REASONING!

z = x+y

y=0

y=0

x=1

x=1

z = x+y

z = 0

z = 1

x=1

y=0

What is value of z?

x=1

z can be 0 or 1

z = x+y

z = 1

slide117

Initially:

y=1

x=0

Memory system can reorder

writes as long as it preserves

illusion of sequential execution

within each thread!

Thread 1

Thread 2

y=0

y=0

z = x+y

z = x+y

x=1

x=1

What is value of z?

Different threads can observe

different orders!

z can be 0 or 1 OR 2!

slide118

Analysis Complications

  • Interleaving semantics is incorrect
    • No soundness guarantee for current analyses
    • Formal semantics of weak memory consistency models still under development
      • Maessen, Arvind, Shen – OOPSLA 2000
      • Manson, Pugh – Java Grande/ISCOPE 2001
    • Unclear how to prove ANY analysis sound…
  • State space is larger than one might think
    • Complicates state space exploration
    • Complicates human reasoning
how does one write a correct program
How does one write a correct program?

Initially:

y=1

x=0

Operations not reordered

across synchronizations

Thread 1

Thread 2

If synchronization separates

conflicting actions from

parallel threads

lock(l)

lock(l)

y=0

Then reorderings not visible

z = x+y

x=1

Race-free programs can use

interleaving semantics

unlock(l)

unlock(l)

What is value of z?

z is 1

compiler optimization effects
Compiler Optimization Effects
  • Standard optimizations assume single thread
  • With interleaving semantics, optimizations may change meaning of program
  • Even if only apply optimizations within serial parts of program!
    • Superset of reordering effects
    • Midkiff, Padua – ICPP 1990
options
Options
  • Rethink and reimplement all compilers
    • Lee, Padua, Midkiff – PPoPP 1999
  • Transform program to restore sequential memory consistency model
    • Shasha, Snir – TOPLAS 1998
    • Lee, Padua – PACT 2000
  • No optimizations across synchronizations
    • Java memory model (Pugh - JavaGrande 1999)
    • Semantics no longer interleaving semantics
program analysis
Program Analysis

Analyze program, verify absence of data races

  • Appealing option
  • Unlikely to be feasible for full range of programs
    • Reconstruct association between locks, data that they protect, threads that access data
      • Dynamic object and thread creation
      • References and pointers
      • Diversity of locking protocols
    • Whole-program analysis
  • Exception:

simple activity management programs

eliminate races at language level
Eliminate races at language level
  • Type system formalizes sharing patterns
  • Check accesses properly synchronized
  • Not as difficult as fully automatic approach
    • Separate analysis of each module
    • No need to reconstruct locking protocol
    • Types provide locking information
  • Limits sharing patterns program can use
  • Key question: Is limitation worth benefit?
    • Depends on expressiveness, flexibility, intrusiveness, perceived value of system
standard sharing patterns for activity management programs
Standard Sharing Patterns for Activity Management Programs

Private data - single thread ownership

Mutual exclusion data

lock protects data, acquire lock to get ownership

Migrating data

Ownership moves between threads in response to data structure insertions and removals

Published data - distributed for read-only access

general principle of ownership
General Principle of Ownership
  • Formalize as ownership relation
  • Relation between data items and threads
  • Basic requirement for reads
    • When a thread reads a data item
    • Must own item (but can share ownership with other threads)
  • Basic requirement for writes
    • When a thread writes data item
    • Must be sole owner of item
typical actions to change ownership
Typical Actions to Change Ownership

Object creation (creator owns new object)

Synchronization operations

Lock acquire (acquire data that lock protects)

Lock release (release data)

Similarly for post/wait, Ada accept, …

Thread creation (thread inherits data from parent)

Thread termination (parent gets data back)

Unique reference acquisition and release (acquire or release referenced data)

proposed systems
Proposed Systems
  • Monitors + copy in/copy out
    • Concurrent Pascal (Brinch Hansen TSE 1975)
    • Guava (Bacon, Strom, Tarafdar – OOPSLA 2000)
  • Mutual exclusion data + private data
    • Flanagan, Abadi – ESOP 2000
    • Flanagan, Freund – PLDI 2000
  • Mutual exclusion data + private data + linear/ownership types
    • de Line, Fahndrich – PLDI 2001
    • Boyapati, Rinard – OOPSLA 2001
slide128

Basic Approach

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Copy model of communication

slide129

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Type system ensures at most one reference to this object

slide130

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Step One:

Grab Lock

slide131

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Step One:

Grab Lock

slide132

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Step Two:

Transfer Reference

slide133

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Step Three:

Release Lock

slide134

Extension: Unique References

  • Thread + Private Data
  • Private data identified as such in type system
  • Type system ensures reachable only from
    • Local variables
    • Other private data
  • Lock + Shared Data
  • Type system identifies correspondence
  • Type system ensures
    • Threads hold lock when access data
    • Data accessible only from other data protected by same lock

Result:

Transferred Object

Ownership Relation Changes Over Time

prospects
Prospects
  • Remaining challenge: general data structures
    • Objects with multiple references
    • Ownership changes correlated with movements between to data structures
    • Recognize insertions and deletions
  • Language-level solutions are the way to go for activity management programs
    • Tractable for typical sharing patterns
    • Big impact in practice
benefits of ownership formalization
Benefits of ownership formalization
  • Identification of atomic regions
    • Weak memory invisible to programmer
    • Enables coarse-grain program analysis
  • Promote lots of new and interesting analyses
    • Component interaction analyses
    • Object propagation analyses
  • Better understanding of software structure
    • Analysis and transformation
    • Software engineering
parallel computing sharing patterns
Parallel Computing Sharing Patterns

Specialized Sharing Patterns

  • Unsynchronized accesses to disjoint regions of a single aggregate structure
    • Threads update disjoint regions of array
    • Threads update disjoint subtrees
  • Generalized reductions
    • Commuting updates
    • Reduction trees
parallel computing prospects
Parallel Computing Prospects
  • No language-level solution likely to be feasible
    • Race freedom depends on arbitrarily complicated properties of updated data structures
  • Impact of data races not as large
    • Parallelism confined to specific algorithms
  • Range of targeted analysis algorithms
    • Parallel loops with dense matrices
    • Divide and conquer programs
    • Generalized reduction recognition
integrating specifications
Integrating Specifications
  • Past focus: discovering properties
  • Future focus: verifying properties
  • Understanding atomicity structure crucial
    • Assume race-free programs
    • Type system or previous analysis
  • Enable Owicki/Gries style verification
    • Assume property holds
    • Show that each atomic action preserves it
    • Consider only actions that affect property
failure containment
Failure Containment
  • Threads as unit of partial failure
  • Partial executions of failed atomic actions
    • Rollback mechanism
    • Optimization opportunity
  • New analyses and transformations
    • Failure propagation analysis
    • Failure response transformations
model checking
Model Checking
  • Avalanche of model checking research
    • Layered analyses for model extraction
    • Flow-insensitive pointer analysis
  • Initial focus on control problems
    • Deadlock detection
    • Operation sequencing constraints
    • Checking finite-state properties
steps towards practicality
Steps towards practicality
  • Java threads prompt experimentation
    • Threads as standard part of safe language
    • Available multithreaded benchmarks
    • Open Java implementation platforms
  • More implementations
    • Interprocedural analyses
    • Scalability emerges as key concern
    • Directs analyses to relevant problems
summary
Summary
  • Multithreaded programs common and important
  • Two kinds of multithreaded programs
    • Parallel computing programs
    • Activity management programs
  • Data races as key analysis problem
    • Programming errors
    • Complicate analysis and transformation
  • Different solutions for different programs
    • Language solution for activity management
    • Targeted analyses for parallel computing
  • Future directions – specifications, failure containment, model checking, practical implementations