program analysis and synthesis of parallel systems
Download
Skip this Video
Download Presentation
Program Analysis and Synthesis of Parallel Systems

Loading in 2 Seconds...

play fullscreen
1 / 95

Program Analysis and Synthesis of Parallel Systems - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

Program Analysis and Synthesis of Parallel Systems. Roman Manevich Ben-Gurion University. Three papers. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Program Analysis and Synthesis of Parallel Systems' - bevis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
program analysis and synthesis of parallel systems

Program Analysis and Synthesis of Parallel Systems

Roman Manevich Ben-Gurion University

three papers
Three papers

A Shape Analysis for Optimizing Parallel Graph Programs[POPL’11]

Elixir: a System for Synthesizing Concurrent Graph Programs[OOPSLA’12]

Parameterized Verification of Transactional Memories[PLDI’10]

what s the connection
What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs[OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories[PLDI’10]

what s the connection1
What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories [PLDI’10]

a shape analysis for optimizing parallel graph programs

A Shape Analysis for Optimizing Parallel Graph Programs

Roman Manevich2

Kathryn S. McKinley1

Dimitrios Prountzos1

Keshav Pingali1,2

1: Department of Computer Science, The University of Texas at Austin

2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

motivation
Motivation
  • Graph algorithms are ubiquitous
  • Goal: Compiler analysis for optimizationof parallel graph algorithms

Computer

Graphics

Computational

biology

Social Networks

boruvka s minimum spanning tree algorithm
Boruvka’sMinimum Spanning Tree Algorithm

7

1

6

c

d

e

f

lt

2

4

4

3

1

6

a

b

g

d

e

f

5

7

4

4

a,c

b

g

3

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
parallelism in boruvka
Parallelism in Boruvka

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
non conflicting iterations
Non-conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
non conflicting iterations1
Non-conflicting iterations

1

6

f,g

d

e

7

4

a,c

b

3

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
conflicting iterations
Conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
optimistic parallelization in galois
Optimistic parallelization in Galois

i2

i1

i3

  • Programming model
    • Client code has sequential semantics
    • Library of concurrent data structures
  • Parallel execution model
    • Thread-level speculation (TLS)
    • Activities executed speculatively
  • Conflict detection
    • Each node/edge has associated exclusive lock
    • Graph operations acquire locks on read/written nodes/edges
    • Lock owned by another thread  conflict  iteration rolled back
    • All locks released at the end
  • Two main overheads
    • Locking
    • Undo actions
generic optimization structure
Generic optimization structure

ProgramAnalyzer

AnnotatedProgram

Program

ProgramTransformer

OptimizedProgram

overheads i locking
Overheads(I): locking
  • Optimizations
    • Redundant locking elimination
    • Lock removal for iteration private data
    • Lock removal for lock domination
  • ACQ(P): set of definitely acquiredlocks per program point P
  • Given method call M at P:

Locks(M)  ACQ(P)  Redundant Locking

overheads ii undo actions
Overheads (II): undo actions

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

foreach (Node a : wl) {

}

Lockset Grows

Failsafe

Lockset Stable

Program point Pis failsafe if:

Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

lockset analysis
Lockset analysis

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

    • Redundant Locking
      • Locks(M)  ACQ(P)
    • Undo elimination
      • Q : Reaches(P,Q) Locks(Q)  ACQ(P)
  • Need to compute ACQ(P)

: Runtime

overhead

the optimization technically
The optimization technically
    • Each graph method m(arg1,…,argk, flag) contains optimization level flag
      • flag=LOCK – acquire locks
      • flag=UNDO – log undo (backup) data
      • flag=LOCK_UNO D – (default) acquire locks and log undo
      • flag=NONE – no extra work
  • Example:Edge e = g.getEdge(lt, n, NONE)
analysis challenges
Analysis challenges
  • The usual suspects:
    • Unbounded Memory  Undecidability
    • Aliasing, Destructive updates
  • Specific challenges:
    • Complex ADTs: unstructured graphs
    • Heap objects are locked
    • Adapt abstraction to ADTs
  • We use Abstract Interpretation [CC’77]
    • Balance precision and realistic performance
shape analysis overview
Shape analysis overview

Predicate Discovery

Graph {

@rep nodes

@rep edges

}

Set {

@rep cont

}

HashMap-Graph

Set Spec

Shape Analysis

Graph Spec

Tree-based Set

Concrete ADT

Implementations

in Galois library

ADTSpecifications

Boruvka.java

Optimized

Boruvka.java

adt specification
ADT specification

Abstract ADT state by virtual set fields

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

Set<Node> neighbors(Node n);

}

...

Set<Node> S1 = g.neighbors(n);

...

@locks(n + n.rev(src) + n.rev(src).dst +

n.rev(dst) + n.rev(dst).src)

@op(

nghbrs = n.rev(src).dst + n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Boruvka.java

Graph Spec

Assumption: Implementation satisfies Spec

modeling adts
Modeling ADTs

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

@op(

nghbrs= n.rev(src).dst+ n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Set<Node> neighbors(Node n);

}

c

src

dst

src

dst

dst

src

a

b

Graph Spec

modeling adts1
Modeling ADTs

Abstract State

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

@op(

nghbrs= n.rev(src).dst+ n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Set<Node> neighbors(Node n);

}

nodes

edges

c

src

dst

ret

cont

src

dst

dst

src

a

b

Graph Spec

nghbrs

abstraction scheme
Abstraction scheme

S1

S2

L(S2.cont)

L(S1.cont)

(S1 ≠S2)

∧ L(S1.cont)

∧ L(S2.cont)

cont

cont

  • Parameterized by set of LockPaths:

L(Path)o . o ∊ Path  Locked(o)

    • Tracks subset of must-be-locked objects
  • Abstract domain elements have the form: Aliasing-configs 2LockPaths …
joining abstract states
Joining abstract states

( L(y.nd) )

(  L(y.nd) L(x.rev(src)) )

( () L(x.nd) )

( L(y.nd) )

( () L(x.nd) )

( L(y.nd) )  ( () L(x.nd) )

Aliasing is crucial for precision

May-be-locked does not enable our optimizations

#Aliasing-configs: small constant (6)

example invariant in boruvka
Example invariant in Boruvka

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

  • The immediate neighbors
  • of a and lt are locked

lt

a

( a ≠ lt )

∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst))

∧ L(a.rev(src).dst)∧ L(a.rev(dst).src)

∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src))

∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst)

…..

heuristics for finding lockpaths
Heuristics for finding LockPaths

S

Set<Node>

cont

Node

nd

NodeData

  • Hierarchy Summarization (HS)
    • x.( fld )*
    • Type hierarchy graph acyclic  bounded number of paths
    • Preflow-Push:
      • L(S.cont) ∧ L(S.cont.nd)
      • Nodes in set S and their data are locked
footprint graph heuristic
Footprint graph heuristic
  • Footprint Graphs (FG)[Calcagno et al. SAS’07]
    • All acyclic paths from arguments of ADT method to locked objects
    • x.( fld | rev(fld) )*
    • Delaunay Mesh Refinement:

L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst))

∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src)

    • Nodes in set S and all of their immediate neighbors are locked
  • Composition of HS, FG
    • Preflow-Push: L(a.rev(src).ed)

HS

FG

experimental evaluation
Experimental evaluation
  • Implement on top of TVLA
    • Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02]
  • Evaluation on 4 Lonestar Java benchmarks
  • Inferred all available optimizations
  • # abstract states practically linear in program size
impact of optimizations for 8 threads
Impact of optimizations for 8 threads

8-core Intel Xeon @ 3.00 GHz

note 1
Note 1
  • How to map abstract domain presented so far to TVLA?
    • Example invariant: (x≠yL(y.nd))  (x=y L(x.nd))
    • Unary abstraction predicate x(v) for pointer x
    • Unary non-abstraction predicateL[x.p] for pointer x and path p
    • Use partial join
    • Resulting abstraction similar to the one shown
note 2
Note 2
  • How to come up with abstraction for similar problems?
    • Start by constructing a manual proof
      • Hoare Logic
    • Examine resulting invariants and generalize into a language of formulas
      • May need to be further specialized for a given program – interesting problem (machine learning/refinement)
    • How to get sound transformers?
note 3
Note 3

How did we avoid considering all interleavings?

Proved non-interference side theorem

elixir a system for synthesizing concurrent graph programs

Elixir : A System for Synthesizing Concurrent Graph Programs

Dimitrios Prountzos1

Roman Manevich2

Keshav Pingali1

1. The University of Texas at Austin

2. Ben-Gurion University of the Negev

slide36
Goal
  • Graph algorithms are ubiquitous

Social network analysis, Computer graphics, Machine learning, …

  • Difficult to parallelize due to their irregular nature
  • Best algorithm and implementation usually
    • Platform dependent
    • Input dependent
  • Need to easily experiment with different solutions
  • Focus: Fixed graph structure
    • Only change labels on nodes and edges
    • Each activity touches a fixed number of nodes

Allow programmer to easilyimplement

correct and efficient

parallel graph algorithms

example single source shortest path
Example: Single-Source Shortest-Path

S

5

2

A

B

A

2

1

7

C

C

3

4

3

12

D

E

2

2

F

9

1

G

if dist(A) + WAC < dist(C)

dist(C) = dist(A) + WAC

  • Problem Formulation
    • Compute shortest distancefrom source node Sto every other node
  • Many algorithms
    • Bellman-Ford (1957)
    • Dijkstra (1959)
    • Chaotic relaxation (Miranker 1969)
    • Delta-stepping (Meyer et al. 1998)
  • Common structure
    • Each node has label distwith knownshortest distance from S
  • Key operation
    • relax-edge(u,v)
dijkstra s algorithm
Dijkstra’s algorithm

<B,5>

<C,3>

<B,5>

<E,6>

<B,5>

<D,7>

S

5

2

A

B

5

3

1

7

C

3

4

D

E

7

2

2

6

F

9

1

G

Scheduling of relaxations:

  • Use priority queueof nodes, ordered by label dist
  • Iterate over nodes u in priority order
  • On each step: relax all neighbors v of u
    • Apply relax-edgeto all (u,v)
chaotic relaxation
Chaotic relaxation

S

5

2

  • Scheduling of relaxations:
  • Use unordered set of edges
  • Iterate over edges (u,v) in any order
  • On each step:
    • Apply relax-edge to edge (u,v)

A

B

5

1

7

C

3

4

12

D

E

2

2

F

9

1

G

(C,D)

(B,C)

(S,A)

(C,E)

insights behind elixir
Insights behind Elixir

Parallel Graph Algorithm

What should be done

How it should be done

Operators

Schedule

Unordered/Ordered algorithms

Order activity processing

Identify

new activities

Operator Delta

“TAO of parallelism”

PLDI 2011

: activity

Static

Schedule

Dynamic Schedule

insights behind elixir1
Insights behind Elixir

Parallel Graph Algorithm

q = new PrQueue

q.enqueue(SRC)

while (! q.empty ) {

a = q.dequeue

for each e = (a,b,w) {

if dist(a) + w < dist(b) {

dist(b) = dist(a) + w

q.enqueue(b)

}

}

}

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

Dijkstra-style Algorithm

contributions
Contributions

Parallel Graph Algorithm

  • Language
    • Operators/Schedule separation
    • Allows exploration of implementation space
  • Operator Delta Inference
    • Precise Delta required for efficient fixpoint computations
  • Automatic Parallelization
    • Inserts synchronization to atomically execute operators
    • Avoids data-races / deadlocks
    • Specializes parallelization based on scheduling constraints

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

Synchronization

sssp in elixir
SSSP in Elixir

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

Graph type

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

Operator

Fixpoint

Statement

sssp = iterate relax ≫ schedule

operators
Operators

Cautiousby construction – easy to generalize

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

Redex pattern

Guard

Update

sssp = iterate relax ≫ schedule

ad

bd

ad

ad+w

w

w

a

b

a

b

if bd > ad + w

fixpoint statement
Fixpoint statement

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax ≫ schedule

Scheduling expression

Apply operator until fixpoint

scheduling examples
Scheduling examples

q = new PrQueue

q.enqueue(SRC)

while (! q.empty ) {

a = q.dequeue

for each e = (a,b,w) {

if dist(a) + w < dist(b) {

dist(b) = dist(a) + w

q.enqueue(b)

}

}

}

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax ≫ schedule

Locality enhanced Label-correcting

group b ≫unroll 2 ≫approx metric ad

Dijkstra-style

metric ad ≫group b

operator delta inference
Operator Delta Inference

Parallel Graph Algorithm

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

delta inference example
Delta Inference Example

c

relax2

w2

a

b

w1

SMT Solver

relax1

assume(da + w1< db)

assume¬(dc + w2 < db)

db_post =da + w1

assert¬(dc + w2 < db_post)

SMT Solver

(c,b) does not

become active

Query Program

delta inference example active
Delta inference example – active

Apply relax on all outgoing edges (b,c) such that:

dc > db +w2

and c ≄ a

relax1

relax2

a

b

c

w1

w2

SMT Solver

assume(da + w1< db)

assume¬(db+ w2 < dc)

db_post =da + w1

assert¬(db_post+ w2< dc)

SMT Solver

Query Program

influence patterns
Influence patterns

d

c

a

b=c

a=d

b

a=c

b

a=c

b=d

a

b=d

d

a=d

b=c

c

system architecture
System architecture

C++Program

Elixir

Synthesize code

Insert synchronization

Galois/OpenMP Parallel Runtime

Parallel Thread-Pool

Graph Implementations

Worklist Implementations

Algorithm Spec

experiments
Experiments

Compare against hand-written parallel implementations

sssp results
SSSP results

Group + Unroll improve locality

Implementation Variant

24 core Intel Xeon @ 2 GHz

USA Florida Road Network (1 M nodes, 2.7 M Edges)

breadth first search results
Breadth-First search results

Scale-Free Graph

1 M nodes,

8 M edges

USA road network

24 M nodes,

58 M edges

conclusion
Conclusion
  • Graph algorithm = Operators + Schedule
    • Elixir language : imperative operators + declarative schedule
  • Allows exploring implementation space
  • Automated reasoning for efficiently computing fixpoints
  • Correct-by-construction parallelization
  • Performance competitive with hand-parallelized code
motivation1
Motivation
  • Transactional memories [Herlihy ‘93]
    • Programmer writes code w. coarse-grained atomic blocks
    • Transaction manager takes care of conflicts providing illusion of sequential execution
  • Strict serializability – correctness criterion
    • Formalizes “illusion of sequential execution”
  • Parameterized verification
    • Formal proof for given implementation
    • For every number of threads
    • For every number of memory objects
    • For every number and length of transactions
stm terminology
STM terminology
  • Statements: reads, writes, commit, abort
  • Transaction: reads and writes of variables followed by commit (committing transaction)or abort (aborting transaction)
  • Word: interleaved sequence of transactions of different threads
  • Conflict: two statements conflict if
    • One is a read of variable X and other is a commit of a transaction that writes to X
    • Both are commits of transactions that write to X
safety property strict serializability
Safety property: strict serializability

There is a serialization for the committing threads such that order of conflicts is preserved

Order of non-overlapping transactions remains the same

safety property strict serializability1
Safety property: strict serializability

conflict

conflict

There is a serialization for the committing threads such that order of conflicts is preserved

Order of non-overlapping transactions remains the same

Example word:

(rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)

=>

Can be serialized to :

(rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)

main results
Main results

Challenging – requires reasoning on both universal and existential properties

  • First automatic verification ofstrict serializability for transactional memories
    • TPL, DSTM, TL2
  • New proof technique:
    • Template-based invisible invariant generation
    • Abstract checking algorithm to check inductive invariants
outline
Outline

Strict serializability verification approach

Automating the proof

Experiments

Conclusion

Related work

proof roadmap 1
Proof roadmap 1

Goal: prove model M is strictly serializable

  • Given a strict-serializability reference model RSS reduce checking strict-serializabilityto checking that M refines RSS
  • Reduce refinement to checking safety
    • Safety property SIM: whenever M can execute a statement so can RSS
    • Check SIM on product system M  RSS
proof roadmap 2
Proof roadmap 2
  • Model STMs M and RSS in first-order logic
    • TM models use set data structures and typestate bits
  • Check safety by generating strong enough candidate inductive invariant and checking inductiveness
  • Use observations on structure of transactional memories
  • Use sound first-order reasoning
reference strict serializability model
Reference strict serializability model
  • Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]
  • RSS :Most liberal specification of strictly serializable system
    • Allows largest language of strictly-serializable executions
  • M is strictly serializable iff every word of M is also a word of RSS
    • Language(M)  Language(RSS)
    • M refines RSS
modeling transactional memories
Modeling transactional memories
  • Mn,k=(predicates,actions)
    • Predicate : ranked relation symbolp(t), q(t,v),…
    • Binary predicates used for sets so instead of rs(t,v)I’ll write vrs(t)
    • Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…
  • Universe = set of k thread individuals and n memory individuals
  • State S = a valuation to the predicates
reference model rss predicates
Reference model (RSS) predicates
  • Typestates:
    • RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t)
  • Read/write sets
    • RSS.rs(t,v), RSS.ws(t,v)
  • Prohibited read/write sets
    • RSS.prs(t,v), RSS.pws(t,v)
  • Weak-predecessor
    • RSS.wp(t1,t2)
dstm predicates
DSTM predicates
  • Typestates:
    • DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t)
  • Read/own sets
    • DSTM.rs(t,v), DSTM.os(t,v)
rss commit t action
RSS commit(t) action

executing thread

action precondition

  • ifRSS.invalid(t)  RSS.wp(t,t)thent1,t2 . RSS.wp’(t1,t2)  t1t  t2t  (RSS.wp(t1,t2)  RSS.wp(t,t2)  (RSS.wp(t1,t) v . vRSS.ws(t1)  vRSS.ws(t)))…

post-statepredicate

current-statepredicate

write-write conflict

dstm commit t action
DSTM commit(t) action

ifDSTM.validated(t) thent1 . DSTM.validated’(t1)  t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…

read-own conflict

fots states and execution
FOTS states and execution

state S1

DSTM

t1

rd v t1

t2

v

threadindividual

memory locationindividual

fots states and execution1
FOTS states and execution

predicate evaluationDSTM.started(t1)=1

state S2

DSTM.started

DSTM

t1

rd v t1

DSTM.rs

t2

v

predicate evaluationDSTM.rs(t1,v)=1

fots states and execution2
FOTS states and execution

state S3

DSTM.started

DSTM

t1

DSTM.started

DSTM.rs

t2

DSTM.ws

wr v t2

v

product system
Product system

The product of two systems: AB

Predicates = A.predicates B.predicates

Actions =commit(t) = { if (A.pre  B.pre) then … }rd(t,v) = { if (A.pre  B.pre) then … }…

M refines RSS iff on every executionSIM holds: action a M.pre(a)  RSS.pre(a)

checking dstm refines rss
Checking DSTM refines RSS

how do we check this safety property?

DSTMRSS SIM

DSTM refines RSS

The only precondition in RSS is for commit(t)

We need to check SIM =t.DSTM.validated(t)  RSS.invalid(t)  RSS.wp(t,t)holds for DSTM  RSS for all reachable states

Proof rule:

checking safety by invisible invariants
Checking safety by invisible invariants

I1: Initial   I2:   transition  ’ I3:   

M  

  • How do we prove that property  holdsfor all reachable statesof system M?
  • Pnueli, Ruah, Zuck[TACAS’01]
  • Come up with inductive invariant that contains reachable states of M and strengthens SIM:
strict serializability proof rule
Strict serializability proof rule

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

two challenges
Two challenges

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

Buthow do we find a candidate ?infinite space of possibilities

given candidate  how do we check the proof rule? checking AB is undecidable for first-order logic

our solution
Our solution

I1: Initial   I2:   transition  ’ I3:   SIM

utilize insightson transactional memory implementations

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

Buthow do we find a candidate ?use templates and iterative weakening

given candidate  how do we check the proof rule? use abstract checking

invariant for dstm rss
Invariant for DSTMRSS

P1: t, t1 . RSS.wp(t,t1) RSS.invalid(t)  RSS.pending(t)  v . vRSS.ws(t1) vRSS.ws(t)

P2: t, v . vRSS.rs(t)  DSTM.aborted(t)  vDSTM.rs(t)

P3: t, v . vRSS.ws(t)  DSTM.aborted(t)  vDSTM.os(t)

P4: t . DSTM.validated(t)  RSS.wp(t,t)

P5: t . DSTM.validated(t)  RSS.invalid(t)

P6: t . DSTM.validated(t)  RSS.pending(t)

Inductive invariant involving only RSS – can use for all future proofs

templates for dstm rss
Templates for DSTMRSS

P1: t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

P2: t, v . v1(t)  2(t)  v3(t)

P3: t, v . V1(t)  2(t)  v3(t)

P4: t . 1(t)  2(t,t)

P5: t . 1(t)  2(t)

P6: t . 1(t)  2(t)

templates for dstm rss1
Templates for DSTMRSS

t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

t, v . v1(t)  2(t)  v3(t)

t . 1(t)  2(t,t)

  • Why templates?
  • Makes invariant separable
  • Controls complexity of invariants
  • Adding templates enables refinement
mining candidate invariants
Mining candidate invariants
  • Use predefined set of templates to specify structure of candidate invariants
    • t,v . 1 2 3
    • 1, 2, 3 are predicates of M or their negations
    • Existential formulas capturing 1-level conflictsv . v4(t1)  v5(t2)
  • Mine candidate invariants from concrete execution
iterative invariant weakening
Iterative invariant weakening

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

  • Initial candidate invariant C0=P1  P2  …  Pk
  • Try to prove I2:   transition  ’C1 = { Pi | I0  transition  Pi for PiI0}
  • If C1=C0 then we have an inductive invariant
  • Otherwise, computeC2 = { Pi | C1 transition  Pi for PiC1}
  • Repeat until either
    • found inductive invariant – check I3:Ck  SIM
    • Reached top {} – trivial inductive invariant
abstract proof rule
Abstract proof rule

I1:(Initial)   I2:abs_transition(())  ’ I3: ()  SIM

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

DSTM  RSS SIM

Abstracttransformer

Approximateentailment

Formulaabstraction

conclusion1
Conclusion
  • Novel invariant generation using templates – extends applicability of invisible invariants
  • Abstract domain and reasoning to check invariants without state explosion
  • Proved strict-serializability for TPL, DSTM, TL2
    • BLAST and TVLA failed
insights on transactional memories
Insights on transactional memories

Transition relation is symmetric – thread identifiers not usedp’(t,v)  … t1 … t2

Executing thread t interacts only with arbitrary thread or conflict-adjacent thread

Arbitrary thread:v . vTL2.rs(t1)  vTL2.ws(t1)

Conflict adjacent:v . vDSTM.rs(t1)  vDSTM.ws(t)

conflict adjacency
Conflict adjacency

writeset

v2

t3

writeset

v1

t2

write-writeconflict

readset

t

read-writeconflict

v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)

related work
Related work
  • Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]
  • Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]
  • Predicate Abstraction + Shape Analysis
    • SLAM, BLAST, TVLA
related work1
Related work

Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]

Templates – for arithmetic constraints

Indexed predicate abstractionShuvendu et al. [TOCL’07]

Thread quantificationBerdine et al. [CAV’08]Segalov et al. [APLAS’09]

ad