program analysis and synthesis of parallel systems n.
Download
Skip this Video
Download Presentation
Program Analysis and Synthesis of Parallel Systems

Loading in 2 Seconds...

play fullscreen
1 / 95

Program Analysis and Synthesis of Parallel Systems - PowerPoint PPT Presentation


  • 174 Views
  • Uploaded on

Program Analysis and Synthesis of Parallel Systems. Roman Manevich Ben-Gurion University. Three papers. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Program Analysis and Synthesis of Parallel Systems' - bevis


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
program analysis and synthesis of parallel systems

Program Analysis and Synthesis of Parallel Systems

Roman Manevich Ben-Gurion University

three papers
Three papers

A Shape Analysis for Optimizing Parallel Graph Programs[POPL’11]

Elixir: a System for Synthesizing Concurrent Graph Programs[OOPSLA’12]

Parameterized Verification of Transactional Memories[PLDI’10]

what s the connection
What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs[OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories[PLDI’10]

what s the connection1
What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories [PLDI’10]

a shape analysis for optimizing parallel graph programs

A Shape Analysis for Optimizing Parallel Graph Programs

Roman Manevich2

Kathryn S. McKinley1

Dimitrios Prountzos1

Keshav Pingali1,2

1: Department of Computer Science, The University of Texas at Austin

2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

motivation
Motivation
  • Graph algorithms are ubiquitous
  • Goal: Compiler analysis for optimizationof parallel graph algorithms

Computer

Graphics

Computational

biology

Social Networks

boruvka s minimum spanning tree algorithm
Boruvka’sMinimum Spanning Tree Algorithm

7

1

6

c

d

e

f

lt

2

4

4

3

1

6

a

b

g

d

e

f

5

7

4

4

a,c

b

g

3

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
parallelism in boruvka
Parallelism in Boruvka

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
non conflicting iterations
Non-conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
non conflicting iterations1
Non-conflicting iterations

1

6

f,g

d

e

7

4

a,c

b

3

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
conflicting iterations
Conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up
    • repeat {
    • pick arbitrary node ‘a’
    • merge with lightest neighbor ‘lt’
    • add edge ‘a-lt’ to MST
    • } until graph is a single node
optimistic parallelization in galois
Optimistic parallelization in Galois

i2

i1

i3

  • Programming model
    • Client code has sequential semantics
    • Library of concurrent data structures
  • Parallel execution model
    • Thread-level speculation (TLS)
    • Activities executed speculatively
  • Conflict detection
    • Each node/edge has associated exclusive lock
    • Graph operations acquire locks on read/written nodes/edges
    • Lock owned by another thread  conflict  iteration rolled back
    • All locks released at the end
  • Two main overheads
    • Locking
    • Undo actions
generic optimization structure
Generic optimization structure

ProgramAnalyzer

AnnotatedProgram

Program

ProgramTransformer

OptimizedProgram

overheads i locking
Overheads(I): locking
  • Optimizations
    • Redundant locking elimination
    • Lock removal for iteration private data
    • Lock removal for lock domination
  • ACQ(P): set of definitely acquiredlocks per program point P
  • Given method call M at P:

Locks(M)  ACQ(P)  Redundant Locking

overheads ii undo actions
Overheads (II): undo actions

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

foreach (Node a : wl) {

}

Lockset Grows

Failsafe

Lockset Stable

Program point Pis failsafe if:

Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

lockset analysis
Lockset analysis

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

    • Redundant Locking
      • Locks(M)  ACQ(P)
    • Undo elimination
      • Q : Reaches(P,Q) Locks(Q)  ACQ(P)
  • Need to compute ACQ(P)

: Runtime

overhead

the optimization technically
The optimization technically
    • Each graph method m(arg1,…,argk, flag) contains optimization level flag
      • flag=LOCK – acquire locks
      • flag=UNDO – log undo (backup) data
      • flag=LOCK_UNO D – (default) acquire locks and log undo
      • flag=NONE – no extra work
  • Example:Edge e = g.getEdge(lt, n, NONE)
analysis challenges
Analysis challenges
  • The usual suspects:
    • Unbounded Memory  Undecidability
    • Aliasing, Destructive updates
  • Specific challenges:
    • Complex ADTs: unstructured graphs
    • Heap objects are locked
    • Adapt abstraction to ADTs
  • We use Abstract Interpretation [CC’77]
    • Balance precision and realistic performance
shape analysis overview
Shape analysis overview

Predicate Discovery

Graph {

@rep nodes

@rep edges

}

Set {

@rep cont

}

HashMap-Graph

Set Spec

Shape Analysis

Graph Spec

Tree-based Set

Concrete ADT

Implementations

in Galois library

ADTSpecifications

Boruvka.java

Optimized

Boruvka.java

adt specification
ADT specification

Abstract ADT state by virtual set fields

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

Set<Node> neighbors(Node n);

}

...

Set<Node> S1 = g.neighbors(n);

...

@locks(n + n.rev(src) + n.rev(src).dst +

n.rev(dst) + n.rev(dst).src)

@op(

nghbrs = n.rev(src).dst + n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Boruvka.java

Graph Spec

Assumption: Implementation satisfies Spec

modeling adts
Modeling ADTs

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

@op(

nghbrs= n.rev(src).dst+ n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Set<Node> neighbors(Node n);

}

c

src

dst

src

dst

dst

src

a

b

Graph Spec

modeling adts1
Modeling ADTs

Abstract State

Graph<ND,ED> {

@rep set<Node> nodes

@rep set<Edge> edges

@locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

@op(

nghbrs= n.rev(src).dst+ n.rev(dst).src ,

ret = new Set<Node<ND>>(cont=nghbrs)

)

Set<Node> neighbors(Node n);

}

nodes

edges

c

src

dst

ret

cont

src

dst

dst

src

a

b

Graph Spec

nghbrs

abstraction scheme
Abstraction scheme

S1

S2

L(S2.cont)

L(S1.cont)

(S1 ≠S2)

∧ L(S1.cont)

∧ L(S2.cont)

cont

cont

  • Parameterized by set of LockPaths:

L(Path)o . o ∊ Path  Locked(o)

    • Tracks subset of must-be-locked objects
  • Abstract domain elements have the form: Aliasing-configs 2LockPaths …
joining abstract states
Joining abstract states

( L(y.nd) )

(  L(y.nd) L(x.rev(src)) )

( () L(x.nd) )

( L(y.nd) )

( () L(x.nd) )

( L(y.nd) )  ( () L(x.nd) )

Aliasing is crucial for precision

May-be-locked does not enable our optimizations

#Aliasing-configs: small constant (6)

example invariant in boruvka
Example invariant in Boruvka

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

  • The immediate neighbors
  • of a and lt are locked

lt

a

( a ≠ lt )

∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst))

∧ L(a.rev(src).dst)∧ L(a.rev(dst).src)

∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src))

∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst)

…..

heuristics for finding lockpaths
Heuristics for finding LockPaths

S

Set<Node>

cont

Node

nd

NodeData

  • Hierarchy Summarization (HS)
    • x.( fld )*
    • Type hierarchy graph acyclic  bounded number of paths
    • Preflow-Push:
      • L(S.cont) ∧ L(S.cont.nd)
      • Nodes in set S and their data are locked
footprint graph heuristic
Footprint graph heuristic
  • Footprint Graphs (FG)[Calcagno et al. SAS’07]
    • All acyclic paths from arguments of ADT method to locked objects
    • x.( fld | rev(fld) )*
    • Delaunay Mesh Refinement:

L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst))

∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src)

    • Nodes in set S and all of their immediate neighbors are locked
  • Composition of HS, FG
    • Preflow-Push: L(a.rev(src).ed)

HS

FG

experimental evaluation
Experimental evaluation
  • Implement on top of TVLA
    • Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02]
  • Evaluation on 4 Lonestar Java benchmarks
  • Inferred all available optimizations
  • # abstract states practically linear in program size
impact of optimizations for 8 threads
Impact of optimizations for 8 threads

8-core Intel Xeon @ 3.00 GHz

note 1
Note 1
  • How to map abstract domain presented so far to TVLA?
    • Example invariant: (x≠yL(y.nd))  (x=y L(x.nd))
    • Unary abstraction predicate x(v) for pointer x
    • Unary non-abstraction predicateL[x.p] for pointer x and path p
    • Use partial join
    • Resulting abstraction similar to the one shown
note 2
Note 2
  • How to come up with abstraction for similar problems?
    • Start by constructing a manual proof
      • Hoare Logic
    • Examine resulting invariants and generalize into a language of formulas
      • May need to be further specialized for a given program – interesting problem (machine learning/refinement)
    • How to get sound transformers?
note 3
Note 3

How did we avoid considering all interleavings?

Proved non-interference side theorem

elixir a system for synthesizing concurrent graph programs

Elixir : A System for Synthesizing Concurrent Graph Programs

Dimitrios Prountzos1

Roman Manevich2

Keshav Pingali1

1. The University of Texas at Austin

2. Ben-Gurion University of the Negev

slide36
Goal
  • Graph algorithms are ubiquitous

Social network analysis, Computer graphics, Machine learning, …

  • Difficult to parallelize due to their irregular nature
  • Best algorithm and implementation usually
    • Platform dependent
    • Input dependent
  • Need to easily experiment with different solutions
  • Focus: Fixed graph structure
    • Only change labels on nodes and edges
    • Each activity touches a fixed number of nodes

Allow programmer to easilyimplement

correct and efficient

parallel graph algorithms

example single source shortest path
Example: Single-Source Shortest-Path

S

5

2

A

B

A

2

1

7

C

C

3

4

3

12

D

E

2

2

F

9

1

G

if dist(A) + WAC < dist(C)

dist(C) = dist(A) + WAC

  • Problem Formulation
    • Compute shortest distancefrom source node Sto every other node
  • Many algorithms
    • Bellman-Ford (1957)
    • Dijkstra (1959)
    • Chaotic relaxation (Miranker 1969)
    • Delta-stepping (Meyer et al. 1998)
  • Common structure
    • Each node has label distwith knownshortest distance from S
  • Key operation
    • relax-edge(u,v)
dijkstra s algorithm
Dijkstra’s algorithm

<B,5>

<C,3>

<B,5>

<E,6>

<B,5>

<D,7>

S

5

2

A

B

5

3

1

7

C

3

4

D

E

7

2

2

6

F

9

1

G

Scheduling of relaxations:

  • Use priority queueof nodes, ordered by label dist
  • Iterate over nodes u in priority order
  • On each step: relax all neighbors v of u
    • Apply relax-edgeto all (u,v)
chaotic relaxation
Chaotic relaxation

S

5

2

  • Scheduling of relaxations:
  • Use unordered set of edges
  • Iterate over edges (u,v) in any order
  • On each step:
    • Apply relax-edge to edge (u,v)

A

B

5

1

7

C

3

4

12

D

E

2

2

F

9

1

G

(C,D)

(B,C)

(S,A)

(C,E)

insights behind elixir
Insights behind Elixir

Parallel Graph Algorithm

What should be done

How it should be done

Operators

Schedule

Unordered/Ordered algorithms

Order activity processing

Identify

new activities

Operator Delta

“TAO of parallelism”

PLDI 2011

: activity

Static

Schedule

Dynamic Schedule

insights behind elixir1
Insights behind Elixir

Parallel Graph Algorithm

q = new PrQueue

q.enqueue(SRC)

while (! q.empty ) {

a = q.dequeue

for each e = (a,b,w) {

if dist(a) + w < dist(b) {

dist(b) = dist(a) + w

q.enqueue(b)

}

}

}

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

Dijkstra-style Algorithm

contributions
Contributions

Parallel Graph Algorithm

  • Language
    • Operators/Schedule separation
    • Allows exploration of implementation space
  • Operator Delta Inference
    • Precise Delta required for efficient fixpoint computations
  • Automatic Parallelization
    • Inserts synchronization to atomically execute operators
    • Avoids data-races / deadlocks
    • Specializes parallelization based on scheduling constraints

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

Synchronization

sssp in elixir
SSSP in Elixir

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

Graph type

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

Operator

Fixpoint

Statement

sssp = iterate relax ≫ schedule

operators
Operators

Cautiousby construction – easy to generalize

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

Redex pattern

Guard

Update

sssp = iterate relax ≫ schedule

ad

bd

ad

ad+w

w

w

a

b

a

b

if bd > ad + w

fixpoint statement
Fixpoint statement

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax ≫ schedule

Scheduling expression

Apply operator until fixpoint

scheduling examples
Scheduling examples

q = new PrQueue

q.enqueue(SRC)

while (! q.empty ) {

a = q.dequeue

for each e = (a,b,w) {

if dist(a) + w < dist(b) {

dist(b) = dist(a) + w

q.enqueue(b)

}

}

}

Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

relax = [ nodes(node a, dist ad) nodes(node b, distbd)

edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

sssp = iterate relax ≫ schedule

Locality enhanced Label-correcting

group b ≫unroll 2 ≫approx metric ad

Dijkstra-style

metric ad ≫group b

operator delta inference
Operator Delta Inference

Parallel Graph Algorithm

Operators

Schedule

Order activity processing

Identify

new activities

Static

Schedule

Dynamic Schedule

delta inference example
Delta Inference Example

c

relax2

w2

a

b

w1

SMT Solver

relax1

assume(da + w1< db)

assume¬(dc + w2 < db)

db_post =da + w1

assert¬(dc + w2 < db_post)

SMT Solver

(c,b) does not

become active

Query Program

delta inference example active
Delta inference example – active

Apply relax on all outgoing edges (b,c) such that:

dc > db +w2

and c ≄ a

relax1

relax2

a

b

c

w1

w2

SMT Solver

assume(da + w1< db)

assume¬(db+ w2 < dc)

db_post =da + w1

assert¬(db_post+ w2< dc)

SMT Solver

Query Program

influence patterns
Influence patterns

d

c

a

b=c

a=d

b

a=c

b

a=c

b=d

a

b=d

d

a=d

b=c

c

system architecture
System architecture

C++Program

Elixir

Synthesize code

Insert synchronization

Galois/OpenMP Parallel Runtime

Parallel Thread-Pool

Graph Implementations

Worklist Implementations

Algorithm Spec

experiments
Experiments

Compare against hand-written parallel implementations

sssp results
SSSP results

Group + Unroll improve locality

Implementation Variant

24 core Intel Xeon @ 2 GHz

USA Florida Road Network (1 M nodes, 2.7 M Edges)

breadth first search results
Breadth-First search results

Scale-Free Graph

1 M nodes,

8 M edges

USA road network

24 M nodes,

58 M edges

conclusion
Conclusion
  • Graph algorithm = Operators + Schedule
    • Elixir language : imperative operators + declarative schedule
  • Allows exploring implementation space
  • Automated reasoning for efficiently computing fixpoints
  • Correct-by-construction parallelization
  • Performance competitive with hand-parallelized code
motivation1
Motivation
  • Transactional memories [Herlihy ‘93]
    • Programmer writes code w. coarse-grained atomic blocks
    • Transaction manager takes care of conflicts providing illusion of sequential execution
  • Strict serializability – correctness criterion
    • Formalizes “illusion of sequential execution”
  • Parameterized verification
    • Formal proof for given implementation
    • For every number of threads
    • For every number of memory objects
    • For every number and length of transactions
stm terminology
STM terminology
  • Statements: reads, writes, commit, abort
  • Transaction: reads and writes of variables followed by commit (committing transaction)or abort (aborting transaction)
  • Word: interleaved sequence of transactions of different threads
  • Conflict: two statements conflict if
    • One is a read of variable X and other is a commit of a transaction that writes to X
    • Both are commits of transactions that write to X
safety property strict serializability
Safety property: strict serializability

There is a serialization for the committing threads such that order of conflicts is preserved

Order of non-overlapping transactions remains the same

safety property strict serializability1
Safety property: strict serializability

conflict

conflict

There is a serialization for the committing threads such that order of conflicts is preserved

Order of non-overlapping transactions remains the same

Example word:

(rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)

=>

Can be serialized to :

(rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)

main results
Main results

Challenging – requires reasoning on both universal and existential properties

  • First automatic verification ofstrict serializability for transactional memories
    • TPL, DSTM, TL2
  • New proof technique:
    • Template-based invisible invariant generation
    • Abstract checking algorithm to check inductive invariants
outline
Outline

Strict serializability verification approach

Automating the proof

Experiments

Conclusion

Related work

proof roadmap 1
Proof roadmap 1

Goal: prove model M is strictly serializable

  • Given a strict-serializability reference model RSS reduce checking strict-serializabilityto checking that M refines RSS
  • Reduce refinement to checking safety
    • Safety property SIM: whenever M can execute a statement so can RSS
    • Check SIM on product system M  RSS
proof roadmap 2
Proof roadmap 2
  • Model STMs M and RSS in first-order logic
    • TM models use set data structures and typestate bits
  • Check safety by generating strong enough candidate inductive invariant and checking inductiveness
  • Use observations on structure of transactional memories
  • Use sound first-order reasoning
reference strict serializability model
Reference strict serializability model
  • Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]
  • RSS :Most liberal specification of strictly serializable system
    • Allows largest language of strictly-serializable executions
  • M is strictly serializable iff every word of M is also a word of RSS
    • Language(M)  Language(RSS)
    • M refines RSS
modeling transactional memories
Modeling transactional memories
  • Mn,k=(predicates,actions)
    • Predicate : ranked relation symbolp(t), q(t,v),…
    • Binary predicates used for sets so instead of rs(t,v)I’ll write vrs(t)
    • Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…
  • Universe = set of k thread individuals and n memory individuals
  • State S = a valuation to the predicates
reference model rss predicates
Reference model (RSS) predicates
  • Typestates:
    • RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t)
  • Read/write sets
    • RSS.rs(t,v), RSS.ws(t,v)
  • Prohibited read/write sets
    • RSS.prs(t,v), RSS.pws(t,v)
  • Weak-predecessor
    • RSS.wp(t1,t2)
dstm predicates
DSTM predicates
  • Typestates:
    • DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t)
  • Read/own sets
    • DSTM.rs(t,v), DSTM.os(t,v)
rss commit t action
RSS commit(t) action

executing thread

action precondition

  • ifRSS.invalid(t)  RSS.wp(t,t)thent1,t2 . RSS.wp’(t1,t2)  t1t  t2t  (RSS.wp(t1,t2)  RSS.wp(t,t2)  (RSS.wp(t1,t) v . vRSS.ws(t1)  vRSS.ws(t)))…

post-statepredicate

current-statepredicate

write-write conflict

dstm commit t action
DSTM commit(t) action

ifDSTM.validated(t) thent1 . DSTM.validated’(t1)  t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…

read-own conflict

fots states and execution
FOTS states and execution

state S1

DSTM

t1

rd v t1

t2

v

threadindividual

memory locationindividual

fots states and execution1
FOTS states and execution

predicate evaluationDSTM.started(t1)=1

state S2

DSTM.started

DSTM

t1

rd v t1

DSTM.rs

t2

v

predicate evaluationDSTM.rs(t1,v)=1

fots states and execution2
FOTS states and execution

state S3

DSTM.started

DSTM

t1

DSTM.started

DSTM.rs

t2

DSTM.ws

wr v t2

v

product system
Product system

The product of two systems: AB

Predicates = A.predicates B.predicates

Actions =commit(t) = { if (A.pre  B.pre) then … }rd(t,v) = { if (A.pre  B.pre) then … }…

M refines RSS iff on every executionSIM holds: action a M.pre(a)  RSS.pre(a)

checking dstm refines rss
Checking DSTM refines RSS

how do we check this safety property?

DSTMRSS SIM

DSTM refines RSS

The only precondition in RSS is for commit(t)

We need to check SIM =t.DSTM.validated(t)  RSS.invalid(t)  RSS.wp(t,t)holds for DSTM  RSS for all reachable states

Proof rule:

checking safety by invisible invariants
Checking safety by invisible invariants

I1: Initial   I2:   transition  ’ I3:   

M  

  • How do we prove that property  holdsfor all reachable statesof system M?
  • Pnueli, Ruah, Zuck[TACAS’01]
  • Come up with inductive invariant that contains reachable states of M and strengthens SIM:
strict serializability proof rule
Strict serializability proof rule

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

two challenges
Two challenges

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

Buthow do we find a candidate ?infinite space of possibilities

given candidate  how do we check the proof rule? checking AB is undecidable for first-order logic

our solution
Our solution

I1: Initial   I2:   transition  ’ I3:   SIM

utilize insightson transactional memory implementations

DSTM  RSS SIM

Proof roadmap:

Divine candidate invariant 

Prove I1, I2, I3

Buthow do we find a candidate ?use templates and iterative weakening

given candidate  how do we check the proof rule? use abstract checking

invariant for dstm rss
Invariant for DSTMRSS

P1: t, t1 . RSS.wp(t,t1) RSS.invalid(t)  RSS.pending(t)  v . vRSS.ws(t1) vRSS.ws(t)

P2: t, v . vRSS.rs(t)  DSTM.aborted(t)  vDSTM.rs(t)

P3: t, v . vRSS.ws(t)  DSTM.aborted(t)  vDSTM.os(t)

P4: t . DSTM.validated(t)  RSS.wp(t,t)

P5: t . DSTM.validated(t)  RSS.invalid(t)

P6: t . DSTM.validated(t)  RSS.pending(t)

Inductive invariant involving only RSS – can use for all future proofs

templates for dstm rss
Templates for DSTMRSS

P1: t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

P2: t, v . v1(t)  2(t)  v3(t)

P3: t, v . V1(t)  2(t)  v3(t)

P4: t . 1(t)  2(t,t)

P5: t . 1(t)  2(t)

P6: t . 1(t)  2(t)

templates for dstm rss1
Templates for DSTMRSS

t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

t, v . v1(t)  2(t)  v3(t)

t . 1(t)  2(t,t)

  • Why templates?
  • Makes invariant separable
  • Controls complexity of invariants
  • Adding templates enables refinement
mining candidate invariants
Mining candidate invariants
  • Use predefined set of templates to specify structure of candidate invariants
    • t,v . 1 2 3
    • 1, 2, 3 are predicates of M or their negations
    • Existential formulas capturing 1-level conflictsv . v4(t1)  v5(t2)
  • Mine candidate invariants from concrete execution
iterative invariant weakening
Iterative invariant weakening

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

  • Initial candidate invariant C0=P1  P2  …  Pk
  • Try to prove I2:   transition  ’C1 = { Pi | I0  transition  Pi for PiI0}
  • If C1=C0 then we have an inductive invariant
  • Otherwise, computeC2 = { Pi | C1 transition  Pi for PiC1}
  • Repeat until either
    • found inductive invariant – check I3:Ck  SIM
    • Reached top {} – trivial inductive invariant
abstract proof rule
Abstract proof rule

I1:(Initial)   I2:abs_transition(())  ’ I3: ()  SIM

I1: Initial   I2:   transition  ’ I3:   SIM

DSTM  RSS SIM

DSTM  RSS SIM

Abstracttransformer

Approximateentailment

Formulaabstraction

conclusion1
Conclusion
  • Novel invariant generation using templates – extends applicability of invisible invariants
  • Abstract domain and reasoning to check invariants without state explosion
  • Proved strict-serializability for TPL, DSTM, TL2
    • BLAST and TVLA failed
insights on transactional memories
Insights on transactional memories

Transition relation is symmetric – thread identifiers not usedp’(t,v)  … t1 … t2

Executing thread t interacts only with arbitrary thread or conflict-adjacent thread

Arbitrary thread:v . vTL2.rs(t1)  vTL2.ws(t1)

Conflict adjacent:v . vDSTM.rs(t1)  vDSTM.ws(t)

conflict adjacency
Conflict adjacency

writeset

v2

t3

writeset

v1

t2

write-writeconflict

readset

t

read-writeconflict

v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)

related work
Related work
  • Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]
  • Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]
  • Predicate Abstraction + Shape Analysis
    • SLAM, BLAST, TVLA
related work1
Related work

Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]

Templates – for arithmetic constraints

Indexed predicate abstractionShuvendu et al. [TOCL’07]

Thread quantificationBerdine et al. [CAV’08]Segalov et al. [APLAS’09]