Program analysis and synthesis of parallel systems
Sponsored Links
This presentation is the property of its rightful owner.
1 / 95

Program Analysis and Synthesis of Parallel Systems PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on
  • Presentation posted in: General

Program Analysis and Synthesis of Parallel Systems. Roman Manevich Ben-Gurion University. Three papers. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]

Download Presentation

Program Analysis and Synthesis of Parallel Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Program Analysis and Synthesis of Parallel Systems

Roman ManevichBen-Gurion University


Three papers

A Shape Analysis for Optimizing Parallel Graph Programs[POPL’11]

Elixir: a System for Synthesizing Concurrent Graph Programs[OOPSLA’12]

Parameterized Verification of Transactional Memories[PLDI’10]


What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs[OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories[PLDI’10]


What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories [PLDI’10]


A Shape Analysis for Optimizing Parallel Graph Programs

Roman Manevich2

Kathryn S. McKinley1

Dimitrios Prountzos1

Keshav Pingali1,2

1: Department of Computer Science, The University of Texas at Austin

2: Institute for Computational Engineering and Sciences, The University of Texas at Austin


Motivation

  • Graph algorithms are ubiquitous

  • Goal: Compiler analysis for optimizationof parallel graph algorithms

Computer

Graphics

Computational

biology

Social Networks


Minimum Spanning Tree Problem

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5


Minimum Spanning Tree Problem

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5


Boruvka’sMinimum Spanning Tree Algorithm

7

1

6

c

d

e

f

lt

2

4

4

3

1

6

a

b

g

d

e

f

5

7

4

4

a,c

b

g

3

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Parallelism in Boruvka

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Non-conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Non-conflicting iterations

1

6

f,g

d

e

7

4

a,c

b

3

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Optimistic parallelization in Galois

i2

i1

i3

  • Programming model

    • Client code has sequential semantics

    • Library of concurrent data structures

  • Parallel execution model

    • Thread-level speculation (TLS)

    • Activities executed speculatively

  • Conflict detection

    • Each node/edge has associated exclusive lock

    • Graph operations acquire locks on read/written nodes/edges

    • Lock owned by another thread  conflict  iteration rolled back

    • All locks released at the end

  • Two main overheads

    • Locking

    • Undo actions


Generic optimization structure

ProgramAnalyzer

AnnotatedProgram

Program

ProgramTransformer

OptimizedProgram


Overheads(I): locking

  • Optimizations

    • Redundant locking elimination

    • Lock removal for iteration private data

    • Lock removal for lock domination

  • ACQ(P): set of definitely acquiredlocks per program point P

  • Given method call M at P:

    Locks(M)  ACQ(P)  Redundant Locking


Overheads (II): undo actions

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

foreach (Node a : wl) {

}

Lockset Grows

Failsafe

Lockset Stable

Program point Pis failsafe if:

Q : Reaches(P,Q)  Locks(Q)  ACQ(P)


Lockset analysis

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

  • Redundant Locking

    • Locks(M)  ACQ(P)

  • Undo elimination

    • Q : Reaches(P,Q) Locks(Q)  ACQ(P)

  • Need to compute ACQ(P)

  • : Runtime

    overhead


    The optimization technically

    • Each graph method m(arg1,…,argk, flag) contains optimization level flag

      • flag=LOCK– acquire locks

      • flag=UNDO– log undo (backup) data

      • flag=LOCK_UNOD– (default) acquire locks and log undo

      • flag=NONE– no extra work

  • Example:Edge e = g.getEdge(lt, n, NONE)


  • Analysis challenges

    • The usual suspects:

      • Unbounded Memory  Undecidability

      • Aliasing, Destructive updates

    • Specific challenges:

      • Complex ADTs: unstructured graphs

      • Heap objects are locked

      • Adapt abstraction to ADTs

    • We use Abstract Interpretation [CC’77]

      • Balance precision and realistic performance


    Shape analysis overview

    Predicate Discovery

    Graph {

    @rep nodes

    @rep edges

    }

    Set {

    @rep cont

    }

    HashMap-Graph

    Set Spec

    Shape Analysis

    Graph Spec

    Tree-based Set

    Concrete ADT

    Implementations

    in Galois library

    ADTSpecifications

    Boruvka.java

    Optimized

    Boruvka.java


    ADT specification

    Abstract ADT state by virtual set fields

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    Set<Node> neighbors(Node n);

    }

    ...

    Set<Node> S1 = g.neighbors(n);

    ...

    @locks(n + n.rev(src) + n.rev(src).dst +

    n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs = n.rev(src).dst + n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Boruvka.java

    Graph Spec

    Assumption: Implementation satisfies Spec


    Modeling ADTs

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs= n.rev(src).dst+ n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Set<Node> neighbors(Node n);

    }

    c

    src

    dst

    src

    dst

    dst

    src

    a

    b

    Graph Spec


    Modeling ADTs

    Abstract State

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs= n.rev(src).dst+ n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Set<Node> neighbors(Node n);

    }

    nodes

    edges

    c

    src

    dst

    ret

    cont

    src

    dst

    dst

    src

    a

    b

    Graph Spec

    nghbrs


    Abstraction scheme

    S1

    S2

    L(S2.cont)

    L(S1.cont)

    (S1 ≠S2)

    ∧ L(S1.cont)

    ∧ L(S2.cont)

    cont

    cont

    • Parameterized by set of LockPaths:

      L(Path)o . o ∊ Path  Locked(o)

      • Tracks subset of must-be-locked objects

    • Abstract domain elements have the form:Aliasing-configs 2LockPaths …


    Joining abstract states

    ( L(y.nd) )

    (  L(y.nd) L(x.rev(src)) )

    ( () L(x.nd) )

    ( L(y.nd) )

    ( () L(x.nd) )

    ( L(y.nd) )  ( () L(x.nd) )

    Aliasing is crucial for precision

    May-be-locked does not enable our optimizations

    #Aliasing-configs: small constant (6)


    Example invariant in Boruvka

    GSet<Node> wl = new GSet<Node>();

    wl.addAll(g.getNodes());

    GBag<Weight> mst = new GBag<Weight>();

    foreach (Node a : wl) {

    Set<Node> aNghbrs = g.neighbors(a);

    Node lt = null;

    for (Node n : aNghbrs) {

    minW,lt = minWeightEdge((a,lt), (a,n));

    }

    g.removeEdge(a, lt);

    Set<Node> ltNghbrs = g.neighbors(lt);

    for (Node n : ltNghbrs) {

    Edge e = g.getEdge(lt, n);

    Weight w = g.getEdgeData(e);

    Edge an = g.getEdge(a, n);

    if (an != null) {

    Weight wan = g.getEdgeData(an);

    if (wan.compareTo(w) < 0)

    w = wan;

    g.setEdgeData(an, w);

    } else {

    g.addEdge(a, n, w);

    }

    }

    g.removeNode(lt);

    mst.add(minW);

    wl.add(a);

    }

    • The immediate neighbors

    • of a and lt are locked

    lt

    a

    ( a ≠ lt )

    ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst))

    ∧ L(a.rev(src).dst)∧ L(a.rev(dst).src)

    ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src))

    ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst)

    …..


    Heuristics for finding LockPaths

    S

    Set<Node>

    cont

    Node

    nd

    NodeData

    • Hierarchy Summarization (HS)

      • x.( fld )*

      • Type hierarchy graph acyclic  bounded number of paths

      • Preflow-Push:

        • L(S.cont) ∧ L(S.cont.nd)

        • Nodes in set S and their data are locked


    Footprint graph heuristic

    • Footprint Graphs (FG)[Calcagno et al. SAS’07]

      • All acyclic paths from arguments of ADT method to locked objects

      • x.( fld | rev(fld) )*

      • Delaunay Mesh Refinement:

        L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst))

        ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src)

      • Nodes in set S and all of their immediate neighbors are locked

    • Composition of HS, FG

      • Preflow-Push: L(a.rev(src).ed)

    HS

    FG


    Experimental evaluation

    • Implement on top of TVLA

      • Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02]

    • Evaluation on 4 Lonestar Java benchmarks

    • Inferred all available optimizations

    • # abstract states practically linear in program size


    Impact of optimizations for 8 threads

    8-core Intel Xeon @ 3.00 GHz


    Note 1

    • How to map abstract domain presented so far to TVLA?

      • Example invariant: (x≠yL(y.nd))  (x=y L(x.nd))

      • Unary abstraction predicate x(v) for pointer x

      • Unary non-abstraction predicateL[x.p] for pointer x and path p

      • Use partial join

      • Resulting abstraction similar to the one shown


    Note 2

    • How to come up with abstraction for similar problems?

      • Start by constructing a manual proof

        • Hoare Logic

      • Examine resulting invariants and generalize into a language of formulas

        • May need to be further specialized for a given program – interesting problem (machine learning/refinement)

      • How to get sound transformers?


    Note 3

    How did we avoid considering all interleavings?

    Proved non-interference side theorem


    Elixir : A System for Synthesizing Concurrent Graph Programs

    Dimitrios Prountzos1

    Roman Manevich2

    Keshav Pingali1

    1. The University of Texas at Austin

    2. Ben-Gurion University of the Negev


    Goal

    • Graph algorithms are ubiquitous

      Social network analysis, Computer graphics, Machine learning, …

    • Difficult to parallelize due to their irregular nature

    • Best algorithm and implementation usually

      • Platform dependent

      • Input dependent

    • Need to easily experiment with different solutions

    • Focus: Fixed graph structure

      • Only change labels on nodes and edges

      • Each activity touches a fixed number of nodes

    Allow programmer to easilyimplement

    correct and efficient

    parallel graph algorithms


    Example: Single-Source Shortest-Path

    S

    5

    2

    A

    B

    A

    2

    1

    7

    C

    C

    3

    4

    3

    12

    D

    E

    2

    2

    F

    9

    1

    G

    if dist(A) + WAC < dist(C)

    dist(C) = dist(A) + WAC

    • Problem Formulation

      • Compute shortest distancefrom source node Sto every other node

    • Many algorithms

      • Bellman-Ford (1957)

      • Dijkstra (1959)

      • Chaotic relaxation (Miranker 1969)

      • Delta-stepping (Meyer et al. 1998)

    • Common structure

      • Each node has label distwith knownshortest distance from S

    • Key operation

      • relax-edge(u,v)


    Dijkstra’s algorithm

    <B,5>

    <C,3>

    <B,5>

    <E,6>

    <B,5>

    <D,7>

    S

    5

    2

    A

    B

    5

    3

    1

    7

    C

    3

    4

    D

    E

    7

    2

    2

    6

    F

    9

    1

    G

    Scheduling of relaxations:

    • Use priority queueof nodes, ordered by label dist

    • Iterate over nodes u in priority order

    • On each step: relax all neighbors v of u

      • Apply relax-edgeto all (u,v)


    Chaotic relaxation

    S

    5

    2

    • Scheduling of relaxations:

    • Use unordered set of edges

    • Iterate over edges (u,v) in any order

    • On each step:

      • Apply relax-edge to edge (u,v)

    A

    B

    5

    1

    7

    C

    3

    4

    12

    D

    E

    2

    2

    F

    9

    1

    G

    (C,D)

    (B,C)

    (S,A)

    (C,E)


    Insights behind Elixir

    Parallel Graph Algorithm

    What should be done

    How it should be done

    Operators

    Schedule

    Unordered/Ordered algorithms

    Order activity processing

    Identify

    new activities

    Operator Delta

    “TAO of parallelism”

    PLDI 2011

    : activity

    Static

    Schedule

    Dynamic Schedule


    Insights behind Elixir

    Parallel Graph Algorithm

    q = new PrQueue

    q.enqueue(SRC)

    while (! q.empty ) {

    a = q.dequeue

    for each e = (a,b,w) {

    if dist(a) + w < dist(b) {

    dist(b) = dist(a) + w

    q.enqueue(b)

    }

    }

    }

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule

    Dijkstra-style Algorithm


    Contributions

    Parallel Graph Algorithm

    • Language

      • Operators/Schedule separation

      • Allows exploration of implementation space

    • Operator Delta Inference

      • Precise Delta required for efficient fixpoint computations

    • Automatic Parallelization

      • Inserts synchronization to atomically execute operators

      • Avoids data-races / deadlocks

      • Specializes parallelization based on scheduling constraints

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule

    Synchronization


    SSSP in Elixir

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    Graph type

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

    Operator

    Fixpoint

    Statement

    sssp = iterate relax ≫ schedule


    Operators

    Cautiousby construction – easy to generalize

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

    Redex pattern

    Guard

    Update

    sssp = iterate relax ≫ schedule

    ad

    bd

    ad

    ad+w

    w

    w

    a

    b

    a

    b

    if bd > ad + w


    Fixpoint statement

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

    sssp = iterate relax ≫ schedule

    Scheduling expression

    Apply operator until fixpoint


    Scheduling examples

    q = new PrQueue

    q.enqueue(SRC)

    while (! q.empty ) {

    a = q.dequeue

    for each e = (a,b,w) {

    if dist(a) + w < dist(b) {

    dist(b) = dist(a) + w

    q.enqueue(b)

    }

    }

    }

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

    sssp = iterate relax ≫ schedule

    Locality enhanced Label-correcting

    group b ≫unroll 2 ≫approx metric ad

    Dijkstra-style

    metric ad ≫group b


    Operator Delta Inference

    Parallel Graph Algorithm

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule


    Identifying the delta of an operator

    ?

    b

    relax1

    ?

    a


    Delta Inference Example

    c

    relax2

    w2

    a

    b

    w1

    SMT Solver

    relax1

    assume(da + w1< db)

    assume¬(dc + w2 < db)

    db_post =da + w1

    assert¬(dc + w2 < db_post)

    SMT Solver

    (c,b) does not

    become active

    Query Program


    Delta inference example – active

    Apply relax on all outgoing edges (b,c) such that:

    dc > db +w2

    and c ≄ a

    relax1

    relax2

    a

    b

    c

    w1

    w2

    SMT Solver

    assume(da + w1< db)

    assume¬(db+ w2 < dc)

    db_post =da + w1

    assert¬(db_post+ w2< dc)

    SMT Solver

    Query Program


    Influence patterns

    d

    c

    a

    b=c

    a=d

    b

    a=c

    b

    a=c

    b=d

    a

    b=d

    d

    a=d

    b=c

    c


    System architecture

    C++Program

    Elixir

    Synthesize code

    Insert synchronization

    Galois/OpenMP Parallel Runtime

    Parallel Thread-Pool

    Graph Implementations

    Worklist Implementations

    Algorithm Spec


    Experiments

    Compare against hand-written parallel implementations


    SSSP results

    Group + Unroll improve locality

    Implementation Variant

    24 core Intel Xeon @ 2 GHz

    USA Florida Road Network (1 M nodes, 2.7 M Edges)


    Breadth-First search results

    Scale-Free Graph

    1 M nodes,

    8 M edges

    USA road network

    24 M nodes,

    58 M edges


    Conclusion

    • Graph algorithm = Operators + Schedule

      • Elixir language : imperative operators + declarative schedule

    • Allows exploring implementation space

    • Automated reasoning for efficiently computing fixpoints

    • Correct-by-construction parallelization

    • Performance competitive with hand-parallelized code


    Parameterized Verificationof Software Transactional Memories

    Michael EmmiRupakMajumdarRoman Manevich


    Motivation

    • Transactional memories [Herlihy ‘93]

      • Programmer writes code w. coarse-grained atomic blocks

      • Transaction manager takes care of conflicts providing illusion of sequential execution

    • Strict serializability – correctness criterion

      • Formalizes “illusion of sequential execution”

    • Parameterized verification

      • Formal proof for given implementation

      • For every number of threads

      • For every number of memory objects

      • For every number and length of transactions


    STM terminology

    • Statements: reads, writes, commit, abort

    • Transaction: reads and writes of variables followed by commit (committing transaction)or abort (aborting transaction)

    • Word: interleaved sequence of transactions of different threads

    • Conflict: two statements conflict if

      • One is a read of variable X and other is a commit of a transaction that writes to X

      • Both are commits of transactions that write to X


    Safety property: strict serializability

    There is a serialization for the committing threads such that order of conflicts is preserved

    Order of non-overlapping transactions remains the same


    Safety property: strict serializability

    conflict

    conflict

    There is a serialization for the committing threads such that order of conflicts is preserved

    Order of non-overlapping transactions remains the same

    Example word:

    (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)

    =>

    Can be serialized to :

    (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)


    Main results

    Challenging – requires reasoning on both universal and existential properties

    • First automatic verification ofstrict serializability for transactional memories

      • TPL, DSTM, TL2

    • New proof technique:

      • Template-based invisible invariant generation

      • Abstract checking algorithm to check inductive invariants


    Outline

    Strict serializability verification approach

    Automating the proof

    Experiments

    Conclusion

    Related work


    Proof roadmap 1

    Goal: prove model M is strictly serializable

    • Given a strict-serializability reference model RSS reduce checking strict-serializabilityto checking that M refines RSS

    • Reduce refinement to checking safety

      • Safety property SIM: whenever M can execute a statement so can RSS

      • Check SIM on product system M  RSS


    Proof roadmap 2

    • Model STMs M and RSS in first-order logic

      • TM models use set data structures and typestate bits

    • Check safety by generating strong enough candidate inductive invariant and checking inductiveness

    • Use observations on structure of transactional memories

    • Use sound first-order reasoning


    Reference strict serializability model

    • Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]

    • RSS :Most liberal specification of strictly serializable system

      • Allows largest language of strictly-serializable executions

    • M is strictly serializable iff every word of M is also a word of RSS

      • Language(M)  Language(RSS)

      • M refines RSS


    Modeling transactional memories

    • Mn,k=(predicates,actions)

      • Predicate : ranked relation symbolp(t), q(t,v),…

      • Binary predicates used for sets so instead of rs(t,v)I’ll write vrs(t)

      • Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…

    • Universe = set of k thread individuals and n memory individuals

    • State S = a valuation to the predicates


    Reference model (RSS) predicates

    • Typestates:

      • RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t)

    • Read/write sets

      • RSS.rs(t,v), RSS.ws(t,v)

    • Prohibited read/write sets

      • RSS.prs(t,v), RSS.pws(t,v)

    • Weak-predecessor

      • RSS.wp(t1,t2)


    DSTM predicates

    • Typestates:

      • DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t)

    • Read/own sets

      • DSTM.rs(t,v), DSTM.os(t,v)


    RSS commit(t) action

    executing thread

    action precondition

    • ifRSS.invalid(t)  RSS.wp(t,t)thent1,t2 . RSS.wp’(t1,t2)  t1t  t2t  (RSS.wp(t1,t2)  RSS.wp(t,t2)  (RSS.wp(t1,t) v . vRSS.ws(t1)  vRSS.ws(t)))…

    post-statepredicate

    current-statepredicate

    write-write conflict


    DSTM commit(t) action

    ifDSTM.validated(t) thent1 . DSTM.validated’(t1)  t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…

    read-own conflict


    FOTS states and execution

    state S1

    DSTM

    t1

    rd v t1

    t2

    v

    threadindividual

    memory locationindividual


    FOTS states and execution

    predicate evaluationDSTM.started(t1)=1

    state S2

    DSTM.started

    DSTM

    t1

    rd v t1

    DSTM.rs

    t2

    v

    predicate evaluationDSTM.rs(t1,v)=1


    FOTS states and execution

    state S3

    DSTM.started

    DSTM

    t1

    DSTM.started

    DSTM.rs

    t2

    DSTM.ws

    wr v t2

    v


    Product system

    The product of two systems: AB

    Predicates = A.predicates B.predicates

    Actions =commit(t) = { if (A.pre  B.pre) then … }rd(t,v) = { if (A.pre  B.pre) then … }…

    M refines RSS iff on every executionSIM holds: action a M.pre(a)  RSS.pre(a)


    Checking DSTM refines RSS

    how do we check this safety property?

    DSTMRSS SIM

    DSTM refines RSS

    The only precondition in RSS is for commit(t)

    We need to check SIM =t.DSTM.validated(t)  RSS.invalid(t)  RSS.wp(t,t)holds for DSTM  RSS for all reachable states

    Proof rule:


    Checking safety by invisible invariants

    I1: Initial   I2:   transition  ’ I3:   

    M  

    • How do we prove that property  holdsfor all reachable statesof system M?

    • Pnueli, Ruah, Zuck[TACAS’01]

    • Come up with inductive invariant that contains reachable states of M and strengthens SIM:


    Strict serializability proof rule

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3


    Two challenges

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3

    Buthow do we find a candidate ?infinite space of possibilities

    given candidate  how do we check the proof rule? checking AB is undecidable for first-order logic


    Our solution

    I1: Initial   I2:   transition  ’ I3:   SIM

    utilize insightson transactional memory implementations

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3

    Buthow do we find a candidate ?use templates and iterative weakening

    given candidate  how do we check the proof rule? use abstract checking


    Invariant for DSTMRSS

    P1: t, t1 . RSS.wp(t,t1) RSS.invalid(t)  RSS.pending(t)  v . vRSS.ws(t1) vRSS.ws(t)

    P2: t, v . vRSS.rs(t)  DSTM.aborted(t)  vDSTM.rs(t)

    P3: t, v . vRSS.ws(t)  DSTM.aborted(t)  vDSTM.os(t)

    P4: t . DSTM.validated(t)  RSS.wp(t,t)

    P5: t . DSTM.validated(t)  RSS.invalid(t)

    P6: t . DSTM.validated(t)  RSS.pending(t)

    Inductive invariant involving only RSS – can use for all future proofs


    Templates for DSTMRSS

    P1: t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

    P2: t, v . v1(t)  2(t)  v3(t)

    P3: t, v . V1(t)  2(t)  v3(t)

    P4: t . 1(t)  2(t,t)

    P5: t . 1(t)  2(t)

    P6: t . 1(t)  2(t)


    Templates for DSTMRSS

    t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

    t, v . v1(t)  2(t)  v3(t)

    t . 1(t)  2(t,t)

    • Why templates?

    • Makes invariant separable

    • Controls complexity of invariants

    • Adding templates enables refinement


    Mining candidate invariants

    • Use predefined set of templates to specify structure of candidate invariants

      • t,v . 1 2 3

      • 1, 2, 3 are predicates of M or their negations

      • Existential formulas capturing 1-level conflictsv . v4(t1)  v5(t2)

    • Mine candidate invariants from concrete execution


    Iterative invariant weakening

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    • Initial candidate invariant C0=P1  P2  …  Pk

    • Try to prove I2:   transition  ’C1 = { Pi | I0  transition  Pi for PiI0}

    • If C1=C0 then we have an inductive invariant

    • Otherwise, computeC2 = { Pi | C1 transition  Pi for PiC1}

    • Repeat until either

      • found inductive invariant – check I3:Ck  SIM

      • Reached top {} – trivial inductive invariant


    Weakening illustration

    P1

    P2

    P1

    P2

    P3

    P3


    Abstract proof rule

    I1:(Initial)   I2:abs_transition(())  ’ I3: ()  SIM

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    DSTM  RSS SIM

    Abstracttransformer

    Approximateentailment

    Formulaabstraction


    Conclusion

    • Novel invariant generation using templates – extends applicability of invisible invariants

    • Abstract domain and reasoning to check invariants without state explosion

    • Proved strict-serializability for TPL, DSTM, TL2

      • BLAST and TVLA failed


    Verification results


    Insights on transactional memories

    Transition relation is symmetric – thread identifiers not usedp’(t,v)  … t1 … t2

    Executing thread t interacts only with arbitrary thread or conflict-adjacent thread

    Arbitrary thread:v . vTL2.rs(t1)  vTL2.ws(t1)

    Conflict adjacent:v . vDSTM.rs(t1)  vDSTM.ws(t)


    Conflict adjacency

    writeset

    v2

    t3

    writeset

    v1

    t2

    write-writeconflict

    readset

    t

    read-writeconflict

    v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)


    Conflict adjacency

    t3

    t2

    ...

    ...

    t


    Related work

    • Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]

    • Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]

    • Predicate Abstraction + Shape Analysis

      • SLAM, BLAST, TVLA


    Related work

    Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]

    Templates – for arithmetic constraints

    Indexed predicate abstractionShuvendu et al. [TOCL’07]

    Thread quantificationBerdine et al. [CAV’08]Segalov et al. [APLAS’09]


    Thank You!


  • Login