Program analysis and synthesis of parallel systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 95

Program Analysis and Synthesis of Parallel Systems PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on
  • Presentation posted in: General

Program Analysis and Synthesis of Parallel Systems. Roman Manevich Ben-Gurion University. Three papers. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11] Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]

Download Presentation

Program Analysis and Synthesis of Parallel Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Program analysis and synthesis of parallel systems

Program Analysis and Synthesis of Parallel Systems

Roman ManevichBen-Gurion University


Three papers

Three papers

A Shape Analysis for Optimizing Parallel Graph Programs[POPL’11]

Elixir: a System for Synthesizing Concurrent Graph Programs[OOPSLA’12]

Parameterized Verification of Transactional Memories[PLDI’10]


What s the connection

What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs[OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories[PLDI’10]


What s the connection1

What’s the connection?

From analysisto language design

A Shape Analysisfor Optimizing ParallelGraph Programs[POPL’11]

Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]

Creates opportunitiesfor more optimizations.Requires other analyses

Similarities betweenabstract domains

Parameterized Verificationof Transactional Memories [PLDI’10]


A shape analysis for optimizing parallel graph programs

A Shape Analysis for Optimizing Parallel Graph Programs

Roman Manevich2

Kathryn S. McKinley1

Dimitrios Prountzos1

Keshav Pingali1,2

1: Department of Computer Science, The University of Texas at Austin

2: Institute for Computational Engineering and Sciences, The University of Texas at Austin


Motivation

Motivation

  • Graph algorithms are ubiquitous

  • Goal: Compiler analysis for optimizationof parallel graph algorithms

Computer

Graphics

Computational

biology

Social Networks


Minimum spanning tree problem

Minimum Spanning Tree Problem

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5


Minimum spanning tree problem1

Minimum Spanning Tree Problem

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5


Boruvka s minimum spanning tree algorithm

Boruvka’sMinimum Spanning Tree Algorithm

7

1

6

c

d

e

f

lt

2

4

4

3

1

6

a

b

g

d

e

f

5

7

4

4

a,c

b

g

3

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Parallelism in boruvka

Parallelism in Boruvka

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Non conflicting iterations

Non-conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Non conflicting iterations1

Non-conflicting iterations

1

6

f,g

d

e

7

4

a,c

b

3

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Conflicting iterations

Conflicting iterations

7

1

6

c

d

e

f

2

4

4

3

a

b

g

5

  • Build MST bottom-up

    • repeat {

    • pick arbitrary node ‘a’

    • merge with lightest neighbor ‘lt’

    • add edge ‘a-lt’ to MST

    • } until graph is a single node


Optimistic parallelization in galois

Optimistic parallelization in Galois

i2

i1

i3

  • Programming model

    • Client code has sequential semantics

    • Library of concurrent data structures

  • Parallel execution model

    • Thread-level speculation (TLS)

    • Activities executed speculatively

  • Conflict detection

    • Each node/edge has associated exclusive lock

    • Graph operations acquire locks on read/written nodes/edges

    • Lock owned by another thread  conflict  iteration rolled back

    • All locks released at the end

  • Two main overheads

    • Locking

    • Undo actions


Generic optimization structure

Generic optimization structure

ProgramAnalyzer

AnnotatedProgram

Program

ProgramTransformer

OptimizedProgram


Overheads i locking

Overheads(I): locking

  • Optimizations

    • Redundant locking elimination

    • Lock removal for iteration private data

    • Lock removal for lock domination

  • ACQ(P): set of definitely acquiredlocks per program point P

  • Given method call M at P:

    Locks(M)  ACQ(P)  Redundant Locking


Overheads ii undo actions

Overheads (II): undo actions

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

foreach (Node a : wl) {

}

Lockset Grows

Failsafe

Lockset Stable

Program point Pis failsafe if:

Q : Reaches(P,Q)  Locks(Q)  ACQ(P)


Lockset analysis

Lockset analysis

GSet<Node> wl = new GSet<Node>();

wl.addAll(g.getNodes());

GBag<Weight> mst = new GBag<Weight>();

foreach (Node a : wl) {

Set<Node> aNghbrs = g.neighbors(a);

Node lt = null;

for (Node n : aNghbrs) {

minW,lt = minWeightEdge((a,lt), (a,n));

}

g.removeEdge(a, lt);

Set<Node> ltNghbrs = g.neighbors(lt);

for (Node n : ltNghbrs) {

Edge e = g.getEdge(lt, n);

Weight w = g.getEdgeData(e);

Edge an = g.getEdge(a, n);

if (an != null) {

Weight wan = g.getEdgeData(an);

if (wan.compareTo(w) < 0)

w = wan;

g.setEdgeData(an, w);

} else {

g.addEdge(a, n, w);

}

}

g.removeNode(lt);

mst.add(minW);

wl.add(a);

}

  • Redundant Locking

    • Locks(M)  ACQ(P)

  • Undo elimination

    • Q : Reaches(P,Q) Locks(Q)  ACQ(P)

  • Need to compute ACQ(P)

  • : Runtime

    overhead


    The optimization technically

    The optimization technically

    • Each graph method m(arg1,…,argk, flag) contains optimization level flag

      • flag=LOCK– acquire locks

      • flag=UNDO– log undo (backup) data

      • flag=LOCK_UNOD– (default) acquire locks and log undo

      • flag=NONE– no extra work

  • Example:Edge e = g.getEdge(lt, n, NONE)


  • Analysis challenges

    Analysis challenges

    • The usual suspects:

      • Unbounded Memory  Undecidability

      • Aliasing, Destructive updates

    • Specific challenges:

      • Complex ADTs: unstructured graphs

      • Heap objects are locked

      • Adapt abstraction to ADTs

    • We use Abstract Interpretation [CC’77]

      • Balance precision and realistic performance


    Shape analysis overview

    Shape analysis overview

    Predicate Discovery

    Graph {

    @rep nodes

    @rep edges

    }

    Set {

    @rep cont

    }

    HashMap-Graph

    Set Spec

    Shape Analysis

    Graph Spec

    Tree-based Set

    Concrete ADT

    Implementations

    in Galois library

    ADTSpecifications

    Boruvka.java

    Optimized

    Boruvka.java


    Adt specification

    ADT specification

    Abstract ADT state by virtual set fields

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    Set<Node> neighbors(Node n);

    }

    ...

    Set<Node> S1 = g.neighbors(n);

    ...

    @locks(n + n.rev(src) + n.rev(src).dst +

    n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs = n.rev(src).dst + n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Boruvka.java

    Graph Spec

    Assumption: Implementation satisfies Spec


    Modeling adts

    Modeling ADTs

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs= n.rev(src).dst+ n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Set<Node> neighbors(Node n);

    }

    c

    src

    dst

    src

    dst

    dst

    src

    a

    b

    Graph Spec


    Modeling adts1

    Modeling ADTs

    Abstract State

    Graph<ND,ED> {

    @rep set<Node> nodes

    @rep set<Edge> edges

    @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src)

    @op(

    nghbrs= n.rev(src).dst+ n.rev(dst).src ,

    ret = new Set<Node<ND>>(cont=nghbrs)

    )

    Set<Node> neighbors(Node n);

    }

    nodes

    edges

    c

    src

    dst

    ret

    cont

    src

    dst

    dst

    src

    a

    b

    Graph Spec

    nghbrs


    Abstraction scheme

    Abstraction scheme

    S1

    S2

    L(S2.cont)

    L(S1.cont)

    (S1 ≠S2)

    ∧ L(S1.cont)

    ∧ L(S2.cont)

    cont

    cont

    • Parameterized by set of LockPaths:

      L(Path)o . o ∊ Path  Locked(o)

      • Tracks subset of must-be-locked objects

    • Abstract domain elements have the form:Aliasing-configs 2LockPaths …


    Joining abstract states

    Joining abstract states

    ( L(y.nd) )

    (  L(y.nd) L(x.rev(src)) )

    ( () L(x.nd) )

    ( L(y.nd) )

    ( () L(x.nd) )

    ( L(y.nd) )  ( () L(x.nd) )

    Aliasing is crucial for precision

    May-be-locked does not enable our optimizations

    #Aliasing-configs: small constant (6)


    Example invariant in boruvka

    Example invariant in Boruvka

    GSet<Node> wl = new GSet<Node>();

    wl.addAll(g.getNodes());

    GBag<Weight> mst = new GBag<Weight>();

    foreach (Node a : wl) {

    Set<Node> aNghbrs = g.neighbors(a);

    Node lt = null;

    for (Node n : aNghbrs) {

    minW,lt = minWeightEdge((a,lt), (a,n));

    }

    g.removeEdge(a, lt);

    Set<Node> ltNghbrs = g.neighbors(lt);

    for (Node n : ltNghbrs) {

    Edge e = g.getEdge(lt, n);

    Weight w = g.getEdgeData(e);

    Edge an = g.getEdge(a, n);

    if (an != null) {

    Weight wan = g.getEdgeData(an);

    if (wan.compareTo(w) < 0)

    w = wan;

    g.setEdgeData(an, w);

    } else {

    g.addEdge(a, n, w);

    }

    }

    g.removeNode(lt);

    mst.add(minW);

    wl.add(a);

    }

    • The immediate neighbors

    • of a and lt are locked

    lt

    a

    ( a ≠ lt )

    ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst))

    ∧ L(a.rev(src).dst)∧ L(a.rev(dst).src)

    ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src))

    ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst)

    …..


    Heuristics for finding lockpaths

    Heuristics for finding LockPaths

    S

    Set<Node>

    cont

    Node

    nd

    NodeData

    • Hierarchy Summarization (HS)

      • x.( fld )*

      • Type hierarchy graph acyclic  bounded number of paths

      • Preflow-Push:

        • L(S.cont) ∧ L(S.cont.nd)

        • Nodes in set S and their data are locked


    Footprint graph heuristic

    Footprint graph heuristic

    • Footprint Graphs (FG)[Calcagno et al. SAS’07]

      • All acyclic paths from arguments of ADT method to locked objects

      • x.( fld | rev(fld) )*

      • Delaunay Mesh Refinement:

        L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst))

        ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src)

      • Nodes in set S and all of their immediate neighbors are locked

    • Composition of HS, FG

      • Preflow-Push: L(a.rev(src).ed)

    HS

    FG


    Experimental evaluation

    Experimental evaluation

    • Implement on top of TVLA

      • Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02]

    • Evaluation on 4 Lonestar Java benchmarks

    • Inferred all available optimizations

    • # abstract states practically linear in program size


    Impact of optimizations for 8 threads

    Impact of optimizations for 8 threads

    8-core Intel Xeon @ 3.00 GHz


    Note 1

    Note 1

    • How to map abstract domain presented so far to TVLA?

      • Example invariant: (x≠yL(y.nd))  (x=y L(x.nd))

      • Unary abstraction predicate x(v) for pointer x

      • Unary non-abstraction predicateL[x.p] for pointer x and path p

      • Use partial join

      • Resulting abstraction similar to the one shown


    Note 2

    Note 2

    • How to come up with abstraction for similar problems?

      • Start by constructing a manual proof

        • Hoare Logic

      • Examine resulting invariants and generalize into a language of formulas

        • May need to be further specialized for a given program – interesting problem (machine learning/refinement)

      • How to get sound transformers?


    Note 3

    Note 3

    How did we avoid considering all interleavings?

    Proved non-interference side theorem


    Elixir a system for synthesizing concurrent graph programs

    Elixir : A System for Synthesizing Concurrent Graph Programs

    Dimitrios Prountzos1

    Roman Manevich2

    Keshav Pingali1

    1. The University of Texas at Austin

    2. Ben-Gurion University of the Negev


    Program analysis and synthesis of parallel systems

    Goal

    • Graph algorithms are ubiquitous

      Social network analysis, Computer graphics, Machine learning, …

    • Difficult to parallelize due to their irregular nature

    • Best algorithm and implementation usually

      • Platform dependent

      • Input dependent

    • Need to easily experiment with different solutions

    • Focus: Fixed graph structure

      • Only change labels on nodes and edges

      • Each activity touches a fixed number of nodes

    Allow programmer to easilyimplement

    correct and efficient

    parallel graph algorithms


    Example single source shortest path

    Example: Single-Source Shortest-Path

    S

    5

    2

    A

    B

    A

    2

    1

    7

    C

    C

    3

    4

    3

    12

    D

    E

    2

    2

    F

    9

    1

    G

    if dist(A) + WAC < dist(C)

    dist(C) = dist(A) + WAC

    • Problem Formulation

      • Compute shortest distancefrom source node Sto every other node

    • Many algorithms

      • Bellman-Ford (1957)

      • Dijkstra (1959)

      • Chaotic relaxation (Miranker 1969)

      • Delta-stepping (Meyer et al. 1998)

    • Common structure

      • Each node has label distwith knownshortest distance from S

    • Key operation

      • relax-edge(u,v)


    Dijkstra s algorithm

    Dijkstra’s algorithm

    <B,5>

    <C,3>

    <B,5>

    <E,6>

    <B,5>

    <D,7>

    S

    5

    2

    A

    B

    5

    3

    1

    7

    C

    3

    4

    D

    E

    7

    2

    2

    6

    F

    9

    1

    G

    Scheduling of relaxations:

    • Use priority queueof nodes, ordered by label dist

    • Iterate over nodes u in priority order

    • On each step: relax all neighbors v of u

      • Apply relax-edgeto all (u,v)


    Chaotic relaxation

    Chaotic relaxation

    S

    5

    2

    • Scheduling of relaxations:

    • Use unordered set of edges

    • Iterate over edges (u,v) in any order

    • On each step:

      • Apply relax-edge to edge (u,v)

    A

    B

    5

    1

    7

    C

    3

    4

    12

    D

    E

    2

    2

    F

    9

    1

    G

    (C,D)

    (B,C)

    (S,A)

    (C,E)


    Insights behind elixir

    Insights behind Elixir

    Parallel Graph Algorithm

    What should be done

    How it should be done

    Operators

    Schedule

    Unordered/Ordered algorithms

    Order activity processing

    Identify

    new activities

    Operator Delta

    “TAO of parallelism”

    PLDI 2011

    : activity

    Static

    Schedule

    Dynamic Schedule


    Insights behind elixir1

    Insights behind Elixir

    Parallel Graph Algorithm

    q = new PrQueue

    q.enqueue(SRC)

    while (! q.empty ) {

    a = q.dequeue

    for each e = (a,b,w) {

    if dist(a) + w < dist(b) {

    dist(b) = dist(a) + w

    q.enqueue(b)

    }

    }

    }

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule

    Dijkstra-style Algorithm


    Contributions

    Contributions

    Parallel Graph Algorithm

    • Language

      • Operators/Schedule separation

      • Allows exploration of implementation space

    • Operator Delta Inference

      • Precise Delta required for efficient fixpoint computations

    • Automatic Parallelization

      • Inserts synchronization to atomically execute operators

      • Avoids data-races / deadlocks

      • Specializes parallelization based on scheduling constraints

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule

    Synchronization


    Sssp in elixir

    SSSP in Elixir

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    Graph type

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

    Operator

    Fixpoint

    Statement

    sssp = iterate relax ≫ schedule


    Operators

    Operators

    Cautiousby construction – easy to generalize

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd> ad + w ] ➔ [ bd = ad + w ]

    Redex pattern

    Guard

    Update

    sssp = iterate relax ≫ schedule

    ad

    bd

    ad

    ad+w

    w

    w

    a

    b

    a

    b

    if bd > ad + w


    Fixpoint statement

    Fixpoint statement

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

    sssp = iterate relax ≫ schedule

    Scheduling expression

    Apply operator until fixpoint


    Scheduling examples

    Scheduling examples

    q = new PrQueue

    q.enqueue(SRC)

    while (! q.empty ) {

    a = q.dequeue

    for each e = (a,b,w) {

    if dist(a) + w < dist(b) {

    dist(b) = dist(a) + w

    q.enqueue(b)

    }

    }

    }

    Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]

    relax = [ nodes(node a, dist ad) nodes(node b, distbd)

    edges(src a, dst b, wt w)bd > ad + w ] ➔ [ bd = ad + w ]

    sssp = iterate relax ≫ schedule

    Locality enhanced Label-correcting

    group b ≫unroll 2 ≫approx metric ad

    Dijkstra-style

    metric ad ≫group b


    Operator delta inference

    Operator Delta Inference

    Parallel Graph Algorithm

    Operators

    Schedule

    Order activity processing

    Identify

    new activities

    Static

    Schedule

    Dynamic Schedule


    Identifying the delta of an operator

    Identifying the delta of an operator

    ?

    b

    relax1

    ?

    a


    Delta inference example

    Delta Inference Example

    c

    relax2

    w2

    a

    b

    w1

    SMT Solver

    relax1

    assume(da + w1< db)

    assume¬(dc + w2 < db)

    db_post =da + w1

    assert¬(dc + w2 < db_post)

    SMT Solver

    (c,b) does not

    become active

    Query Program


    Delta inference example active

    Delta inference example – active

    Apply relax on all outgoing edges (b,c) such that:

    dc > db +w2

    and c ≄ a

    relax1

    relax2

    a

    b

    c

    w1

    w2

    SMT Solver

    assume(da + w1< db)

    assume¬(db+ w2 < dc)

    db_post =da + w1

    assert¬(db_post+ w2< dc)

    SMT Solver

    Query Program


    Influence patterns

    Influence patterns

    d

    c

    a

    b=c

    a=d

    b

    a=c

    b

    a=c

    b=d

    a

    b=d

    d

    a=d

    b=c

    c


    System architecture

    System architecture

    C++Program

    Elixir

    Synthesize code

    Insert synchronization

    Galois/OpenMP Parallel Runtime

    Parallel Thread-Pool

    Graph Implementations

    Worklist Implementations

    Algorithm Spec


    Experiments

    Experiments

    Compare against hand-written parallel implementations


    Sssp results

    SSSP results

    Group + Unroll improve locality

    Implementation Variant

    24 core Intel Xeon @ 2 GHz

    USA Florida Road Network (1 M nodes, 2.7 M Edges)


    Breadth first search results

    Breadth-First search results

    Scale-Free Graph

    1 M nodes,

    8 M edges

    USA road network

    24 M nodes,

    58 M edges


    Conclusion

    Conclusion

    • Graph algorithm = Operators + Schedule

      • Elixir language : imperative operators + declarative schedule

    • Allows exploring implementation space

    • Automated reasoning for efficiently computing fixpoints

    • Correct-by-construction parallelization

    • Performance competitive with hand-parallelized code


    Parameterized verification of software transactional memories

    Parameterized Verificationof Software Transactional Memories

    Michael EmmiRupakMajumdarRoman Manevich


    Motivation1

    Motivation

    • Transactional memories [Herlihy ‘93]

      • Programmer writes code w. coarse-grained atomic blocks

      • Transaction manager takes care of conflicts providing illusion of sequential execution

    • Strict serializability – correctness criterion

      • Formalizes “illusion of sequential execution”

    • Parameterized verification

      • Formal proof for given implementation

      • For every number of threads

      • For every number of memory objects

      • For every number and length of transactions


    Stm terminology

    STM terminology

    • Statements: reads, writes, commit, abort

    • Transaction: reads and writes of variables followed by commit (committing transaction)or abort (aborting transaction)

    • Word: interleaved sequence of transactions of different threads

    • Conflict: two statements conflict if

      • One is a read of variable X and other is a commit of a transaction that writes to X

      • Both are commits of transactions that write to X


    Safety property strict serializability

    Safety property: strict serializability

    There is a serialization for the committing threads such that order of conflicts is preserved

    Order of non-overlapping transactions remains the same


    Safety property strict serializability1

    Safety property: strict serializability

    conflict

    conflict

    There is a serialization for the committing threads such that order of conflicts is preserved

    Order of non-overlapping transactions remains the same

    Example word:

    (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)

    =>

    Can be serialized to :

    (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)


    Main results

    Main results

    Challenging – requires reasoning on both universal and existential properties

    • First automatic verification ofstrict serializability for transactional memories

      • TPL, DSTM, TL2

    • New proof technique:

      • Template-based invisible invariant generation

      • Abstract checking algorithm to check inductive invariants


    Outline

    Outline

    Strict serializability verification approach

    Automating the proof

    Experiments

    Conclusion

    Related work


    Proof roadmap 1

    Proof roadmap 1

    Goal: prove model M is strictly serializable

    • Given a strict-serializability reference model RSS reduce checking strict-serializabilityto checking that M refines RSS

    • Reduce refinement to checking safety

      • Safety property SIM: whenever M can execute a statement so can RSS

      • Check SIM on product system M  RSS


    Proof roadmap 2

    Proof roadmap 2

    • Model STMs M and RSS in first-order logic

      • TM models use set data structures and typestate bits

    • Check safety by generating strong enough candidate inductive invariant and checking inductiveness

    • Use observations on structure of transactional memories

    • Use sound first-order reasoning


    Reference strict serializability model

    Reference strict serializability model

    • Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]

    • RSS :Most liberal specification of strictly serializable system

      • Allows largest language of strictly-serializable executions

    • M is strictly serializable iff every word of M is also a word of RSS

      • Language(M)  Language(RSS)

      • M refines RSS


    Modeling transactional memories

    Modeling transactional memories

    • Mn,k=(predicates,actions)

      • Predicate : ranked relation symbolp(t), q(t,v),…

      • Binary predicates used for sets so instead of rs(t,v)I’ll write vrs(t)

      • Action : a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…

    • Universe = set of k thread individuals and n memory individuals

    • State S = a valuation to the predicates


    Reference model rss predicates

    Reference model (RSS) predicates

    • Typestates:

      • RSS.finished(t), RSS.started(t), RSS.pending(t), RSS.invalid(t)

    • Read/write sets

      • RSS.rs(t,v), RSS.ws(t,v)

    • Prohibited read/write sets

      • RSS.prs(t,v), RSS.pws(t,v)

    • Weak-predecessor

      • RSS.wp(t1,t2)


    Dstm predicates

    DSTM predicates

    • Typestates:

      • DSTM.finished(t), DSTM.validated(t), DSTM.invalid(t), DSTM.aborted(t)

    • Read/own sets

      • DSTM.rs(t,v), DSTM.os(t,v)


    Rss commit t action

    RSS commit(t) action

    executing thread

    action precondition

    • ifRSS.invalid(t)  RSS.wp(t,t)thent1,t2 . RSS.wp’(t1,t2)  t1t  t2t  (RSS.wp(t1,t2)  RSS.wp(t,t2)  (RSS.wp(t1,t) v . vRSS.ws(t1)  vRSS.ws(t)))…

    post-statepredicate

    current-statepredicate

    write-write conflict


    Dstm commit t action

    DSTM commit(t) action

    ifDSTM.validated(t) thent1 . DSTM.validated’(t1)  t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…

    read-own conflict


    Fots states and execution

    FOTS states and execution

    state S1

    DSTM

    t1

    rd v t1

    t2

    v

    threadindividual

    memory locationindividual


    Fots states and execution1

    FOTS states and execution

    predicate evaluationDSTM.started(t1)=1

    state S2

    DSTM.started

    DSTM

    t1

    rd v t1

    DSTM.rs

    t2

    v

    predicate evaluationDSTM.rs(t1,v)=1


    Fots states and execution2

    FOTS states and execution

    state S3

    DSTM.started

    DSTM

    t1

    DSTM.started

    DSTM.rs

    t2

    DSTM.ws

    wr v t2

    v


    Product system

    Product system

    The product of two systems: AB

    Predicates = A.predicates B.predicates

    Actions =commit(t) = { if (A.pre  B.pre) then … }rd(t,v) = { if (A.pre  B.pre) then … }…

    M refines RSS iff on every executionSIM holds: action a M.pre(a)  RSS.pre(a)


    Checking dstm refines rss

    Checking DSTM refines RSS

    how do we check this safety property?

    DSTMRSS SIM

    DSTM refines RSS

    The only precondition in RSS is for commit(t)

    We need to check SIM =t.DSTM.validated(t)  RSS.invalid(t)  RSS.wp(t,t)holds for DSTM  RSS for all reachable states

    Proof rule:


    Checking safety by invisible invariants

    Checking safety by invisible invariants

    I1: Initial   I2:   transition  ’ I3:   

    M  

    • How do we prove that property  holdsfor all reachable statesof system M?

    • Pnueli, Ruah, Zuck[TACAS’01]

    • Come up with inductive invariant that contains reachable states of M and strengthens SIM:


    Strict serializability proof rule

    Strict serializability proof rule

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3


    Two challenges

    Two challenges

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3

    Buthow do we find a candidate ?infinite space of possibilities

    given candidate  how do we check the proof rule? checking AB is undecidable for first-order logic


    Our solution

    Our solution

    I1: Initial   I2:   transition  ’ I3:   SIM

    utilize insightson transactional memory implementations

    DSTM  RSS SIM

    Proof roadmap:

    Divine candidate invariant 

    Prove I1, I2, I3

    Buthow do we find a candidate ?use templates and iterative weakening

    given candidate  how do we check the proof rule? use abstract checking


    Invariant for dstm rss

    Invariant for DSTMRSS

    P1: t, t1 . RSS.wp(t,t1) RSS.invalid(t)  RSS.pending(t)  v . vRSS.ws(t1) vRSS.ws(t)

    P2: t, v . vRSS.rs(t)  DSTM.aborted(t)  vDSTM.rs(t)

    P3: t, v . vRSS.ws(t)  DSTM.aborted(t)  vDSTM.os(t)

    P4: t . DSTM.validated(t)  RSS.wp(t,t)

    P5: t . DSTM.validated(t)  RSS.invalid(t)

    P6: t . DSTM.validated(t)  RSS.pending(t)

    Inductive invariant involving only RSS – can use for all future proofs


    Templates for dstm rss

    Templates for DSTMRSS

    P1: t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

    P2: t, v . v1(t)  2(t)  v3(t)

    P3: t, v . V1(t)  2(t)  v3(t)

    P4: t . 1(t)  2(t,t)

    P5: t . 1(t)  2(t)

    P6: t . 1(t)  2(t)


    Templates for dstm rss1

    Templates for DSTMRSS

    t, t1 . 1(t,t1)  2(t)   3(t)  v . v4(t1)  v5(t)

    t, v . v1(t)  2(t)  v3(t)

    t . 1(t)  2(t,t)

    • Why templates?

    • Makes invariant separable

    • Controls complexity of invariants

    • Adding templates enables refinement


    Mining candidate invariants

    Mining candidate invariants

    • Use predefined set of templates to specify structure of candidate invariants

      • t,v . 1 2 3

      • 1, 2, 3 are predicates of M or their negations

      • Existential formulas capturing 1-level conflictsv . v4(t1)  v5(t2)

    • Mine candidate invariants from concrete execution


    Iterative invariant weakening

    Iterative invariant weakening

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    • Initial candidate invariant C0=P1  P2  …  Pk

    • Try to prove I2:   transition  ’C1 = { Pi | I0  transition  Pi for PiI0}

    • If C1=C0 then we have an inductive invariant

    • Otherwise, computeC2 = { Pi | C1 transition  Pi for PiC1}

    • Repeat until either

      • found inductive invariant – check I3:Ck  SIM

      • Reached top {} – trivial inductive invariant


    Weakening illustration

    Weakening illustration

    P1

    P2

    P1

    P2

    P3

    P3


    Abstract proof rule

    Abstract proof rule

    I1:(Initial)   I2:abs_transition(())  ’ I3: ()  SIM

    I1: Initial   I2:   transition  ’ I3:   SIM

    DSTM  RSS SIM

    DSTM  RSS SIM

    Abstracttransformer

    Approximateentailment

    Formulaabstraction


    Conclusion1

    Conclusion

    • Novel invariant generation using templates – extends applicability of invisible invariants

    • Abstract domain and reasoning to check invariants without state explosion

    • Proved strict-serializability for TPL, DSTM, TL2

      • BLAST and TVLA failed


    Verification results

    Verification results


    Insights on transactional memories

    Insights on transactional memories

    Transition relation is symmetric – thread identifiers not usedp’(t,v)  … t1 … t2

    Executing thread t interacts only with arbitrary thread or conflict-adjacent thread

    Arbitrary thread:v . vTL2.rs(t1)  vTL2.ws(t1)

    Conflict adjacent:v . vDSTM.rs(t1)  vDSTM.ws(t)


    Conflict adjacency

    Conflict adjacency

    writeset

    v2

    t3

    writeset

    v1

    t2

    write-writeconflict

    readset

    t

    read-writeconflict

    v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)


    Conflict adjacency1

    Conflict adjacency

    t3

    t2

    ...

    ...

    t


    Related work

    Related work

    • Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]

    • Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]

    • Predicate Abstraction + Shape Analysis

      • SLAM, BLAST, TVLA


    Related work1

    Related work

    Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]

    Templates – for arithmetic constraints

    Indexed predicate abstractionShuvendu et al. [TOCL’07]

    Thread quantificationBerdine et al. [CAV’08]Segalov et al. [APLAS’09]


    Thank you

    Thank You!


  • Login