1 / 43

A Shape Analysis for Optimizing Parallel Graph Programs

A Shape Analysis for Optimizing Parallel Graph Programs. Dimitrios Prountzos 1 Keshav Pingali 1,2. Roman Manevich 2 Kathryn S. McKinley 1. 1: Department of Computer Science, The University of Texas at Austin

louise
Download Presentation

A Shape Analysis for Optimizing Parallel Graph Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Shape Analysis for Optimizing Parallel Graph Programs Dimitrios Prountzos1 Keshav Pingali1,2 Roman Manevich2 Kathryn S. McKinley1 1: Department of Computer Science, The University of Texas at Austin 2: Institute for Computational Engineering and Sciences, The University of Texas at Austin

  2. Motivation • Graph algorithms are ubiquitous • Goal: Compiler analysis for optimizationof parallel graph algorithms Computer Graphics Computational biology Social Networks

  3. Organization • Parallelization of graph algorithms in Galois system • Speculative execution • Example: Boruvka MST algorithm • Optimization opportunities • Reduce speculation overheads • Analysis problem: LockSet shape analysis • Lockset shape analysis • Abstract Data Type (ADT) modeling • Hierarchy summarization abstraction • Predicate discovery • Evaluation • Fast and infers all available optimizations • Optimizations give speedup up to 12x

  4. Boruvka’sMinimum Spanning Tree Algorithm 7 1 6 c d e f lt 2 4 4 3 1 6 a b g d e f 5 7 4 4 a,c b g 3 • Build MST bottom-up • repeat { • pick arbitrary node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a single node

  5. Parallelism in Boruvka • Algorithm = repeated application of operator to graph • Active node: • Node where computation is needed • Activity: • Application of operator to active node • Neighborhood: • Sub-graph read/written to perform activity • Unordered algorithms: • Active nodes can be processed in any order • Amorphous data-parallelism • Parallel execution of activities, subject to neighborhood constraints • Neighborhoods are functions of runtime values • Parallelism cannot be uncovered at compile time in general i2 i1 i3

  6. Optimistic Parallelization in Galois • Programming model • Client code has sequential semantics • Library of concurrent data structures • Parallel execution model • Thread-level speculation (TLS) • Activities executed speculatively • Conflict detection • Each node/edge has associated exclusive lock • Graph operations acquire locks on read/written nodes/edges • Lock owned by another thread  conflict  iteration rolled back • All locks released at the end • Two main overheads • Locking • Undo actions i2 i1 i3

  7. Overheads(I): Locking • Optimizations • Redundant locking elimination • Lock removal for iteration private data • Lock removal for lock domination • ACQ(P): set of definitely acquiredlocks per program point P • Given method call M at P: Locks(M)  ACQ(P)  Redundant Locking

  8. Overheads (II): Undo actions foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } foreach (Node a : wl) { … … } Lockset Grows Failsafe Lockset Stable … Program point Pis failsafe if: Q : Reaches(P,Q)  Locks(Q)  ACQ(P)

  9. Lockset Analysis • Redundant Locking • Locks(M)  ACQ(P) • Undo elimination • Q : Reaches(P,Q) Locks(Q)  ACQ(P) • Need to compute ACQ(P) GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Runtime overhead : Runtime overhead

  10. Analysis Challenges • The usual suspects: • Unbounded Memory  Undecidability • Aliasing, Destructive updates • Specific challenges: • Complex ADTs: unstructured graphs • Heap objects are locked • Adapt abstraction to ADTs • We use Abstract Interpretation [CC’77] • Balance precision and realistic performance

  11. Organization • Parallelization of graph algorithms in Galois system • Speculative execution • Example: Boruvka MST algorithm • Optimization opportunities • Reduce speculation overheads • Analysis problem: LockSet shape analysis • Lockset shape analysis • Abstract Data Type (ADT) modeling • Hierarchy summarization abstraction • Predicate discovery • Evaluation • Fast and infers all available optimizations • Optimizations give speedup up to 12x

  12. Shape Analysis Overview Predicate Discovery Graph { @rep nodes @rep edges … } Set { @rep cont … } … … HashMap-Graph Set Spec Shape Analysis Graph Spec Tree-based Set Concrete ADT Implementations in Galois library ADTSpecifications Boruvka.java Optimized Boruvka.java

  13. ADT Specification Abstract ADT state by virtual set fields Graph<ND,ED> { @rep set<Node> nodes @rep set<Edge> edges Set<Node> neighbors(Node n); } ... Set<Node> S1 = g.neighbors(n); ... @locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src) @op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Boruvka.java Graph Spec Assumption: Implementation satisfies Spec

  14. Modeling ADTs Graph<ND,ED> { @rep set<Node> nodes @rep set<Edge> edges @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src) @op( nghbrs= n.rev(src).dst+ n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n); } c src dst src dst dst src a b Graph Spec

  15. Modeling ADTs Abstract State Graph<ND,ED> { @rep set<Node> nodes @rep set<Edge> edges @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src) @op( nghbrs= n.rev(src).dst+ n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n); } nodes edges c src dst ret cont src dst dst src a b Graph Spec nghbrs

  16. Organization • Parallelization of graph algorithms in Galois system • Speculative execution • Example: Boruvka MST algorithm • Optimization opportunities • Reduce speculation overheads • Analysis problem: LockSet shape analysis • Lockset shape analysis • Abstract Data Type (ADT) modeling • Hierarchy summarization abstraction • Predicate discovery • Evaluation • Fast and infers all available optimizations • Optimizations give speedup up to 12x

  17. Abstraction Scheme • Parameterized by set of LockPaths: L(Path)o . o ∊ Path  Locked(o) • Tracks subset of must-be-locked objects • Abstract domain elements have the form: Aliasing-configs 2LockPaths … S1 S1 S2 S2 L(S2.cont) L(S2.cont) L(S1.cont) L(S1.cont) (S1 ≠S2) ∧ L(S1.cont) ∧ L(S2.cont) cont cont cont cont 

  18. Joining Abstract States ( L(y.nd) )  (  L(y.nd) L(x.rev(src)) )  ( () L(x.nd) ) ( L(y.nd) )  ( () L(x.nd) ) ( L(y.nd) )  ( () L(x.nd) ) Aliasing is crucial for precision May-be-locked does not enable our optimizations #Aliasing-configs: small constant (6)

  19. Example Invariant in Boruvka GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } • The immediate neighbors • of a and lt are locked lt a ( a ≠ lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst)∧ L(a.rev(dst).src) ∧ L(lt) ∧ L(lt.rev(dst)) ∧ L(lt.rev(src)) ∧ L(lt.rev(dst).src) ∧ L(lt.rev(src).dst) …..

  20. Heuristics for Finding Paths S • Hierarchy Summarization (HS) • x.( fld )* • Type hierarchy graph acyclic  bounded number of paths • Preflow-Push: • L(S.cont) ∧ L(S.cont.nd) • Nodes in set S and their data are locked Set<Node> cont Node nd NodeData

  21. Footprint Graph Heuristic • Footprint Graphs (FG)[Calcagno et al. SAS’07] • All acyclic paths from arguments of ADT method to locked objects • x.( fld | rev(fld) )* • Delaunay Refinement: L(S.cont) ∧ L(S.cont.rev(src)) ∧ L(S.cont.rev(dst)) ∧ L(S.cont.rev(src).dst) ∧ L(S.cont.rev(dst).src) • Nodes in set S and all of their immediate neighbors are locked • Composition of HS, FG • Preflow-Push: L(a.rev(src).ed) FG HS

  22. Organization • Parallelization of graph algorithms in Galois system • Speculative execution • Example: Boruvka MST algorithm • Optimization opportunities • Reduce speculation overheads • Analysis problem: LockSet shape analysis • Shape analysis • Abstract Data Type modeling • Hierarchy summarization abstraction • Predicate discovery • Evaluation • Fast and infers all available optimizations • Optimizations give speedup up to 12x

  23. Experimental Evaluation • Implement on top of TVLA • Encode abstraction by 3-Valued Shape Analysis [SRW TOPLAS’02] • Evaluation on 4 Lonestar Java benchmarks • Inferred all available optimizations • # abstract states practically linear in program size

  24. Impact of Optimizations for 8 Threads 8-core Intel Xeon @ 3.00 GHz

  25. Related Work • Safe programmable speculative parallelism [Prabhu et al. PLDI’10] • Focused on value speculation on ordered algorithms • Different rollback freedom condition • Transactional Memory compiler optimizations [Harris et al. PLDI’06, Dragojevic et al. SPAA’09] • Similar optimizations • Don’t target rollback freedom • Imprecise for unbounded data-structures • Optimizations for parallel graph programs [Mendez-Lojo et al. PPOPP’10] • Manual optimizations • Failsafe subsumes cautious • Verifying conformance of ADT implementation to specification • The Jahob project (Kuncak, Rinard, Wies et al.)

  26. Conclusion • New application for static analysis • Optimization of optimistically parallelized graph programs • Novel shape analysis • Utilize observations on the structure of concrete states and programming style • Enables optimizations crucial for performance

  27. Thank You!

  28. Backup

  29. Outline of Boruvka MST Code GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Pick arbitrary worklist node Find lightest neighbor Update neighbors of lightest Update worklist and MST

  30. Approximating Sets of Locked Objects Reachability-based Scheme S1 S2 S1 S2 cont cont cont cont Reach(S1.cont) Reach(S2.cont) Reach(S1.cont) Reach(S2.cont) Reach(S2.cont) Reach(S2.cont) Reach(S2.cont)

  31. Approximating Sets of Locked Objects Hierarchy Summarization Scheme S1 S2 (S1 ≠S2) ∧ L(S1.cont) ∧ L(S2.cont) S1 S2 cont cont L(S2.cont) L(S1.cont) cont cont L(Path) o . o ∊ Path  Locked(o) L(S2.cont) L(S1.cont)

  32. Enabling Optimizations in Galois • Graph API uses flags to enable/disable locking and storing undo actions • removeEdge(Node src, Node dst, Flag f); ALL LOCKS UNDO Lock(src) Lock(dst) Lock( (src,dst) ) addEdge(src, dst); NONE Challenge: Find minimal flag per ADT method call Solution: Lockset analysis

  33. Optimization Conditions • set of definitely acquiredlocks per program point • Given method call at : • Program point is failsafe if: • Infer from by simple backward analysis

  34. Speculation Overheads and Optimizations

  35. Modeling ADTs Abstract State Graph<ND,ED> { @rep set<Node> ns; // nodes @rep set<Edge> es; // edges @locks(n + n.rev(src) + n.rev(src).dst+n.rev(dst) + n.rev(dst).src) @op( nghbrs= n.rev(src).dst+ n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n); } ret ns es nghbrs a b cont 7 5 c d 3 dst src 2 4 ed a b Graph Spec 5

  36. Failsafe Points – Eliminating Undo Actions a lt CAUTIOUS OPERATOR [Mendez et al. PPOPP’10] g.neighbors(lt); GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } 7 c d 3 2 4 a b 5 Graph Node Graph Edge Edge Data

  37. Boruvka’sMinimum Spanning Tree Algorithm 7 1 6 c d e f 7 2 4 4 3 a b g 3 5 • Build MST bottom up • repeat { • pick arbitrary active node ‘a’ • merge with lightest neighbor ‘lt’ • add edge ‘a-lt’ to MST • } until graph is a singular node

  38. Failsafe Points – Eliminating Undo Actions a lt CAUTIOUS OPERATOR [Mendez et al. PPOPP’10] g.neighbors(lt); GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Graph Node 7 Graph Edge c d Edge Data 3 2 4 a b 5 New Lock Redundant Acquired

  39. lt Redundant Locking Example a GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } Graph Node 7 Graph Edge c d Edge Data 3 2 4 a b 5 New Lock Redundant Acquired

  40. Boruvka’sMinimum Spanning Tree Algorithm GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } 7 1 6 c d e f 7 2 4 4 3 a b g 3 5 • Build MST iteratively • Pick random active node • Contract edge with lightest neighbor

  41. Parallelism in Boruvka’s Algorithm GSet<Node> wl = new GSet<Node>(); wl.addAll(g.getNodes()); GBag<Weight> mst = new GBag<Weight>(); foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a); } 7 1 6 c d e f f 3 2 4 4 a b g 5 • Dependences between activities are functions of runtime values • Parallelism cannot be uncovered at compile time in general • Don’t Care Non-Determinism • All produced MSTs correct and optimal

  42. Abstraction of a Single State State after first loop in Boruvka ( a lt ) ∧ L(a) ∧ L(a.rev(src)) ∧ L(a.rev(dst)) ∧ L(a.rev(src).dst) ∧ L(a.rev(dst).src) ∧ UniqueRef(EdgeData) 7 lt c d 3 2 4 a a b 5

  43. Hierarchy Summarization Intuition aNghbrsltNghbrs mst nIter wl g Iterator<Node> Set<Node> Gset<Node> GBag<Weight> Graph all ns cont gcont es bcont past, at,future Edge e,an src,dst Node a, lt, n ed Weight w,wan,minW nd Void

More Related