Hawkeye : Effective Discovery of Dataflow Impediments to Parallelization

Hawkeye: Effective Discovery of Dataflow Impediments to Parallelization Omer Tripp John Field GretaYorsh MoolySagiv

Dataflow Impediments to Parallelization public void setAndProcess(Object o) { set(o); process(); } set(o) || process()? public void set(Object o) { this.f = calc_f(o); } public void process() { Object o = this.f; if (o == null) { doA(); } else { doB(); } } RAW dependency

Sometimes It’s Less Obvious for (Vertex cutpoint : this.cutpoints) { UndirectedGraphsubgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { intoldHitCount = this.block2hits.get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } } for (Vertex cutpoint : this.cutpoints) { UndirectedGraphsubgraph = new SimpleGraph(); subgraph.addVertex(cutpoint); this.cutpointGraphs.put(cutpoint, subgraph); this.addVertex(subgraph); Set blocks = this.vertex2blocks.get(cutpoint); for (UndirectedGraph block : blocks) { intoldHitCount = this . block2hits .get(block); this.block2hits.put(block, oldHitCount+1); this.addEdge (subgraph, block); } } Simplified version of the JGraphT algorithm for building a block-cutpoint graph • This code admits a lot of available parallelism, but there are a few impediments that must be addressed toward parallelizing it. How can we pinpoint these dependencies precisely and concisely?

Field-based Dependence Analysis Static dependence analysis is challenged by dynamic containers, aliasing, etc So let’s use dynamic dependence analysis instead…

Spurious dependencies, which inhibit m.put(k,1) || m.put(k’,2)! modcount 7 8 9 [0] next next table […] … key key K K’ next [8] next next 1 2 2 value value […] … m = new ConcurrentHashMap(); m.put(k,1); Semantic dependency, which gets “lost” in the noise! m.put(k’,2); m.put(k,2);

Eureka: Let’s Use Abstraction • But… Using ADT semantics in DB concurrency control (Muth et al., 93) Galois Exploiting commutativity in DB transactions (Bernstein, 66) We need a predictive tool; our code is still sequential Leveraging ADT semantics in STM conflict detection We want the tool to pinpoint impediments to parallelization beforeapplying parallelization transformations Abstract Locking

The Hawkeye Analysis Tool Key Value [0] next next table […] … Representation Function key key K K’ modcount 9 8 7 next [8] next next 1 2 […] … value value Concrete Map state Map ADT state Allows concentrating on semantic dependencies while suppressing spurious dependencies Uses abstraction while tracking (certain) dependencies Dynamic analysis tool User specifies representation function for data structures of choice; rest tracked concretely value ? ? K value K 1 K’ value K’ 2 value ? ?

Specification Language Map foreach key k in m.keySet() adtState.add(m -> k); foreach entry (k,v) in m.entrySet() adtState.add(k -> v); Graph foreach node n in g.nodes() adtState.add(g -> n); foreach edge (n1,n2) in g.edges() adtState.add(n1 -> n2);

Specification Language DistanceFunction foreach instance i1 in instances() foreach instance i2 in instances() adtState.add((i1,i2) -> distance(i1,i2)); …

Specification Language No need to model ADT operations No need for a commutativity spec Hawkeye uses heuristics for (sound) approximation of the footrprint of an ADT operation User can refine approximation (though our experience shows that the default is mostly accurate)

The Hawkeye Algorithm modcount • Our assumptions: M 7 8 9 [0] next next table key key K K’ Concrete • encapsulation – for state abstraction next [8] next next • linearizability – for trace abstraction 1 2 2 value value (M,X) M m.put(k,1); (R: {}, W: {(M,K),(M,K,1)}) (M,K,2) (M,K) (M,K,1) 1 2 K m.put(k’,2); WAW (M,K’) (M,K’,2) (R: {}, W: {(M,K’),(M,K’,2)}) 2 K’ Logical m.put(k,2); (R: {}, W: {(M,K),(M,K,1),(M,K,2)})

Challenges • What is the meaning of dependencies under abstraction? • How can we track both concrete and abstract dependencies simultaneously? We’ve developed a uniform framework for tracking data dependencies…

Best Write Set • The write set of transition is the union of • the locations whose value was changed by ; • the locations allocated by ; and • the locations de-allocated by . Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.

Best Read Set (More Tricky) • is a sufficient read set of transition iff for every , such that and agree on , write( ) ≡ write( ). • The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.

Simple Example ([y=3], set(y,4), [y=4]) • Read set: { y } • Write set: { y } Secures y=4 in exit state ([y=3], set(y,3), [y=3]) • Read set: { y } • Write set: { } Secures empty write set

Approximating the “Best” Definitions • The good news: The “best” definitions apply both in concrete and in abstract semantics • The bad news: The definition of the “best” read set is not computable in general • An approximation r, w of read, write is sound iff • read r w • write w

Usage Scenario [0] next next table […] … key key K K’ modcount next [8] next next 7 1 2 […] … value value Hmmm… Too many dependencies!

Usage Scenario value ? ? K value K 1 K’ value K’ 2 Now I understand what’s going on! value ? ?

Usage Scenario value ? ? K value K 1 K’ value K’ 2 value ? ?

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Only built-in spec (Java collections)

Number of inter-iteration dependencies at the level of ADT operations with and without abstraction Including user spec (for user types)

modcount 7 8 9 [0] next next table […] … next next next next A N T K H next next next U ! Y O […] …

Backup

Preliminaries • A state maps memory locations to values. • A transition is a triple , where p is a program statement and are states, such that . • A program trace is a sequence of transitions. • We assume an interleaving semantics of concurrency.

Challenges • What is the meaning of dependencies under abstraction? • How can we track both concrete and abstract dependencies simultaneously? We’ve developed a uniform framework for tracking data dependencies…

Best Write Set • The write set of transition is the union of • the locations whose value was changed by ; • the locations allocated by ; and • the locations de-allocated by . Intuitively, the write set of a transition is its observable effect, i.e., the delta between the entry and exit states.

Best Read Set (More Tricky) • is a sufficient read set of transition iff for every , such that and agree on , write( ) ≡ write( ). • The read set of transition is the union of all its minimal sufficient read sets. Intuitively, the read set of a transition is the set of locations whose values determine the observable effect of the transition.

Simple Example ([y=3], set(y,4), [y=4]) • Read set: { y } • Write set: { y } Secures y=4 in exit state ([y=3], set(y,3), [y=3]) • Read set: { y } • Write set: { } Secures empty write set

Approximating the “Best” Definitions • The good news: The “best” definitions apply both in concrete and in abstract semantics • The bad news: The definition of the “best” read set is not computable in general • An approximation r, w of read, write is sound iff • read r w • write w

Approximate Read Set Take 1: all the locations reachable from arguments Take 2: all the locations reachable from arguments that were accessed during the statement’s execution Take 3: all the locations reachable from arguments that were accessed during the statement’s execution with user specification of the frame

Hawkeye : Effective Discovery of Dataflow Impediments to Parallelization