Approximating Inclusion-based Points-to Analysis

Rupesh Nasre. Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India MSPC 2011 June 05, 2011 Approximating Inclusion-based Points-to Analysis

Placement of Pointer Analysis Improved runtime. Parallelizing compiler. Lock synchronizer. Memory leak detector. Secure code. Pointer Analysis. Data flow analyzer. String vulnerability finder. Better compile time. Affine expression analyzer. Type analyzer. Program slicer. Better debugging.

Inclusion-based Points-to Analysis p p q q • p = &q address-of • p = q copy • p = *q load • *p = q store p q p q p q p q p q p q

Inclusion-based Points-to Analysis Points-to Analysis ... a = &x c = b d = *b *b = a ... Program Points-to Sets main ( ) { if (...) { ... } } ... a → {x,y} b → {a,z} c → {a,z,x} ...

Optimizations • Online cycle elimination (Fahndrich et al., 1998) • Offline variable substitution(Rountev and Chandra, 2000) • Pointer and location equivalence (Hardekopf and Lin, 2007) These optimizations preserve the precision of the underlying analysis.

Pointer Equivalence (PE) x1 x1 x2 x2 x3 x3 P1 P2 P1, P2 x4 x4 x5 x5 x6 x6 x7 x7 Original points-to sets Modified points-to sets

Location Equivalence (LE) P1 P1 P2 P2 P3 P3 x1 x2 X1, X2 P4 P4 P5 P5 P6 P6 P7 P7 Original points-to sets Modified points-to sets

Issues and Learnings • Cubic time complexity. • High absolute running times. • Approximations are inevitable for scalability.

Basic Idea Approximate Pointer Equivalence (APE) x1 x1 x2 x2 x3 x3 P1 P2 P1, P2 x4 x4 x5 x5 x6 x6 x7 x7 Original points-to sets Approximate points-to sets

Basic Idea Approximate Location Equivalence (ALE) P1 P1 P2 P2 P3 P3 x1 x2 X1, X2 P4 P4 P5 P5 P6 P6 P7 P7 Original points-to sets Approximate points-to sets

Our Contributions • Approximate pointer and location equivalence • Sound algorithm to compute APE and ALE online • Optimizations: • Proximity merge • Eager/lazy merging • Merge order • Equivalence identification frequency • Extensive empirical evaluation

APE and ALE Pointers P1 and P2 are approximately pointer equivalent with similarity αif sim(ptsto(P1), ptsto(P2)) ≥α. Objects x1 and x2 are approximately location equivalent with similarity βif sim(ptdby(x1), ptdby(x2)) ≥ β. s1 Ո s2 sim(s1, s2) = s1 Ս s2

Examples ptsto(p1) = {x,y,z} ptdby(x) = {p1, p3} ptsto(p2) = {y,z,w} ptdby(y) = {p1, p2} ptsto(p3) = {x,w} ptdby(z) = {p1, p2} ptdby(w) = {p2, p3} α= 0.5 p1 and p2 are APE with similarity 2/4 = 0.5 p1 and p3 are not APE with similarity 1/4 = 0.25 β= 0.7 y and z are ALE with similarity 2/2 = 1.0 x and w are not ALE with similarity 1/3 = 0.33

Approximate Points-to Analysis Input: set of constraints, α, β. Process address-of constraints Add edges to constraint graph G using copy edges repeat Propagate points-to information in G for each variable pair (x,y) do simα = sim(ptsto(x), ptsto(y)) simβ = sim(ptdby(x), ptdby(y)) if simα ≥ αor simβ ≥ βthen merge(x,y) end if end for Add edges to G using load and store constraints until fixed point

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c α = 0.5, β = 0.7

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c a {u,v,w,x} 0: Processing address-of constraints p {a} b {y} q {b} c {z} d { } r {a} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c a {u,v,w,x} 0: Processing address-of constraints 0: Processing copy constraints p {a} b {y} q {b} c {z} d { } r {a} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c a {u,v,w,x} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information p {a} b {y} q {b} c {z} d {u,v,w,x} r {a} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c α = 0.5, β = 0.7 [a,d] {u,v,w,x} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) p {a} b {y} q {b} c {z} r {a} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c α = 0.5, β = 0.7 [a,d] {u,v,w,x} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) [p,r] {a} b {y} q {b} c {z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c α = 0.5, β = 0.7 [a,d] {u,v,[w,x]} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) [p,r] {a} b {y} q {b} c {z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c [a,d] {u,v,[w,x]} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) 1: Processing load/store constraints [p,r] {a} b {y} q {b} c {z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c [a,d] {u,v,[w,x],z} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) 1: Processing load/store constraints 2: Propagate points-to information [p,r] {a} b {y,z} q {b} c {z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c α = 0.5, β = 0.7 [a,d] {u,v,[w,x],z} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) 1: Processing load/store constraints 2: Propagate points-to information 2: merge(b, c) [p,r] {a} q {b} [b,c] {y,z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c [a,d] {u,v,[w,x],y,z} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) 1: Processing load/store constraints 2: Propagate points-to information 2: merge(b, c) 3: Propagate points-to information [p,r] {a} q {b} [b,c] {y,z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c [a,d] {u,v,[w,x],y,z} 0: Processing address-of constraints 0: Processing copy constraints 1: Propagate points-to information 1: merge(a, d) 1: merge(p, r) 1: merge(w, x) 1: Processing load/store constraints 2: Propagate points-to information 2: merge(b, c) 3: Propagate points-to information Fixed point [p,r] {a} q {b} [b,c] {y,z} v {u} u {v} Constraint graph

Example Input constraints: a = &u, a = &v, a = &w, a = &x, b = &y, c = &z, u = &v, v = &u, p = &a, q = &b, r = &a, d = a, *p = c, *q = c [a,d] {u,v,[w,x],y,z} Exact analysis: 18 points-to pairs Approximate analysis: 19 points-to pairs [p,r] {a} q {b} [b,c] {y,z} v {u} u {v} Constraint graph

Proximity Merge Similarity of a node is checked against another that is at most k-reachable. 0 1 1 2 2 2 3 3 A proximity of k=10 gives a 38% improvement in analysis time and the precision loss is only 2.6%.

Lazy Merging When to merge matters. Example: α = 0.5 ptsto(p1) = {a,b}, ptsto(p2) = {b,c}, ptsto(p3) = {c,d}, ptsto(p4) = {d,e}, ptsto(p5) = {e,f}, ptsto(p6) = {f,g} Eager merging: [p1,p2], [p3,p4], [p5,p6] Lazy merging: [p1,p2,p3,p4,p5,p6] Lazy merging improves analysis time over eager merging by 8%, but reduces precision.

Merge Order Merge order matters. Example: α = 0.5 ptsto(a) = {x,y,z}, ptsto(b) = {w,y,z}, ptsto(c) = {w,x} Order (a,c), (b,c), (a,b) merges a and b. Order (a,b), (a,c), (b,c) merges all a, b and c. We arrange nodes in non-increasing and non-decreasing similarities and find that the former reduces analysis time by 10% while the latter requires 8% more time than not ordering the variables.

Effect of α Precision improves non-linearly with α.

Effect of β Precision improves almost linearly with β.

Equivalence Identification Frequency A client provides MAXMEM and the analysis updates α, β to choose the maximum amount of precision possible. Note: observations are averaged across benchmarks.

Conclusions • APE is more important that ALE • Proximity merging helps the analysis scale. • Lazy vs eager merging, order of merging offer nice trade-offs. • Client can specify a MAXMEM to make its judicious use.

Rupesh Nasre. nasre@csa.iisc.ernet.in Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India MSPC 2011 June 05, 2011 Approximating Inclusion-based Points-to Analysis

Approximating Inclusion-based Points-to Analysis

Approximating Inclusion-based Points-to Analysis

Presentation Transcript

Parallel Inclusion-based Points-to Analysis

Approximating Area

Refinement-Based Context-Sensitive Points-To Analysis for JAVA

Approximating Roots

Manx Points Based System

The Points-Based System

An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages

The Points Based System

Points Based Immigration System

Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis

Points-based system

Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis

The Points-Based System

Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis

Field-Sensitive Points-to-Analysis

Points Based System Overview

The Points-Based System

Context Sensitive Points-to Analysis

Refinement-Based Context-Sensitive Points-To Analysis for Java

Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to Analysis

Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis

Approximating Points by A Piecewise Linear Function: I