- 44 Views
- Uploaded on
- Presentation posted in: General

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Stephen Adams, Tom Ball, Manuvir Das

Sorin Lerner, Mark Seigle

Westley Weimer

Microsoft Research

University of Washington

UC Berkeley

- Static analysis for program verification
- Complex dataflow analyses are popular
- SLAM, ESP, BLAST, CQual, …
- Flow-Sensitive
- Interprocedural
- Expensive!

- Cut down on “data flow facts”
- Without losing anything important

- If complex analysis is worse than O(N)
- And you have a cheap analysis that
- Is O(N)
- Reduces N

- Then composing them saves time

- Variant of a points-to graph
- Encodes the flow of values in the program
- Conservative approximation
- Lightweight, fast to compute and query
- Early queries can safely reduce
- data-flow facts considered
- program points considered

- Like slicing a program wrt. value flow

- Use a subtyping-based pointer analysis
- We used One-Level Flow [Das]

- Process all assignments
- Not just those involving pointers

- Represent constant values explicitly
- Put them in the graph

- Label graph with source locations
- Encodes program slices

1: int a, *x;

2: x = &a;

3: *x = 7;

x

Points-to

Edge

a

Source

“Address”

Node

x

Expr

Node

One Level Flow Graph

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

a

Source

“Address”

Node

x

Expr

Node

Value Flow Graph

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3

- Computed in almost-linear time
- Get points-to sets from VFG in linear time
- Backwards reachability via flow edges
- Gather up all variables

- Get value flow from VFG in linear time
- Backwards reachability via flow edges
- Follow points-to edges up one

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3

- Computed in almost-linear time
- Queries complete in linear time
- Approximates flow of values in program
- Show two applications that benefit
- ESP
- SLAM

- Verification tool for large C++ programs
- Tracks “typestate” of values
- Encoded as Finite State Machine
- Special Error state

- Core: interprocedural data-flow engine
- Flow sensitive: state at every point

- Performed bottom-up on call graph
- Requires function summaries

- Consider stateful memory locations
- Summarize function behavior for each loc
- Reducing number of locs would be good!
- But C has evil casts, so types cannot be used

- Worst case set of locations:
- All globals and formal parameters
- Everything transitively reachable from there

- Location L needs to be considered in F if
- Some exp E has its state changed in F
- Value held by L at entry to F can flow into E

- Assuming state-changing ops are known
- Query VFG to find values that flow in

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

Locations to consider

for foo() summary:

{ e, *e, f, *f, g, *g, h, *h }

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

- Compute VFG
- (2) Query value flow on *p
- (3) Reduced locations to consider for foo() summary: { e, f }
- (4) Reduce lines to consider for dataflow

- FILE * output in GCC
- 140 KLOC, 2149 functions, 66 files, 1068 globals

- VFG Queries take 200 seconds
- Reduce average number of locations per function summary from 1100 to <1
- Median of 15 for functions with >0

- Verification takes 15 minutes
- Infeasible otherwise

- Validates temporal safety properties
- Boolean abstraction
- Interprocedural dataflow analysis
- Counterexample-driven refinement

- Convert C program to Boolean program
- Exhaustive dataflow analysis
- No errors? Program is safe.
- Real error? Program has a bug.
- False error? Add predicates, repeat.

int x,y;

x = 5;

y = 6;

x = x * 2;

y = y * 2;

assert(x<y)

bool p,q;

p = 1;

q = 1;

p = 0; q = 0;

q = 1;

assert(q)

p means “x == 5”

q means “x < y”

Predicates

(important!)

C Program

Boolean Program

- Hard to come up with good predicates
- Counterexample-driven refinement
- Picks good predicates
- Is very slow

- Taking all possible predicates
- Is even slower

- Want “all the useful” predicates

- For a simple subset of C
- Similar to “Copy Constants”
- Use VFG to find a sufficient set of predicates
- Provably sufficient for this subset

- If this set fails to prove the real program
- Fall back on counterexample-driven refinement

s ::= vi = n// constants

| vi = vj // variable copy

| if (*) s1 else s2 // condition ignored

| vi = fun(vj, …)// function call

| return(vi)// function return

| assert(vi» vj)// safety property

- High-level idea
- Each flow edge in the VFG means “values may flow from X to Y”
- Add predicates to see if they do

- For each assert(vi» vj)
- Consider the chain of values flowing to vi, vj
- Add an equality predicate for each link
- Use constants to resolve scoping

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

2

c

4

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a

a == 1

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a // no scope!

a == 1

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == rb == r

r == 3r == 3

r == fr == f

f == a // no scope!f == 1 f == 3

a == 1a == 1 a == 3

- Simple language
- No arithmetic, etc.
- Just copying around initial values

- Knowing final values of variables
- Completely decides safety condition

- Still related to real life
- Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.

Generated predicates are between all and two-thirds of the necessary

predicates. However, since SLAM must iterate once to generate 3-7

missing predicates, the net performance increase is more than linear.

Predicates can be specialized or simplified if the assert() condition is

a common relational operator (e.g., x==y, x<y, x==5).

- Complex interprocedural analyses can benefit from inexpensive value-flow
- VFG encodes value flow
- Constructed and queried quickly

- Prune the set of dataflow facts and program points considered
- Large net performance increase