Speeding up dataflow analysis using flow insensitive pointer analysis
Download
1 / 31

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis' - nigel-phelps


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speeding up dataflow analysis using flow insensitive pointer analysis

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Stephen Adams, Tom Ball, Manuvir Das

Sorin Lerner, Mark Seigle

Westley Weimer

Microsoft Research

University of Washington

UC Berkeley


Motivation
Motivation Analysis

  • Static analysis for program verification

  • Complex dataflow analyses are popular

    • SLAM, ESP, BLAST, CQual, …

    • Flow-Sensitive

    • Interprocedural

    • Expensive!

  • Cut down on “data flow facts”

  • Without losing anything important


General idea
General Idea Analysis

  • If complex analysis is worse than O(N)

  • And you have a cheap analysis that

    • Is O(N)

    • Reduces N

  • Then composing them saves time


Value flow graph vfg
Value Flow Graph (VFG) Analysis

  • Variant of a points-to graph

  • Encodes the flow of values in the program

  • Conservative approximation

  • Lightweight, fast to compute and query

  • Early queries can safely reduce

    • data-flow facts considered

    • program points considered

  • Like slicing a program wrt. value flow


Computing a vfg
Computing a VFG Analysis

  • Use a subtyping-based pointer analysis

    • We used One-Level Flow [Das]

  • Process all assignments

    • Not just those involving pointers

  • Represent constant values explicitly

    • Put them in the graph

  • Label graph with source locations

    • Encodes program slices


Example points to graph
Example Points-To Graph Analysis

1: int a, *x;

2: x = &a;

3: *x = 7;

x

Points-to

Edge

a

Source

“Address”

Node

x

Expr

Node


One Level Flow Graph Analysis

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

a

Source

“Address”

Node

x

Expr

Node


Value Flow Graph Analysis

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg properties
VFG Properties Analysis

  • Computed in almost-linear time

  • Get points-to sets from VFG in linear time

    • Backwards reachability via flow edges

    • Gather up all variables

  • Get value flow from VFG in linear time

    • Backwards reachability via flow edges

    • Follow points-to edges up one


Vfg query points to of x
VFG Query: Points-To of x Analysis

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg query value flow into a
VFG Query: Value Flow into a Analysis

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg summary
VFG Summary Analysis

  • Computed in almost-linear time

  • Queries complete in linear time

  • Approximates flow of values in program

  • Show two applications that benefit

    • ESP

    • SLAM


Application 1 esp
Application 1: ESP Analysis

  • Verification tool for large C++ programs

  • Tracks “typestate” of values

    • Encoded as Finite State Machine

    • Special Error state

  • Core: interprocedural data-flow engine

    • Flow sensitive: state at every point

  • Performed bottom-up on call graph

  • Requires function summaries


Esp function summaries
ESP Function Summaries Analysis

  • Consider stateful memory locations

  • Summarize function behavior for each loc

    • Reducing number of locs would be good!

    • But C has evil casts, so types cannot be used

  • Worst case set of locations:

    • All globals and formal parameters

    • Everything transitively reachable from there


Reduce location set
Reduce Location Set Analysis

  • Location L needs to be considered in F if

    • Some exp E has its state changed in F

    • Value held by L at entry to F can flow into E

  • Assuming state-changing ops are known

  • Query VFG to find values that flow in


Esp example
ESP Example Analysis

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

Locations to consider

for foo() summary:

{ e, *e, f, *f, g, *g, h, *h }


Esp example1
ESP Example Analysis

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

  • Compute VFG

  • (2) Query value flow on *p

  • (3) Reduced locations to consider for foo() summary: { e, f }

  • (4) Reduce lines to consider for dataflow


Esp results
ESP Results Analysis

  • FILE * output in GCC

    • 140 KLOC, 2149 functions, 66 files, 1068 globals

  • VFG Queries take 200 seconds

  • Reduce average number of locations per function summary from 1100 to <1

    • Median of 15 for functions with >0

  • Verification takes 15 minutes

    • Infeasible otherwise


Application 2 slam
Application 2: SLAM Analysis

  • Validates temporal safety properties

    • Boolean abstraction

    • Interprocedural dataflow analysis

    • Counterexample-driven refinement

  • Convert C program to Boolean program

  • Exhaustive dataflow analysis

    • No errors? Program is safe.

    • Real error? Program has a bug.

    • False error? Add predicates, repeat.


Boolean programs
Boolean Programs Analysis

int x,y;

x = 5;

y = 6;

x = x * 2;

y = y * 2;

assert(x<y)

bool p,q;

p = 1;

q = 1;

p = 0; q = 0;

q = 1;

assert(q)

p means “x == 5”

q means “x < y”

Predicates

(important!)

C Program

Boolean Program


Slam predicates
SLAM Predicates Analysis

  • Hard to come up with good predicates

  • Counterexample-driven refinement

    • Picks good predicates

    • Is very slow

  • Taking all possible predicates

    • Is even slower

  • Want “all the useful” predicates


Speeding up slam
Speeding Up SLAM Analysis

  • For a simple subset of C

    • Similar to “Copy Constants”

    • Use VFG to find a sufficient set of predicates

    • Provably sufficient for this subset

  • If this set fails to prove the real program

    • Fall back on counterexample-driven refinement


A simple language
A Simple Language Analysis

s ::= vi = n // constants

| vi = vj // variable copy

| if (*) s1 else s2 // condition ignored

| vi = fun(vj, …) // function call

| return(vi) // function return

| assert(vi» vj) // safety property


Predicate discovery
Predicate Discovery Analysis

  • High-level idea

    • Each flow edge in the VFG means “values may flow from X to Y”

    • Add predicates to see if they do

  • For each assert(vi» vj)

    • Consider the chain of values flowing to vi, vj

    • Add an equality predicate for each link

    • Use constants to resolve scoping


Slam example
SLAM Example Analysis

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

2

c

4


Predicates for b
Predicates For “b” Analysis

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a

a == 1


Predicates for b1
Predicates For “b” Analysis

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a // no scope!

a == 1


Predicates for b2
Predicates For “b” Analysis

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r b == r

r == 3 r == 3

r == f r == f

f == a // no scope! f == 1 f == 3

a == 1 a == 1 a == 3


Why does this work
Why does this work? Analysis

  • Simple language

    • No arithmetic, etc.

    • Just copying around initial values

  • Knowing final values of variables

    • Completely decides safety condition

  • Still related to real life

    • Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.


Some slam results
Some SLAM Results Analysis

Generated predicates are between all and two-thirds of the necessary

predicates. However, since SLAM must iterate once to generate 3-7

missing predicates, the net performance increase is more than linear.

Predicates can be specialized or simplified if the assert() condition is

a common relational operator (e.g., x==y, x<y, x==5).


Conclusions
Conclusions Analysis

  • Complex interprocedural analyses can benefit from inexpensive value-flow

  • VFG encodes value flow

    • Constructed and queried quickly

  • Prune the set of dataflow facts and program points considered

  • Large net performance increase


ad