Speeding up dataflow analysis using flow insensitive pointer analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis PowerPoint PPT Presentation


  • 39 Views
  • Uploaded on
  • Presentation posted in: General

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification

Download Presentation

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speeding up dataflow analysis using flow insensitive pointer analysis

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Stephen Adams, Tom Ball, Manuvir Das

Sorin Lerner, Mark Seigle

Westley Weimer

Microsoft Research

University of Washington

UC Berkeley


Motivation

Motivation

  • Static analysis for program verification

  • Complex dataflow analyses are popular

    • SLAM, ESP, BLAST, CQual, …

    • Flow-Sensitive

    • Interprocedural

    • Expensive!

  • Cut down on “data flow facts”

  • Without losing anything important


General idea

General Idea

  • If complex analysis is worse than O(N)

  • And you have a cheap analysis that

    • Is O(N)

    • Reduces N

  • Then composing them saves time


Value flow graph vfg

Value Flow Graph (VFG)

  • Variant of a points-to graph

  • Encodes the flow of values in the program

  • Conservative approximation

  • Lightweight, fast to compute and query

  • Early queries can safely reduce

    • data-flow facts considered

    • program points considered

  • Like slicing a program wrt. value flow


Computing a vfg

Computing a VFG

  • Use a subtyping-based pointer analysis

    • We used One-Level Flow [Das]

  • Process all assignments

    • Not just those involving pointers

  • Represent constant values explicitly

    • Put them in the graph

  • Label graph with source locations

    • Encodes program slices


Example points to graph

Example Points-To Graph

1: int a, *x;

2: x = &a;

3: *x = 7;

x

Points-to

Edge

a

Source

“Address”

Node

x

Expr

Node


Speeding up dataflow analysis using flow insensitive pointer analysis

One Level Flow Graph

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

a

Source

“Address”

Node

x

Expr

Node


Speeding up dataflow analysis using flow insensitive pointer analysis

Value Flow Graph

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg properties

VFG Properties

  • Computed in almost-linear time

  • Get points-to sets from VFG in linear time

    • Backwards reachability via flow edges

    • Gather up all variables

  • Get value flow from VFG in linear time

    • Backwards reachability via flow edges

    • Follow points-to edges up one


Vfg query points to of x

VFG Query: Points-To of x

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg query value flow into a

VFG Query: Value Flow into a

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg summary

VFG Summary

  • Computed in almost-linear time

  • Queries complete in linear time

  • Approximates flow of values in program

  • Show two applications that benefit

    • ESP

    • SLAM


Application 1 esp

Application 1: ESP

  • Verification tool for large C++ programs

  • Tracks “typestate” of values

    • Encoded as Finite State Machine

    • Special Error state

  • Core: interprocedural data-flow engine

    • Flow sensitive: state at every point

  • Performed bottom-up on call graph

  • Requires function summaries


Esp function summaries

ESP Function Summaries

  • Consider stateful memory locations

  • Summarize function behavior for each loc

    • Reducing number of locs would be good!

    • But C has evil casts, so types cannot be used

  • Worst case set of locations:

    • All globals and formal parameters

    • Everything transitively reachable from there


Reduce location set

Reduce Location Set

  • Location L needs to be considered in F if

    • Some exp E has its state changed in F

    • Value held by L at entry to F can flow into E

  • Assuming state-changing ops are known

  • Query VFG to find values that flow in


Esp example

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

Locations to consider

for foo() summary:

{ e, *e, f, *f, g, *g, h, *h }


Esp example1

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

  • Compute VFG

  • (2) Query value flow on *p

  • (3) Reduced locations to consider for foo() summary: { e, f }

  • (4) Reduce lines to consider for dataflow


Esp results

ESP Results

  • FILE * output in GCC

    • 140 KLOC, 2149 functions, 66 files, 1068 globals

  • VFG Queries take 200 seconds

  • Reduce average number of locations per function summary from 1100 to <1

    • Median of 15 for functions with >0

  • Verification takes 15 minutes

    • Infeasible otherwise


Application 2 slam

Application 2: SLAM

  • Validates temporal safety properties

    • Boolean abstraction

    • Interprocedural dataflow analysis

    • Counterexample-driven refinement

  • Convert C program to Boolean program

  • Exhaustive dataflow analysis

    • No errors? Program is safe.

    • Real error? Program has a bug.

    • False error? Add predicates, repeat.


Boolean programs

Boolean Programs

int x,y;

x = 5;

y = 6;

x = x * 2;

y = y * 2;

assert(x<y)

bool p,q;

p = 1;

q = 1;

p = 0; q = 0;

q = 1;

assert(q)

p means “x == 5”

q means “x < y”

Predicates

(important!)

C Program

Boolean Program


Slam predicates

SLAM Predicates

  • Hard to come up with good predicates

  • Counterexample-driven refinement

    • Picks good predicates

    • Is very slow

  • Taking all possible predicates

    • Is even slower

  • Want “all the useful” predicates


Speeding up slam

Speeding Up SLAM

  • For a simple subset of C

    • Similar to “Copy Constants”

    • Use VFG to find a sufficient set of predicates

    • Provably sufficient for this subset

  • If this set fails to prove the real program

    • Fall back on counterexample-driven refinement


A simple language

A Simple Language

s ::= vi = n// constants

| vi = vj // variable copy

| if (*) s1 else s2 // condition ignored

| vi = fun(vj, …)// function call

| return(vi)// function return

| assert(vi» vj)// safety property


Predicate discovery

Predicate Discovery

  • High-level idea

    • Each flow edge in the VFG means “values may flow from X to Y”

    • Add predicates to see if they do

  • For each assert(vi» vj)

    • Consider the chain of values flowing to vi, vj

    • Add an equality predicate for each link

    • Use constants to resolve scoping


Slam example

SLAM Example

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

2

c

4


Predicates for b

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a

a == 1


Predicates for b1

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a // no scope!

a == 1


Predicates for b2

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == rb == r

r == 3r == 3

r == fr == f

f == a // no scope!f == 1 f == 3

a == 1a == 1 a == 3


Why does this work

Why does this work?

  • Simple language

    • No arithmetic, etc.

    • Just copying around initial values

  • Knowing final values of variables

    • Completely decides safety condition

  • Still related to real life

    • Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.


Some slam results

Some SLAM Results

Generated predicates are between all and two-thirds of the necessary

predicates. However, since SLAM must iterate once to generate 3-7

missing predicates, the net performance increase is more than linear.

Predicates can be specialized or simplified if the assert() condition is

a common relational operator (e.g., x==y, x<y, x==5).


Conclusions

Conclusions

  • Complex interprocedural analyses can benefit from inexpensive value-flow

  • VFG encodes value flow

    • Constructed and queried quickly

  • Prune the set of dataflow facts and program points considered

  • Large net performance increase


  • Login