Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis - PowerPoint PPT Presentation

Speeding up dataflow analysis using flow insensitive pointer analysis
Download
1 / 31

  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speeding up dataflow analysis using flow insensitive pointer analysis

Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis

Stephen Adams, Tom Ball, Manuvir Das

Sorin Lerner, Mark Seigle

Westley Weimer

Microsoft Research

University of Washington

UC Berkeley


Motivation

Motivation

  • Static analysis for program verification

  • Complex dataflow analyses are popular

    • SLAM, ESP, BLAST, CQual, …

    • Flow-Sensitive

    • Interprocedural

    • Expensive!

  • Cut down on “data flow facts”

  • Without losing anything important


General idea

General Idea

  • If complex analysis is worse than O(N)

  • And you have a cheap analysis that

    • Is O(N)

    • Reduces N

  • Then composing them saves time


Value flow graph vfg

Value Flow Graph (VFG)

  • Variant of a points-to graph

  • Encodes the flow of values in the program

  • Conservative approximation

  • Lightweight, fast to compute and query

  • Early queries can safely reduce

    • data-flow facts considered

    • program points considered

  • Like slicing a program wrt. value flow


Computing a vfg

Computing a VFG

  • Use a subtyping-based pointer analysis

    • We used One-Level Flow [Das]

  • Process all assignments

    • Not just those involving pointers

  • Represent constant values explicitly

    • Put them in the graph

  • Label graph with source locations

    • Encodes program slices


Example points to graph

Example Points-To Graph

1: int a, *x;

2: x = &a;

3: *x = 7;

x

Points-to

Edge

a

Source

“Address”

Node

x

Expr

Node


Speeding up dataflow analysis using flow insensitive pointer analysis

One Level Flow Graph

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

a

Source

“Address”

Node

x

Expr

Node


Speeding up dataflow analysis using flow insensitive pointer analysis

Value Flow Graph

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg properties

VFG Properties

  • Computed in almost-linear time

  • Get points-to sets from VFG in linear time

    • Backwards reachability via flow edges

    • Gather up all variables

  • Get value flow from VFG in linear time

    • Backwards reachability via flow edges

    • Follow points-to edges up one


Vfg query points to of x

VFG Query: Points-To of x

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg query value flow into a

VFG Query: Value Flow into a

2

Flow Edge

x

Points-to

Edge

1: int a, *x;

2: x = &a;

3: *x = 7;

2

7

a

Source

“Address”

Node

x

Expr

Node

3

2

2,3


Vfg summary

VFG Summary

  • Computed in almost-linear time

  • Queries complete in linear time

  • Approximates flow of values in program

  • Show two applications that benefit

    • ESP

    • SLAM


Application 1 esp

Application 1: ESP

  • Verification tool for large C++ programs

  • Tracks “typestate” of values

    • Encoded as Finite State Machine

    • Special Error state

  • Core: interprocedural data-flow engine

    • Flow sensitive: state at every point

  • Performed bottom-up on call graph

  • Requires function summaries


Esp function summaries

ESP Function Summaries

  • Consider stateful memory locations

  • Summarize function behavior for each loc

    • Reducing number of locs would be good!

    • But C has evil casts, so types cannot be used

  • Worst case set of locations:

    • All globals and formal parameters

    • Everything transitively reachable from there


Reduce location set

Reduce Location Set

  • Location L needs to be considered in F if

    • Some exp E has its state changed in F

    • Value held by L at entry to F can flow into E

  • Assuming state-changing ops are known

  • Query VFG to find values that flow in


Esp example

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

Locations to consider

for foo() summary:

{ e, *e, f, *f, g, *g, h, *h }


Esp example1

ESP Example

FILE *e, *f, *g, *h;

void foo() {

FILE **p;

int a = (int)h;

if (…) p = &e;

else p = &f;

*p = fopen(…);

}

  • Compute VFG

  • (2) Query value flow on *p

  • (3) Reduced locations to consider for foo() summary: { e, f }

  • (4) Reduce lines to consider for dataflow


Esp results

ESP Results

  • FILE * output in GCC

    • 140 KLOC, 2149 functions, 66 files, 1068 globals

  • VFG Queries take 200 seconds

  • Reduce average number of locations per function summary from 1100 to <1

    • Median of 15 for functions with >0

  • Verification takes 15 minutes

    • Infeasible otherwise


Application 2 slam

Application 2: SLAM

  • Validates temporal safety properties

    • Boolean abstraction

    • Interprocedural dataflow analysis

    • Counterexample-driven refinement

  • Convert C program to Boolean program

  • Exhaustive dataflow analysis

    • No errors? Program is safe.

    • Real error? Program has a bug.

    • False error? Add predicates, repeat.


Boolean programs

Boolean Programs

int x,y;

x = 5;

y = 6;

x = x * 2;

y = y * 2;

assert(x<y)

bool p,q;

p = 1;

q = 1;

p = 0; q = 0;

q = 1;

assert(q)

p means “x == 5”

q means “x < y”

Predicates

(important!)

C Program

Boolean Program


Slam predicates

SLAM Predicates

  • Hard to come up with good predicates

  • Counterexample-driven refinement

    • Picks good predicates

    • Is very slow

  • Taking all possible predicates

    • Is even slower

  • Want “all the useful” predicates


Speeding up slam

Speeding Up SLAM

  • For a simple subset of C

    • Similar to “Copy Constants”

    • Use VFG to find a sufficient set of predicates

    • Provably sufficient for this subset

  • If this set fails to prove the real program

    • Fall back on counterexample-driven refinement


A simple language

A Simple Language

s ::= vi = n// constants

| vi = vj // variable copy

| if (*) s1 else s2 // condition ignored

| vi = fun(vj, …)// function call

| return(vi)// function return

| assert(vi» vj)// safety property


Predicate discovery

Predicate Discovery

  • High-level idea

    • Each flow edge in the VFG means “values may flow from X to Y”

    • Add predicates to see if they do

  • For each assert(vi» vj)

    • Consider the chain of values flowing to vi, vj

    • Add an equality predicate for each link

    • Use constants to resolve scoping


Slam example

SLAM Example

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

2

c

4


Predicates for b

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a

a == 1


Predicates for b1

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == r

r == 3

r == f

f == a // no scope!

a == 1


Predicates for b2

Predicates For “b”

int sel(int f) {

int r;

if (*) r = f;

else r = 3;

return(r);

}

void main() {

int a,b,c;

a = 1;

b = sel(a);

if (*) c = 2;

else c = 4;

assert(b > c);

}

b

3

r

f

a

1

Predicates:

b == rb == r

r == 3r == 3

r == fr == f

f == a // no scope!f == 1 f == 3

a == 1a == 1 a == 3


Why does this work

Why does this work?

  • Simple language

    • No arithmetic, etc.

    • Just copying around initial values

  • Knowing final values of variables

    • Completely decides safety condition

  • Still related to real life

    • Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.


Some slam results

Some SLAM Results

Generated predicates are between all and two-thirds of the necessary

predicates. However, since SLAM must iterate once to generate 3-7

missing predicates, the net performance increase is more than linear.

Predicates can be specialized or simplified if the assert() condition is

a common relational operator (e.g., x==y, x<y, x==5).


Conclusions

Conclusions

  • Complex interprocedural analyses can benefit from inexpensive value-flow

  • VFG encodes value flow

    • Constructed and queried quickly

  • Prune the set of dataflow facts and program points considered

  • Large net performance increase


  • Login