- 103 Views
- Uploaded on
- Presentation posted in: General

Pointer Analysis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Pointer Analysis

G. Ramalingam

Microsoft Research, India

x = 3;

y = 4;

z = x + 5;

- x is always 3 here
- can replace x by 3
- and replace x+5 by 8
- and so on

x = 3;

*p = 4;

z = x + 5;

- Is x always 3 here?

p = &y;

x = 3;

*p = 4;

z = x + 5;

if (?)

p = &x;

else

p = &y;

x = 3;

*p = 4;

z = x + 5;

p = &x;

x = 3;

*p = 4;

z = x + 5;

pointers affect

most program analyses

x is always 3

x is always 4

x may be 3 or 4

(i.e., x is unknown in our lattice)

p = &y;

x = 3;

*p = 4;

z = x + 5;

if (?)

p = &x;

else

p = &y;

x = 3;

*p = 4;

z = x + 5;

p = &x;

x = 3;

*p = 4;

z = x + 5;

p always

points-to y

p always

points-to x

p may point-to x or y

- Determine the set of targets a pointer variable could point-to (at different points in the program)
- “p points-to x” == “*p has l-value x”
- targets could be variables or locations in the heap (dynamic memory allocation)
- p = &x;
- p = new Foo(); or p = malloc (…);

- must-point-to vs. may-point-to

Can *p denote the

same location as *q?

*q = 3;

*p = 4;

z = *q + 5;

what values can

this take?

- *p and *q are said to be aliases (in a given concrete state) if they represent the same location
- Alias analysis
- determine if a given pair of references could be aliases at a given program point
- *pmay-alias *q
- *pmust-alias*q

- Points-To Analysis
- may-point-to
- must-point-to

- Alias Analysis
- may-alias
- must-alias

p = &x;

q = &y;

if (?) {

q = p;

}

x = &a;

y = &b;

z = *q;

p = &x;

q = &y;

if (?) {

q = p;

}

x = &a;

y = &b;

z = *q;

x = &a;

y = &b;

if (?) {

p = &x;

} else {

p = &y;

}

*x = &c;

*p = &c;

x: a

x: {a,c}

x: a

y: {b,c}

y: b

y: b

p: {x,y}

p: {x,y}

p: {x,y}

a: c

a: c

a: null

How should we handle

this statement?

Weak update

Weak update

Strong update

- When is it correct to use a strong update? A weak update?
- Is this points-to analysis precise?
- What does it mean to say
- p must-point-to x at pgm point u
- p may-point-to x at pgm point u
- p must-not-point-to x at u
- p may-not-point-to x at u

- We must formally define what we want to compute before we can answer many such questions

- A static program analysis computes approximateinformation about the runtime behavior of a given program
- The set of valid programs is defined by the programming language syntax
- The runtime behavior of a given program is defined by the programming language semantics
- The analysis problem defines whatinformation is desired
- The analysis algorithm determines what approximation to make

- A program consists of
- a set of variables Var
- a directed graph (V,E,entry) with a distinguished entry vertex, with every edge labelled by a primitive statement

- A primitive statement is of the form
- x = null
- x = y
- x = *y
- x = &y;
- *x = y
- skip
(where x and y are variables in Var)

- Omitted (for now)
- Dynamic memory allocation
- Pointer arithmetic
- Structures and fields
- Procedures

1

Vars = {x,y,p,a,b,c}

x = &a;

y = &b;

if (?) {

p = &x;

} else {

p = &y;

}

*x = &c;

*p = &c;

x = &a

2

y = &b

3

p = &x

p = &y

4

5

skip

skip

6

*x = &c

7

*p = &c

8

- Operational semantics == an interpreter (defined mathematically)
- State
- Data-State ::= Var -> (Var U {null})
- PC ::= V (the vertex set of the CFG)
- Program-State ::= PC x Data-State

- Initial-state:
- (entry, \x. null)

Vars = {x,y,p,a,b,c}

Initial data-state

1

x: N, y:N, p:N, a:N, b:N, c:N

x = &a

2

Initial program-state

y = &b

x: N, y:N, p:N, a:N, b:N, c:N

<1, >

3

p = &x

p = &y

4

5

skip

skip

6

*x = &c

7

*p = &c

8

- Meaning of primitive statements
- CS[stmt] : Data-State -> Data-State

- CS[ x = y ] s = s[x s(y)]
- CS[ x = *y ] s = s[x s(s(y))]
- CS[ *x = y ] s = s[s(x) s(y)]
- CS[ x = null ] s = s[x null]
- CS[ x = &y ] s = s[x y]

must say what happens if null is dereferenced

- Meaning of program
- a transition relation on program-states
- Program-State X Program-State
- state1state2means that the execution of some edge in the program can transform state1 into state2

- Defining
- (u,s) (v,s’) iff the program contains a control-flow edge u->vlabelled with a statement stmt such that M[stmt]s = s’

- A sequence of states s1s2 … snis said to be an execution (of the program) iff
- s1is the Initial-State
- sisi+1for 1 <= I < n

- A state s is said to be a reachable stateiff there exists some execution s1s2 … snis such that sn= s.
- Define RS(u) = { s | (u,s) is reachable }

- A sequence of states s1s2 … snis said to be an execution (of the program) iff
- s1 is the Initial-State
- si| si+1 for 1 <= I < n

- A state s is said to be a reachable state iff there exists some execution s1s2 … snis such that sn= s.
- Define RS(u) = { s | (u,s) is reachable }

All of this formalism for this one definition

- Let u denote a vertex in the CFG
- Define IdealMustPT (u) to be
{ (p,x) | forall s in RS(u). s(p) == x }

- Define IdealMayPT (u) to be
{ (p,x) | exists s in RS(u). s(p) == x }

May Point-To Analysis

Compute R: V -> 2Vars’ such that

R(u) IdealMayPT(u)

(where Var’ = Var U {null})

For every vertex u in the CFG,

compute a set R(u) such that

R(u) { (p,x) | $sÎRS(u). s(p) == x }

- An algorithm is said to be correct if the solution R it computes satisfies
"uÎV. R(u) IdealMayPT(u)

- An algorithm is said to be precise if the solution R it computes satisfies
"uÎV. R(u)=IdealMayPT(u)

- An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if
"uÎV. R1(u) ÍR2(u)

Compute R: V -> 2Vars’ such that

R(u) IdealMayPT(u)

p = &x;

q = &y;

if (?) {

q = p;

}

x = &a;

y = &b;

z = *q;

- Is this algorithm correct?
- Is this algorithm precise?
- Let’s first completely and formally define the algorithm.

- Define semi-lattice of abstract-values
- AbsDataState ::= Var -> 2Var’
- f1 f2 = \x. (f1 (x) È f2 (x))
- bottom = \x.{}

- Define initial abstract-value
- InitialAbsState = \x. {null}

- Define transformers for primitive statements
- AS[stmt] : AbsDataState -> AbsDataState

- Let st(v,u) denote stmt on edge v->u
- Compute the least-fixed-point of the following “dataflow equations”
- x(entry) = InitialAbsState
- x(u) = v->u AS(st(v,u)) x(v)

x(v)

x(w)

v

w

st(v,u)

st(w,u)

u

x(u)

- Abstract transformers for primitive statements
- AS[stmt] : AbsDataState -> AbsDataState

- AS[ x = y ] s = s[x s(y)]
- AS[ x = null ] s = s[x {null}]
- AS[ x = &y ] s = s[x {y}]
- AS[ x = *y ] s = s[x s*(s(y))]
where s*({v1,…,vn}) = s(v1) È … È s(vn)

- AS[ *x = y ] s = ???

- We have a complete & formal definition of the problem.
- We have a complete & formal definition of a proposed solution.
- How do we reason about the correctness & precision of the proposed solution?

- Concrete Domain
- Concrete states: C
- Semantics: For every statement st,
- CS[st] : C -> C

- AbstractDomain
- Asemi-lattice(A, b)
- TransferFunctions
- For every statement st,
- AS[st] : A -> A

- For every statement st,

- Concrete (Collecting) Domain
- Asemi-lattice(2C, b)
- TransferFunctions
- For every statement st,
- CS*[st] : 2C -> 2C

- For every statement st,

a

g

2Data-State

2Var x Var’

a(Y) = { (p,x) | exists s in Y. s(p) == x }

MayPT(u)

a

Í

a

RS(u)

IdealMayPT(u)

2Data-State

2Var x Var’

IdealMayPT (u) = a ( RS(u) )

c is said to be correctly

approximated by a

iff

a(c) Ía

c1

correctly

approximated by

a1

f

c2

f#

correctly

approximated by

a2

C

A

concretization

g

c1

a1

f

abstraction

a

c2

f#

a2

requirement:

f#(a1) ≥a (f( g(a1))

C

A

- CS[stmt] : Data-State -> Data-State
- CS[ x = y ] s = s[x s(y)]
- CS[ x = *y ] s = s[x s(s(y))]
- CS[ *x = y ] s = s[s(x) s(y)]
- CS[ x = null ] s = s[x null]
- CS*[stmt] : 2Data-State -> 2Data-State
- CS*[st] X = { CS[st]s | s Î X }

- AS[stmt] : AbsDataState -> AbsDataState
- AS[ x = y ] s = s[x s(y)]
- AS[ x = null ] s = s[x {null}]
- AS[ x = *y ] s = s[x s*(s(y))]
where s*({v1,…,vn}) = s(v1) È … È s(vn)

- AS[ *x = y ] s = ???

x: &y

y: &x

z: &a

g

x: {&y}

y: {&x,&z}

z: {&a}

x: &y

y: &z

z: &a

f

f#

*y = &b;

*y = &b;

x: &b

y: &x

z: &a

x: {&y,&b}

y: {&x,&z}

z: {&a,&b}

x: &y

y: &z

z: &b

a

x: &y

y: &x

z: &a

g

x: {&y}

y: {&x,&z}

z: {&a}

x: &y

y: &z

z: &a

f

f#

*x = &b;

*x = &b;

x: &y

y: &b

z: &a

x: {&y}

y: {&b}

z: {&a}

x: &y

y: &b

z: &a

a

- Transformer for “*p = q”