Symbolic Analysis in Program Variables: Understanding Affine Expressions and Induction Variables

Symbolic Analysis

Symbolic Analysis • Symbolic analysis tracks the values of variables in programs symbolically as expressions of input variables and other variables, which we call reference variables. • We may draw out useful information about relationships among variables that are expressed in terms of the same set of reference variables

z = x – 2 &A[x]  &A[y] z > x is never true can be removed An Example • x = input(); • y = x – 1; • z = y – 1; • A[x] = 10; • A[y] = 11; • if (z > x) • z = x;

Abstract Domain • Since we cannot create succinct and closed-form symbolic expressions for all values computed, we choose an abstract domain and approximate the computations with the most precise expressions within the domain. • Constant propagation: { constants, UNDEF, NAC } • Symbolic analysis: { affine-expressions, NAA }

Affine Expressions • An expression is affine with respect to variables v1, v2, …, vn if it can be expressed as c0 + c1v1 + … + cnvn, where c0, c1, …, cn are constants. • An affine expression is linear if c0 is zero.

Induction Variables • An affine expression can also be written in terms of the count of iterations through the loop. • Variables whose values can be expressed as c1i + c0, where i is the count of iterations through the closest enclosing loop, are known as induction variables.

An Example for (m = 10; m < 20; m++) { x = m * 3; A[x] = 0; } i, m = i + 10 x = 30 + 3 * i x = 27; for (m = 10; m < 20; m++) { x = x + 3; A[x] = 0; } for (x = &A + 30; x <= &A + 57; x = x + 3) { *x = 0; }

Other Reference Variables • If a variable is not a linear function of the reference variables already chosen, we have the option of treating its value as reference for future operations. a = f(); b = a + 10; c = a + 11;

R8 a = 0; i = 1; B1 R7 R6 B2 a = a + 1; b = 10 * a; c = 0; j = 1; R5 B3 d = b + c; c = c + 1; j = j + 1; if j <= 10 goto B3 B4 i = i + 1; if i <= 100 goto B2 A Running Example a = 0; for (f = 100; f < 200; f++) { a = a + 1; b = 10 * a; c = 0; for (g = 10; g < 20; g++) { d = b + c; c = c + 1; } }

Data-Flow Values: Symbolic Maps • The domain of data-flow values for symbolic analysis is symbolic maps, which are functions that map each variable in the program to a value. • The value is either an affine function of reference values, or the special symbol NAA to represent a non-affine expression. • If there is only one variable, the bottom value of the semilattice is a map that sends the variable to NAA. • The semilattice for n variables is the product of the individual semillatices. • We use mNAA to denote the bottom of the semilattice which maps all variables to NAA.

R8 a = 0; i = 1; B1 R7 R6 B2 a = a + 1; b = 10 * a; c = 0; j = 1; var i = 1 1  i  100 j = 1, …, 10 j = 1, …, 10 a 1 i b 10 10i d 10, …, 19 10i, …, 10i + 9 c 1, …, 10 1, …, 10 R5 B3 d = b + c; c = c + 1; j = j + 1; if j <= 10 goto B3 B4 i = i + 1; if i <= 100 goto B2 The Running Example

The Running Example m m(a) m(b) m(c) m(d) IN[B1] NAA NAA NAA NAA OUT[B1] 0 NAA NAA NAA IN[B2] i – 1 NAA NAA NAA OUT[B2] i 10i 0 NAA IN[B3] i 10i j – 1 NAA OUT[B3] i 10i j 10i + j – 1 IN[B4] i 10i j 10i + j – 1 OUT[B4] i – 1 10i – 10 j 10i + j – 11

The Running Example a = 0; for (i = 1; i <= 100; i++) { a = i; b = 10 * i; c = 0; for (j = 1; j <= 10; j++) { d = 10 * i + j – 1; c = j; } }

Transfer Functions • The transfer functions in symbolic analysis send symbolic maps to symbolic maps. • The transfer function of statement s, denoted fs, is defined as follows: • If s is not an assignment, then fs = I. • If s is an assignment to variable x, then fs(m)(x)m(v) for all variables v x,= c0+c1m(y)+c2m(z) if x is assigned c0+c1y+c2z,NAA otherwise.

Composition of Transfer Functions • If f2(m)(v) = NAA, then (f2。f1)(m)(v) = NAA. • If f2(m)(v) = c0 + icim(vi), then (f2。f1)(m)(v)NAA, if f1(m)(vi) = NAA for some i 0, = ci 0c0 + ici f1(m)(vi) otherwise

The Running Example ff(m)(a) f(m)(b) f(m)(c) f(m)(d) fB1 0 m(b) m(c) m(d) fB2 m(a) + 1 10m(a) + 10 0 m(d) fB3 m(a) m(b) m(c) + 1 m(b) + m(c) fB4 m(a) m(b) m(c) m(d)

Solutions to Data-Flow Problem OUT[Bk] = fB(IN[Bk]), for all Bk OUT[B1]  IN1[B2] OUT[B2]  INi,1[B3], 1  i  100 OUTi,j-1[B3]  INi,j[B3], 1  i  100, 2  j  10 OUTi,10[B3]  INi[B4], 2  i  100 OUTi-1[B4]  INi[B2], 1  i  100

Meet of Transfer Functions • The meet of two transfer functions:f1(m)(v) if f1(m)(v) = f2(m)(v) (f2  f1)(m)(v) = NAA otherwise

Parameterized Function Compositions • If f(m)(x) = m(x) + c,thenf i(m)(x) = m(x) + cifor alli 0, x is a basic induction variable. • If f(m)(x) = m(x), thenf i(m) (x) = m(x)for alli 0, x is a symbolic constant. • If f(m)(x) =c0 + c1m(x1) + … + cnm(xn),where each xk is either a basic induction variable or a symbolic constant , thenf i(m)(x) = c0 + c1 f i(m)(x1) + … + cn f i(m)(xn) for alli 0 , x is an induction variable. • In all other cases, f i(m)(x) = NAA.

Parameterized Function Compositions • The effect of executing a fixed number of iterations is obtained by replacing i above by that number. • If the number of iterations is unknown, the value at the start of the last iteration is given by f *.m(v) if f(m)(v) = m(v) f *(m)(v) = NAA otherwise

The Running Example • m(a) if v = am(b) if v = bf iB3(m)(v) = m(c) +i if v = cm(b) +m(c) +i if v = d. • m(a) if v = am(b) if v = bf *B3(m)(v) = NAA if v = c NAA if v = d.

A Region-Based Algorithm • The effect of execution from the start of the loop region to the entry of the ith iterationfR,i,IN[S]= (Bpred(S)fS,OUT[B])i-1 • If the number of iterations of a region is known, replace i with the actual count. • In the top-down pass, compute fR,i,IN[B]. • If m(v)= NAA, introduce a new reference variable t, all references of m(v) are placed by t.

The Running Example fR5,j,IN[B3] = f j-1B3 fR5,j,OUT[B3] = f jB3 fR6,IN[B2] = I fR6,IN[R5] = fB2 fR6,OUT[B4] = I。 fR5,10,OUT[B3]。fB2 fR7,i,IN[R6] = f i-1R6,OUT[B4] fR7, i,OUT[B4] = f iR6,OUT[B4]fR8,IN[B1] = I fR8,IN[R7] = fB1 fR8,OUT[B4] = I。 fR7,100,OUT[B4]。fB1

The Running Example ff(m)(a) f(m)(b) f(m)(c) f(m)(d) fR5,j,IN[B3]m(a) m(b) m(c)+j-1 NAA fR5,j,OUT[B3]m(a) m(b) m(c)+j m(b)+m(c)+j-1 fR6,IN[B2]m(a) m(b) m(c) m(d) fR6,IN[R5]m(a)+1 10m(a)+10 0 m(d) fR6,OUT[B4] m(a)+1 10m(a)+10 10 10m(a)+9 fR7,i,IN[R6] m(a)+i-1 NAA NAA NAA fR7, i,OUT[B4] m(a)+i 10m(a)+10i 10 10m(a)+10i+9 fR8,IN[B1] m(a) m(b) m(c) m(d) fR8,IN[R7] 0 m(b) m(c) m(d) fR8,OUT[B4] 100 1000 10 1000

The Running Example IN[B1] = mNAA OUT[B1] = fB1(IN[B1]) INi[B2] = fR7,i,IN[R6](OUT[B1]) OUTi[B2] = fB2(INi[B2]) INi,j[B3] = fR5,j,IN[B3](OUTi[B2]) OUTi,j[B3] = fB2(INi,j[B3])

for (i = 1; i < n; i++) { a = input(); t = a; for (j = 1; j < 10; j++) { a = t – 1; b = t – 1 + j; a = t; } } The Running Example for (i = 1; i < n; i++) { a = input(); for (j = 1; j < 10; j++) { a = a – 1; b = j + a; a = a + 1; } }

Symbolic Analysis in Program Variables: Understanding Affine Expressions and Induction Variables

Symbolic Analysis in Program Variables: Understanding Affine Expressions and Induction Variables

Presentation Transcript

SYMBOLIC LOGIC

Computationally Sound Symbolic Protocol Analysis: Correspondence Theorems

Symbolic Protocol Analysis

Segmented Symbolic Analysis

Universally Composable Symbolic Analysis of Security Protocols

Symbolic

Symbolic program analysis as Satisfiability Modulo Theories

Symbolic Interactionism

Symbolic Logic

INTRODUCTION TO SYMBOLIC DATA ANALYSIS

Universally Composable Symbolic Analysis of Cryptographic Protocols

Symbolic

Symbolic Protocol Analysis with Algebraic Theories

symbolic

Symbolic Analysis for Buffer Overflow

Symbolic Protocol Analysis

Symbolic Processing

Symbolic Analysis of Dynamical systems

Symbolic Analysis of Dynamical systems