260 likes | 358 Views
Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code. Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo Institute of Computing Technology, Chinese Academy of Sciences { htyu, zqzhang, fxb, huowei }@ict.ac.cn. Jingling Xue
E N D
Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code Hongtao Yu Zhaoqing Zhang Xiaobing Feng Wei Huo Institute of Computing Technology, Chinese Academy of Sciences { htyu, zqzhang, fxb, huowei }@ict.ac.cn Jingling Xue University of New South Wales jingling@cse.unsw.edu.au
Outline • Introduction • Framework • Analyzing a Level • Experiments • Conclusion
Introduction • Motivation • Who needs flow- and context-sensitive (FSCS) pointer analysis ? • Software checking tools • Program understanding • Parallelization tools • Hardware synthesis • Existed methods cannot scale to large real programs • Aiming at millions of lines of C code
Improve scalability • For flow-sensitivity • Decreasing iterations in dataflow analysis • Saving space of points-to graph • For context-sensitivity • Summary-based • Low storage penalty • Low apply penalty
Idea • Level by Level analysis • Analyze the pointers in decreasing order of their points-to levels • Suppose int **q, *p, x; q has a level 2, p has a level 1 and x has a level 0. • Fast flow-sensitive analysis on full sparse SSA • Fast and accurate context-sensitive analysis using a full transfer function
Contribution • performs a full-sparse flow-sensitive pointer analysis using a flow-insensitive algorithm • performs a context-sensitive pointer analysis efficiently with precise full transfer function • yields a flow- and context-sensitive interproce-duralmay/must mod/ref on a compact SSA form • analyzes million lines of code in minutes, fast-erthan the state-of-the art FSCS pointer ana-lysisalgorithms
Framework • for points-to level from the highest to lowest Compute points-to level Bottom-up Top-down • Propagate points-to set • Evalute transfer functions • incremental build call graph Figure 1. Level-by-level pointer analysis (LevPA).
Points-to level • Property 1.If a variable x is possibly pointed to by a pointer y, then ptl(x) ≤ ptl(y). • Property 2.If a variable y is possibly assigned to x, then ptl(x) = ptl(y). • Compute points-to level by a Unification-based pointer analysis
Example • int o, t; • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x = &a; y = &b; • L4: foo(x, y); • L5: *b = 5; • L6: if ( … ) { x = &c; y = &e; } • L7: else { x = &d; y = &d; } • L8: c = &t; • L9: foo( x, y); • L10: *e = 10; } • voidfoo( int **p, int **q) { • L11: *p = *q; • L12: *q = &obj; • } • ptl(x, y, p, q) =2 • ptl(a, b, c, d, e) =1 • ptl(t, o) = 0 • analyze • first { x, y, p, q } • then { a, b, c, d, e} • last { t, o }
Bottom-up analyze level 2 • void foo( int **p, int **q) { • L11: *p = *q; • L12: *q = &obj; } • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x = &a; y = &b; • L4: foo(x, y); • L5: *b = 5; • L6: if ( … ) { x = &c; y = &e; } • L7: else { x = &d; y = &d; } • L8: c = &t; • L9: foo( x, y); • L10: *e = 10; }
Bottom-up analyze level 2 • void foo( int **p, int **q) { • L11: *p1 = *q1; • L12: *q1 = &obj; } • p1’s points-to depend on formal-in p • q1’s points-to depend on formal-in q • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x = &a; y = &b; • L4: foo(x, y); • L5: *b = 5; • L6: if ( … ) { x = &c; y = &e; } • L7: else { x = &d; y = &d; } • L8: c = &t; • L9: foo( x, y); • L10: *e = 10; }
Bottom-up analyze level 2 • void foo( int **p, int **q) { • L11: *p1 = *q1; • L12: *q1 = &obj; } • p1’s points-to depend on formal-in p • q1’s points-to depend on formal-in q • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x1 = &a; y1 = &b; • L4: foo(x1, y1); • L5: *b = 5; • L6: if ( … ) { x2 = &c; y2 = &e; } • L7: else { x3 = &d; y3 = &d; } • x4=ϕ (x2, x3); y4=ϕ (y2, y3) • L8: c = &t; • L9: foo( x4, y4); • L10: *e = 10; } • x1 →{ a } • y1 →{ b } • x2 →{ c } • y2 → { e } • x3 → { d } • y3 →{ d } • x4 → { c, d } • y4 → { e, d }
Full-sparse Analysis • Achieve flow-sensitivity flow-insensitively • Regard each SSA name as a unique variable • Set constraint-based pointer analysis • Full sparse • Saving time • Saving space
Top-down analyze level 2 • voidfoo( int **p, int **q) { • L11: *p = *q; • L12: *q = &obj; } L4: foo.p→ { a } foo.q→ { b } • main: Propagate to callsite • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x = &a; y = &b; • L4: foo(x, y); • L5: *b = 5; • L6: if ( … ) { x = &c; y = &e; } • L7: else { x = &d; y = &d; } • L8: c = &t; • L9: foo( x, y); • L10: *e = 10; } • L9: • foo.p→ { c, d } • foo.q→ { d, e } • foo.p→ { a, c, d } • foo.q→ { b, d, e }
Top-down analyze level 2 • voidfoo( int **p, int **q) { • L11: *p = *q; • L12: *q = &obj; } • foo: Expand pointer dereferences • voidfoo( int **p, int **q) { • μ(b, d, e) • L11: *p1 = *q1; • χ(a, c, d) • L12: *q1 = &obj; • χ(b, d, e) • } • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x = &a; y = &b; • L4: foo(x, y); • L5: *b = 5; • L6: if ( … ) { x = &c; y = &e; } • L7: else { x = &d; y = &d; } • L8: c = &t; • L9: foo( x, y); • L10: *e = 10; } Merging calling contexts here
Context Condition • To be context-sensitive • Points-to relation ci • p ⟹ v (p→v ) , pmust (may) point to v, p is a formal parameter. • Context Condition ℂ(c1,…,ck) • a Boolean function consists of higher-level points-to relations • Context-sensitive μ and χ • μ(vi, ℂ(c1,…,ck)) • vi+1=χ(vi, M, ℂ(c1,…,ck)) • M ∈ {may, must}, indicates weak/strong update
Context-sensitive μ and χ void foo( int **p, int **q) { μ(b, q⟹b) μ(d,q→d) μ(e,q→e) L11: *p1 = *q1; a=χ(a , must, p⟹a) c=χ(c , may, p→c) d=χ(d , may, p→d) L12: *q1 = &obj; b=χ(b , must, q⟹b) d=χ(d , may, q→d) e=χ(e , may, q→e) }
Bottom-up analyze level 1 void foo( int **p, int **q) { μ(b1, q⟹b) μ(d1,q→d) μ(e1,q→e) L11: *p1 = *q1; a2=χ(a1 , must, p⟹a) c2=χ(c1 , may, p→c) d2=χ(d1 , may, p→d) L12: *q1 = &obj; b2=χ(b1 , must, q⟹b) d3=χ(d2 , may, q→d) e2=χ(e1 , may, q→e) }
Points-to Set • Local Points-to Set • Loc (p) = { <v, ℂ(c1,…,ck)> | ℂ(c1,…,ck) is a context condition}. • p can point to v if and only if ℂ(c1,…,ck) holds. • is computed explicitly during the bottom-up analysis. • Dependence Set • Dep(p) = { <q, ℂ(c1,…,ck)> | q is a formal-in parameter of level lev and ℂ(c1,…,ck) is a context condition • Ptr(p) includes Ptr(q) if and only if ℂ(c1,…,ck) holds.
Transfer function • Trans(proc, v) • < Loc(v), Dep(v), ℂ(c1,…,ck), M > • v is a formal-out parameter • ℂ(c1,…,ck) is a context condition. • V can be modified at a callsite invoking proc only if ℂ(c1,…,ck) holds at the callsite • M ∈ {may, must}, • indicates may/must mod effect • Trans(proc) • a set of all individual transfer functions Trans(proc, v).
Bottom-up analyze level 1 • Trans(foo, a) = < { }, { <b, q⟹b> , < d, q→d>, < e, q→e>} , p⟹a, must > void foo( int **p, int **q) { μ(b1, q⟹b) μ(d1, q→d) μ(e1, q→e) L11: *p1 = *q1; a2=χ(a1 , must, p⟹a) c2=χ(c1 , may, p→c) d2=χ(d1 , may, p→d) L12: *q1 = &obj; b2=χ(b1 , must, q⟹b) d3=χ(d2 , may, q→d) e2=χ(e1 , may, q→e) } • Trans(foo, c) = < { }, { <b, q⟹b> , < d, q→d>, < e, q→e>} , p→c, may > • Trans(foo, b) = < {< obj, q⟹b> }, { } , q⟹b, must > • Trans(foo, e) = < {< obj, q→e> }, { } , q→e, may > • Trans(foo, d) = < {< obj, q→d> }, { <b, p→d ∧ q⟹b> , < d, p→d>, < e, p→d ∧ q→e> } , p→d ∨ q→d, may >
Bottom-up analyze level 1 • L5: *b1 = 5; • L6: if ( … ) { x2 = &c; y2 = &e; } • L7: else { x3 = &d; y3 = &d; } • x4=ϕ (x2, x3) y4=ϕ (y2, y3) • L8: c1 = &t; • μ(d1, true) • μ(e1, true) • L9: foo(x4, y4); • c2=χ(c1, may , true) • d2=χ(d1, may , true) • e2=χ(e1, may , true) • L10: *e1= 10; } • intobj, t; • main() { • L1: int **x, **y; • L2: int *a, *b, *c, *d, *e; • L3: x1 = &a; y1 = &b; • μ(b1, true) • L4: foo(x1 , y1 ); • a2=χ(a1 , must, true) • b2=χ(b1 , must, true) • at L4, • p ⟹ a holds, • q ⟹ b holds • at L9, • p → c, p → d holds, • q → e, q → d holds,
x1 1 0 x2 0 x3 1 1 0 0 1 BDD and context condition • Context conditions are implemented using BDD • Compactly represented • Boolean operations efficiently variable x1 represents p→a variable x2 represents q→a variable x3 represents p→b BDD for ℂ = (p → a ∧ q → a) ∨ p → b if only p → b holds at a call site, we can write ℂ|x1=0;x2=0;x3=1to see whether C holds at the call site.
Experiment • Analyzes million lines of code in minutes • Faster than the state-of-the art FSCS pointer analysis algorithms. Table 2. Performance (secs).
Conclusion • We present a scalable method for flow- and context-sensitive pointer analysis • Analyzes the pointers in a program level by level in terms of their points-to levels. • Fast flow-sensitive analysis on full sparse SSA form • Fast and accurate context-sensitive analysis using full transfer functions represented by BDD. • Can analyze million lines of C code in minutes, faster than the state-of-the-art methods.