260 likes | 365 Views
Set-Based Analysis. Jaeho Shin <netj@ropas.snu.ac.kr> 2004-11-01 ROPAS Show & Tell. Overview. Treating program variables as sets of values is simple and intuitive. requires no abstract domain (if no further approximation is used). Ignore dependencies between different variables.
E N D
Set-Based Analysis Jaeho Shin <netj@ropas.snu.ac.kr> 2004-11-01 ROPAS Show & Tell
Overview • Treating program variables as sets of values • is simple and intuitive. • requires no abstract domain (if no further approximation is used). • Ignore dependencies between • different variables. • different occurrences of the same variables. • domain and codomain of functions. • Set-based analysis (especially in [He1994]) • makes no a priori requirement for sets be finitely presentable. • represents an upper-bound on the accuracy of systems that ignore dependencies between variables.
Inter-Variable Dependencies {u 1, v2} {u 3, v4} {x1, ran(f) [1,1]} {dom(g) 1, ran(g) 2} {x2, ran(f) [2,2]} {dom(g) 2, ran(g) 3}
Ignoring Inter-Variable Dependencies { u {1, 3}, v {2, 4} } { x {1, 2},ran(f) {[1,1],[1,2],[2,1],[2,2]} } { dom(g) {1, 2},ran(g) {2, 3} }
Target Language • ML-like, • Simple call-by-value functional language
Set-Based Operational Semantics • Approximates execution by collapsing all environments into one single set environment.
Set-Based Approximation • Local safety conditions for safe approximation • The set-based semantics defined here is non-deterministic, and it may lead to an unsound approximation. • Set-based approximation of term e0 is the set of values derived from the safe and minimal set environment Emin.
Algorithm for Computing sba(e0) • Representation of values • To forget the environment part of closures • The algorithm in [He1994] computesthe representation of sba(e0), • Basically two steps: • Construct set constraints from given term. • Simplify the constructed set constraints.
Set Variable Set Expression Set Constraint
Meaning of Constraints • InterpretationI • from set expressions to sets of set constraint values
Correspondence ofC with sba(e0) • Interpretation I is a model of the conjunction of constraints C • if, for each constraint X⊇se, I(se) is defined and I(X) ⊇I(se). • By giving order between I • I1 ⊇I2 if I1(X) ⊇I2(X) for all X • there is a least model lm(C) of C. • It can be proved that • if e0 B (X, C) andIlm = lm(C), • then Ilm(X) = ||sba(e0)||.
Remarks on the Algorithm • The simplification algorithm outputs explicit form of C. • Explicit form contains only constraints with atomic expressions, • where atomic expression is an abstraction or a constant with all subparts atomic. • Explicit form represents a regular grammar for possible values. • Time complexity is O(n3). • Construction of constraints is linear in the size of e0 . • At most O(n2) new constraints can be added by the simplification. • Determining what other new constraints need to be added, when adding each new constraint, can be bounded by O(n). • Space complexity is O(n2). • Also computes the least set environment safe w.r.t. e0.
Application:Finding Links in Web Pages • Goal • Find all possible links (URL’s) from a given web pagewhich is written in HTML and JavaScript. • Observation • URL’s in HTML can be found trivially. • For JavaScript, strings assigned to variables named *.hrefor *.srcare the URL’s. • Solution • Transform given web page into an intermediate representation. • Construct set constraints from the intermediate program. • Simplify constraints. • Gather all strings that may be assigned to variables named *.hrefor *.src.
Finding Links in Web Pages:Future Works • Demand-driven analysis • To analyze only the variables named *.hrefor *.src • Using the idea in [ChYi2002] • Increase precision • Process undeclared global variables and nested functions. • Distinguish different occurrences of same variables. • Handle arithmetic more sophisticatedly. • Consider using regular expressions instead of strings with *’s for final concrete output.
References • [He1994] Nevin Heintze, “Set-Based Analysis for ML Programs”, In Proceedings of the SIGPLAN Conference on Lisp and Functional Programming, 1994. • [ChYi2002] Woongshik Choi and Kwang Yi, “Demand-driven Set-Based Analysis”, Tech. Memo. ROPAS-2002-18, Research On Program Analysis System, Korea Advanced Institute of Science and Technology, October 2002. http://ropas.kaist.ac.kr/memo.