String Analysis for Dynamic Field Access

String Analysis for Dynamic Field Access. Esben Andreasen Magnus Madsen. Department of Computer Science Aarhus University. Motivation. Static Analysis of JavaScript type analysis, bug finding or refactoring a key component is Points-To analysis Analysis of JavaScript is a difficult

String Analysis for Dynamic Field Access

Esben Andreasen

Department of Computer Science

Aarhus University

Motivation
• Static Analysis of JavaScript
• type analysis, bug finding or refactoring
• a key component is Points-To analysis
• Analysis of JavaScript is a difficult
• a flexible object model (prototype-based)
• dynamic field access
• coercions and eval
• non-standard scope rules
• ...

Today

Static Field Access

Field Name

Dynamic Field Access (DFA)

where e is any expression

Object

Array

v = o["p"]

v = o["p" + "q"]

v = o[???]

o

length

map

pop

push

reduce

reduceRight

reverse

shift

slice

some

sort

splice

unshift

__defineGetter__

__defineSetter__

__lookupGetter__

__lookupSetter__

constructor

hasOwnProperty

(4 more)

concat

every

filter

forEach

indexOf

join

lastIndexOf

Dynamic Writes

o["p"] = v

o["p" + "q"] = v

o[???] = v

All fields of o

may now point to v!

o[e1][e2] = v

e.g. prototype

e.g. toString

All fields of Object

may now point to v!

Spurious Event Handlers

var elm = \$("#button");

elm.onclick = function() {}

elm[???] = function() {}

The function is registered as all possible event handlers

Usage in Practice

Survey by Richards et al. [1]:

• 8.3% of all reads are dynamic
• 10.3% of all writes are dynamic

Dynamic field access is prevalent in libraries:

• jQuery, Mootools and Prototype: 300+ DFAs

[1]: Gregor Richards, Sylvain Lebresne, Brian Burg, and Jan Vitek. An Analysis of the Dynamic Behavior of JavaScript Programs. In PLDI, 2010.

Proposed Solution

What we need: A way to distinguish strings flowing into dynamic field accesses.

Solution: A set of light-weight lattices.

• focus on concatenate, equaland join
• compact and efficient
• ideally O(1)time and space
(Simple) Lattices
• Constant String (CS)

single concrete string, e.g. "push".

• String Set (SS)

set of k strings, e.g. "pop"and "push".

• Length Interval (LI)

min. and max. length, e.g. the interval [3; 4] for the strings "pop" and "push".

Character Inclusion (CI)

Sets of characters which may and mustoccur.

Example for the strings "pop" and "push":

Prefixes and Suffixes
• Prefix-Suffix Characters (PS)

first and last character, e.g. for the strings "pop" and "push":

and

• Prefix Suffix Inclusion (PSI)

may and must sets of characters for the first and last character.

(like the character inclusion lattice.)

Index Predicate

A boolean valued predicate which may/must hold for each of the first string indices.

Examples:

• isUppercase (useful for e.g. ”lastIndexOf”)
• isUnderscore (useful for e.g. ”__defineGetter__”)
• isDigit (useful for array indices)
Index Predicate: Concatenation

+

1

0

0

1

0

1

1

0

0

0

1

0

0

1

1

1

0

0

0

may-case

1

0

0

1

1

1

1

0

0

0

1

0

0

1

0

1

1

0

0

0

String Hash
• Pick a distributive hash function : where is some universe of a fixed size .
• Distributive:
• The lattice is the powerset lattice of .
• Length Hash (LH)
• Hashes the length instead of the string itself.
String Hash: Example

foo = (the quick brown fox)

4

33

29

40

13

• String Constant
• Prefix Suffix
• Character Inclusion
• Index Predicate
String Hash: Concatenation

Example: Let and

Assume a universe of size then:

Easy to compute in iterations.

Paper has solution in time.

Overview
• The Hlattice is the (reduced) product of
• the string set lattice (SS)
• the character inclusion lattice (CI)
• the string hash lattice (SH)
Evaluation

Q1: How precise are the lattices, independent of any particular analysis, for reasoning about strings used in DFAs?

Q2: To what degree does a more precise string lattice, for DFAs, improve overall precision and performance of a static analysis?

Evaluation: Dynamic Analysis

We perform a recordand replay of several popular JavaScript libraries:

Record:

• The history of every string flowing to a DFA.
• The field names of every receiver object at a DFA.

Replay:

• For every DFA merge the histories of strings and determine for each lattice if it has a false positive.

(example next slide)

Dynamic Analysis: Example

e = (c ? "a": "b") + "x"

o = {"ax": 1, "bx": 2}

v = o[e];

+

evaluating o[e] in

the abstract?

join

"x"

"a"

"b"

Evaluation: Dynamic Analysis

DFAs with zero false positives

Const. Str. Insufficient

Hybrid

Prefix/Suffic Incl.

Evaluation: Static Analysis

Flow-sensitive dataflow analysis for JavaScript

• inter-procedural, context-insensitive
• instantiated with the constant and hybrid lattices

Benchmarks

• Mozilla Sunspider + Google Octane
• Various GitHub projects
Summary
• Dynamic field access is common in JavaScript.
• Simple constant propagation is insufficient for reasoning about dynamic field accesses.
• The proposed hybridlatticeimproves precision and performance for 7 out of 10 benchmark programs.

Thank You!

Arrays

var a = [1, 2, 3];

var i = 0;

while (...) {

var x = a[i++];

}

String Patterns

"a|b|c|d".split("|")

Number- & Type String Lattices
• Number String (N)

A powerset lattice of the strings:

{Infinity, -Infinity, NaN, 0, 1, ...}

• Type String (T)

A powerset lattice of the strings:

{Boolean, Function, Object, String, Undefined}