230 likes | 354 Views
Verifying Dereference Safety via Expanding-Scope Analysis. Alexey Loginov (GrammaTech, Inc.) Joint work with: E. Yahav, S. Chandra, S. Fink (IBM TJ Watson) N. Rinetzky (Tel-Aviv University) M.G. Nanda (IBM IRL).
E N D
Verifying Dereference Safety via Expanding-Scope Analysis Alexey Loginov (GrammaTech, Inc.) Joint work with: E. Yahav, S. Chandra, S. Fink (IBM TJ Watson) N. Rinetzky (Tel-Aviv University) M.G. Nanda (IBM IRL)
Why Null-Dereference Analysis? • Common problem • …or symptom of other problems • Null-dereference warning may help in identifying root cause • Relevant to all software • Specification is obvious (absence of NPE) • Requires no user interaction
Why Sound Null-Dereference Analysis? • Safety guarantees are important in some domains • Results can become an in-code specification, e.g., via JSR 305 • Annotations can help with code understanding • Annotations can simplify future analyses (e.g., after modifications) • Precise and efficient sound analysis is challenging • Lessons carry over to other static analyses
Example answers expected • class A { • final A a = new A(); • static main() { • B b = new B(); • initB(b); • a.foo(b); // okay • } • foo(B b) { • b.f.fun(); // okay • b.f.f.gun(); // null-deref. • } • static initB(B b) { • b.f = new F(); // okay • b.f.f = null; // okay • } • } • Interprocedural information is needed often • Allocations in callers (e.g., new B()) common • Allocations in callees (e.g., new F()) common
Common approaches • Most existing tools perform intraprocedural analysis • Have to make assumptions about callers/callees • Option 1: pessimistic assumptions about callers/callees • Result: a sea of false alarms
Results of pessimistic intraproc. analysis • class A { • final A a = new A(); • static main() { • B b = new B(); • initB(b); • a.foo(b); // null deref. • } • foo(B b) { • b.f.fun(); // two null derefs. • b.f.f.gun(); // null deref. • } • static initB(B b) { • b.f = new F(); // null deref. • b.f.f = null; // okay • } • } • Reports four false alarms • Only real error is on line 10
Common approaches • Most existing tools perform intraprocedural analysis • Have to make assumptions about callers/callees • Option 2: optimistic assumptions about callers/callees • Result: missing real errors (catching the most glaring ones)
Results of optimistic intraproc. analysis • class A { • final A a = new A(); • static main() { • B b = new b(); • initB(b); • a.foo(b); // okay • } • foo(B b) { • b.f.fun(); // okay • b.f.f.gun(); // okay • } • static initB(B b) { • b.f = new F(); // okay • b.f.f = null; // okay • } • } • Misses the real error on line 10
Common approaches • Most existing tools perform intraprocedural analysis • Have to make assumptions about callers/callees • Option 3: mostly optimistic assumptions • Detects inconsistencies in programmer’s beliefs • Test x == null: belief that x could be null before test • Dereference of x without a test: belief that x cannot be null • Allow analysis to dismiss assumptions contradicted by beliefs • Result: missing real errors, reporting safe dereferences as unsafe • Generally, few false alarms but many missed errors • Same result as option 2 (optimistic assumptions) in our example
Prospects for interprocedural analysis • Whole-program analysis cannot scale to large software • Majority of instructions are relevant to null-dereference analysis • Can’t prune down program to a small relevant subset • Need mechanism to break down a program’s complexity
Expanding-Scope Analysis • Holy Grail • Cost: INTRAprocedural analysis • Precision: INTERprocedural (whole-program) analysis • Staged approach • Analyze dereferences with limited interprocedural context • Verify dereferences with the least amount of context • Increase interprocedural context for harder cases • In simplest form • Start with local analysis (with pessimistic assumptions) • Verify some dereferences without considering context • Consider remaining dereferences with extra level of context • Verify some dereferences within a call subtree of immediate callers • … • We refer to individual analyses as Limited-Scope Analyses
Expanding-Scope Analysis f f f f … f.foo() … f f f
Expanding-Scope Analysis main B b = new B(); initB(b); a.foo(b); initB foo b.f .fun(); b.f = new F(); b.f.f = null b.f .f .gun();
Abstract Domain • Product of three abstract domains • Abstract domain for may-alias analysis • Implementation: flow- & context-insensitive Andersen-style • Abstract domain for must-alias analysis • Implementation: demand-driven (based on def-use chains) • Set APnn of non-null access paths • Access paths denote l-value expressions: • (VarId| StaticFieldId).InstanceFieldId* • Finiteness of domain guaranteed by (parameterized) bounds on • Size of APnn • Maximal length of access paths in APnn • Only the final component (set of non-null access paths APnn) changes
Transfer Functions (statements) Let = InstanceFieldId* (sequences of instance fields)
Staged Analysis in SALSA(Scalable Analysis via Lazy Scope expAnsion) • Real OO applications (e.g., web applications) have wide call graphs • High scope limits are too expensive to analyze • New stages help stave off the need for high scope limits • Pruning • Verifies dereferences of (non-null) final and stationary fields • Special local (scope-0) analyses • Caller-guarantee analysis (top-down in call graph) • Propagates callers’ guarantees to callees • E.g., for references passed as arguments down deep call chains • Callee-guarantee analysis (bottom-up in call graph) • Propagates callees’ guarantees up to callers • E.g., for field initializations in deep initialization call chains
Staged Analysis in SALSA(Scalable Analysis via Lazy Scope expAnsion) pruning caller-guarantee • limited-scope • data-flow analyses callee-guarantee scope-1 • subtrees of depth 1 from parents scope-2 • subtrees of depth 2 from grandparents • … … symbolic high priority low priority
Steps of staged interproc. analysis • class A { • static main() { • initB(b); • } • foo(B b) { • } • static initB(B b) { • } • } • Pruning (final & stationary fields) • Limited-scope analysis • Scope-0 (local analysis) • Scope-1 analysis final A a = new A(); Caller-guarantee (local) analysis Callee-guarantee (local) analysis Scope-1 analysis B b = new B(); b.f APnn a.foo(b); b APnn b.f .fun(); b.f .f .gun(); b APnn b.f = new F(); b.f .f = null;
Experimental results • 21 (mostly open-source) applications • ~3K-465K bytecodes; ~300-37K dereferences • Avg: ~90% of dereferences verified soundly and automatically • ~8% dismissed by Pruning • ~77% dismissed by caller-guarantee analysis • ~5% dismissed by remaining stages • Final scope limit: between 2 and 5 (chosen heuristicallly) • Diminishing returns after local analyses (caller-/callee-guarantee) • Higher scope limits useful in the absence of caller/callee guarantees • Max. access-path length: 2 for all but four applications • Higher access-path lengths had no effect for most applications • Helped C-like applications (direct field dereferences without getters)
Experimental results • Expected many false alarms due to simple abstract domain • Implemented heuristic symbolic path-validity checking • This phase selected ~20% as high-priority warnings • Surprisingly low incidence of false alarms due to path-correlation • Biggest domain shortcoming: not tracking access-path types • Causes unnecessarily high cost of verifying certain dereferences • Includes too many irrelevant code portions when verifying a dereference • Produces false alarms due to examining type-infeasible paths • Results are encouraging for the simplicity of the domain
Tool-User Interaction • The output includes suggested annotations • Ordered by the number of warnings guaranteed to be dismissed • Actual number would require an alternate abstract domain • Current annotation options • Field f is non-null • Parameter p or return value of method foo() is non-null • User may choose to accept some annotations • We studied annotations for 8 benchmarks with high warning counts • A few hours effort for non-familiar code • Result: 30% decrease in warning counts
Summary • Novel expanding-scope analysis • Applicable to multiple abstract domains • Scalable and precise null-dereference analysis • Staged analysis makes a simple abstract domain effective • Vision: improve programs’ specifications and robustness • Cleanse programs by examining warnings and suggested annotations • Check accepted annotations with assertions or symbolic techniques • Extend the program’s specification and analyzability via annotations