Static Source Code Analysis

Static Source Code Analysis CPSC 410

Static Analysis • Want to determine execution properties of source code 1. Without executing all possible test cases 2. Without reverse-engineering it in our head • Execution properties • Properties that do not depend on the structure • Properties that are invariant over Refactoring

Static Analysis Overview • Provides • Automated Abstract Reasoning • Uses • Dataflow analysis over Control Flow Graphs • Abstract program Operations • Branch Predicate analysis

Abstract Reasoning • Asks questions about program variable properties without executing the program. • Example: Does the value from input( ) affect the value of the variable x? y = input ( ); z = y + 5; y = 6; x = z;

Abstract Reasoning Examples • If a program takes no negative inputs, has no negative constants, uses no subtraction or bit operators, … • It won’t have a negative output • If a program initializes all variables when they are declared • It won’t have a null pointer exception • A variable in an if-branch, when not assigned-to in the branch • can’t have a value that contradicts the if-condition

Dataflow Motivation • Determine the paths that data follows so that we can apply abstract reasoning at a fine granularity Example • If we know foo uses no negative numbers • and bar is only called by foo bar(x) { …sqrt(x); }

Abstract Operator Motivation • Interpret program operators on variable properties instead of values • Addition examples Pos + Pos = Pos Neg + Neg = Neg Pos + Neg = ? ? + Pos = ? • Multiplication examples Pos * Pos = Pos Neg * Neg = Pos Pos * Neg = Neg Pos * ? = ?

Branch Analysis Motivation if(y < 0) { throw new IllegalArgumentException(); } if(x > y) { z = x * -1; } else { z = 1; } Can z be negative at the of this code?

Data-flow Requires Control-flow Graph Can base analysis on a model of the control-flow of the program A node in a control-flow graph (CFG) represents a statement An edge (i,j) represents a possible transfer of control from node i to node j • Consider Single Method CFG • Ignoring Exceptions

Single Method Control-Flow • Use two special nodes to denote entry and exit of method Start points to first statement all return statements point to Exit • Connect with other nodes for method body • Assignments • Declarations • Conditionals/Loops/Logical Operators • Input/Output Start Exit

Statements • Use Statement Level for Dataflow Analysis • Not Block Level! Example (in red): a = b if ( a > b ) {a = b; x = x + 1;} print(x); x = x + 1 a = b x = x + 1

Conditional Conditionals have outgoing arcs labeled true or false a > b true a = b Example: false if ( a > b ) { a = b; x = x + 1;} print(x); x = x + 1 print(x)

Loop Last statement in loop has a back edge to loop condition x > y true while (x > y) { x = x + 1; y = y * 2; } return y; x = x + 1 false y = y * 2 return y;

Method Start int method(int a, int b, int x, int y) { if ( a > b ) { a = b; x = x + 1; } print(x); while (x > y) { x = x + 1; y = y * 2; } return y; } a > b true x > y a = b true false x = x + 1 false x = x + 1 print(x) y = y * 2 return y; Exit

Iterative Data-Flow Analysis • Iterative Data-Flow Analysis Framework • Theoretical framework for many dataflow analyses • Iterates over CFG and annotates nodes/edges with sets of assertions • Each analysis chooses: 1. Domain 2. Approximation 3. Direction 4. Transfer functions for each CFG node type • Examples • Liveness • Available Expressions • Reaching Definitions • Information flow security

Data-Flow Analysis Framework • Domain • What kind of solution is the analysis looking for? • Ex. Variables have not yet been defined • Algorithm assigns a set of assertions to each node/edge • Approximation • Useful data-flow properties are never 100% accurate • Rice’s Theorem, from 1953 • Lower approximation is called a MUST analysis • Set of solutions found is smaller than the set of actual solutions • Upper approximation is called a MAY analysis • Set of solutions found may be larger than the set of actual solutions

Data-Flow Analysis Framework • Direction • Forwards: For each node/edge, computes information about past behavior • Backwards: For each node/edge, computes information about future behavior • Transfer Functions • JOIN: Specifies how information from adjacent nodes /edges is propagated • MAY: Union of adjacent edges • MUST: Intersection of adjacent edges • GEN: Specifies which possible solutions are generated at the node/edge • KILL: Specifies which possible solutions are removed at that node/edge

Data-Flow Algorithm • Start at the top (bottom) of the CFG • Forwards: top • Backwards: bottom • At each node compute: (JOIN() – KILL(node)) U GEN(node) At each branch: Follow all paths, in any order, up to node where path merges Once all paths up to merge are complete, continue at merge node • If all JOIN edges are not yet computed, • use empty set (MAY) • universal set (MUST) • For loops: • repeat until the solution for all nodes in loop doesn’t change • Called the “fixed-point”

Liveness • A variable is live at a node if its current value can be read during the remaining execution of the program • i.e. it holds a value needed in the future. • Domain: program variables • Backwards MAY analysis

Liveness Example

Liveness Transfer Functions • Exit • GEN(exit) = { } • KILL(exit) = { } • Conditions and Output • GEN(stmt) = Set of all variables appearing in the statement • KILL(stmt) = { } • Assignment • GEN(assignment) = Set of all variables appearing on the right-hand side • KILL(assignment) = Set with variable being assigned to • Declaration • GEN(declaration) = { } • KILL(declaration) = Set of variables being declared • Other • GEN(other) = { } • KILL(other) = { }

Liveness Example {x, y} {x} {x, y} { } {x} true false { } {x} {x, z} {x, z} START true false {x, z} { x } { } END

Liveness Application • Memory Allocation • Since y and z are never live at the same time, they can share the same memory location • Performance Optimization • Assignment, z = z – 1, is never used

Data-Flow Framework Summary • Generic framework for different analyses • Each analysis defines • Domain • Approximation • Direction • Transfer Functions • Used for optimization, verification, and testing

Static Source Code Analysis

Static Source Code Analysis

Presentation Transcript

Software Security with Static Code Analysis Using CAT.NET

Static Techniques on code

Source Code Analysis

Assessment of Software Quality with Static Source Code Analysis: GridSpace2 Case Study

Static Code Analysis to Find Bugs

Static Code Analysis and Governance

Source Code Analysis Using BAT

Malgram Malware Analysis: Malware Unpacking Static Analysis Code Deobfuscation Decompilation

Source code

Static Analysis

Peer Code Review and Static Code Analysis Tools

Example application: source code analysis

Transformation and Analysis of Haskell Source Code

Source code analysis with Columbus

Static Code Scheduling

Network Simulator 2 Source Code Analysis Method

Static Analysis

How To Speed Up Your Static Code Analysis

Static Code Analysis Tools Appvigil

Secure Source Code Analysis

Static Code Analysis

Static Analysis of Embedded C Code