1 / 24

Building a Model for Static Analysis: Lexical, Parsing, Abstract Syntax, Semantic Analysis

This chapter covers how to build a model for static analysis, including lexical analysis, parsing, abstract syntax, semantic analysis, tracking control flow, tracking dataflow, taint propagation, and pointer aliasing.

wernera
Download Presentation

Building a Model for Static Analysis: Lexical, Parsing, Abstract Syntax, Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Static Analysis Chapter 4

  2. Summary (1) • Building a model of the program: • Lexical analysis • Parsing • Abstract syntax • Semantic Analysis • Tracking control flow • Tracking Dataflow • Taint propagation • Pointer aliasing

  3. Reporting Reporting results Eliminating unwanted results Explaining the significance of the results. Summary (2) • Analysis Algorithms • Checking Assertions • Naive local analysis • Approaches to local analysis • Global analysis • Research tools • Rules • Rule formats • Rules for taint propagation • Rules in print

  4. Introduction

  5. Building a model: Lexical Analysis • Decompose input into sequence of “tokens. • Ignore comments, whitespace. • Often uses regular expressions

  6. Building a model: Parsing • Uses a “context free grammar” to match the token stream. • A Grammar consists of a set of productions which describe symbols in the language. • Parser performs a derivation by matching the productions rules in the grammar to produce a “parse tree”. • There are many facilities to build scanners and parsers; lex and yacc are just one pair

  7. Building a model: Abstract syntax • Abstract syntax tree = syntax tree - “garbage nonterminals” • Sometimes, the AST may simplify the code: for example, all loops could be converted to a special kind of loop. If a tool is multilingual, some ASTs for different languages may be similar.

  8. Building a model: Semantic Analysis • During parsing, a symbol table is being build also; • It points to the definition • Contains its type • Need the info to make explicit conversions, can also do some method checking. • Often, the AST is now converted to a form more suited to the analysis, could be more than one.

  9. Building a model: Tracking control flow • Construct control flow graph from intermediate representation: • nodes are basic blocks of code, • edges are potential control flows between the different nodes. • Back edges represent possible loops. • Call graph traces calls between nodes

  10. Building a model: Tracking dataflow • Using the control flow analysis, we can now see how variables get set: this leads to converting a program to Static Single Assignment form (SSA) • SSA = variables are only assigned a value once (so use sub-index). • In case of merging control flows, use a Φ-function to “resolve” the issue. Φ(v1,v2)

  11. Building a model: taint propagation • Taint propagation = tracing where user-controlled variables/values are

  12. Building a model: Pointer aliasing • Detecting pointer aliasing is important in static analysis for taint propagation. • Many possible relationships, such as: • “must alias” • “may alias” • “cannot alias” • etc

  13. Analysis Algorithms: Introduction • Reason for analyzing a program is determining context. For example • cin >> is bad only when used with character arrays • Danger of strcpy depends on parameter sizes. • Two parts to analysis: • Analysis within procedures aka intraprocedural analysis. • Analysis of procedure interactions aka interprocedural analysis. • Because the two terms are so similar will use the terms “local” and “global” instead.

  14. Analysis Algorithms:Checking Assertions • Easiest way to check situations is to check assertions. For example strcpy(dest, src) is safe if (and only if) assert(alloc_size(dest) > strlen(src)); • Will use this approach to check for problems. • Three kinds of problems: • Trusting input for badly behaving data • Buffer overflows • Variable/type state.

  15. Analysis Algorithms:Naive local analysis • Need to have an idea of local variable values to check assertions. • Straight-lne code is relatively simple. • Gets complicated when program logic gets complicated. • Loops are worse than if-then-else

  16. Analysis Algorithms:Approaches to local analysis • Abstract interpretation: abstract away irrelevant properties. • For loops, do a flow insensitive analysis . • Predicate Transfomers: • Weakest precondition necessary to satisfy a postcondition. (Dijkstra) • Model Checking (for example, a FSA)

  17. Analysis AlgorithmsGlobal Analysis • Cannot be safely ignored. (For example, an environment variable is passed to a procedure as a character array, which is then copied, opening a vulnerability). • Naive approach: inlining... • More flexible approach: function summaries: • Short statement specifying pre-conditions and post-conditions of the function; often replace the function during analysis. • Program analysis then becomes similar to a graph traversal.

  18. LAPSE: Light Analysis for Program Security in Eclipse (for J2EE apps) MOPS: Model checking Programs for Security properties SATURN uses Boolean satisfiabiity for temporal safety problems Analysis AlgorithmsResearch Tools (1) • ARCHER = Array CHEckeR • BOON: checks array indices, imprecise • Cqual uses type analysis to do taint analysis; requires taint declarations • Eau Claire uses a theorem prover

  19. Xg++ uses templates to find uses of untrusted data. Some others: ESP SLAM BLAST Analysis AlgorithmsResearch Tools (2) • Splint, by adding annotations, can be used to find abstraction violations, like global variable modifications, use before init, array bounds.. • Pixy detects XSS vulnerability in PHP programs

  20. Rules • The need for rules. • Automatic? • Need for security, define library behavior, etc. • Built-in or can be added?

  21. Rule formats • Best to write the rules independently of the analyzer. But need formats: • (page 98 for two variants) • Annotations in the code (Java, Microsoft SAL) • There were many ad-hoc systems for writing rules which were unsuccessful. • PQL (page 101)

  22. Rules for taint propagation • Source rules • Sink rules • Pass-through rules • Cleanse rules • Entry-points (invoked by attacker) Taint flags: taint types.

  23. Reports • Careful with output! • False positives • Try to be part of a Unified SDE • Try to make result formatting and ordering an option. • Sometimes assumptions may simplify the output. • Rank by severity (most severe first) (buffer overflow more severe than null pointer dereference) • Confidence is important. (p 107 for a graph

  24. Reports (2) • Eliminating unwanted results: use pragmas or annotations, or eliminate whole categories. If code modification is not possible, a combination of line numbering and pattern matching is necessary. • Result significance, see examples in book.

More Related