Loading in 5 sec....

Relational String Verification Using Multi-track AutomataPowerPoint Presentation

Relational String Verification Using Multi-track Automata

- By
**senwe** - Follow User

- 101 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Relational String Verification Using Multi-track Automata' - senwe

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Relational String Verification Using Multi-track Automata

Fang Yu, Tevfik Bultan, and Oscar Ibarra

Department of Computer Science

University of California, Santa Barbara

Web software

- Web software is becoming increasingly dominant
- Web applications are used extensively in many areas:
- Commerce: online banking, online shopping, …
- Entertainment: online music & videos, …
- Interaction: social networks

- We will rely on web applications more in the future:
- Health records
- Google Health, Microsoft HealthVault

- Controlling and monitoring of national infrastructures:
- Google Powermeter

- Health records
- Web software is also rapidly replacing desktop applications
- Could computing + software-as-service
- Google Docs, Google …

- Could computing + software-as-service

One Major Road Block

- Web applications are not secure!
- Web applications are notorious for security vulnerabilities
- Their global accessibility makes them a target for many malicious users

- As web applications are becoming increasingly dominant and as their use in safety critical areas is increasing
- Their security is becoming a critical issue

Web applications are not secure

- There are many well-known security vulnerabilities that exist in many web applications. Here are some examples:
- Malicious file execution: where a malicious user causes the server to execute malicious code
- SQL injection: where a malicious user executes SQL commands on the back-end database by providing specially formatted input
- Cross site scripting (XSS):causes the attacker to execute a malicious script at a user’s browser

- These vulnerabilities are typically due to
- errors in user input validation or
- lack of user input validation

Web Application Vulnerabilities

- The top two vulnerabilities of the Open Web Application Security Project (OWASP)’s top ten list in 2007
- Cross Site Scripting (XSS)
- Injection Flaws (such as SQL Injection)

- The top two vulnerabilities of the OWASPs top ten list in 2010
- Injection Flaws (such as SQL Injection)
- Cross Site Scripting (XSS)

Why are web applications error prone?

- Extensive string manipulation:
- Web applications use extensive string manipulation
- To construct html pages, to construct database queries in SQL, etc.

- The user input comes in string form and must be validated and sanitized before it can be used
- This requires the use of complex string manipulation functions such as string-replace

- String manipulation is error prone

- Web applications use extensive string manipulation

String Related Vulnerabilities

- String related web application vulnerabilities occur when:
- a sensitive function is passed a malicious string input from the user
- This input contains an attack
- User input is not properly sanitized before it reaches the sensitive function

- String analysis: Discover these vulnerabilities automatically

XSS Vulnerability

- A PHP Example:
1:<?php

2: $www = $_GET[”www”];

3: $l_otherinfo = ”URL”;

4: echo ”<td>” . $l_otherinfo . ”: ” . $www . ”</td>”;

5:?>

- The echo statement in line 4 is a sensitive function
- It contains a Cross Site Scripting (XSS) vulnerability

<script ...

String Analysis

- String analysis determines all possible values that a string expression can take during any program execution
- Using string analysis we can identify all possible input values of the sensitive functions
- Then we can check if inputs of sensitive functions can contain attack strings

- How can we characterize attack strings?
- Use regular expressions to specify the attack patterns
- Attack pattern for XSS: Σ∗<scriptΣ∗

- If string analysis determines that the intersection of the attack pattern and possible inputs of the sensitive function is empty
- then we can conclude that the program is secure

- If the intersection is not empty, then we conclude that the program might be vulnerable

String Systems

stmt ::= id := sexp; |

id := call id (sexp);

if exp then goto l; | (where l is a stmt label)

goto L; | (where L is a set of stmt labels)

inputid; |

outputexp; |

assert exp;

exp ::= bexp | exp and exp | exp and exp | not exp

bexp ::= atom = sexp

sexp ::= sexp . atom | atom | suffix(id) | prefix(id)

atom ::= id | c (where c is a string constant)

Basic String System Categorization

We use the following categorization

- N/D: nondeterministic or deterministic
- U/B/K: unary, binary or arbitrary alphabet
- The set of variables
- The types of statements
- The types of branch conditions
Example: NB(X1, X2) Xi := Xi.c; X1 = X2

Nondeterministic, binary alphabet, variables X1, X2, statements of the form Xi := Xi.c, branch conditions of the form X1 = X2

Define the reachability problem for the string systems as:

Given a string system and a configuration (an instruction label and values for the variables) is that configuration reachable?

Decidability Results

Reachability problem for:

- NB(X1,X2) Xi := Xi.c; X1 = X2 is undecidable
- Reduction from Post Correspondence Problem

- DU(X1,X2,X3) Xi := Xi.c; X1 = X3, X2 = X3 is undecidable
- Can simulate 2-counter machines

- NK(X1, . . . ,Xk) Xi := d.Xi.c; c = Xi, c = prefix(Xi), c=suffix(Xi) is decidable
- Reduction to emptiness check for multi-tape automaton

- DK(X1, . . . ,Xk) Xi := Xi . a, Xi := a . Xi; X1 = X2, c = Xi, c = prefix(Xi), c = suffix(Xi) is decidable.
- Can bound the execution steps if there is no infinite loop

Automata-based String Analysis

- Finite State Automata can be used to characterize sets of string values
- We use automata based string analysis
- Associate each string expression in the program with an automaton
- The automaton accepts an over approximation of all possible values that the string expression can take during program execution

- Using this automata representation we symbolically execute the program, only paying attention to string manipulation operations

String Analysis Stages

- Convert PHP programs to dependency graphs
- Use symbolic reachability analysis to compute an over-approximation of reachable configurations
- Forward analysis
- Assume that the user input can be any string
- Propagate this information on the dependency graph
- When a sensitive function is reached, intersect with attack pattern

- Result
- If the intersection is not empty, there might be a vulnerability
- If the intersection is empty the program is not vulnerable (wrt attack pattern)

Reachability

Analysis

Front

End

Vulnerability

Report

PHP

Program

Attack

patterns

Dependency Graphs

Given a PHP program,

first construct the:

Dependency graph

1:<?php

2: $www = $ GET[”www”];

3: $l_otherinfo = ”URL”;

4: echo $l_otherinfo .

”: ” . $www;

5:?>

“URL”, 3

$l_otherinfo, 3

$_GET[www], 2

“: “, 4

str_concat, 4

$www, 2

str_concat, 4

echo, 4

Dependency Graph

Symbolic Reachability Analysis

- Using the dependency graph we conduct symbolic reachability analysis
- Automata-based forward fixpoint computation that identifies the possible string values of each node
- Each node in the dependency graph is associated with a DFA
- DFA accepts an over-approximation of the strings values that the string expression represented by that node can take at runtime
- The DFAs for the input nodes acceptΣ∗

- Intersecting the DFA for the sink nodes with the DFA for the attack pattern identifies the vulnerabilities

- Each node in the dependency graph is associated with a DFA

Forward Analysis

Attack Pattern = Σ*<Σ*

Forward = Σ*

“URL”, 3

$_GET[www], 2

URL

“: “, 4

$www, 2

$l_otherinfo, 3

:

Σ*

URL

str_concat, 4

str_concat, 4

URL:

URL: Σ*

echo, 4

URL: Σ*

≠ Ø

∩

= L(URL: Σ*< Σ*)

L(URL: Σ*)

L(Σ*<Σ*)

Relational String Analysis

- Earlier work on string analysis use multiple single-track DFAs during symbolic reachability analysis
- One DFA per variable per program location

- Our approach: Use one multi-track DFA per program location
- Each track represents the values of one string variable

- Using multi-track DFAs:
- Identifies the relations among string variables
- Improves the precision of the path-sensitive analysis
- Can be used to prove properties that depend on relations among string variables, e.g., $file = $usr.txt

Multi-track Automata

- Let X (the first track), Y (the second track), be two string variables
- λ is the padding symbol
- A multi-track automaton that encodes the word equation:
X = Y.txt

(λ,t)

(λ,x)

(λ,t)

(a,a), (b,b) …

Alignment

- To conduct relational string analysis, we need to compute ”intersection” of multi-track automata
- Intersection is closed under aligned multi-track automata
- In an aligned multi-track automaton λs are right justified in all tracks, e.g., abλλ instead of aλbλ

- Intersection is closed under aligned multi-track automata
- However, there exist unaligned multi-track automata that are not equivalent to any aligned multi-track automata
- We propose an alignment algorithm that constructs aligned automata which over or under approximates unaligned ones
- Over approximation: Generates an aligned multi-track automaton that accepts a super set of the language recognized by the unaligned multi-track automaton
- Under approximation: Generates an aligned multi-track automaton that accepts a subset of the language recognized by the unaligned multi-track automaton

- We propose an alignment algorithm that constructs aligned automata which over or under approximates unaligned ones

Symbolic Reachability Analysis

- Transitions and configurations of a string system can be represented using word equations
- Word equations can be represented/approximated using aligned multi-track automata which are closed under intersection, union, complement and projection
- Operations required for reachability analysis (such as equivalence checking) can be computed on DFAs

Word Equations

- Word equations: Equality of two expressions that consist of concatenation of a set of variables and constants
- Example: X = Y . txt

- Word equations and their combinations (using Boolean connectives) can be expressed using only equations of the form X = Y . c, X = c . Y, c = X . Y, X = Y. Z, Boolean connectives and existential quantification
- Our goal:
- Construct multi-track automata from basic word equations
- The automata should accept tuples of strings that satisfy the equation

- Boolean connectives can be handled using intersection, union and complement
- Existential quantification can be handled using projection

- Construct multi-track automata from basic word equations

Word Equations to Automata

- Basic equations X = Y . c, X = c . Y, c = X . Y and their Boolean combinations can be represented precisely using multi-track automata
- The size of the aligned multi-track automaton for X = c . Y is exponential in the length of c
- The nonlinear equation X = Y . Z cannot be represented precisely using an aligned multi-track automaton

Word Equations to Automata

- When we cannot represent an equation precisely, we can generate an over or under-approximation of it
- Over-approximation: The automaton accepts all string tuples that satisfy the equation and possibly more
- Under-approximation: The automaton accepts only the string tuples that satify the equation but possibly not all of them

- We implement a function CONSTRUCT(equation, sign)
- Which takes a word equation and a sign and creates a multi-track automata that over or under-approximation of the equation based on the input sign

Post condition computation

- During symbolic reachability analysis we compute the post-conditions of statements using the function CONSTRUCT
Given a multi-track automata M and

an assignment statement: X := sexp

Post(M, X := sexp) denotes the post-condition of X := sexp with respect to M

Post(M, X := sexp)

= ( X , M ∩ CONSTRUCT(X’ = sexp, +))[X/X’]

- We implement a symbolic forward reachability computation using the post-condition operations
- It is a least fixpoint computation
- We use widening to achieve convergence

Widening

- String verification problem is undecidable
- The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion
- We compute a sound approximation
- During fixpoint we compute an over approximation of the least fixpoint that corresponds to the reachable states

- We use an automata based widening operation to over-approximate the fixpoint
- Widening operation over-approximates the union operations and accelerates the convergence of the fixpoint computation

Summarization

- We developed techniques for handling function calls using summarization
- We generate a transducer that is the summary of a function
- It represents a relation between the arguments of the function and the value it returns
- We generate a multi-track automaton for the function summary
- We generate the function summary also using forward fixpoint computation and widening

- We use the function summaries during reachability analysis to handle function calls

Symbolic Automata Representation

- We used the MONA DFA Package for automata manipulation
- [Klarlund and Møller, 2001]

- Compact Representation:
- The transition relation of the DFA is represented as a multi-terminal BDD (MBDD)

- Exploits the MBDD structure in the implementation of DFA operations
- Union, Intersection, and Emptiness Checking
- Projection and Minimization

- Cannot Handle Nondeterminism:
- We extended the alphabet with dummy bits to encode nondeterminism

Stranger: A String Analysis Tool

Stranger is available at:

www.cs.ucsb.edu/~vlab/stranger

- Uses Pixy [Jovanovic et al., 2006] as a PHP front end
- Uses MONA [Klarlund and Møller, 2001] automata package for automata manipulation

Attack

patterns

Pixy Front End

Symbolic String Analysis

String/Automata Operations

Automata Based

String Manipulation

Library

Parser

String

Analyzer

Dependency

Graphs

Stranger Automata

PHP

program

CFG

DFAs

Dependency

Analyzer

String Analysis

Report

(Vulnerability

Signatures)

MONA Automata

Package

Experiments

- XSS (Cross-Site Scripting) benchmarks (contain vulnerability)
- We check whether the input to a sensitive function can contain the string <script
- S1: MyEasyMarket-4.1, trans.php (218)
- S2: PBLguestbook-1.32, pblguestbook.php(1210)
- S3: Aphpkb-0.71, saa.php(87)
- S4: BloggIT 1.0, admin.php(23)

- MFE (Malicious File Execution) benchmarks (do not contain vulnerability):
- We check whether the retrieved files and the external inputs are consistent with the security policy
- M1: PBLguestbook-1.32, pblguestbook.php(536)
- M2, M3: MyEasyMarket-4.1, prod.php (94, 189)
- M4, M5: php-fusion-6.01, db backup.php (111), forums prune.php (28).

Case Study

- Schoolmate 1.5.4
- Number of PHP files: 63
- Lines of code: 8181

- Forward Analysis results
- After manual inspection we found the following:

Case Study – False Positives

- Why false positives?
- Path insensitivity: 39
- Path to vulnerable program point is not feasible

- Un-modeled built in PHP functions : 6
- Unfound user written functions: 3
- PHP programs have more than one execution entry point

- Path insensitivity: 39
- We can remove all these false positives by extending our analysis to a path sensitive analysis and modeling more PHP functions

Case Study - Sanitization

- We patched all actual vulnerabilities by adding sanitization routines
- We ran stranger the second time
- Stranger proved that our patches are correct with respect to the attack pattern we are using

Related Work: String Analysis

- String analysis based on context free grammars: [Christensen et al., SAS’03] [Minamide, WWW’05]
- String analysis based on symbolic/concolic execution: [Bjorner et al., TACAS’09]
- Bounded string analysis : [Kiezun et al., ISSTA’09]
- Automata based string analysis: [Xiang et al., COMPSAC’07] [Shannon et al., MUTATION’07]
- Application of string analysis to web applications: [Wassermann and Su, PLDI’07, ICSE’08] [Halfond and Orso, ASE’05, ICSE’06]

Related Work

- Size Analysis
- Size analysis: [Hughes et al., POPL’96] [Chin et al., ICSE’05] [Yu et al., FSE’07] [Yang et al., CAV’08]
- Composite analysis: [Bultan et al., TOSEM’00] [Xu et al., ISSTA’08] [Gulwani et al., POPL’08] [Halbwachs et al., PLDI’08]

- Vulnerability Signature Generation
- Test input/Attack generation: [Wassermann et al., ISSTA’08] [Kiezun et al., ICSE’09]
- Vulnerability signature generation: [Brumley et al., S&P’06] [Brumley et al., CSF’07] [Costa et al., SOSP’07]

Our Other String Analysis Publications

- Yu et al. Stranger: An Automata-based String Analysis Tool for PHP [TACAS’10]
- Yu et al. Generating Vulnerability Signatures for String Manipulating Programs Using Automata-based Forward and Backward Symbolic Analyses [ASE’09]
- Yu et al. Symbolic String Verification: Combining String Analysis and Size Analysis [TACAS’09]
- Yu et al. Symbolic String Verification: An Automata-based Approach [SPIN’08]

Current and Future Work

- Vulnerability signature generation
- A characterization of all the inputs that might exploit a vulnerability

- Automated sanitization generation
- Automatically fixing a vulnerability by modifying the input in a minimal way

- Client side string analysis
- Javascript

Download Presentation

Connecting to Server..