Intrusion Prevention and Detection for Web Databases

Intrusion Prevention and Detection for Web Databases by Ehud Gudes and Alex Roichman June 2008

Outline • Web Applications and Databases • Architecture • Attack surface • Multilayered security solution for web databases: • Parameterized View prevention method • Session Key parameter • Rolling Key parameter • Implementation and analysis • DIWeDa – intrusion detection method • Architecture • Profile building algorithm • Intrusion detection algorithm • Implementation and evaluation • Summary

Two-tier architecture • Users are fixed and known • Users’ number is limited • All transactions are passed via the same connection and users connect directly to the DB Traditional DB

Traditional DB Access Control • DB can identify and authenticate its users • DB can authorize users by traditional user/role-based access control • It is quite easy to follow up single user transactions to seek signs of intrusion • Views can be used to determine for a user the only part of the database that interests her

Web Databases Three-tier architecture Users are casualand unknown Users’ number is not limited Users do not connect directly to the DB

Connection Pooling • Different web users can run their SQL statements on the same DB connection and one user can run her SQL statements on different connections • This technique contributes to application efficiency since the time to open and close the connection is saved per each request • It has serious implications on the database’s access control mechanism

Web DB Access Control • Web DB does not identify the real application user who accesses it • The only user accessing the database is the user of the web application server – most often with full access privileges (administrator or “super-user”) • No user-based access control can be applied • No or very limited RBAC can be applied • The principle of minimal privilege is violated • No more fine-grained access control to the web DB exists

Web DB is vulnerable • There is no way to limit the web user privileges at the database level of the web databases. • There are still many secured web applications, but their security can be achieved only by application means (writing lines of code that implement security policies) and not by database access control means (GRANT/REVOKE and VIEWS) • This situation is very problematic: • Implementing access control by writing code is a time consuming task • Enterprise may have one DB but many applications, then access control is distributed and in many cases not consistent • Programmers must be security specialists • There is no confidence that all accesses to DB are authorized • As a result: "70% of websites are at immediate risk of being hacked!" (http://www.acunetix.com)

Attack example: SQL Injection • In many Web application the SQL sentence is structured as a string and the user’s parameter is concatenated to this string: strSQL= "SELECT Salary FROM Salary_Table WHERE Employee_No = 123 AND Salary_Date = '" + dateParam + "'" • But the intruder can type `01.2007' or '1' = '1` and retrieve the entire table: SELECT Salary FROM Salary_Table WHERE Employee_No = 123 AND Salary_Date = '01/2007' OR '1' = '1'

Attack example: Business Logic Violation • Another attack on web databases is the Business Logic Violation (BLV)Attack. • For example: an enterprise can require that each buyer must key up her paying details prior to buying some book in an online bookstore application. Buying without entering a payment details will violate a business rule of an enterprise.

Web attacks reports • Many web applications have serious security holes and vulnerability reports found on the Internet show how the existing situation is critical: • On November 01, 2005 a high school student used SQL injection to break into the site of a Taiwanese information security magazine and steal customer's information (http://www.webappsec.org/projects/whid/list_id_2005-46.shtml). • On January 13, 2006 hackers broke into a Rhode Island government web site and stole credit card data from individuals who have done business online with state agencies (http://www.webappsec.org/projects/whid/list_id_2006-3.shtml). • On March 29, 2007 a SQL injection flaw in official Indian government site was discovered (http://www.webappsec.org/projects/whid/list_id_2007-12.shtml). • On June 29, 2007 hackers defaces Microsoft U.K. web page by using an SQL injection attack (http://www.cgisecurity.com/2007/06/27). • On August 12, 2007 the United Nations web site was defaced by using SQL injection (http://hackademix.net/2007/08/12/united-nations-vs-sql-injections/). • …

Web attacks reports –SQL Injection is very popular • SQL injection attacks are extremely prevalent, and ranked as the second most common form of attack on web applications in 2006 in CVE (Common Vulnerabilities and Exposures list). • The percentage of these attacks among the overall number of attacks reported rose from 5.5% in 2004 to 14% in 2006 (http://cve.mitre.org). • Recent reports suggest that a large number of applications on the web are vulnerable to SQL injection attacks (http://www.darkreading.com/document.asp?docid=103774&WT.svl=news1_3), • The number of attacks are on the increase, and SQL Injection is on the list of most prevalent forms of attack (http://www.computerweekly.com/Articles/2004/04/16/201840/Top+five+threats.htm) • The recent SQL Injection attack on CardSystems Solutions that exposed several hundreds of thousands of credit card numbers is an example of how such attack can victimize an organization and members of the general public (http://www.secureworks.com/press/20060718-sql.html).

Related work • Conventional methods for providing protection in databases rely heavily on the identity of the entity accessing the database • Several suggestions were published in the literature to prevent SQL injection attacks • There is research that proposes to refer to the structure of each SQL statement possible in the system and to fingerprint that structure • An additional approach is to refer to some interesting properties of each SQL sentence such as referenced tables and fields • Another approach to database IDS is not to build profiles for each SQL sentence, but for each database user • Another approach is to search for data dependencies among the data items in a database

Related work criticism • The proposed prevention methods are applicative and cannot be implemented by the database built-in mechanisms; they try to shield databases from attacks instead of making databases resistant to them • The proposed protection methods do not fit web databases since they do not propose any algorithm for dealing with web session traceability problem; they do not work at the session level and their approach is not role-based – thus they cannot detect many attacks from web applications

Our Approach • We propose the two layered security model for web databases: • The purpose of the first line of defense is to prevent intrusions. • The purpose of the second line of defense is to detect attacks which elude the prevention layer. • We use the Parameterized Views method - a novel fine-grained access control mechanism at the first prevention line of defense. • We use DIWeDa (Detecting Intrusion in Web Databases) as our method of intrusion detection at the second line of defense. • Two layers work together to achieve the highest level of web database protection.

Prevention - the first line of defense • The Parameterized View method solves the problems of absence of web database internal access control and enable using of views as a means of access restriction • This method is capable of preventing attacks to the DB by the existing native database protection mechanisms • It is attack-independent and significantly decreases the database attack surface • It enables real-life development of secure Internet database by using commercially available tools

Parameterized View • Our concept is based on the use of parameterized views as the means to supervise the accesses to the DB from web systems • The parameterized view transfers the user's identity to the DB and the view displays the relevant data to this user accordingly, thus providing fine-grained access control to web DB • Example of the Parameterized View definition: CREATE VIEW Student_Marks_View WITH pStudent_No SELECT * FROM Student_Marks_Table WHERE Student_No = :pStudent_No

Traditional Parameterized Views without random parameters SELECT * FROM Student_Marks_View(1) WHERE Course_No = 12345 • Example of select from the parameterized view: • But if instead of the course number an intruder types: 12345 UNION Select * From Student_Marks_View(2) then the SQL injection will still be possible: • The solution to this problem is not to use the explicit user identity in the SQL statement but instead use a run-time generated identifier which will be very difficult to fake. SELECT * FROM Student_Marks_View(1) WHERE Course_No = 12345 UNION SELECT * FROM Student_Marks_View(2)

Parameterized views flow of events

Our Parameterized View • The main requirement is that the parameter will be difficult to fake • The application will be authorized to parameterized views only • In the parameterized view, the parameter will contain the identity of the user • We present two design solutions for the parameterized views: • Application Session Key parameter • Rolling Key parameter • For details See our SACMAT2007 paper

Detection - the second line of defense • Although access control can prevent many attacks on web databases, it cannot prevent all of them. • One kind of an attack on web databases which cannot be prevented is the Business Logic Violation(BLV) • This attack cannot be prevented ether by a database not by the parameterized views since the existing database/views access control can grant or revoke access to resources only according to the accesor identity/role/parameter and cannot rely on the business logic of an enterprise. • If many data-centric attacks cannot be affectively prevented, it make a sense at least to detect them.

Detecting BLV at the DB level vs. Application level • BLV can be detected either at the application or at the database level, but- • An enterprise in general has several applications, but one database. • Applications are changed frequently, but business rules which are seen at the database level are stable. • Thus it is preferable to have only one IDS at the database level to enforce stable business rules and not to have IDS for each application and to cope with continues application's changes. • The conclusion – it is preferable to detect BLV at the database level by a web database Intrusion Detection System.

DIWeDa – our solution for Web Database Intrusion Detection Systems • Intrusion Detection Systems (IDS) for network and operating systems have existing for over 20 years. • IDS for databases is relatively new field of research that has surfaced in the last few years. • IDS for web databases is almost not exist. • We present DIWeDa, which is able to detect not only most known attacks such as SQL injections, but also more complicated and complex kinds of attacks such as business logic violations.

DIWeDa – role based approach • Since different roles of an enterprise have different authorizations, the best strategy for web database IDS is to build profiles not per an enterprise but per enterprise roles. • Role-based models are widely used for old desktop applications, but most of the web applications do not use roles: • the real user of a web session is unknown at the database level due to connection pooling issues, thus it is impossible to apply role base access control to web databases; • sometimes the actual role a user uses is determined dynamically only at run time. • DIWeDa identifies database roles from the learnt profiles and look for intrusions from one role to another.

Employing many intrusion classifiers • We use data mining techniques to detect intrusions and employ two different intrusion classifiers: • SQL content classifier (this paper) • SQL parameters classifier • The combination of intrusion classifiers help us to achieve very high accuracy and preciseness in detecting of attacks. • The combination of intrusion classifiers was previously used for network or operation systems IDS, but we first use it for detecting intrusions for web databases.

DIWeDa’s System Architecture

Profile Builder algorithm • During the training phase DIWeDa learns an application behavior and builds profiles which will be used later during the detection phase. • Each DIWeDa’s profile consists of: • the SQL Fingerprint-Set that represents the enterprise SQL space • Statistical characteristics of parameters for each fingerprint • The Cluster-Set that represents the SQL content of each enterprise role

Profile Builder: Fingerprint-Set • SQL fingerprint represents the structure of a SQL sentence. • DIWeDa learns all possible fingerprints of an application and save them in the Fingerprint-Set. • Some attacks, such as SQL injection, always change the structure of SQL, so if a session contains an SQL statement without known fingerprint – this session might be under attack. • But more complicated attacks such as logic violations cannot be detected only by looking for a statement level SQL structure anomaly- we also must look for a session level SQL content anomaly.

Profile Builder: Session Vector • Each application session can be presented by its SQL Session Vector. • A Session Vector is a binary vector SV with the length equal to the number of fingerprints in the application, where the ith bit is 1 if the application session submits SQL with the ith fingerprint, else bit i is 0. (note order is immaterial) • For example, if some application session runs SQL statements with corresponding fingerprints 1, 3, 5 and there are 7 different fingerprints at all, we have the following session vector: SV = {1, 0, 1, 0, 1, 0, 0}

Profile Builder: SQL space • The Session Vector enables us to formally define the session SQL contents. • We can think about the application SQL space as an n-dimensional space, where n is the number of fingerprints for the application. • Then each session's SQL content can be seen as a vector (or a point with n coordinates) in the n-dimensional space.

How Sessions are identified? Connection pooling makes identifying and separating under sessions difficult But with parameterized views the SQL statements issued by a user are uniquely identified by the unique random key of the parameterized view. Thus, sessions can be identified and monitored.

Profile Builder: distribution in SQL space • If a session's SQL contentwas absolutely random, then the distribution of vectors in the space should be uniform. • But in reality, a session's SQLcontent is not random- this means that a session can submit only SQL statements that a session's role is supposed to submit. • The groups of points in the last figure represent different access roles and show that SQL contents of sessions from the same role are very similar, while SQL contents of sessions from different roles are far apart.

Profile Builder: looking for intrusions • We will find clusters (application roles) by using data mining techniques • We will look for intrusion by looking for sessions with vectors which are far from the found clusters

Profile Builder: Cluster • The Cluster is a group of highly similar Session Vectors which belong to the same role. • The Cluster Centroid is a vector CC with vector values that are the respective means of the cluster vectors. • The distance D between two clusters represented by their centroids is computed as:

Profile Builder: Clustering Algorithm • Now we can use the data mining technique (hierarchical clustering algorithm) and cluster application sessions of the same role. • This algorithm builds a tree which is called the cluster Dendrogram tree.

Profile Builder: Building Dendrogram Hierarchical_Clustering_Algorithm //Building Dendrogram tree { • For each application session build its Session Vector. • Start with each Session Vector as a separate cluster. • Save all clusters received at this stage in a Cluster-Set and initialize i to 1. • Select two closest clusters to merge into a single cluster. • Compute the new cluster centroid for the merged cluster. • Save all clusters received at this stage in a corresponding Cluster-Set and advance i by 1. • Repeat steps 4-6 until we get a single cluster. }

Profile Builder: Cluster-Set • Now we will choose the best Cluster-Set (layer of the dendrogram tree) to serve as a profile for DIWeDa. • We recommend to choose a Cluster-Set with the Minimum Total Distance to serve as the profile for DIWeDa. • This approach finds a cluster-set with small specific clusters that are far from a global centroid. • By using this approach, DIWeDa will find specific separated roles (clusters).

Profile Builder: Cluster-Set with Minimum Total Distance • The Minimum Total Distance is computed by the following formula: • where Intra-Cluster Distance is computed by: • and Inter-Cluster Distance is computed by:

Profile Builder: Algorithm To summarize, our algorithm of building cluster-based profile is as follows: Build_Cluster_Based_Profile { • Find all application fingerprints with support > SIR and save them in Fingerprint-Set • Run Build_Dendrogram • Select the appropriate set of clusters (layer in Dendrogram) with the Minimum Total Distance and save it in Cluster-Set • Delete Clusters from Cluster-Set with a small Support }

Detection Phase: Session Anomaly by SQL content • The probability of an analyzed session represented by its session vector SV to have an abnormal SQL content is defined by the formula: where MD (Minimum Distance) is the distance between SV and the closest cluster centroid from the cluster-set and NUS is the number of unexpected statements in the analyzed session. • This formula is based on the following assumption: • The further a session vector is from any existing cluster, the more abnormal a session is. • The more unexpected statement a session has, the more abnormal a session is.

Detection Phase: Intrusion Threshold • Any session with probability to be abnormal greater than the Session Intrusion Threshold will be classified as intrusion. • The Session Intrusion Threshold is the IDS sensitivity and is represented by a number from the range [0, 1], where each session with the probability to be abnormal greater than this threshold will be classified as intrusion. • The interesting thing that was learned from our evaluation is that the best threshold is equal to the Cluster-SetMaximum Radius.

Detection Phase: Maximum Radius • The Cluster-Set Maximum Radius as the maximum between Cluster Maximum Distances over all clusters in the cluster-set. • The idea is that any session with a vector inside a cluster will be classified as legal and outside a cluster – as intrusion

DIWeDa: Implementation and Evaluation • We implemented a prototype of the proposed system and used it to evaluate the system. • The prototype was developed with C# and SQL Server 2005. • The online bookstore application was used as a web application benchmark. • The profiles were built by manually operating this application. • The analyzed sessions were created by synthetic data

Inputs to our system - DB log

Test case patterns results

DIWeDa: Implementation and Evaluation • We found that our proposal for the system threshold of 0.07 was the best as can be seen from the following ROC. • At this threshold we achieved TPR = 92.5% and FPR=5%. • The following table shows the evaluation summary:

Evaluation summary • Our True Positive Rate was about 92.5% • Our False Positive Rate was about 5%

Future work • Improving accuracy • increasing TPR and decreasing FPR may be achieved by employing more than two intrusion classifiers and by finding a smart algorithm of classifiers fusion • Adapting the proposed algorithms to new kind of web attacks

Summary • Multilayered security model for web databases: • the novel fine-grained access control mechanism as the first prevention line of defense: • general attack independent prevention mechanism • prevention at DB layer • user/session traceability at DB layer • the session anomaly detection mechanism as the second intrusion detection line of defense: • role-based approach • detection of attacks which can be seen only at session level • detection accuracy due to the combination of different intrusion classifiers • observing the normal working application with no assumption that the learning period is clear from attacks • Our solution proposes a very high protection level for existing web database environments, and this is the main significant advantage of this research.

More on Parameterized Views

Intrusion Prevention and Detection for Web Databases