Intrusion prevention and detection for web databases
1 / 67

Intrusion Prevention and Detection for Web Databases - PowerPoint PPT Presentation

  • Uploaded on

Intrusion Prevention and Detection for Web Databases. by Ehud Gudes and Alex Roichman June 2008. Outline. Web Applications and Databases Architecture Attack surface Multilayered security solution for web databases: Parameterized View prevention method Session Key parameter

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Intrusion Prevention and Detection for Web Databases' - dorie

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Intrusion prevention and detection for web databases

Intrusion Prevention and Detection for Web Databases


Ehud Gudes and Alex Roichman

June 2008


  • Web Applications and Databases

    • Architecture

    • Attack surface

  • Multilayered security solution for web databases:

    • Parameterized View prevention method

      • Session Key parameter

      • Rolling Key parameter

      • Implementation and analysis

    • DIWeDa – intrusion detection method

      • Architecture

      • Profile building algorithm

      • Intrusion detection algorithm

      • Implementation and evaluation

  • Summary

Traditional db

  • Two-tier architecture

  • Users are fixed and known

  • Users’ number is limited

  • All transactions are passed via the same connection and users connect directly to the DB

Traditional DB

Traditional db access control
Traditional DB Access Control

  • DB can identify and authenticate its users

  • DB can authorize users by traditional user/role-based access control

  • It is quite easy to follow up single user transactions to seek signs of intrusion

  • Views can be used to determine for a user the only part of the database that interests her

Web databases

Web Databases

Three-tier architecture

Users are casualand unknown

Users’ number is not limited

Users do not connect directly to the DB

Connection pooling
Connection Pooling

  • Different web users can run their SQL statements on the same DB connection and one user can run her SQL statements on different connections

    • This technique contributes to application efficiency since the time to open and close the connection is saved per each request

    • It has serious implications on the database’s access control mechanism

Web db access control
Web DB Access Control

  • Web DB does not identify the real application user who accesses it

  • The only user accessing the database is the user of the web application server – most often with full access privileges (administrator or “super-user”)

  • No user-based access control can be applied

  • No or very limited RBAC can be applied

  • The principle of minimal privilege is violated

  • No more fine-grained access control to the web DB exists

Web db is vulnerable
Web DB is vulnerable

  • There is no way to limit the web user privileges at the database level of the web databases.

  • There are still many secured web applications, but their security can be achieved only by application means (writing lines of code that implement security policies) and not by database access control means (GRANT/REVOKE and VIEWS)

  • This situation is very problematic:

    • Implementing access control by writing code is a time consuming task

    • Enterprise may have one DB but many applications, then access control is distributed and in many cases not consistent

    • Programmers must be security specialists

    • There is no confidence that all accesses to DB are authorized

  • As a result: "70% of websites are at immediate risk of being hacked!" (

Attack example sql injection
Attack example: SQL Injection

  • In many Web application the SQL sentence is structured as a string and the user’s parameter is concatenated to this string:

strSQL= "SELECT Salary FROM Salary_Table WHERE Employee_No = 123

AND Salary_Date = '" + dateParam + "'"

  • But the intruder can type `01.2007' or '1' = '1` and retrieve the entire table:

SELECT Salary FROM Salary_Table

WHERE Employee_No = 123

AND Salary_Date = '01/2007' OR '1' = '1'

Attack example business logic violation
Attack example: Business Logic Violation

  • Another attack on web databases is the Business Logic Violation (BLV)Attack.

    • For example: an enterprise can require that each buyer must key up her paying details prior to buying some book in an online bookstore application. Buying without entering a payment details will violate a business rule of an enterprise.

Web attacks reports
Web attacks reports

  • Many web applications have serious security holes and vulnerability reports found on the Internet show how the existing situation is critical:

    • On November 01, 2005 a high school student used SQL injection to break into the site of a Taiwanese information security magazine and steal customer's information (

    • On January 13, 2006 hackers broke into a Rhode Island government web site and stole credit card data from individuals who have done business online with state agencies (

    • On March 29, 2007 a SQL injection flaw in official Indian government site was discovered (

    • On June 29, 2007 hackers defaces Microsoft U.K. web page by using an SQL injection attack (

    • On August 12, 2007 the United Nations web site was defaced by using SQL injection (

Web attacks reports sql injection is very popular
Web attacks reports –SQL Injection is very popular

  • SQL injection attacks are extremely prevalent, and ranked as the second most common form of attack on web applications in 2006 in CVE (Common Vulnerabilities and Exposures list).

  • The percentage of these attacks among the overall number of attacks reported rose from 5.5% in 2004 to 14% in 2006 (

  • Recent reports suggest that a large number of applications on the web are vulnerable to SQL injection attacks (,

  • The number of attacks are on the increase, and SQL Injection is on the list of most prevalent forms of attack (

  • The recent SQL Injection attack on CardSystems Solutions that exposed several hundreds of thousands of credit card numbers is an example of how such attack can victimize an organization and members of the general public (

Related work
Related work

  • Conventional methods for providing protection in databases rely heavily on the identity of the entity accessing the database

  • Several suggestions were published in the literature to prevent SQL injection attacks

  • There is research that proposes to refer to the structure of each SQL statement possible in the system and to fingerprint that structure

  • An additional approach is to refer to some interesting properties of each SQL sentence such as referenced tables and fields

  • Another approach to database IDS is not to build profiles for each SQL sentence, but for each database user

  • Another approach is to search for data dependencies among the data items in a database

Related work criticism
Related work criticism

  • The proposed prevention methods are applicative and cannot be implemented by the database built-in mechanisms; they try to shield databases from attacks instead of making databases resistant to them

  • The proposed protection methods do not fit web databases since they do not propose any algorithm for dealing with web session traceability problem; they do not work at the session level and their approach is not role-based – thus they cannot detect many attacks from web applications

Our approach
Our Approach

  • We propose the two layered security model for web databases:

  • The purpose of the first line of defense is to prevent intrusions.

  • The purpose of the second line of defense is to detect attacks which elude the prevention layer.

  • We use the Parameterized Views method - a novel fine-grained access control mechanism at the first prevention line of defense.

  • We use DIWeDa (Detecting Intrusion in Web Databases) as our method of intrusion detection at the second line of defense.

  • Two layers work together to achieve the highest level of web database protection.

Prevention the first line of defense
Prevention - the first line of defense

  • The Parameterized View method solves the problems of absence of web database internal access control and enable using of views as a means of access restriction

  • This method is capable of preventing attacks to the DB by the existing native database protection mechanisms

  • It is attack-independent and significantly decreases the database attack surface

  • It enables real-life development of secure Internet database by using commercially available tools

Parameterized view
Parameterized View

  • Our concept is based on the use of parameterized views as the means to supervise the accesses to the DB from web systems

  • The parameterized view transfers the user's identity to the DB and the view displays the relevant data to this user accordingly, thus providing fine-grained access control to web DB

  • Example of the Parameterized View definition:

CREATE VIEW Student_Marks_View

WITH pStudent_No

SELECT * FROM Student_Marks_Table

WHERE Student_No = :pStudent_No

Traditional parameterized views without random parameters
Traditional Parameterized Views without random parameters


FROM Student_Marks_View(1)

WHERE Course_No = 12345

  • Example of select from the parameterized view:

  • But if instead of the course number an intruder types: 12345 UNION Select * From Student_Marks_View(2) then the SQL injection will still be possible:

  • The solution to this problem is not to use the explicit user identity in the SQL statement but instead use a run-time generated identifier which will be very difficult to fake.


FROM Student_Marks_View(1)

WHERE Course_No = 12345



FROM Student_Marks_View(2)

Parameterized views flow of events
Parameterized views flow of events

Our parameterized view
Our Parameterized View

  • The main requirement is that the parameter will be difficult to fake

  • The application will be authorized to parameterized views only

  • In the parameterized view, the parameter will contain the identity of the user

  • We present two design solutions for the parameterized views:

    • Application Session Key parameter

    • Rolling Key parameter

  • For details See our SACMAT2007 paper

Detection the second line of defense
Detection - the second line of defense

  • Although access control can prevent many attacks on web databases, it cannot prevent all of them.

  • One kind of an attack on web databases which cannot be prevented is the Business Logic Violation(BLV)

  • This attack cannot be prevented ether by a database not by the parameterized views since the existing database/views access control can grant or revoke access to resources only according to the accesor identity/role/parameter and cannot rely on the business logic of an enterprise.

  • If many data-centric attacks cannot be affectively prevented, it make a sense at least to detect them.

Detecting blv at the db level vs application level
Detecting BLV at the DB level vs. Application level

  • BLV can be detected either at the application or at the database level, but-

    • An enterprise in general has several applications, but one database.

    • Applications are changed frequently, but business rules which are seen at the database level are stable.

  • Thus it is preferable to have only one IDS at the database level to enforce stable business rules and not to have IDS for each application and to cope with continues application's changes.

  • The conclusion – it is preferable to detect BLV at the database level by a web database Intrusion Detection System.

Diweda our solution for web database intrusion detection systems
DIWeDa – our solution for Web Database Intrusion Detection Systems

  • Intrusion Detection Systems (IDS) for network and operating systems have existing for over 20 years.

  • IDS for databases is relatively new field of research that has surfaced in the last few years.

  • IDS for web databases is almost not exist.

  • We present DIWeDa, which is able to detect not only most known attacks such as SQL injections, but also more complicated and complex kinds of attacks such as business logic violations.

Diweda role based approach
DIWeDa – role based approach

  • Since different roles of an enterprise have different authorizations, the best strategy for web database IDS is to build profiles not per an enterprise but per enterprise roles.

  • Role-based models are widely used for old desktop applications, but most of the web applications do not use roles:

    • the real user of a web session is unknown at the database level due to connection pooling issues, thus it is impossible to apply role base access control to web databases;

    • sometimes the actual role a user uses is determined dynamically only at run time.

  • DIWeDa identifies database roles from the learnt profiles and look for intrusions from one role to another.

Employing many intrusion classifiers
Employing many intrusion classifiers

  • We use data mining techniques to detect intrusions and employ two different intrusion classifiers:

    • SQL content classifier (this paper)

    • SQL parameters classifier

  • The combination of intrusion classifiers help us to achieve very high accuracy and preciseness in detecting of attacks.

  • The combination of intrusion classifiers was previously used for network or operation systems IDS, but we first use it for detecting intrusions for web databases.

Diweda s system architecture
DIWeDa’s System Architecture

Profile builder algorithm
Profile Builder algorithm

  • During the training phase DIWeDa learns an application behavior and builds profiles which will be used later during the detection phase.

  • Each DIWeDa’s profile consists of:

    • the SQL Fingerprint-Set that represents the enterprise SQL space

    • Statistical characteristics of parameters for each fingerprint

    • The Cluster-Set that represents the SQL content of each enterprise role

Profile builder fingerprint set
Profile Builder: Fingerprint-Set

  • SQL fingerprint represents the structure of a SQL sentence.

  • DIWeDa learns all possible fingerprints of an application and save them in the Fingerprint-Set.

  • Some attacks, such as SQL injection, always change the structure of SQL, so if a session contains an SQL statement without known fingerprint – this session might be under attack.

  • But more complicated attacks such as logic violations cannot be detected only by looking for a statement level SQL structure anomaly- we also must look for a session level SQL content anomaly.

Profile builder session vector
Profile Builder: Session Vector

  • Each application session can be presented by its SQL Session Vector.

  • A Session Vector is a binary vector SV with the length equal to the number of fingerprints in the application, where the ith bit is 1 if the application session submits SQL with the ith fingerprint, else bit i is 0. (note order is immaterial)

  • For example, if some application session runs SQL statements with corresponding fingerprints 1, 3, 5 and there are 7 different fingerprints at all, we have the following session vector:

    SV = {1, 0, 1, 0, 1, 0, 0}

Profile builder sql space
Profile Builder: SQL space

  • The Session Vector enables us to formally define the session SQL contents.

  • We can think about the application SQL space as an n-dimensional space, where n is the number of fingerprints for the application.

  • Then each session's SQL content can be seen as a vector (or a point with n coordinates) in the n-dimensional space.

How sessions are identified
How Sessions are identified?

Connection pooling makes identifying and separating under sessions difficult

But with parameterized views the SQL statements issued by a user are uniquely identified by the unique random key of the parameterized view. Thus, sessions can be identified and monitored.

Profile builder distribution in sql space
Profile Builder: distribution in SQL space

  • If a session's SQL contentwas

    absolutely random, then the distribution

    of vectors in the space should be uniform.

  • But in reality, a session's SQLcontent

    is not random- this means that

    a session can submit only SQL statements

    that a session's role is supposed to submit.

  • The groups of points in the last figure represent different access roles and show that SQL contents of sessions from the same role are very similar, while SQL contents of sessions from different roles are far apart.

Profile builder looking for intrusions
Profile Builder: looking for intrusions

  • We will find clusters (application roles) by using data mining techniques

  • We will look for intrusion by looking for sessions with vectors which are far from the found clusters

Profile builder cluster
Profile Builder: Cluster

  • The Cluster is a group of highly similar Session Vectors which belong to the same role.

  • The Cluster Centroid is a vector CC with vector values that are the respective means of the cluster vectors.

  • The distance D between two clusters represented by their centroids is computed as:

Profile builder clustering algorithm
Profile Builder: Clustering Algorithm

  • Now we can use the data mining technique (hierarchical clustering algorithm) and cluster application sessions of the same role.

  • This algorithm builds a tree which is called the cluster Dendrogram tree.

Profile builder building dendrogram
Profile Builder: Building Dendrogram


//Building Dendrogram tree


  • For each application session build its Session Vector.

  • Start with each Session Vector as a separate cluster.

  • Save all clusters received at this stage in a Cluster-Set and initialize i to 1.

  • Select two closest clusters to merge into a single cluster.

  • Compute the new cluster centroid for the merged cluster.

  • Save all clusters received at this stage in a corresponding Cluster-Set and advance i by 1.

  • Repeat steps 4-6 until we get a single cluster.


Profile builder cluster set
Profile Builder: Cluster-Set

  • Now we will choose the best Cluster-Set (layer of the dendrogram tree) to serve as a profile for DIWeDa.

  • We recommend to choose a Cluster-Set with the Minimum Total Distance to serve as the profile for DIWeDa.

  • This approach finds a cluster-set with small specific clusters that are far from a global centroid.

  • By using this approach, DIWeDa will find specific separated roles (clusters).

Profile builder cluster set with minimum total distance
Profile Builder: Cluster-Set with Minimum Total Distance

  • The Minimum Total Distance is computed by the following formula:

  • where Intra-Cluster Distance is computed by:

  • and Inter-Cluster Distance is computed by:

Profile builder algorithm1
Profile Builder: Algorithm

To summarize, our algorithm of building cluster-based profile is as follows:



  • Find all application fingerprints with support > SIR and save them in Fingerprint-Set

  • Run Build_Dendrogram

  • Select the appropriate set of clusters (layer in Dendrogram) with the Minimum Total Distance and save it in Cluster-Set

  • Delete Clusters from Cluster-Set with a small Support


Detection phase session anomaly by sql content
Detection Phase: Session Anomaly by SQL content

  • The probability of an analyzed session represented by its session vector SV to have an abnormal SQL content is defined by the formula:

    where MD (Minimum Distance) is the distance between SV and the closest cluster centroid from the cluster-set and NUS is the number of unexpected statements in the analyzed session.

  • This formula is based on the following assumption:

    • The further a session vector is from any existing cluster, the more abnormal a session is.

    • The more unexpected statement a session has, the more abnormal a session is.

Detection phase intrusion threshold
Detection Phase: Intrusion Threshold

  • Any session with probability to be abnormal greater than the Session Intrusion Threshold will be classified as intrusion.

  • The Session Intrusion Threshold is the IDS sensitivity and is represented by a number from the range [0, 1], where each session with the probability to be abnormal greater than this threshold will be classified as intrusion.

  • The interesting thing that was learned from our evaluation is that the best threshold is equal to the Cluster-SetMaximum Radius.

Detection phase maximum radius
Detection Phase: Maximum Radius

  • The Cluster-Set Maximum Radius as the maximum between Cluster Maximum Distances over all clusters in the cluster-set.

  • The idea is that any session with a vector inside a cluster will be classified as legal and outside a cluster – as intrusion

Diweda implementation and evaluation
DIWeDa: Implementation and Evaluation

  • We implemented a prototype of the proposed system and used it to evaluate the system.

  • The prototype was developed with C# and SQL Server 2005.

  • The online bookstore application was used as a web application benchmark.

  • The profiles were built by manually operating this application.

  • The analyzed sessions were created by synthetic data

Diweda implementation and evaluation1
DIWeDa: Implementation and Evaluation

  • We found that our proposal for the system threshold of 0.07 was the best as can be seen from the following ROC.

  • At this threshold we achieved TPR = 92.5% and FPR=5%.

  • The following table shows the evaluation summary:

Evaluation summary
Evaluation summary

  • Our True Positive Rate was about 92.5%

  • Our False Positive Rate was about 5%

Future work
Future work

  • Improving accuracy

    • increasing TPR and decreasing FPR may be achieved by employing more than two intrusion classifiers and by finding a smart algorithm of classifiers fusion

  • Adapting the proposed algorithms to new kind of web attacks


  • Multilayered security model for web databases:

    • the novel fine-grained access control mechanism as the first prevention line of defense:

      • general attack independent prevention mechanism

      • prevention at DB layer

      • user/session traceability at DB layer

    • the session anomaly detection mechanism as the second intrusion detection line of defense:

      • role-based approach

      • detection of attacks which can be seen only at session level

      • detection accuracy due to the combination of different intrusion classifiers

      • observing the normal working application with no assumption that the learning period is clear from attacks

  • Our solution proposes a very high protection level for existing web database environments, and this is the main significant advantage of this research.

Our parameterized view1
Our Parameterized View

  • The main requirement is that the parameter will be difficult to fake

  • The application will be authorized to parameterized views only

  • In the parameterized view, the parameter will contain the identity of the user

  • We present two design solutions for the parameterized views:

    • Application Session Key parameter

    • Rolling Key parameter

  • For details See our SACMAT2007 paper

Application session key
Application Session Key

  • A user requests to perform a process of identification when she provides a username and a password to the application

  • The application runs a DB stored procedure that accepts the user's username and password and returns a random number (AS_key)

  • DB stores the random number in a table of active users

  • The application knows the user who works with it, and stores the AS_key as well. Each SQL sentence that will be run on behalf of the user will be run with a parameter of user's corresponding AS_key

  • The AS_key is cleared when a user disconnects from the application

Example of application session key
Example of Application Session Key

  • When Alice entering the system, she is authenticated and receives a random key like 11011…

  • The application is authorized to access only the parameterized views

  • In order the view will return the data, it must get a valid parameter

  • Now Alice may access information about her salary from different months, but not the salary of different employees.

Example of application session key1
Example of Application Session Key

  • Each SQL of Alice will look as follows:

  • And the definition of the view looks as follows:


FROM Salary_View(110011…)

WHERE Salary_Date = '01/2007'

  • CREATE VIEW Salary_View

  • WITH pAS_key

  • SELECT *

  • FROM Salary_Table

  • WHERE Employee_No IN

  • (

    • SELECT Employee_No

  • FROM Users_Table

  • WHERE Users_Table.AS_key=:pAS_key

  • )

Rolling key
Rolling Key

  • The first three steps are similar to the previous parameter method based on the application session key, except that now AS_key also serves as the seed of the rolling key

  • DB and the application agree on a common encryption key (Enc_key). This encryption key is used to generate the next rolling key from the current one

  • DB stores the AS_key, Enc_key and the current Roll_Key in a table of active users

Rolling key1

Rolling Key

Now, when the SQL sentence arrives, it contains a request for a parameterized view with 2 parameters: the AS_key and the rolling key. As a result, two things occur:

The view returns the filtered data that belongs only to the user that the session and rolling keys belong to

The rolling key is advanced to the next number both in the application and in the DB

Example of rolling key
Example of Rolling Key

  • When Alice entering the system, she is authenticated and receives two random keys

  • Now she can run a statement like:


FROM Salary_View(110111, 000111)

WHERE Salary_Date = '01/2007'

  • But if she wants to run another statement, she will submit:


FROM Salary_View(110111, 111010)

WHERE Salary_Date = '02/2007'

Comparison of our two methods
Comparison of our two methods

  • Both approaches significantly decrease the attack surface of web DB

  • The safety of the parameter method rests with the difficulties of guessing the various keys

  • Example of SQL Injection into the parameterized view:

SELECT Salary FROM Salary_View(11011…)

WHERE Salary_Date = '01/2007' OR '1' = '1'

  • Because Salary_View returns only the data of the attacker, she may access information about her salary from different months, but not the salary of different employees.

Comparison of our two methods1
Comparison of our two methods

  • AS key method is preferable from the efficiency point of view: the calculation of the next key in the rolling key method is expensive

  • Rolling key method is preferable from security point of view: the rolling key method is resistant to replay attacks and a correct guess of a code is only valid for running one sentence

  • Both methods allow the identification of a real user at the DB layer, thus fine-grained access control can be applied at the DB layer


  • The parameterized views are not yet the part of the SQL standard

  • But we can implement our method with existing functions stored in DB (table functions)

  • These functions are supported in most databases and they can be used in standard DML sentences:

SELECT * FROM Student_Marks_Func(10101…)

  • The advantage of table functions is their ability to support delete and update actions, while limiting the activity of them to the return value of a function that depends on the parameter passed

Profile builder characterizing sql parameters
Profile Builder - Characterizing SQL Parameters

  • Detecting anomalous SQL sentences cannot rely only on the SQL structure (fingerprints), we also want to characterize the SQL parameter values.

  • Our goal is to capture normal characteristics of parameters and then use these characteristics during the detecting phase of DIWeDa to discover anomalies in user parameters that are applied to SQL statements.

  • We use two statistical models for characterizing parameters: parameter length and parameter character distribution.

Profile builder characterizing sql parameters1
Profile Builder - Characterizing SQL Parameters

  • By computing the parameter length mean and variance, we can find anomalies in a parameter, when a malicious input is passed.

  • Legal SQL parameters have regular structures and contain only human-readable characters.

  • In the case of attacks, such as the parameter buffer overflow or parameter guessing by a brute force attack, the character distribution is changed dramatically.

Profile builder characterizing sql parameters3
Profile Builder - Characterizing SQL Parameters



For each fingerprint F from the Fingerprint Set do:

For each parameter P from F do:

Find characteristics of P (AVG, STD and

Character Distribution) and save these

characteristics under the indices of F and P


Detection phase session anomaly by parameters
Detection Phase: Session Anomaly by Parameters

  • Testing parameter length by the following formula:

  • Testing character distribution by Pearson test:

  • Combining the two tests:

Detection phase classifiers combiner
Detection Phase: Classifiers Combiner

  • Our method compute the probability of a session to be legal based on following features:

    • Session SQL content anomaly

    • Session parameter anomaly

  • Two classifiers support us with a value in the same range [0, 1] that can be used as evidence to what degree the analyzed session is abnormal.

  • Now we combine them. We rely on the fact that our two classifiers are independent: values of parameters do not influence SQL structures and vise versa, thus we can compute: