one class support vector machines for detecting anomalous windows registry accesses n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses PowerPoint Presentation
Download Presentation
One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses

Loading in 2 Seconds...

play fullscreen
1 / 24

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses - PowerPoint PPT Presentation


  • 165 Views
  • Uploaded on

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses. Krysta Svore Joint Work with Katherine Heller, Angelos Keromytis, and Salvatore Stolfo. Anomaly Detection. We would like to be able to detect anomalous behavior by monitoring Windows registry queries

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses' - ziv


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
one class support vector machines for detecting anomalous windows registry accesses

One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses

Krysta Svore

Joint Work with Katherine Heller, Angelos Keromytis, and Salvatore Stolfo

anomaly detection
Anomaly Detection
  • We would like to be able to detect anomalous behavior by monitoring Windows registry queries
  • We studied a host-based intrusion detection system that applies a one class support vector machine algorithm within the Registry Anomaly Detection (RAD) system
key results
Key Results
  • We show that the Probabilistic Anomaly Detection (PAD) algorithm dramatically outperforms the OCSVM algorithm due to its hierarchical prior
  • To improve the performance of the OCSVM algorithm, we need knowledge on how to construct well-defined kernels
host based intrusion detection systems ids
Host-based Intrusion Detection Systems (IDS)
  • Most often attacked operating system is Microsoft Windows
  • Combative methods
    • Virus scanner and security patches
      • Unable to combat unknown attacks
      • Requires frequent updates
    • Host-based IDS
      • Detects intrusions by monitoring system accesses
      • Used with data mining techniques for improved detection ability
our approach
Our Approach
  • Use the Registry Anomaly Detection (RAD) system to monitor Windows registry queries
  • Apply a One Class Support Vector Machine (OCSVM) algorithm within the RAD system
  • Compare our system with previous work using the Probabilistic Anomaly Detection (PAD) algorithm within the RAD system
the windows registry
The Windows Registry
  • Stores configuration settings for programs, security information, user profiles, and other system information
  • Consists of registry keys and their associated values
  • Queried by programs for information by accessing a specific registry key
        • Process: EXPLORER.EXE
        • Query: OpenKey
        • Key: HKCR\CLSID\B41DB860-8EE4-11D2-9906-E49FADC173CA\shellex\MayChangeDefaultMenu
        • Response: SUCCESS
        • ResultValue: NOTFOUND
the windows registry1
The Windows Registry
  • Activity varies from several accesses per minute to several thousand accesses per minute
  • Almost all programs use the registry
registry anomaly detection rad system
Registry Anomaly Detection (RAD) System
  • Three components:
    • Audit Sensor
      • Analyzes the registry access
    • Model Generator
      • Develops a model of normal behavior
    • Anomaly Detector
      • Classifies a new registry access as normal or attack according to the generated model
probabilistic anomaly detection pad algorithm
Probabilistic Anomaly Detection (PAD) Algorithm
  • Developed by E. Eskin et al., 2002
  • Relies on consistency checks over normal data and labels a record anomalous if it fails any one of these tests
    • First order consistency check
      • Verifies a value is consistent with observed values of that feature in the normal data set
      • 5 total checks
    • Second order consistency check
      • Determines the conditional probability of a feature value given another feature value
      • 20 total checks
slide10
PAD
  • Probability of an observed element
    • P(X = i) = C(Ni + α)/(kα+N)
      • where N - total number of observations

Ni - number of observations of symbol i

α - “pseudo count” for each observed symbol

k - number of observed symbols

  • Probability of an unobserved element
    • P(X = i) = (1-C)/(L-k)

L - number of possible symbols

  • Scaling factor
    • C= N/(N+L-k)
an example
An Example
  • LAN
    • We expect the probability of seeing a previously unobserved IP address to be small
  • Peering Center
    • We expect the probability of seeing a previously unobserved IP address to be almost 1
one class support vector machine ocsvm algorithm
One Class Support Vector Machine (OCSVM) Algorithm
  • Maps input data into a high dimensional feature space
  • Iteratively finds the maximal margin in the hyperplane which best separates the training data from the origin
  • Solves optimization problem to find rule f with maximal margin
    • f(x)=‹w,x›+b
    • If f(x)<0 , label x as anomalous
ocsvm kernels
OCSVM: Kernels
  • Equivalent to solving the dual quadratic programming problem
    • minα (1/2) ∑i,j αiαjK(xi,xj) s.t. 0≤αi≤1/(νl) , ∑i αi = 1
      • where αi - Lagrange multiplier

v - parameter to control trade-off between distance of hyperplane from the origin and number of points in training dataset

l - number of points in training dataset

  • Kernel function projects input vectors into a feature space allowing for nonlinear decision boundaries
    • Feature map: Φ: X → RN
    • Kernel Function: K(xi,xj) = ‹Φ(xi), Φ(xj)›
efficiency comparison
Efficiency Comparison
  • Complexity of PAD
    • Time: O(v R )
    • Space: O(vR )
      • where v – number of unique record values

R – number of record components

  • Complexity of OCSVM
    • Time: O(dL )
    • Space: O(d(L+T))
      • where d – number of dimensions

L – number of records in training dataset

T – number of records in test dataset

2

2

2

3

experimental data
Experimental Data
  • Collected training data on Windows NT 4.0
  • Used approximately 500,000 attack-free records to train system
  • Tested system using approximately 300,000 records with 2,000 labeled as attack
experiments
Experiments
  • Used three common kernels
      • Linear: K(x,y) = (x·y)
      • Polynomial: K(x,y) = (x·y+1)d , where d is the degree of the polynomial
      • Gaussian: K(x,y) = e -║x-y║2/(2σ2) , where σ2 is the variance
  • Used two types of feature vectors
      • Binary
        • One dimension for every unique entry for each of the five given record values
      • Frequency-based
        • Each feature corresponds to the number of occurrences of the corresponding record component in the training set
dimensionality of binary and frequency based feature vectors
Dimensionality of Binary and Frequency-based Feature Vectors
  • Binary feature vector
    • Dimension = Number of possible entries for feature value 1 * number of possible entries for feature value 2 * … * number of possible entries for feature value 5
  • Frequency-based feature vector
    • Dimension = 5
evaluation
Evaluation
  • Used two statistics to compare the system’s accuracy
    • Detection rate
      • Percentage of attack records that have been correctly identified
    • False positive rate
      • Percentage of normal records that have been mislabeled as anomalous
  • Plotted results with a Receiver Operator Characteristic (ROC) curve
    • Plots percentage of false positives versus percentage of true positives
results2
Results
  • PAD algorithm outperforms OCSVM algorithm due to the use of a hierarchical prior to estimate the probabilities
    • Knows the likelihood of encountering a previously unencountered feature value
  • Ability of OCSVM to detect anomalies relies on the choice of the kernel
  • Need a novel, well-defined kernel which accounts for highly discriminative information
conclusions
Conclusions
  • Registry activity provides a good platform for anomaly detection
  • PAD is more reliable than OCSVM
  • Currently no known way to learn a “most optimal” kernel
  • By analyzing different algorithms, we can identify what needs to be captured in the kernel definition
future work
Future Work
  • Improve kernel definition to incorporate more discriminative information
  • Update the model as new data is labeled
    • Requires an efficient way of remodeling the data over time
  • Test our system on the UNIX platform