1 / 30

Information-Theoretic Measures for Anomaly Detection

Information-Theoretic Measures for Anomaly Detection. Wenke Lee, and Dong Xiang (North Carolina State University). IEEE Security and Privacy, 2001. Speaker: Chang Huan Wu 2009/4/14. Outline. Introduction Information-Theoretic Measures Case Studies Conclusions. Introduction (1/2).

briar
Download Presentation

Information-Theoretic Measures for Anomaly Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information-Theoretic Measures for Anomaly Detection Wenke Lee, and Dong Xiang (North Carolina State University) IEEE Security and Privacy, 2001 Speaker: Chang Huan Wu 2009/4/14

  2. Outline • Introduction • Information-Theoretic Measures • Case Studies • Conclusions

  3. Introduction (1/2) • Misuse detection • Use the “signatures” of known attacks • Anomaly detection • Use established normal profiles • The basic premise for anomaly detection:There is regularity in audit data that is consistent with the normal behavior and thus distinct from the abnormal behavior

  4. Introduction (2/2) • Most anomaly detection models are built based solely on “expert” knowledge or intuition • Provide theoretical foundations as well as useful tools that can facilitate the IDS development process and improve the effectiveness of ID technologies

  5. Information-Theoretic Measures (1/7) • Entropy • Use entropy as a measure of the regularity of audit data

  6. Information-Theoretic Measures (2/7) • Conditional Entropy • Let X be a collection of sequences where each is (e1, e2, …, en-1, en), each ei is an audit event; let Y be the collection of subsequences where each is (e1, e2, …, ek), and k < n • H(X | Y) tell us how much uncertainty remains for the rest of audit events in a sequence x after we have seen y

  7. Information-Theoretic Measures (3/7) • Relative Entropy • Relative entropy measures the distance of the regularities between two datasets • Training dataset and testing dataset

  8. Information-Theoretic Measures (4/7) • When we use conditional entropy to measure the regularity of sequential dependencies, we can use relative conditional entropy to measure the distance between two audit datasets

  9. Information-Theoretic Measures (5/7) • Intrusion detection can be cast as a classification problem • When constructing a classifier, a classification algorithm needs to search for features with high information gain • When the dataset is partitioned according to this feature values, the subsets will have lower entropy

  10. Information-Theoretic Measures (6/7) • Information Gain

  11. Gain(年齡)=0.0167 • Gain(性別)=0.0972 • Gain(家庭所得)=0.0177 Information Gain H(X)=-((4/16)*log2(4/16)+(12/16)*log2(12/16))=0.8113 E(年齡)=(6/16)*H(<35)+(10/16)*H(>35)=0.7946 Gain(年齡)=H(X)-E(年齡)=0.0167

  12. Information-Theoretic Measures (7/7) • Intuitively, the more information we have, the better the detection performance • There is always a cost for any gain • We can define information cost as the average time for processing an audit record and checking against the detection model

  13. UNM sendmail System Call Data (1/6) • University of New Mexico (UNM) sendmail system call data • Each trace contains the consecutive system calls made by the run-time processes • Used the first 80% traces as the training data and the last 20%as part of the testing data

  14. UNM sendmail System Call Data (2/6) • H(length-n sequences | subsequences of the length n-1) • Measures the regularity of how the first n-1 system calls determines the n-th system call => Conditional entropy drops as sequence length increases

  15. UNM sendmail System Call Data (3/6) • For normal data, the trend of misclassification rate coincides with the trend of conditional entropy

  16. UNM sendmail System Call Data (4/6) • Misclassification rates for the intrusion traces are much higher • This suggests that we can use the range of the misclassification rate as the indicator of whether a given trace is normal or abnormal (intrusion)

  17. UNM sendmail System Call Data (5/6) • When the training and testing normal datasets differs more, then the misclassification rate on testing normal data is also higher

  18. UNM sendmail System Call Data (6/6) • The cost is a linear function of the sequence length • Length ↑, accuracy ↑ but cost also↑

  19. MIT Lincoln Lab sendmail BSM Data (1/6) • BSM data developed and distributed by MIT Lincoln Lab for the 1999 DARPA evaluation • Each audit record corresponds to a system call made by sendmail • Contains additional information (Ex. user and group IDs, the obj name)

  20. MIT Lincoln Lab sendmail BSM Data (2/6) • UNM data : (s1, s2, … , sl) • BSM data • so : (s1_o1, s2_o2, … , sl_ol) • s-o : (s1, o1, s2, o2, … , sl, ol) • s: system call , o: obj name (system or user or other)

  21. MIT Lincoln Lab sendmail BSM Data (3/6) • Conditional entropy drops as sequence length increases

  22. MIT Lincoln Lab sendmail BSM Data (4/6) • For in-bound mails the testing data have clearly higher misclassification rates than the training data

  23. MIT Lincoln Lab sendmail BSM Data (5/6) • Out-bound mails have much smaller relative conditional entropy than in-bound mails

  24. MIT Lincoln Lab sendmail BSM Data (6/6) • Though the performance with obj name is slightly better, if we consider cost, it is actually better to use system call name only

  25. MIT Lincoln Lab Network Data (1/4) • tcpdump data developed and distributed by MIT Lincoln Lab for the 1998 DARPA evaluation • Each record describes a connection using the following features: timestamp, duration, source port, source host, service…

  26. MIT Lincoln Lab Network Data (2/4) • Destination host was used for partitioning the data into per-host subsets

  27. MIT Lincoln Lab Network Data (3/4) • We can see from the figure that intrusion datasets have much higher misclassification rates • Models from the (more) partitioned datasets have much better performance

  28. MIT Lincoln Lab Network Data (4/4) • Conditional entropy decrease as window size grows

  29. Conclusion • Proposed to use some information-theoretic measures for anomaly detection

  30. Comments • Provide theoretical foundations, use numbers to tell the result • Plentiful experiment result

More Related