1 / 25

Data Mining Approach for Network Intrusion Detection

Data Mining Approach for Network Intrusion Detection. Zhen Zhang Advisor: Dr. Chung-E Wang 04/24/2002 Department of Computer Science California State University, Sacramento. Outline. Background Intrusion Detection: promises and challenges Data Mining in IDS: how can it help Motivation

gamma
Download Presentation

Data Mining Approach for Network Intrusion Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Approach for Network Intrusion Detection Zhen Zhang Advisor: Dr. Chung-E Wang 04/24/2002 Department of Computer Science California State University, Sacramento

  2. Outline • Background • Intrusion Detection: promises and challenges • Data Mining in IDS: how can it help • Motivation • Approaches, tasks, problems and my contributions • Results • Conclusion and future work

  3. Intrusion Detection- Building a Secure Network • Primary assumptions • System activities are observable • Normal and intrusive activities have distinct evidence • Main techniques • Misuse detection: patterns of well-known attacks • Anomaly detection: deviation from normal usage

  4. Data Mining in IDS • Shortfalls with current IDS (mostly misuse detections) • Variants: Intrusions change easily and frequently. • False positive: Difficult to pick up intrusions. • False negative: Detecting attacks for which there are no known signatures • Data overload: Amount of data grows rapidly.

  5. What is Data Mining • Data Mining: Take data and pull from it patterns or deviations. • Many different types of algorithms: Decision Tree,Link analysis, Clustering, Association, Rule abduction, Deviation Analysis, and Sequence analysis. • Software and Tools: • MS SQL Server 2000 • Ripper and many others

  6. How can Data Mining help • Variants • Use anomaly detection, no great concern with variants in an exploit code. • False positives • To identify recurring sequences of alarms in order to help identify valid network activity. • False negatives • Attacks for which signatures have not been developed might be detected. • Data overload • Data mining plays a vital role.

  7. Summary of my work • Identify objective • Distinguish network attacks from normal traffic • New area, several research projects, no commercial products • Focus on the principle and basic implementation of concepts • Data Collection • Data Pre-processing on tcpdump dataset • Apply data mining on processed data • Investigate results • Software packages used: Visual Basic, Microsoft SQL Server 2000 with Analysis Server, Tcpdump

  8. Data Collection • Tcpdump data (http://iris.cs.uml.edu:8080/) • Tcpdump was executed on the gateway, to capture the traffic between LAN and external, and broadcast packets within LAN • Only header, no user data • Filters were used, only TCP and UDP packets • Baseline and 4 simulated attacks

  9. TCPDUMP data format • TCP packet • Time stamp • Source IP address • Source port • Destination IP address • Destination port • Flags (SYN, FIN, PUSH, RST, or .) • Data sequence number of this packet • Data sequence number of the data expected in return • Number of bytes of receive buffer space available • Indication of whether or not the data is urgent

  10. Tcpdump data format • UDP packet • Time stamp • Source IP address • Source port • Destination IP address • Destination port • Length of the packet • Example data

  11. Example tcpdump data

  12. Data Pre-processing- 80% ~ 90% work • Packet level information to connection level • Group by same source/destination IP/Port • Use flags, acks to determine status of the connection • SF, REJ, S0, S1, S3, S3, S4, RSTOSn, RSTRSn, SS, SH, SHR, OOS1, OOS2 • Record start time, duration, protocol • Calculate bytes in, bytes out, resent rate • UDP is connectionless, so simply treat each packet as a connection

  13. First round of processing Intrinsic Features

  14. Establish more information Same Destination Temporal and Statistical Attributes (last 2 seconds)

  15. Establish more information Same Service Temporal and Statistical Attributes (last 2 seconds)

  16. Second round of processing Same Destination Temporal and Statistical Attributes

  17. Final round of processing • Final, but important • Reduce data amount • Remove noise or trivial information • Re-organization data, add new feature if necessary • Challenges • Hard to tell which data to reduced/remove • Requires tremendous domain knowledge • Need experiments and adjustments

  18. Data Mining • Decision Tree Algorithm • Microsoft SQL Server 2000 Analysis Server • Steps: • 80% of baseline (normal) dataset as training data • Use 20% left as validation data, compute misclassification. • 20% of each of the four intrusion datasets as predication data, compute misclassification.

  19. Dependency Network

  20. Decision Tree

  21. Apply Data Mining Model to Validate/Predicate

  22. Results

  23. Conclusion and future improvement • Accuracy • Preliminary experiments of using DM on the tcpdump data showed promising results • depends on sufficient training data and right feature set. • Performance • 6 hours on one dataset (628775 records) • Size of time window • 2 seconds or larger? • Automated process • Call MSSQL DM and DTS procedures within VB • Real-time monitor and alarm

  24. References • Intrusion Detection,Rebecca Gurley Bace, Macmillan Technical Publishing, 2000 • Data Mining: Concepts and Techniques, Jiawei Han Micheline kamber, Morgan Kaufmann Publishers 2001 • Data Mining with Microcoft SQL Server 2000, Claude Seidman. Microsoft Press, 2001 • http://www.cs.columbia.edu/~sal/hpapers/USENIX/usenix.html • http://iris.cs.uml.edu:8080/network.html • http://www-nrg.ee.lbl.gov/. Network Research Group (NRG) of the Information and Computing Sciences Division (ICSD) at Lawrence Berkeley National Laboratory (LBNL) in Berkeley, California.

  25. Thank You!

More Related