210 likes | 308 Views
This presentation explores using data mining for network intrusion detection to combat cyber threats. Learn about detection models, approaches, related work, and experimental evaluations. Discover how anomaly detection and misuse detection models work for identifying novel attacks and rare classes in network data.
E N D
Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep ZSrivastava, Pang-Ning Tan Computer Science Department University of Minnesota CS685Presentation Data Mining for Network Intrusion Detection Presented By: Song.Yuan@uky.edu
CS685Presentation • Outlines • Motivation • Related Work • Detection Models and Approaches • Experimental Evaluation • Conclusion
CS685Presentation • Motivation • Organizations are becoming increasingly vulnerable to potential cyber threats, e.g., network intrusions. cyber incidents reported to CERT/CC
CS685Presentation • Motivation (cont.) • Intrusion Detection System (IDS) • collect signatures of known attacks • input attack signatures into IDS signature databases • extract features from various audit streams • compare these features with attacks signatures • raise the alarm when possible intrusion happens • Limitations of traditional signature-based methods • manual update of signature database • inability to detect emerging cyber threats
CS685Presentation • Motivation (cont.) • Why data mining? • large volumes of network data • different data mining techniques • clustering, classification
CS685Presentation • Related Work • Data mining based intrusion detection techniques • anomaly detection • Build models of normal data • Detect any deviation from normal data • Flag deviation as suspect • Identify new types of intrusions as deviation from normal behavior • misuse detection • Label all instances in the data set (“normal” or “intrusion” ) • Run learning algorithms over the labeled data to generate classification rules • Automatically retrain intrusion detection models on different input data
CS685Presentation • Related Work --- misuse detection • Classification Model • Bayesian classifier • Decision tree • Association rule • Support vector machine • Learning from rare class
CS685Presentation • Related Work --- anomaly detection • Anomaly Detection Model • Association rule • Neural network • Unsupervised SVM • Outlier detection
CS685Presentation • Detection Models • misuse detection • rare class prediction model • known intrusions and their variations • anomaly detection • outlier detection model • novel attacks whose nature is unknown •
CS685Presentation • Learning from Rare Class • Problem: classification model for dataset with skewed class distribution ? • intrusion class << normal class • Mining needle in a haystack
CS685Presentation • Learning from Rare Class (cont.) • Novel classification algorithms • PN-rule • P-rule most of intrusive examples • N-rule eliminating false alarms • SMOTEBoost • SMOTE (Synthetic Minority Over-sampling TEchnique) • Boosting
CS685Presentation • Anomaly Detection • Novel attacks/intrusions • deviation from normal behavior • Outlier detection algorithm • Nearest neighbor approach • Distance based approach • Density based approach • Unsupervised support vector machines
CS685Presentation • Anomaly Detection • Density based approach (LOF)
CS685Presentation • Anomaly Detection • Identify normal behavior • Construct useful set of feature • Define similarity function • Flag deviation as suspect
CS685Presentation • Experimental Evaluation • Public data set • DARPA 1998 Intrusion Detection Evaluation Data Set • prepared and managed by MIT Lincoln Lab • training data and test data • KDD Cup 1999 Data • the extension of DARPA’98 • training data and test data • Real network data • Network data from University of Minnesota
CS685Presentation • Experimental Evaluation---feature construction • Purpose: • more informative data set from public data set • Method: • connection records • label connection records • ‘normal‘ or ‘intrusion‘ • features for each connection record • # of {packets, bytes}, {ACK, Re-Tx} packets, SYN/FIN, … • time-based features ( DoS attacks ) • connection-based features ( PROBING attacks )
CS685Presentation ExperimentalEvaluation--- single connection attacks ROC curves for single connection attacks
CS685Presentation Experimental Evaluation --- bursty attacks ROC curves for bursty attacks
CS685Presentation • Experimental Evaluation --- real network data • Why? • Limitations of DARPA’98 data set • How? • Detect network intrusion in the live network traffic • Result? • Successfully identify some novel intrusions • (top ranked outliers)
CS685Presentation • Conclusion • promising intrusion detection models • performance of algorithm (on-line detection) • new classification and anomaly detection algorithms
CS685Presentation Thanks! Questions?