Download
a proposal of new benchmark data to evaluate mining algorithms for intrusion detection n.
Skip this Video
Loading SlideShow in 5 Seconds..
A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection PowerPoint Presentation
Download Presentation
A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection

A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection

100 Views Download Presentation
Download Presentation

A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. A Proposal of New Benchmark Data to Evaluate Mining Algorithms for Intrusion Detection Jungsuk SONG†, Hiroki TAKAKURA‡, Yasuo OKABE‡ †Graduate School of Informatics, Kyoto Univ. ‡Academic Center for Computing and Studies, Kyoto Univ. oaktree@net.ist.i.kyoto-u.ac.jp, takakura@media.kyoto-u.ac.jp, okabe@i.kyoto-u.ac.jp

  2. Overview • Introduction • Intrusion Detection System • Intrusion Detection Evaluation Data • KDD Cup 99 Data Set • Details • Problems • Our Experimental Result • Our Proposal 23rd Asia Pacific Advanced Network Meeting

  3. Firewall Introduction • Intrusion Detection System(IDS) • combination of software and hardware that attempts to perform intrusion detection • raise the alarm when possible intrusion or suspicious patterns are observed IDS The Internet Intrusion Intrusion Attacker IDS Internal Network 23rd Asia Pacific Advanced Network Meeting

  4. Introduction • Why we need IDS? • Unknown weakness or bugs • Complex, unforeseen attacks • Firewalls, security policies • Using information detected • Recover compromised system • Understand the attack mechanism • Detect novel attacks • Defend our systems 23rd Asia Pacific Advanced Network Meeting

  5. Introduction • We need evaluation data for IDS • Performance improvement • Technical progress • Research guide… • KDD Cup 99 Data Set • Most commonly used evaluation data, but.. • Propose new benchmark data 23rd Asia Pacific Advanced Network Meeting

  6. KDD Cup 99 Data Set • Modification of DARPA 1998 data set • DARPA 1998 data set • Managed by Lincoln Lab.(under DARPA sponsorship) • Simulated nine weeks of raw TCP dump data • Attacks • 38 different attacks against Unix/Linux machines • DoS, Scan, Buffer overflow and so on. • Normal traffic • 1000’s of virtual hosts and 100’s of user automata 23rd Asia Pacific Advanced Network Meeting

  7. KDD Cup 99 Data Set • Each connection ⇒41-dimensions vector • Samples 5,tcp,smtp,SF,959,337,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00, 0.00,0.00,144,192,0.70,0.02,0.01,0.01,0.00,0.00,0.00,0.00,normal. 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.0 0,0.00,0.00,118,118,1.00,0.00,0.01,0.00,0.00,0.00,0.02,0.02,back. 0,tcp,http_443,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,114,2,1.00,1.00,0.00,0.00,0.02 ,0.06,0.00,255,2,0.01,0.07,0.00,0.00,1.00,1.00,0.00,0.00,neptune. • Numerical: 34, Categorical: 7 • Basic feature:“duration”, “protocol”… • Statistical feature:“number of connections to the same host as the current connection in the past two seconds”… • Label ⇒“normal” or “name of attacks” 23rd Asia Pacific Advanced Network Meeting

  8. KDD Cup 99 Data Set • Problems • Attacks • Can not reflect current malicious activities • Stealthy scan ⇒ short time interval, no multiple IP address scan • No attacks against Windows machines • Protocol types • Only TCP, UDP, ICMP • Can not detect attacks such as ARP Spoofing • Simplicity • Only 3 real victim hosts • 1000’s of virtual hosts and 100’s of user automata(custom software) 23rd Asia Pacific Advanced Network Meeting

  9. Our Experimental Results • PCA(Principal Components Analysis) • Technique for reducing dimensions of data set • Transform the data to a new coordinate system • What we know from PCA • The number of dimensions that are actually required to represent the original data • Accumulative Contribution Ratio • Indicate what percentage of the original data can be represented • For example • 2 dimensions ⇒ 90% : represent 90% of the original data by them 23rd Asia Pacific Advanced Network Meeting

  10. Our Experimental Results There is no guarantee their performance also will be good in real environment 23rd Asia Pacific Advanced Network Meeting

  11. Our Proposal • New benchmark data • IDS • Honeypots • Privacy problems • Sanitize IP address • Remove payload data • Goal • Comparison analysis of IDS alert and Honeypots traffic data • Detect the attacks that are missed by IDS KDD Cup 99 form Open Update every month 23rd Asia Pacific Advanced Network Meeting

  12. Thank you for your attention! 23rd Asia Pacific Advanced Network Meeting