Analyzing Anomaly Detection Systems Through Benchmarking

Benchmarking Anomaly-Based Detection Systems Written by: Roy A. Maxion Kymie M.C.Tan Presented by: Yi Hu

Agenda • Introduction • Benchmarking Approach • Structure in categorical data • Constructing the benchmark datasets • Experiment one • Experiment two • Conclusion & Suggestion

Introduction • Application of detection of anomaly; • Problems; • Difference in data regularities • Environment variation

Benchmarking Approach • Methodology that can provide quantitative results of running an anomaly detector on various datasets containing different structure. • Address: environment variation - structuring of the data

Structure in Categorical Data • Perfect regularity and perfect randomness(0—perfect regularity; 1—perfect randomness) • Entropy to measure the randomness

Benchmarking datasets • Training data(background) • Testing data(background+anomaly) • Anomaly data

Benchmarking datasets (cont’d) • Defining the sequence ; • Alphabet symbols.(English) • Alphabet size.(2,4,6,8,10– 5 suites) • Regularity.(0~1 at 0.1 intervals) • Sequence length.(All datasets-500,000 characters)

Defining the anomalies • Anomalies: • Foreign-symbol anomalies;(Q from A,B,C,D,E) • Foreign n-gram anomalies;(CC, not the input of A,B,C,D , but it is the bi-gram of datasets) • Rare n-gram anomalies;(Usually <0.05)

Generating the training and test data • 500,000 random numbers in table • 11 transition matrices used to produce the desired regularities. • Regularity indices between 0~1, with .1 increments

Generating the anomaly • Independent of generating the test data. • Each of the anomaly types is generated in a different way.

Injecting the anomalies into test data • The system determines the maximum number of anomalies.(Not more than .24% un-injected data.) • Select the injection intervals.

Experiment one: • Data sets: • Training dataset with rare-4-gram anomalies less than 5% occurrence; • All variables were held constant except for dataset regularity; • Total 275 benchmark datasets, 165 of which were anomaly-injected;

Experiment one: • steps: • Training the detector—11 training datasets and 55 training session are conducted; • Testing the detector—For each of the 5 alphabet sizes, the detector was run on 33 test datasets, 11 for each anomaly type. • Scoring the detection outcomes—event outcomes; ground truth;threshold; scope and presentation of results

Experiment one: • ROC analysis: • Relative operating characteristic curve; • Compare two aspects of detection systems: hits---Y axis and false alarm--- X axis

Experiment one: • Results: None of the curves overlap until they reach the 100% hit rate, demonstrating that regularity does influence detector performance. If regularity had no effect, all the ROC curve will superimpose each others.

Experiment one: • Results: The false alarm rate rises as the regularity index grows(data become more and more random) also shows regularity do affect the detection performance.

Experiment two • Natural Dataset: Y-axis: regularity index X-axis: users Data are taken from an undergraduate student computer. This diagram demonstrate clearly that regularity is a characteristic of natural data;

Conclusion • In the experiments conducted here, all variables were held constant except regularity, and it was established that a strong relationship exist between detector accuracy and regularity. • An anomaly detector cannot be evaluated on the basis of its performance on a dataset of one regularity. • Different regularity occur not only between different users and environment, but also within user sessions.

Suggestion • Overcoming this obstacle may require a mechanism to swap anomaly detectors or change the parameters of the current anomaly detector whenever regularity changes.

Analyzing Anomaly Detection Systems Through Benchmarking

Analyzing Anomaly Detection Systems Through Benchmarking

Presentation Transcript

Global Router-based Anomaly/Intrusion Detection (GRAID) Systems

Anomaly Detection

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Anomaly Detection

Benchmarking Anomaly-based Detection Systems

Anomaly Detection Systems

Signature Based and Anomaly Based Network Intrusion Detection

ecs236 Winter 2007: Intrusion Detection #2: Explanation-based Anomaly Detection

Rule-Based Anomaly Detection on IP Flows

Traffic Anomaly Detection

An Algorithm for Anomaly-based Botnet Detection

An Auctioning Reputation System Based on Anomaly Detection

ELISHA: A Visual-Based Anomaly Detection System

Anomaly Detection Systems

Volume Anomaly Detection

PANACEA: AUTOMATING ATTACK CLASSIFICATION FOR ANOMALY-BASED NETWORK INTRUSION DETECTION SYSTEMS

Rule-based Anomaly Detection on IP Flows

Anomaly Detection of Web-based Attacks

ITEC 810 Entropy based anomaly detection systems

RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation

Rule-Based Anomaly Detection on IP Flows

Benchmarking Anomaly-based Detection Systems