190 likes | 222 Views
Explore the methodology and results of benchmarking anomaly-based detection systems on various datasets to understand the impact of data regularities on detection performance. The experiments delve into dataset structure, injection of anomalies, and evaluation metrics for detecting anomalies effectively. Key findings underline the significance of dataset regularity in influencing detector accuracy. Suggestions include adapting detectors to changing regularity dynamics for robust anomaly detection.
E N D
Benchmarking Anomaly-Based Detection Systems Written by: Roy A. Maxion Kymie M.C.Tan Presented by: Yi Hu
Agenda • Introduction • Benchmarking Approach • Structure in categorical data • Constructing the benchmark datasets • Experiment one • Experiment two • Conclusion & Suggestion
Introduction • Application of detection of anomaly; • Problems; • Difference in data regularities • Environment variation
Benchmarking Approach • Methodology that can provide quantitative results of running an anomaly detector on various datasets containing different structure. • Address: environment variation - structuring of the data
Structure in Categorical Data • Perfect regularity and perfect randomness(0—perfect regularity; 1—perfect randomness) • Entropy to measure the randomness
Benchmarking datasets • Training data(background) • Testing data(background+anomaly) • Anomaly data
Benchmarking datasets (cont’d) • Defining the sequence ; • Alphabet symbols.(English) • Alphabet size.(2,4,6,8,10– 5 suites) • Regularity.(0~1 at 0.1 intervals) • Sequence length.(All datasets-500,000 characters)
Defining the anomalies • Anomalies: • Foreign-symbol anomalies;(Q from A,B,C,D,E) • Foreign n-gram anomalies;(CC, not the input of A,B,C,D , but it is the bi-gram of datasets) • Rare n-gram anomalies;(Usually <0.05)
Generating the training and test data • 500,000 random numbers in table • 11 transition matrices used to produce the desired regularities. • Regularity indices between 0~1, with .1 increments
Generating the anomaly • Independent of generating the test data. • Each of the anomaly types is generated in a different way.
Injecting the anomalies into test data • The system determines the maximum number of anomalies.(Not more than .24% un-injected data.) • Select the injection intervals.
Experiment one: • Data sets: • Training dataset with rare-4-gram anomalies less than 5% occurrence; • All variables were held constant except for dataset regularity; • Total 275 benchmark datasets, 165 of which were anomaly-injected;
Experiment one: • steps: • Training the detector—11 training datasets and 55 training session are conducted; • Testing the detector—For each of the 5 alphabet sizes, the detector was run on 33 test datasets, 11 for each anomaly type. • Scoring the detection outcomes—event outcomes; ground truth;threshold; scope and presentation of results
Experiment one: • ROC analysis: • Relative operating characteristic curve; • Compare two aspects of detection systems: hits---Y axis and false alarm--- X axis
Experiment one: • Results: None of the curves overlap until they reach the 100% hit rate, demonstrating that regularity does influence detector performance. If regularity had no effect, all the ROC curve will superimpose each others.
Experiment one: • Results: The false alarm rate rises as the regularity index grows(data become more and more random) also shows regularity do affect the detection performance.
Experiment two • Natural Dataset: Y-axis: regularity index X-axis: users Data are taken from an undergraduate student computer. This diagram demonstrate clearly that regularity is a characteristic of natural data;
Conclusion • In the experiments conducted here, all variables were held constant except regularity, and it was established that a strong relationship exist between detector accuracy and regularity. • An anomaly detector cannot be evaluated on the basis of its performance on a dataset of one regularity. • Different regularity occur not only between different users and environment, but also within user sessions.
Suggestion • Overcoming this obstacle may require a mechanism to swap anomaly detectors or change the parameters of the current anomaly detector whenever regularity changes.