slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Jing Gao 1 , Wei Fan 2 , Deepak Turaga 2 , Olivier Verscheure 2 , Xiaoqiao Meng 2 , PowerPoint Presentation
Download Presentation
Jing Gao 1 , Wei Fan 2 , Deepak Turaga 2 , Olivier Verscheure 2 , Xiaoqiao Meng 2 ,

Loading in 2 Seconds...

play fullscreen
1 / 23

Jing Gao 1 , Wei Fan 2 , Deepak Turaga 2 , Olivier Verscheure 2 , Xiaoqiao Meng 2 , - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

INFOCOM’2011 Shanghai, China. Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection. INPUT: multiple simple atomic detectors OUTPUT: optimization-based combination mostly consistent with all atomic detectors.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Jing Gao 1 , Wei Fan 2 , Deepak Turaga 2 , Olivier Verscheure 2 , Xiaoqiao Meng 2 ,


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

INFOCOM’2011 Shanghai, China

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection

INPUT: multiple simple atomic detectors

OUTPUT: optimization-based combination mostly consistent with

all atomic detectors

Jing Gao1, Wei Fan2, Deepak Turaga2, Olivier Verscheure2, Xiaoqiao Meng2,

Lu Su1,Jiawei Han1

1 Department of Computer Science

University of Illinois

2 IBM TJ Watson Research Center

network traffic anomaly detection
Network Traffic Anomaly Detection

Computer

Network

Anomalous or Normal?

Network Traffic

challenges
Challenges
  • the normal behavior can be too complicated to describe.
  • some normal data could be similar to the true anomalies
  • labeling current anomalies is expensive and slow
  • the network attacks adapt themselves continuously – what we know in the past may not work for today
the problem
The Problem
  • Simple rules (or atomic rules) are relatively easy to craft.
  • Problem:
    • there can be way too many simple rules
    • each rule can have high false alarm or FP rate
  • Challenge: can we find their non-trivial combination (per event, per detector) that significantly improve accuracy?
why we need combine detectors
Why We Need Combine Detectors?

Count 0.1-0.5

Entropy 0.1-0.5

Too many alarms!

Count 0.3-0.7

Entropy 0.3-0.7

Count 0.5-0.9

Entropy 0.5-0.9

Combined view is better than individual views!!

Label

combining detectors
Combining Detectors
  • is non-trivial
    • We aim at finding a consolidated solution without any knowledge of the true anomalies (unsupervised)
    • But we could improve with limited supervision and incrementally (semi-supervised and incremental)
    • We don’t know which atomic detectors are better and which are worse
    • At some given moment, it could be some non-trivial and dynamic combination of atomic detectors
    • There could be more bad base detectors than good ones, so that majority voting cannot work
problem formulation
Problem Formulation

Which one is anomaly?

……

A1

A2

Ak-1

Ak

……

Record 1

Y

N

N

N

  • Combine atomic detectors into one!
  • We propose a non-trivial combination
  • Consensus:
  • mostly consistent with all atomic detectors
  • optimization-based framework

……

Record 2

N

Y

Y

N

……

Record 3

Y

N

N

N

……

Record 4

Y

Y

N

Y

……

Record 5

N

N

Y

Y

……

Record 6

N

N

N

N

……

Record 7

N

N

N

N

……

how to combine atomic detectors
How to Combine Atomic Detectors?
  • Linear Models
    • As long as one detector is correct, there always exist weights to combine them linearly
    • Question: how to figure out these weights
    • Per example & per detector
      • Different from majority voting and model averaging
  • Principles
    • Consensus considers the performance among a set of examples and weights each detectors by considering its performance over others, i.e, each example is no longer i.i.d
    • Consensus: mostly consistent among all atomic detectors
    • Atomic detectors are better than random guessing and systematic flipping
    • Atomic detectors should be weighted according to their detection performance
    • We should rank the records according to their probability of being an anomaly
  • Algorithm
    • Reach consensus among multiple atomic anomaly detectors
      • unsupervised
      • Semi-supervised
      • incremental
    • Automatically derive weights of atomic detectors and records – per detector & per event – no single weight works for all situations.
framework
Framework

[1 0]

[0 1]

record i

detector j

A1

probability of anomaly, normal

adjacency

……

……

Ak

initial probability

Records

Detectors

objective
Objective

[1 0]

[0 1]

minimize disagreement

A1

Similar probability of being an anomaly if the record is connected to the detector

……

……

Ak

Do not deviate much from the initial probability

Records

Detectors

methodology
Methodology

[1 0]

[0 1]

Iterate until convergence

Update detector probability

A1

……

……

Update record probability

Ak

Records

Detectors

propagation process
Propagation Process

[1 0]

[0 1]

[0.5 0.5]

[0.5285 0.4715]

[0.5 0.5]

[0.357 0.643]

[0.6828 0.3172]

[0.7 0.3]

[0.5 0.5]

[0.5285 0.4715]

[0.304 0.696]

[0.357 0.643]

[0.5 0.5]

[0.7 0.3]

……

……

[0.7514 0.2486]

[0.7 0.3]

[0.5 0.5]

[0.5285 0.4715]

[0.304 0.696]

[0.357 0.643]

[0.5 0.5]

[0.5285 0.4715]

[0.5 0.5]

[0.357 0.643]

[0.5 0.5]

[0.357 0.643]

Records

Detectors

consensus combination reduces expected error
Consensus Combination Reduces Expected Error
  • Detector A
    • Has probability P(A)
    • Outputs P(y|x,A) for record x regarding y=0 (normal) and y=1 (anomalous)
  • Expected error of single detector
  • Expected error of combined detector
  • Combined detector has a lower expected error
extensions
Extensions
  • Semi-supervised
    • Know the labels of a few records in advance
    • Improve the performance of the combined detector by incorporating this knowledge
  • Incremental
    • Records arrive continuously
    • Incrementally update the combined detector
incremental
Incremental

[1 0]

[0 1]

When a new record arrives

Update detector probability

A1

……

……

Update record probability

Ak

Detectors

Records

semi supervised
Semi-supervised

[1 0]

[0 1]

Iterate until convergence

A1

……

……

unlabeled

Ak

labeled

Records

Detectors

benchmark data sets
Benchmark Data Sets
  • IDN
    • Data: A sequence of events: dos flood, syn flood, port scanning, etc, partitioned into intervals
    • Detector: setting threshold on two high-level measures describing the probability of observing events during each interval
  • DARPA
    • Data: A series of TCP connection records, collected by MIT Lincoln labs, each record contains 34 continuous derived features, including duration, number of bytes, error rate, etc.
    • Detector: Randomly select a subset of features, and apply unsupervised distance-based anomaly detection algorithm
benchmark datasets
Benchmark Datasets
  • LBNL
    • Data: an enterprise traffic dataset collected at the edge routers of the Lawrence Berkeley National Lab. The packet traces were aggregated by intervals spanning 1 minute
    • Detector: setting threshold on six metrics including number of TCP SYN packets, number of distinct IPs in the source or destination, maximum number of distinct IPs an IP in the source or destination has contacted, and 6) maximum pairwise distance between distinct IPs an IP has contacted.
experiments setup
Experiments Setup
  • Baseline methods
    • base detectors
    • majority voting
    • consensus maximization
    • semi-supervised (2% labeled)
    • stream (30% batch, 70% incremental)
  • Evaluation measure
    • area under ROC curve (0-1, 1 is the best)
    • ROC curve: tradeoff between detection rate and false alarm rate
auc on benchmark data sets
AUC on Benchmark Data Sets

Majority voting among detectors

Consensus combination improves anomaly detection performance!

Worst, best and average performance of atomic detectors

Unsupervised, semi-supervised and incremental version of consensus combination

stream computing
Stream Computing

Continuous Ingestion

Continuous Complex Analysis in low latency

conclusions
Conclusions
  • Consensus Combination
    • Combine multiple atomic anomaly detectors to a more accurate one in an unsupervised way
  • We give
    • Theoretical analysis of the error reduction by detector combination
    • Extension of the method to incremental and semi-supervised learning scenarios
    • Experimental results on three network traffic datasets
thanks
Thanks!
  • Any questions?

Code available upon request