Cod cluster onset detection online temporal clustering for outbreak detection
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on
  • Presentation posted in: General

COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection. Tomas Singliar (U. Pitt.), Denver H. Dash (Intel Research, U. Pitt.) AAAI’07 (American Association for AI National Conference). Reference.

Download Presentation

COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cod cluster onset detection online temporal clustering for outbreak detection

COD(Cluster Onset Detection): Online Temporal Clustering for Outbreak Detection

Tomas Singliar (U. Pitt.),

Denver H. Dash (Intel Research, U. Pitt.)

AAAI’07 (American Association for AI National Conference)


Reference

Reference

  • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions

    • Denver H. Dash, etc.

    • AAAI’06

  • COD: Online Temporal Clustering for Outbreak Detection

    • Tomas Singliar, Denver H. Dash

    • AAAI’07

Speaker: Li-Ming Chen


Challenge slowly propagating attacks

Challenge: Slowly Propagating Attacks

  • Worm attacks – 2 opposite extremes:

    • 1. Much faster to allow rapid spread !!

    • 2. Much slower to prevent detection !!

  • Most of the existing detection techniques rely on the fact that worms are reproducing quickly

  • Slow propagation attacks

    • Difficult to detect – under the veil of normal network traffic

    • Still dangerous – can propagate exponentially

Speaker: Li-Ming Chen


Other challenges

Other Challenges

  • Global Infection:

    • IDSes (individual entities) can only see a partial picture of the larger network wide behavior of the worm

    •  require collaboration detection (AAAI’06)

  • Homogeneous assumption:

    • Detection techniques treat the population as a monolithic entity

    •  also note that, hosts or detectors (collaborators) are not always homogeneous (AAAI’07)

Speaker: Li-Ming Chen


Architecture model

LD

LD

GD

LD

Architecture Model

  • Global Detector:

  • aggregates messages

  • from LDs

    • Performs probabilistic

    • inference to determine

    • whether an infection

    • being present or not

  • Concept of Collaboration Detection:

    • LDs (designed to be weak but general classifiers) may raise false alarm at a relatively high frequency

    • GD can combine LDs’ weak information to infer the existence of an attack

    • Where to place the GDs in the network ?

      • Centralized/Distributed placement

“Weak” host-based

Local Detector

Speaker: Li-Ming Chen


Paper 1

Paper 1

  • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions

    • Denver H. Dash, etc.

    • AAAI’06

  • COD: Online Temporal Clustering for Outbreak Detection

    • Tomas Singliar, Denver H. Dash

    • AAAI’07

Speaker: Li-Ming Chen


Architecture

Architecture

Speaker: Li-Ming Chen


About the weak lds

A binary classifier

Normal or abnormal

Detect by heuristic:

Counts # of new outgoing connections to unique Dst. addresses and ports

Observation  see pic.

In slow worm detection, set threshold to 4 (CPI)

The space of LD:

Inward-looking

Outward-looking

About the “Weak” LDs

within 37 hosts

LD threshold

Pre-define as 4 (CPI)

Propagation rate of

previous worms

(Blaster, Slapper, CR2, Slammer, Witty)

within 5 weeks, observe 37 hosts,

will have (37*5*7*24*60*60/50)= 2,237,760 obs.,

then compute distribution…

Speaker: Li-Ming Chen


4 possible gd models

4 possible GD models

  • Traditional collaborative counting schemes:

    • PosCount

      • Tests whether Σ(positive counts) > threshold or not

    • CuSum

      • Detect changes in the trend of a statistic

  • DBN-based schemes:

    • CP-DBN

      • A simplified causal model

      • Models an attack as occurring uniformly across the population or not at all

    • E-DBN

      • Models the dynamics of a system that is being swept by and epidemic outbreak

Speaker: Li-Ming Chen


How gds work

How GDs work?

  • Input of a GD: Lt, a binary subset of LD observations at time t

  • GD output: St, some measure of how likely a global anomaly is to be occurring at time t

  • The system of GDs makes up an ensemble !!

    • There are many ensemble techniques could be used

    • This paper only use the max function to determine whether a global alarm should be raised or not

Speaker: Li-Ming Chen


How gds work cont d

How GDs work? (cont’d)

  • Traditional collaborative counting schemes:

    • PosCount

      • Tests whether Σ(positive counts) > threshold or not

    • CuSum

      • Detect changes in the trend of a statistic

  • DBN-based schemes:

    • CP-DBN

      • A simplified causal model

      • Models an attack as occurring uniformly across the population or not at all

    • E-DBN

      • Models the dynamics of a system that is being swept by and epidemic outbreak

Speaker: Li-Ming Chen


Cp dbn

CP-DBN

Ai = {T, F}, attack has taken place at time i or not.

Oli = {on, off}, LD l is on or off at time i.

observation time T

(hidden states)

LD0

(observable states)

total M LDs

TP rate

FP rate

Speaker: Li-Ming Chen


E dbn

E-DBN

(hidden states)

  • To model the exponential

  • growing trend:

  • T denotes observation time

  • At = {0, 1}, the anomaly state

  • at time t

  • Nt = {0, …, N}, # of infected hosts

  • S is the spreading rate

  • Ot = {0, …, N}, # of observed LDs that fired

(observable states)

state transition

between unobserved

state variables

Speaker: Li-Ming Chen


E dbn cont d

E-DBN (cont’d)

  • Assuming a worm attack, the growth rate in the number of infected hosts ΔNt+1 is modeled by a binomial:

  • The likelihood of ot detectors firing when nt hosts are infected is modeled by a binomial:

    • where

susceptible

chance of a hit

Speaker: Li-Ming Chen


How dbn based gds works

How DBN-based GDs works?

Anomaly Am at the

most likely time m

based on some observations

from t-T to t

given DBN model

then, do ensemble decision making

(using max function)

Speaker: Li-Ming Chen


Performance evaluation

Performance Evaluation

  • Parameters:

  • Spread rate S =

  • 1 conn. per 20 sec.

  • Address density =

  • 1/1000 (ratio of

  • vulnerable hosts)

  • LD threshold =

  • 4 conn. per 50 sec.

  • LD comm. with GD

  • per 10 sec.

PosCount only raise a detection

after the entire network is infected

Desired

FP rate

better

Speaker: Li-Ming Chen


Paper 2

Paper 2

  • When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions

    • Denver H. Dash, etc.

    • AAAI’06

  • COD: Online Temporal Clustering for Outbreak Detection

    • Tomas Singliar, Denver H. Dash

    • AAAI’07

Speaker: Li-Ming Chen


New approach cod cluster onset detection

New Approach:COD (Cluster Onset Detection)

  • What to cluster?

    • Partition the population (e.g., hosts) into subgroups,

    • then COD tries to detect susceptible subgroups

  • Why clustering?

    • Traditional outbreak detection methods treat the population as a monolithic entity

    • Real populations are heterogeneous

      • Different subpopulations are susceptible to different degrees

    • Clustering can boost the signal-to-noise ratio for detection

Speaker: Li-Ming Chen


Cod model detection architecture

COD Model – detection architecture

  • “Weak” host-based LDs

    • Periodically send their status to a GD

    • Use the same feature and rule:

      • Fire whenever the number of outgoing connections exceeds 4 in a 50 second interval

  • Centralized GD

    • Collects messages and determines whether the positive local detections corroborate each other

    • Periodically outputs a signal that represents its belief of infection being present

Speaker: Li-Ming Chen


Cod model data

COD Model – data

Time j

  • Dataset X

    • Row: Xi corresponds to a single LD i

    • Column: X*j corresponds to the value of a feature function in a discrete time interval j

  • Use temporal stratified sampling

    • Each time interval has a fixed position

      • Ex. 12am-1am, 1am-2am etc.

    • To account for obvious diurnal behavior in the system

LD i

Sum of alarms

(might be FP)

Speaker: Li-Ming Chen


Cod model clustering

COD Model – clustering

Assuming different classes generate their detections randomly at different rates and can take a fairly large range of values,

Xij can be assumed as Poisson distributed

Naïve Bayes clustering model

NB features are positive

local detection counts Xij

arriving from a machine i

during a time interval j

F() = sum(alarms)

for each machine

In a time interval,

a LD may fire several times

Speaker: Li-Ming Chen


Cod model clustering cont d

COD Model – clustering (cont’d)

  • Some details:

    • How to determine the number m of clusters?

      • By using a greedy heuristic to find optimal value

    • Not mentioned about λkjx

    • At the end of each interval,

      • The feature value will be updated and the model is re-learned

    • How to cluster?

      • The posterior on the cluster variable M defines the assignment of local detectors into clusters:

Speaker: Li-Ming Chen


Cod model example

COD Model, example

host

ID

Time (hr)

(burn-in)

  • A typical example of how the hosts in the dataset get assigned into clusters.

  • 5 clusters (colors) & 1 day burn-in period

  • Clusters are rather stable and cluster membership changes rarely

  • At the end, most hosts have been infected

Speaker: Li-Ming Chen


Cod model demonstrate daily pattern

COD Model, demonstrate daily pattern

host

ID

Local detection

count in a time

interval

Time (hr)

  • Clustering  group hosts according to the daily pattern of their local detection activity

  • 5 groups (two of which are composed of a single host)

  • reflects the applications and habits of the host and can provide better estimation for deteciton

Speaker: Li-Ming Chen


4 step cluster interpretation

4-step Cluster Interpretation

  • Detect “highly active” cluster (presumably infected)

    • Compute “average detection rate” for each host

    • Compute “average (local) detection rate” for each cluster and identify the most active cluster

    • Performing a one-sided, unbalanced-design t-test with null hypothesis

      • Host detection rates in the most active cluster and remainder of the population are the same !

    • Comparing the outcome of the t-test to a historical histogram of values to determine if the system is in an anomalous state

num. of positive detections at host i

Speaker: Li-Ming Chen


Experimental evaluation

Experimental Evaluation

  • Some details in configuration:

    • Normal traffic trace: 5 weeks traces from 37 hosts

    • Inject worm traffic for testing

    • LDs send a message every 10 seconds

    • Focus on metrics: FAR, TTD (FI)

      • False Alarm Rate, Time To Detect, Fraction of Infection

      • Aim to control FAR to 1 per week

    • Compare the results with E-DBN (the baseline)

    • Traffic trace will be recycled to simulate more hosts

    • Observe the effects of number of cluster, network size and interval length

Speaker: Li-Ming Chen


Cod vs e dbn

COD vs. E-DBN

AMOC: plot the expected time to detection (since

the outbreak began) as a function of the false alarm rate

COD outperforms E-DBN (FI reduce)

COD/adaptive performs better

but more costly to run!

Speaker: Li-Ming Chen


Scaling with network size

Scaling with Network Size

  • The performance actually improves with scaling of the system

  • Larger number of datapoints gives the model more information

    and refines the clustering

Speaker: Li-Ming Chen


Effect of interval length

Effect of Interval Length

  • Interval length affects the performance in two (opposite) ways:

  • More freq. re-clustering eliminates part of the “mid-interval” blind spot

  • Longer interval yield features with less variance.

  • The results show that:

  • Better Perf. is achieved with

  • longer intervals. (better

  • smoothing over any random

  • fluctuation)

  • Lower frequency of the

  • detection Algo. Invocation

  • gives fewer false alarms

  • And for slow worm, delayed

  • detection is okay!

standard deviation

(in a day)

Speaker: Li-Ming Chen


Conclusion

Conclusion

  • Use distribution scheme and collaborative inference to support slow worm detection

  • Dividing the population into subgroups according to susceptibility increase the SNR ratio and can lead to detection performance boost

    • Subgroups are more homogeneous in their usage and application patterns

    • Not require prior knowledge of the population

Speaker: Li-Ming Chen


My comments

My Comments

  • Other features on a host can reveal diurnal patterns?

  • Host-based LD can acquire rich information about the attack, but building a host-based distributed detection system is much harder

  • Clustering is a way to deal with stealthy attacks

Speaker: Li-Ming Chen


  • Login