- 103 Views
- Uploaded on
- Presentation posted in: General

COD ( Cluster Onset Detection ) : Online Temporal Clustering for Outbreak Detection

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

COD(Cluster Onset Detection): Online Temporal Clustering for Outbreak Detection

Tomas Singliar (U. Pitt.),

Denver H. Dash (Intel Research, U. Pitt.)

AAAI’07 (American Association for AI National Conference)

- When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions
- Denver H. Dash, etc.
- AAAI’06

- COD: Online Temporal Clustering for Outbreak Detection
- Tomas Singliar, Denver H. Dash
- AAAI’07

Speaker: Li-Ming Chen

- Worm attacks – 2 opposite extremes:
- 1. Much faster to allow rapid spread !!
- 2. Much slower to prevent detection !!

- Most of the existing detection techniques rely on the fact that worms are reproducing quickly
- Slow propagation attacks
- Difficult to detect – under the veil of normal network traffic
- Still dangerous – can propagate exponentially

Speaker: Li-Ming Chen

- Global Infection:
- IDSes (individual entities) can only see a partial picture of the larger network wide behavior of the worm
- require collaboration detection (AAAI’06)

- Homogeneous assumption:
- Detection techniques treat the population as a monolithic entity
- also note that, hosts or detectors (collaborators) are not always homogeneous (AAAI’07)

Speaker: Li-Ming Chen

LD

LD

GD

LD

- Global Detector:
- aggregates messages
- from LDs
- Performs probabilistic
- inference to determine
- whether an infection
- being present or not

- Concept of Collaboration Detection:
- LDs (designed to be weak but general classifiers) may raise false alarm at a relatively high frequency
- GD can combine LDs’ weak information to infer the existence of an attack
- Where to place the GDs in the network ?
- Centralized/Distributed placement

“Weak” host-based

Local Detector

Speaker: Li-Ming Chen

- When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions
- Denver H. Dash, etc.
- AAAI’06

- COD: Online Temporal Clustering for Outbreak Detection
- Tomas Singliar, Denver H. Dash
- AAAI’07

Speaker: Li-Ming Chen

Speaker: Li-Ming Chen

A binary classifier

Normal or abnormal

Detect by heuristic:

Counts # of new outgoing connections to unique Dst. addresses and ports

Observation see pic.

In slow worm detection, set threshold to 4 (CPI)

The space of LD:

Inward-looking

Outward-looking

within 37 hosts

LD threshold

Pre-define as 4 (CPI)

Propagation rate of

previous worms

(Blaster, Slapper, CR2, Slammer, Witty)

within 5 weeks, observe 37 hosts,

will have (37*5*7*24*60*60/50)= 2,237,760 obs.,

then compute distribution…

Speaker: Li-Ming Chen

- Traditional collaborative counting schemes:
- PosCount
- Tests whether Σ(positive counts) > threshold or not

- CuSum
- Detect changes in the trend of a statistic

- PosCount
- DBN-based schemes:
- CP-DBN
- A simplified causal model
- Models an attack as occurring uniformly across the population or not at all

- E-DBN
- Models the dynamics of a system that is being swept by and epidemic outbreak

- CP-DBN

Speaker: Li-Ming Chen

- Input of a GD: Lt, a binary subset of LD observations at time t
- GD output: St, some measure of how likely a global anomaly is to be occurring at time t
- The system of GDs makes up an ensemble !!
- There are many ensemble techniques could be used
- This paper only use the max function to determine whether a global alarm should be raised or not

Speaker: Li-Ming Chen

- Traditional collaborative counting schemes:
- PosCount
- Tests whether Σ(positive counts) > threshold or not

- CuSum
- Detect changes in the trend of a statistic

- PosCount
- DBN-based schemes:
- CP-DBN
- A simplified causal model
- Models an attack as occurring uniformly across the population or not at all

- E-DBN
- Models the dynamics of a system that is being swept by and epidemic outbreak

- CP-DBN

Speaker: Li-Ming Chen

Ai = {T, F}, attack has taken place at time i or not.

Oli = {on, off}, LD l is on or off at time i.

observation time T

(hidden states)

LD0

(observable states)

total M LDs

TP rate

FP rate

Speaker: Li-Ming Chen

(hidden states)

- To model the exponential
- growing trend:
- T denotes observation time
- At = {0, 1}, the anomaly state
- at time t
- Nt = {0, …, N}, # of infected hosts
- S is the spreading rate
- Ot = {0, …, N}, # of observed LDs that fired

(observable states)

state transition

between unobserved

state variables

Speaker: Li-Ming Chen

- Assuming a worm attack, the growth rate in the number of infected hosts ΔNt+1 is modeled by a binomial:
- The likelihood of ot detectors firing when nt hosts are infected is modeled by a binomial:
- where

susceptible

chance of a hit

Speaker: Li-Ming Chen

Anomaly Am at the

most likely time m

based on some observations

from t-T to t

given DBN model

then, do ensemble decision making

(using max function)

Speaker: Li-Ming Chen

- Parameters:
- Spread rate S =
- 1 conn. per 20 sec.
- Address density =
- 1/1000 (ratio of
- vulnerable hosts)
- LD threshold =
- 4 conn. per 50 sec.
- LD comm. with GD
- per 10 sec.

PosCount only raise a detection

after the entire network is infected

Desired

FP rate

better

Speaker: Li-Ming Chen

- When Gossip is Good: Distributed Probabilistic Inference for Detection of Slow Network Intrusions
- Denver H. Dash, etc.
- AAAI’06

- COD: Online Temporal Clustering for Outbreak Detection
- Tomas Singliar, Denver H. Dash
- AAAI’07

Speaker: Li-Ming Chen

- What to cluster?
- Partition the population (e.g., hosts) into subgroups,
- then COD tries to detect susceptible subgroups

- Why clustering?
- Traditional outbreak detection methods treat the population as a monolithic entity
- Real populations are heterogeneous
- Different subpopulations are susceptible to different degrees

- Clustering can boost the signal-to-noise ratio for detection

Speaker: Li-Ming Chen

- “Weak” host-based LDs
- Periodically send their status to a GD
- Use the same feature and rule:
- Fire whenever the number of outgoing connections exceeds 4 in a 50 second interval

- Centralized GD
- Collects messages and determines whether the positive local detections corroborate each other
- Periodically outputs a signal that represents its belief of infection being present

Speaker: Li-Ming Chen

Time j

- Dataset X
- Row: Xi corresponds to a single LD i
- Column: X*j corresponds to the value of a feature function in a discrete time interval j

- Use temporal stratified sampling
- Each time interval has a fixed position
- Ex. 12am-1am, 1am-2am etc.

- To account for obvious diurnal behavior in the system

- Each time interval has a fixed position

LD i

Sum of alarms

(might be FP)

Speaker: Li-Ming Chen

Assuming different classes generate their detections randomly at different rates and can take a fairly large range of values,

Xij can be assumed as Poisson distributed

Naïve Bayes clustering model

NB features are positive

local detection counts Xij

arriving from a machine i

during a time interval j

F() = sum(alarms)

for each machine

In a time interval,

a LD may fire several times

Speaker: Li-Ming Chen

- Some details:
- How to determine the number m of clusters?
- By using a greedy heuristic to find optimal value

- Not mentioned about λkjx
- At the end of each interval,
- The feature value will be updated and the model is re-learned

- How to cluster?
- The posterior on the cluster variable M defines the assignment of local detectors into clusters:

- How to determine the number m of clusters?

Speaker: Li-Ming Chen

host

ID

Time (hr)

(burn-in)

- A typical example of how the hosts in the dataset get assigned into clusters.
- 5 clusters (colors) & 1 day burn-in period
- Clusters are rather stable and cluster membership changes rarely
- At the end, most hosts have been infected

Speaker: Li-Ming Chen

host

ID

Local detection

count in a time

interval

Time (hr)

- Clustering group hosts according to the daily pattern of their local detection activity
- 5 groups (two of which are composed of a single host)
- reflects the applications and habits of the host and can provide better estimation for deteciton

Speaker: Li-Ming Chen

- Detect “highly active” cluster (presumably infected)
- Compute “average detection rate” for each host
- Compute “average (local) detection rate” for each cluster and identify the most active cluster
- Performing a one-sided, unbalanced-design t-test with null hypothesis
- Host detection rates in the most active cluster and remainder of the population are the same !

- Comparing the outcome of the t-test to a historical histogram of values to determine if the system is in an anomalous state

num. of positive detections at host i

Speaker: Li-Ming Chen

- Some details in configuration:
- Normal traffic trace: 5 weeks traces from 37 hosts
- Inject worm traffic for testing
- LDs send a message every 10 seconds
- Focus on metrics: FAR, TTD (FI)
- False Alarm Rate, Time To Detect, Fraction of Infection
- Aim to control FAR to 1 per week

- Compare the results with E-DBN (the baseline)
- Traffic trace will be recycled to simulate more hosts
- Observe the effects of number of cluster, network size and interval length

Speaker: Li-Ming Chen

AMOC: plot the expected time to detection (since

the outbreak began) as a function of the false alarm rate

COD outperforms E-DBN (FI reduce)

COD/adaptive performs better

but more costly to run!

Speaker: Li-Ming Chen

- The performance actually improves with scaling of the system
- Larger number of datapoints gives the model more information
and refines the clustering

Speaker: Li-Ming Chen

- Interval length affects the performance in two (opposite) ways:
- More freq. re-clustering eliminates part of the “mid-interval” blind spot
- Longer interval yield features with less variance.

- The results show that:
- Better Perf. is achieved with
- longer intervals. (better
- smoothing over any random
- fluctuation)
- Lower frequency of the
- detection Algo. Invocation
- gives fewer false alarms
- And for slow worm, delayed
- detection is okay!

standard deviation

(in a day)

Speaker: Li-Ming Chen

- Use distribution scheme and collaborative inference to support slow worm detection
- Dividing the population into subgroups according to susceptibility increase the SNR ratio and can lead to detection performance boost
- Subgroups are more homogeneous in their usage and application patterns
- Not require prior knowledge of the population

Speaker: Li-Ming Chen

- Other features on a host can reveal diurnal patterns?
- Host-based LD can acquire rich information about the attack, but building a host-based distributed detection system is much harder
- Clustering is a way to deal with stealthy attacks

Speaker: Li-Ming Chen