1 / 29

Mining Anomalies Using Traffic Feature Distributions

Mining Anomalies Using Traffic Feature Distributions. Anukool Lakhina, Mark Crovella (cs.bu) , Christophe Diot (Intel) SIGCOMM 2005. Reference. SIGCOMM 2004 – “Diagnosing Network-Wide Traffic Anomalies” SIGCOMM 2005 – “Mining Anomalies Using Traffic Feature Distributions” Authors:

hoch
Download Presentation

Mining Anomalies Using Traffic Feature Distributions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina, Mark Crovella (cs.bu), Christophe Diot (Intel) SIGCOMM 2005

  2. Reference • SIGCOMM 2004 – “Diagnosing Network-Wide Traffic Anomalies” • SIGCOMM 2005 – “Mining Anomalies Using Traffic Feature Distributions” • Authors: • Anukool Lakhina (Ph.D. @ Boston Univ.) • Mark Crovella (Professor @ Boston Univ.) • Christophe Diot (@ Intel Research Lab.) Speaker: Li-Ming Chen

  3. Outline • Network-wide observation • Using subspace method to detect volume anomalies (SIGCOMM’04) • Volume vs. Traffic Feature Distribution (SIGCOMM’05) • Anomaly Diagnosis Methodology • Anomaly Detection • Anomaly Classification • Conclusion & comments Speaker: Li-Ming Chen

  4. Anomaly Diagnosis • Is my network experiencing unusual conditions? • e.g., being attacked?, worm spreading?, equipment outages?, misconfigurations? unknown… • Anomaly Diagnosis • Detection – is there an unusual event? • Identification – what is the best explanation? • Quantification – how serious is the problem? Speaker: Li-Ming Chen

  5. Previous Work on Anomaly Detection • Largely focused on: • Point solutions for specific types of anomalies • E.g., portscans, worm, DoS… • Not a general approach • Single-link traffic data • Not network-wide view • Rule-based classification • Not unsupervised • A general, unsupervised method for reliably detecting and classifying network anomalies is needed Speaker: Li-Ming Chen

  6. Network-wide Observation • Study the proposed anomaly detection and classification framework using sampled flow data collected from all access links of backbone networks • Two backbone networks: Abilene, Géant and Sprint • OD flow is the traffic that enters at an origin PoP and exits at a destination PoP of a backbone network PoP: Points of Presence Speaker: Li-Ming Chen

  7. Volume Anomaly Detection: Problem Statement • A volume anomaly is a sudden change in an OD flow • i.e., point to point traffic • Given link traffic measurements, diagnose the volume anomalies Speaker: Li-Ming Chen

  8. Why care about OD Flows? • If we only monitor traffic • on network links, volume • arising from an OD flow • may not be noticeable. • Thus, naïve approach • won’t work if OD flow • info isn’t available. • (Problem) • A network with n PoP • will have n2 OD flows. • -> OD flows are high • dimensional data… Speaker: Li-Ming Chen

  9. Subspace Analysis of Link Traffic • Even if OD flow information is not available, and only link traffic information is available, PCA can be applied and subspace technique can detect volume anomalies • PCA: Principle Component Analysis • Link Traffic info: data consist of time samples of traffic volumes at all m links in the network • Y is the t x m traffic measurement matrix • An arbitrary row y of Y denotes one sample • Reasons: • Links share OD flows • Set of OD flows also low dimensional • Use PCA to separate normal and anomalous traffic Speaker: Li-Ming Chen

  10. The Subspace Method • An approach to separate normal from anomalous traffic • Normal Subspace, : space spanned by the first k principal components • Anomalous Subspace, : space spanned by the remaining principal components • Then, decompose traffic on all links by projecting onto and to obtain: Residual trafficvector Traffic vector of all links at a particular point in time Normal trafficvector Speaker: Li-Ming Chen

  11. y A Geometric Illustration In general, anomalous traffic results in a large value of Capture size of vector using squared prediction error (SPE): Traffic on Link 2 Traffic on Link 1 Speaker: Li-Ming Chen

  12. Subspace Analysis Results • Note that during anomaly, • normal component • doesn’t change that much • while residual component • changes quite a lot. • Thus, anomalies can be • detected by setting some • threshold. Speaker: Li-Ming Chen

  13. Outline • Network-wide observation • Using subspace method to detect volume anomalies (SIGCOMM’04) • Volume vs. Traffic Feature Distribution (SIGCOMM’05) • Anomaly Diagnosis Methodology • Anomaly Detection • Anomaly Classification • Conclusion & comments Speaker: Li-Ming Chen

  14. Introduction • Challenges for automatically detecting and classifying anomalies: • Anomalies are a moving target (can span a vast range of events) • New anomalies will continue to arise • Anomalies present in network-wide traffic data are buried like needles in a haystack • Goal of this paper: • Seek methods that are able to detect a diverse and general set of network anomalies • With high detection rate and low false alarm rate • Seek to mine the anomalies from the data by discovering and interpreting the patterns present in network-wide traffic Speaker: Li-Ming Chen

  15. Traffic Feature Distributions • Most anomalies share a common characteristic • Anomalies can be detected and distinguished by inspecting traffic features: • 4-tuple: SrcIP, SrcPort, DstIP, DstPort Speaker: Li-Ming Chen

  16. Volume vs. Traffic Feature Distribution • Volume based detection schemes have been successful in isolating large traffic changes • But a large of anomalies do NOT cause detectable disruptions in traffic volume • Using traffic feature distribution • Augments volume-based anomaly detection • Traffic distributions can reveal valuable information about the structure of anomalies • -> information which is not present in traffic volume measures Speaker: Li-Ming Chen

  17. Summarize using sample entropy of histogram X: where symbol i occurs nitimes; S is total # of observations • Dispersed HistogramHigh Entropy ~ 450 new destination ports Dest. Ports # Packets Dest. IPs • Concentrated • Histogram • Low Entropy One destination (victim) dominates # Packets Port scan Traffic Feature Distributions Typical Traffic Speaker: Li-Ming Chen

  18. Port scan anomalies viewed in terms of traffic volume and in terms of entropy Port scan dwarfed in volume metrics… But stands out in feature entropy, which also revealsits structure Speaker: Li-Ming Chen

  19. Entropy based scheme • In volume based scheme, # of packets or bytes per time slot was the variable. • In entropy based scheme, in every time slot, the entropy of every traffic feature is the variable. • This gives us a three way data matrix H. • H(t, p, k) denotes at time t, the entropy of OD flow p, of the traffic feature k. • To apply subspace method, we need to unfold it into a single-way representation.

  20. Multiway Subspace Method:(Multi-way to single-way) • Decompose into a single-way matrix • Now apply the usual subspace decomposition (PCA) • Every row of the matrix will be decomposed into

  21. Comparing Entropy Detections with Detections in Volume Metrics (1) Found in Entropy Only Found in both metrics Found in Volume Only Points that lie to the right of the vertical line are volume-detected anomalies and points that lie above the horizontal line are detected in entropy. Speaker: Li-Ming Chen

  22. Comparing Entropy Detections with Detections in Volume Metrics (2) Speaker: Li-Ming Chen

  23. Detection Rate by Injecting Real Anomalies • Evaluation Methodology • Superimpose known anomaly traces into OD flows • Test sensitivity at varying anomaly intensities, by thinning trace • Results are average over a sequence of experiments Speaker: Li-Ming Chen 12% 1.3% 6.3% 0.63%

  24. Classifying Anomalies by Clustering • Enables unsupervised classification • Each anomaly is a point in 4-D space: • h = [H(srcIP), H(dstIP), H(srcPort), H(dstPort)] • Questions: • Do anomalies form clusters in this space? • Are the clusters meaningful? • Internally consistent, Externally different • What can we learn from the clusters? • Use Hierarchical Agglomerative Algorithm for determining clusters • Minimizes intra-cluster variation and maximizes inter-cluster variation Speaker: Li-Ming Chen

  25. Clustering Known Anomalies(2-D view) Code Red Scanning Multi source DOS attack Single source DOS attack Speaker: Li-Ming Chen

  26. Abilene anomaly clusters (3-D view) • Results of both clustering • algorithms are consistent • Heuristics identify about • 10 clusters in dataset Speaker: Li-Ming Chen

  27. Anomaly Clusters in Abilene data Speaker: Li-Ming Chen

  28. Conclusion • Feature distributions as summarized by entropy are promising for general anomaly diagnosis • Network-Wide Detection: • Entropy significantly augments volume metrics • Highly sensitive: Detection rates of 90% possible, even when anomaly is 1% of background traffic • Anomaly Classification: • Clusters are meaningful, and reveal new anomalies Speaker: Li-Ming Chen

  29. Comments • The paper only discusses anomaly detection on offline data. Can it be enhanced for online anomaly detection? • We still need volume based detection because feature distribution does not identify all anomalies. • Can other fields in packet header be used for anomaly detection? Speaker: Li-Ming Chen

More Related