1 / 24

Network Anomography

Network Anomography. Yin Zhang yzhang@cs.utexas.edu Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA. Network Anomaly Detection. Is the network experiencing unusual conditions? Call these conditions anomalies

bayle
Download Presentation

Network Anomography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Anomography Yin Zhangyzhang@cs.utexas.edu Joint work withZihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA

  2. Network Anomaly Detection • Is the network experiencing unusual conditions? • Call these conditions anomalies • Anomalies can often indicate network problems • DDoS, worms, flash crowds, outages, misconfigurations … • Need rapid detection and diagnosis • Want to fix the problem quickly • Questions of interest • Detection • Is there an unusual event? • Identification • What’s the best explanation? • Quantification • How serious is the problem?

  3. B C A Network Anomography • What we want • Volume anomalies [Lakhina04]Significant changes in an Origin-Destination flow, i.e., traffic matrix element • What we have • Link traffic measurements • It is difficult to measure traffic matrix directly • Network Anomography • Infer volume anomalies from link traffic measurements

  4. An Illustration Courtesy: Anukool Lakhina [Lakhina04]

  5. Anomography =Anomalies + Tomography

  6. Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Mathematical Formulation Problem: Infer changes in TM elements (xt) given link measurements (bt)

  7. Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Mathematical Formulation bt = At xt (t=1,…,T) Typically massively under-constrained!

  8. Only measure at links 1 route 3 link 1 router 2 route 2 link 2 route 1 3 link 3 Static Network Anomography B = AX Time-invariantAt (= A), B=[b1…bT], X=[x1…xT]

  9. Anomography Strategies • Early Inverse • Inversion • Infer OD flows X by solving bt=Axt • Anomaly extraction • Extract volume anomalies X from inferred X Drawback: errors in step 1 may contaminate step 2 • Late Inverse • Anomaly extraction • Extract link traffic anomalies B from B • Inversion • Infer volume anomalies X by solving bt=Axt Idea: defer “lossy” inference to the last step     

  10. Extracting Link Anomalies B  • Temporal Anomography: B = BT • ARIMA modeling • Diff: ft = bt-1 bt = bt – ft • EWMA: ft = (1-) ft-1 +  bt-1 bt = bt – ft • Fourier / wavelet analysis • Link anomalies = the high frequency components • Temporal PCA • PCA = Principal Component Analysis • Project columns onto principal link column vectors • Spatial Anomography: B = TB • Spatial PCA [Lakhina04] • Project rows onto principal link row vectors   

  11. Extracting Link Anomalies B  • Temporal Anomography: B = BT • Self-consistent • Tomography equation: B = AX • Post-multiply by T: BT = AXT B = AX • Spatial Anomography: B = TB • No longer self-consistent   

  12.  Solving bt = A xt   • Pseudoinverse: xt = pinv(A) bt • Shortest minimal L2-norm solution • Minimize |xt|2subject to |bt – A xt|2 is minimal • Maximize sparsity (i.e. minimize |xt|0) • L0-norm is not convex  hard to minimize • Greedy heuristic • Greedily add non-zero elements to xt • Minimize |bt – A xt|2 with given |xt|0 • L1-norm approximation • Minimize |xt|1 (can be solved via LP) • With noise  minimize |xt|1 +  |bt-Axt|1            

  13. Dynamic Network Anomography • Time-varying At is common • Routing changes • Missing data • Missing traffic measurement on a link setting the corresponding row of At to 0 in bt=Atxt • Solution • Early inverse: Directly applicable • Late inverse: Apply ARIMA modeling • L1-norm minimization subject to link constraints • minimize |xt|1subject to xt = xt – xt-1, bt=Atxt, bt-1=At-1xt-1 • Reduce problem size by eliminating redundancy  

  14. Performance Evaluation: Inversion • Fix one anomaly extraction method • Compare “real” and “inferred” anomalies • “real” anomalies: directly from OD flow data • “inferred” anomalies: from link data • Order them by size • Compare the size • How many of the top N do we find • Gives detection rate: | top N”real” top Ninferred | / N

  15. Inference Accuracy  Tier-1 ISP (10/6/04 – 10/12/04) Diff (bt = bt = bt – bt-1) Sparsity-L1 works best among all inference techniques

  16. Inference Accuracy  Tier-1 ISP (10/6/04 – 10/12/04) Diff (bt = bt = bt – bt-1) detection rate = | top Nreal top Ninferred | / N Sparsity-L1 works best among all inference techniques

  17. Impact of Routing Changes  Tier-1 ISP (10/6/04 – 10/12/04) Diff (bt = bt = bt – bt-1) Late inverse (sparsity-L1) beats early inverse (tomogravity)

  18. Performance Evaluation: Anomography • Hard to compare performance • Lack ground-truth: what is an anomaly? • So compare events from different methods • Compute top M “benchmark” anomalies • Apply an anomaly extraction method directly on OD flow data • Compute top N “inferred” anomalies • Apply another anomography method on link data • Report min(M,N) - | top Mbenchmarktop Ninferred | • M < N  “false negatives” # big “benchmark” anomalies not considered big by anomography • M > N  “false positives” # big “inferred” anomalies not considered big by benchmark method • Choose M, N similar to numbers of anomalies a provider is willing to investigate, e.g. 30-50 per week

  19. Anomography: “False Negatives” • Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent • PCA methods not consistent (even with each other) • - PCA cannot detect anomalies in the “normal” subspace • - PCA insensitive to reordering of [b1…bT]  cannot utilize all temporal info • Spatial methods (e.g. spatial PCA) are not self-consistent

  20. Anomography: “False Positives” • Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent • PCA methods not consistent (even with each other) • - PCA cannot detect anomalies in the “normal” subspace • - PCA insensitive to reordering of [b1…bT]  cannot utilize all temporal info • Spatial methods (e.g. spatial PCA) are not self-consistent

  21. Summary of Results • Inversion methods • Sparsity-L1 beats Pseudoinverse and Sparsity-Greedy • Late-inverse beats early-inverse • Anomography methods • Diff/EWMA/H-W/ARIMA/Fourier/Wavelet all largely consistent • PCA methods not consistent (even with each other) • PCA methods cannot detect anomalies in “normal” subspace • PCA methods cannot fully exploit temporal information in {xt} • Reordering of [b1…bT] doesn’t change results! • Spatial methods (e.g. spatial PCA) are not self-consistent • Temporal methods are • The method of choice: ARIMA + Sparsity-L1 • Accurate, consistent with Fourier/Wavelet • Robust against measurement noise, insensitive to choice of • Works well in the presence of missing data, routing changes • Supports both online and offline analysis

  22. Conclusions • Anomography = Anomalies + Tomography • Find anomalies in {xt} given bt=Atxt (t=1,…,T) • Contributions • A general framework for anomography methods • Decouple anomaly extraction and inference components • A number of novel algorithms • Taking advantage of the range of choices for anomaly extraction and inference components • Choosing between spatial vs. temporal approaches • The first algorithm for dynamic anomography • Extensive evaluation on real traffic data • 6-month Abilene and 1-month Tier-1 ISP • The method of choice: ARIMA + Sparsity-L1

  23. Future Work • Correlate traffic with other types of data • BGP routing events • Router CPU utilization • Anomaly response • Maybe with an effective response system, false positives become less important? • Anomography for performance diagnosis • Inference of link performance based on end-to-end measurements can be formulated as bt=Axt • Beyond networking • Detecting anomalies in other inverse problems • Are we just reinventing the wheel?

  24. Thank you !

More Related