1 / 44

Network Anomaly Diagnosis

Network Anomaly Diagnosis. Status Report 15-03-2006. Contents. Network performance metrics Active monitoring tools Preparation of canonical dataset Anomaly detection Anomaly diagnosis. Network performance metrics. One-Way Delay (OWD) Serialization Delay Propagation Delay

waylon
Download Presentation

Network Anomaly Diagnosis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Anomaly Diagnosis Status Report 15-03-2006

  2. Contents • Network performance metrics • Active monitoring tools • Preparation of canonical dataset • Anomaly detection • Anomaly diagnosis

  3. Network performance metrics • One-Way Delay (OWD) • Serialization Delay • Propagation Delay • Queuing Delay • Forwarding Delay • Round-Trip Time (RTT) • Delay Variation (Jitter) • Packet Loss • Congestion • Errors • Packet Reordering • Maximum Transmission Unit (MTU) • Bandwidth Delay Product (BDP)

  4. One-Way Delay (OWD) • The time it takes for a packet to reach its end-to-end destination • It can be broken down into: • per-hop one-way delays, and these in turn into: • per-linkand • per-node delay components • The per-link component of one-way delay consists of two sub-components: propagation delay and serialization delay. • The per-node component of one-way delay also consists of two sub- components: forwarding delay and queuing delay.

  5. Serialization Delay • Its the time taken to separate a packet into sequential link transmission units (bits). • It is obtained by dividing the packet size (in bits) by the capacity of the link (in bits per second). • Nowadays, as links increasingly have a higher bit rate, serialization delay is less relevant.

  6. Propagation Delay • Propagation delay is the duration of time forsignals to move from the transmitting to the receiving end of a link. • On simple links, this is the product of: • the link's physical length and • the characteristic propagation speed of media. • On high-speed wide-area network (WAN) paths, delay is usually dominated by propagation times.

  7. Queuing Delay • Queuing delay is defined as the time a packet has to spend inside a node such as a router while waiting for availability of the output link. • It depends on • the amount of traffic competing to send packets towards the output link and • on the priorities of the packet

  8. Forwarding Delay • Processing at the node • Reading forwarding-relevant information • Dest address + other headers • Forwarding decision • Based on the routing table

  9. Round-Trip Time (RTT) • Its the sum of the one-way delays from source to destination plus time it takes B to formulate the response. • Large RTT values can cause problems for TCP and other window-based transport protocols. • The round-trip time influences the achievable throughput, as there can only be a window's worth of unacknowledged data in the network.

  10. Delay Variation (Jitter) • OWD is not constant on a real network because of competing traffic and contention for processing resources. • The difference between a given packet’s actual and average OWD is termed ‘delay variation’ or jitter. • It only compares the delays experienced by packets of equal size, as OWD is dependent on packet size because of serialization delay

  11. Packet Loss • Packet loss is determined as the probability of a packet being lost in transit from a source A to a destination B. • Applications requiring reliable transmission e.g. Bulk data transfers, use retransmission, which reduces performance. • In addition, congestion-sensitive protocols such as standard TCP assume that packet loss is due to congestion, and respond by reducing their transmission rate accordingly. • Congestion and errors are the two main reasons for packet loss.

  12. Congestion • When the offered load exceeds the capacity of a part of the network, packets are buffered in queues. • Since these buffers are also of limited capacity, congestion can lead to queue overflows, which leads to packet losses. • Congestion can be caused by • moderate overload condition maintained for an extended amount of time or • by the sudden arrival of a very large amount of traffic (traffic burst).

  13. Errors • Another reason for loss of packets is corruption, where parts of the packet are modified in-transit. • When such corruptions happen on a link (due to noisy lines etc.), this is usually detected by a link-layer checksum at the receiving end, which then discards the packet.

  14. Packet Reordering • The Internet Protocol (IP) does not guarantee the packet ordering. • Packet reordering concerns packets of different sizes. • larger packets take longer to transfer and may be overtaken by smaller packets in transit. • It can be measured by: • injecting the same traffic pattern via traffic generator and calculating the reordering. • To measure maximal reordering: short burst of long packets immediately followed by a short burst of short packets

  15. Maximum Transmission Unit (MTU) • It describes the maximum size of an IP packet that can be transferred over the link without fragmentation. • Common MTU sizes are: • 1500 bytes (Ethernet, 802.11 WLAN) • 4470 bytes (FDDI, common default for POS and serial links) • 9000 bytes (Internet2 and GÉANT convention, limit of some Gigabit Ethernet adapters) • 9180 bytes (ATM, SMDS)

  16. Bandwidth Delay Product (BDP) • The Bandwidth Delay Product (BDP) of an end-to-end path is the product of the bottleneck bandwidth and the delay of the path. • It is often useful to think of BDP as the "memory capacity" of a path, i.e. the amount of data that fits entirely into the path between two end-systems. • This relates to throughput, which is the rate at which data is sent and received. • Network paths with a large BDP are called Long Fat Networks or LFNs. • BDP is an important parameter for the performance of window-based protocols such as TCP.

  17. Available bandwidth • dynamic bandwidth capacity (Cap) by analyzing the minimum inter-packet delay; • the cross-traffic (Xtr) byanalyzing the inter-packet delay variability; • and the available bandwidth • (Abw = Cap – Xtr)

  18. Active Probing Tools • Some publicly available active network monitoring tools are: • Throughput & Delay Measurement Tools • Ping • Traceroute • Iperf • Thrulay • Path Characterization & Bandwidth Estimation • pathChirp • Pathload • ABwE • Netperf • Nettest

  19. Ping • Uses ICMP echo mechanism • Measures • RTT • Packet loss • Reachability • Source & Destination use timestamps to calculate RTT • Packet loss can be measured by sending a set of packets from a source to a destination and comparing the number of received packets against the number of packets sent.

  20. Remote host Ping example Repeat count Packet size RTT syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com PING thumper.bellcore.com (128.96.41.1): 64 data bytes 72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms 72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms 72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms 72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms 72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms --- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5 packets received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms Missing seq # Summary

  21. Traceroute • Determines the route a packet takes through the Internet to reach its destination; i.e. the number of "hops" it takes. • It sends "probe" packets with TTL values incrementing from one, and uses ICMP "Time Exceeded" messages to detect "hops" on the way to the specified destination. • It also records "response" times for each hop, and displays losses and other types of failures • Used to measure • hop-by-hop connectivity and • RTT

  22. Traceroute Max hops Remote host Probes/hop • UDP/ICMP tool to show route packets take from local to remote host 17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pk traceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms 4 snv-slac.es.net (134.55.208.30) 1.377 ms 5 nyc-snv.es.net (134.55.205.22) 75.536 ms 6 nynap-nyc.es.net (134.55.208.146) 80.629 ms 7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms 8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms 9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms 10 207.45.205.18 (207.45.205.18) 128.648 ms 11 210.56.31.94 (210.56.31.94) 762.150 ms 12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms 13 * 14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms No response: Lost packet or router ignores

  23. Iperf • It reports bandwidth, delay jitter, datagram loss. • It has two modes • TCP • UDP • In TCP mode it is used to measure thebandwidth. • It reports MSS (Maximum Segment Size) and the observed read sizes. It supports TCP window size adjustment via socket buffers. • In UDP mode packet loss and delay jitter can be measured.

  24. Thrulay • Thrulay stands for THRUput and deLAY. • It measures the capacity of a network by sending a bulk TCP stream over it. • One-way delay can be measured by sending a Poisson stream (arrivals at random) of very precisely positioned UDP packets.

  25. Pathload • Pathload is based on the technique of Self-Loading Periodic Streams (SLoPS), for measuringavailable bandwidth. • A periodic stream in SLoPS consists of K packets of size L, sent to the path at a constant rate R. • If the stream rate R is higher than the available bandwidth, the one-way delays of successive packets at the receiver show an increasing trend. • In pathload, sender uses UDP for packet stream and TCP connection for controlling measurements. • Sender timestamps each packet upon transmission, which provides relative one-way delay at the receiver side.

  26. pathChirp • pathChirp is used to calculate unused capacity or available bandwidth. • Based on the concept of self-induced congestion. • Probing rate < available bw  no delay increase • Probing rate > available bw  delay increases • Unique to pathChirp is an exponentially spaced chirp probing train.

  27. ABwE • ABwE is a light weight available bandwidth measurement tool based on the packet pair dispersion technique. • It uses fixed size packets with specific delay between each packet pair. • The observed packet pair delay is converted into available bandwidth calculations. • The tool can be used in continuous mode and detects all substantial bandwidth changes caused by improper routing or by congestions.

  28. Netperf • Netperf is a benchmark that can be used to measure the performance of many different types of networking. • It provides tests for both unidirectional throughput, and end-to-end latency. • Essentially, these tests will measure how fast one system can send data to another and/or how fast that other system can receive it. • Request/response performance is the second area that can be investigated with netperf.

  29. Selection of tools • Shriram et al. concludes his comparison of public End-to-End bandwidth estimation tools on High-Speed Links with following remarks: • “Pathload and Pathchirp are the most accurate. Iperf requires maximum buffer size and is sensitive to small packet loss.” • Ribeiro et al. in comparisons of pathChirp with pathload and TOPP, clearly show that pathChirp performs far batter than the other two tools in single-hop, multi-hop scenarios and on real internet as well. • Cottrel et al. states “Ping utility does not require any extra installation/deployment efforts and has become a de-facto standard for measuring RTT and packet loss.” • Tools selected are pathChirp, pinger, thrulay and traceroute.

  30. References • Cottrell L., Internet Monitoring, Presented at NUST Institute of Information Technology (NIIT) Rawalpindi, Pakistan, March 15, 2005 • Cottrell L., Matthews W. and Logg C., Tutorial on Internet Monitoring & PingER at SLAC • Jacobson V., “pathchar — a tool to infer characteristics of Internet paths” • Ribeiro V., Riedi R., Baraniuk R., Navratil J., Cottrell L. pathChirp: Efficient Available Bandwidth Estimation for Network Paths. • Jain M., Dovrolis C., End-to-End Available Bandwidth: Measurement Methodology, Dynamics, and Relation with TCP Throughput. ACM SIGCOMM 2002. • Navratil J. and. Cottrell L., ABwE: A Practical Approach to Available Bandwidth Estimation. • http://thrulay-hd.sourceforge.net

  31. References • http://www.netperf.org/netperf/training/Netperf.html • Beginner's Guide to Network-Distributed Resource Usage • Cottrell, L., Logg, C.: Overview of IEPM-BW Bandwidth Testing of Bulk Data Transfer. In: Sc2002: High Performance Networking and Computing. (2002) • http://dast.nlanr.net/Projects/Iperf/iperfdocs_1.7.0.html • http://www-didc.lbl.gov/NCS/netest/ • Shriram A., Murray M., Hyun Y., Brownlee N., Broido A., Fomenkov M., claffy k., “Comparison of Public End-to-End Bandwidth Estimation tools on High-Speed Links” • Prasad R. S., Dovrolis C., “Capacity estimation using packet dispersion techniques : recent progress and open issues” Presentation • http://www.academ.com/nanog/june1997/performance.html • Labit Y., Owezarski P., Larrieu N., Evaluation of Active Measurement Tools for Bandwidth Estimation in Real Environment. • Dovrolis C., Jain M., “End-to-end available bandwidth estimation using Pathload” Presentation • claffy k., Dovrolis C., “Bandwidth estimation: measurement methodologies and applications” Presentation • GÉANT2 Performance Enhancement and Response Team (PERT) User Guide and Best Practice Guide

  32. Anomaly Detection Techniques • Exponential weighted moving average (EWMA) • Holt-Winters forecasting algorithm • Plateau Algorithm • Principal Component Analysis (PCA) • ARIMA-based Box-Jenkins forecasting models • Signal analysis techniques, e.g. the wavelet analysis scheme

  33. EWMA • EWMA, a forecasting based techniques • EWMA predicts the next value in a given timeseries based on recent history. • Specifically, if ztdenotes the traffic at time t, then the EWMA prediction for time t+1 is given by ^zt+1 as: • Anomalies can then be quantified by taking the difference between the forecasted and the actual value, i.e.,

  34. Holt-Winters (HW) Algorithm • Service network variable time series exhibit the following regularities: • A trend over time (i.e., a gradual increase in application daemon requests over a two month period due to increased subscriber load). • A seasonal trend or cycle (i.e., every day bytes per second increases in the morning hours, peaks in the afternoon and declines late at night). • Seasonal variability. (i.e., application requests fluctuate wildly minute by minute during the peak hours of 4-8 pm, but at 1 am application requests hardly vary at all).

  35. Holt-Winters (HW) Algorithm • The Holt-Winters (HW) algorithm uses a triple Exponential Weighted Moving Average (EWMA) approximation to characterize the time series behavior as a superposition of three components: a baseline, a linear trend and a seasonal effect.

  36. Plateau Algorithm • It is used to detect a significant change in the base RTT of a path. • Base RTT means a change in most of the RTT values measured, new values are based on the new, higher level. • The plateau detector is based around two windows, • the summary window and • the sample window. • Summary window is used to characterize the normal state of the path which is based on the previous samples. • The number of samples stored in the summary window is a user settable parameter.

  37. Plateau Algorithm • Samples are first stored in summary window, trigger selector decides if a sample forms part of a trigger. • In this case, the sample is stored in the sample window. • This decision is made on the value of the sample, the mean and variance of the summary window, and on parameters that the user has chosen. • RTT > samplemean + (samplevariance* sensitivty)

  38. Plateau Algorithm • A trigger is generated when a number of samples (a user settable parameter) meet the above equation are received. • Each time a sample, that fulfils the above equation, is received a counter is incremented. • For a non-zero counter when a sample that does not meet the above equation is received the counter is decremented. • If the counter reaches the user parameter trigger duration a trigger is generated. If the counter returns to zero the trigger is aborted. • In either case, data in the sample window is passed to the summary window.

  39. Principal Component Analysis (PCA) • PCA is a coordinate transformation method that maps a given set of data points onto new axes. These axes are called the principal axes or principal components. • When working with zero-mean • data, each principal component has the property that it points in • the direction of maximum variance remaining in the data, given • the variance already accounted for in the preceding components. • As such, the first principal component captures the variance of the • data to the greatest degree possible on a single axis. The next principal • components then each capture the maximum variance among • the remaining orthogonal directions. Thus, the principal axes are • ordered by the amount of data variance that they capture.

  40. References • McGregor A.J. and Braun H-W, ”Automated Event Detection for Active Measurement Systems”, Proceedings of PAM2001, Amsterdam, Netherlands, April 2001

  41. Preparation of Canonical dataset • Fetch data from monitoring nodes (Pathchirp, Thrulay, Ping, Traceroute) - Adnan, Ali • Study/Analyze Data - Adnan, Ali • Run Run-period - Adnan • Analyze Alerts - Adnan, Ali • Design a format - Adnan, Ali • Rearrange data according to new format - Adnan, Ali

  42. Monitoring Node Monitoring Node Monitoring Node Monitoring Node Monitored Nodes = 21 Topology to be used

  43. Data being collected using this topology • Pinger: Round trip Time (Min, Max, Avg) • Pathchirp: Throughput (Avg, Max, Min, Stdev) • Thrulay: Throughput (Avg, Max, Min, Stdev) RTT (Avg, Max, Min, Stdev) • trace route summaries • Host Monitoring data (e.g. data from LISA, Nagios, Ganglia) --- candidate

  44. Format of canonical data set • Not sure, needs some more investigation of available data from multiple sources. • Data is fetched from 1st Jan, 06 to 28th Feb 2006.

More Related