1 / 41

Internet Monitoring - Results

Internet Monitoring - Results. Les Cottrell & Warren Matthews SLAC < cottrell@slac.stanford.edu> <warrenm@slac.stanford.edu> Presented at the XIWT Meeting, San Francisco, Mar 1998 http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/

katy
Download Presentation

Internet Monitoring - Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Internet Monitoring - Results Les Cottrell & Warren Matthews SLAC <cottrell@slac.stanford.edu> <warrenm@slac.stanford.edu> Presented at the XIWT Meeting, San Francisco, Mar 1998 http://www.slac.stanford.edu/grp/scs/net/talk/xiwt-mar98/ Partially funded by MICS joint SLAC/LBL proposal on Internet End-to-end Performance Monitoring (IEPM) \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  2. Outline of Talk • What, why & how are we (ESnet/HENP community) measuring? • What PingER measurement reports are available and what do they show • short, intermediate & long term • Traffic volume & Traceroute measurements • Summary • Deployment/development, Internet Performance, Next Steps • Collaborations \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  3. Why go to the effort? • Apparent quality of Internet getting worse as size and demands increases • Internet woefully under-measured & under-instrumented • Internet very diverse - no single path typical • Users need: • realistic expectations, planning information • guidelines for setting SLAs • information to help in identifying problems • help to decide where to apply resources \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  4. Why the focus on Ping • “Universally available”, easy to understand • no software for clients to install • Low network impact • Select hosts carefully, concerns over routers, loaded hosts etc. • Provides end-to-end (user view vs network infrastructure view) loss, response time, reachability, unpredictability \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  5. Importance of Response Time • Time is scarcest and most valuable commodity • Studies in late 70’s and early 80s showed the economic value of Rapid Response Time • 0-0.4s High productivity interactive response • 0.4-2s Fully interactive regime • 2-12s Sporadically interactive regime • 12s-600s Break in contact regime • >600s Batch regime • Threshold around 4-5s complaints increase rapidly. • Voice has threshold around 100ms \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  6. Perception of Poor Packet Loss • Above 4-6% packet loss video conferencing becomes irritating, and non native language speakers become unable to communicate. • The occurrence of long delays of 4 seconds or more at a frequency of 4-5% or more is also irritating for interactive activities such as telnet and X windows. • Above 10-12% packet loss there is an unacceptable level of back to back loss of packets and extremely long timeouts, connections start to get broken, and video conferencing is unusable. \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  7. Our Main Metric is Ping • “Universally available”, easy to understand • no software for clients to install • Low network impact • select hosts carefully, concerns over routers, loaded hosts etc. • Provides loss, response time, reachability, unpredictability • Provides useful real world measures \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  8. Ping Response vs Web Response HTTP GET Response (ms) Minimum Ping Response (ms) \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  9. Method • Measurement • Each Collection site keeps list of remote hosts to ping at sites it is interested in • Every 30 mins ping each remote host with 11 * 100 byte followed by 10 * 1000 byte pings • Min separation of pings is 1 second, timeout 20 seconds • Throw away first ping • Measure response, packet loss, host unreachable (no answer to any ping) • Record loss, min/avg/max response time and make available • Have Poisson sampling & median measurement in beta \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  10. Architecture HTTP WWW Reports & Data E.g. HEPNRC E.g. SLAC Analysis Analysis Archive Collecting Collecting Collecting Collecting Pings Remote Cache Remote Remote Remote \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  11. Long Term Reports • Currently only available from SLAC • Tabular reports generated automatically by SAS • Monthly averages: • Response time, packet loss for prime time (SLAC) • Quiescent frequency • Reachability & Unpredictability \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  12. Monthly Packet Loss Click here for Excel Click here for 180 day plot Sorted to show worst at top Colored for quality \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  13. Graphical Analysis • Use Excel manually or with macros for more detailed analysis • graphs, • means, medians, standard deviations, distributions, percentiles • fits \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  14. Ranked packet loss for 3 months Stanford Rome UK Cincinnatti \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  15. Sawtooth Effect 2 * capacity (+ 2Mbps) Added 45 Mbps (quadrupled capacity) 3 * capacity + 9 Mbps Holiday \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  16. RAL Last 180 Days plot Lines are simply cubic splines fits to aid eye Upper green and black points are response time in ms Red & blue are weekday loss Cyan are weekend loss Note weekend/weekday differences Note Xmas/New Year lull Also note quick onset of saturation at end August & September \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  17. Ping Response & Loss between HEPNRC & Manchester Dec-Jan ‘97/98 \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  18. Italian sites look similar to each other \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  19. Representative International HENP Site Loss Jan-95 thru Nov-97 • Note RL (UK) saw-tooths as add UK-US bandwidth (Apr-96, Feb-97, Aug-97) \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  20. Aggregation • Group measurements, for example: • by area (e.g. N. America E, N. America E, W. Europe/Japan, others, by country) • trans-oceanic links • separation e.g. number of hops, time zones crossed, IXPs crossed • ISP (ESnet, vBNS/I2, ...) • by monitoring site • one site seen from multiple sites • common interest/affiliation (XIWT, HENP …) • user selectable \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  21. Group Selection for Ping Loss Plots • Allow wild cards • Allow pre-selected groups • In beta test \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  22. Group Response Time Jan-95 Nov-97 • Improved between 1 and 2.5% / month • Response & Loss similar improvements \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  23. Network Quiescence • Frequency of zero packet loss (for all time - not cut on prime time) \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  24. Ping Loss Quality • Want quick to grasp indicator of link quality • Loss is the most sensitive indicator • loss of packet requires ~ 4 sec TCP retry timeout • Studies on economic value of response time by IBM showed there is a threshold around 4-5secs where complaints increase. • 0-1% = Good 1-2.5% = Acceptable • 2.5%-5% = Poor 5%-12% = Very Poor • > 12% = Bad \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  25. Quality Distributions • ESnet median good quality • All other groups poor or very poor • Critical to have good peering \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  26. Traffic Growth • Read out of external router Exponential growth from 2.5-6% \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  27. Traffic Volume for Germany (DFN) DFN T1 Utilization 15 Jan ‘98 (5 min averages) Green = to US Blue = from US DFN T1 Utilization for 15 Jan ‘98 (5 min averages) # of 2 min periods in Dec-96 with peak utilization > y % To US From US # Samples \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  28. Traffic Volume for ESnet Italy Link INFN T1 Link Utilization for 16 Jan’98 # of 2 min periods in Dec-96 with peak utilization > y % From US To US \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  29. Traceroute • Reverse traceroute servers • provides traceroute from Web server to client • available at about 30 HENP & ESnet sites \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  30. TracePing Muliple routes seen \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  31. Traceroute From TRIUMF • Reverse traceroute servers • Traceping • TopologyMap • Ellipses show node on route • Open ellipse is measurement node • Blue ellipse no reachable • Keeping history \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  32. Summary • Deployment Development • ESnet/HENP has 14 Collection sites in 8 countries collecting data on > 500 links involving 22 countries • 600MB/month/link, 6 bps/link, .25 FTE @ analysis site, 1.5-2.5 FTE on analysis • HEPNRC gathering, archiving • Long term reports being ported to HEPNRC from SLAC • Long term analysis today requires tool like SAS • Cost of SAS (or Oracle) license problem for analysis site • XIWT/IPWT deployed ~ 6 collection sites using PingER tools \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  33. Summary • Deployment Development • Internet Performance • Performance within ESnet is good • Performance between ESnet & other sites is poor to very poor on average • one of main causes is congestion points, so peering is critical • ESnet traffic accepted from major HENP labs growing by 2.5-6% per month • Response time improving by 1-2% / month • Packet loss improving between SLAC & other sites by 3% / month \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  34. Summary • Deployment Development • Internet Performance (continued): • Links to sites outside N. America vary from good (KEK) to bad • Some of the bad sites are to be expected, e.g. FSU, China, Czeck Republic, some surprises such as UK • CERN, France, Germany acceptable to poor \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  35. Summary • Deployment Development • Internet Performance • Next Steps • Improve tools: • Deploy Poisson sampling & median measurements • Extend MapPing & bring to production (work with NLANR), port traceping to Unix, extend deployment of traceroute topology map • Make long term reports at Analysis site available & understandable • Get group defining/selection going • Look at & compare site performance seen from multiple sites • Look at new visualization techniques • Look into prediction (extrapolations, develop models, configure and validate with data) • Pursue IETF Surveyor & NIMI deployment \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  36. National Internet Measurement Infrastructure (NIMI) • Secure, scalable infrastructure for scheduling monitoring, gathering data • Minimal amount of human intervention • Inexpensive probe built on PC FreeBSD platform • Dynamic - can add/modify measurement suites, initially includes: • Traceroute • TReno - measures bulk transfer thruput • Poip - one way ping \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  37. Asymmetric One-way Delays 20% U Chicago to Advanced Advanced to U Chicago Loss Loss 0% 300ms Delay Delay 0ms 0 \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt 24

  38. Summary • Deployment Development • Internet Performance • Next Steps • Lots of collaboration: • SLAC & HEPNRC • 14 collection sites, ~ 400 remote sites • Collection site tools CERN & CNAF/ICFA • Oxford/TracePing • MapPing/MAPNet/NLANR • TRIUMF Traceroute topology Map • NIMI/LBNL & Surveyor/IETF • XIWT/IPWT • Talks at IETF, XIWT, ICFA, ESCC ... \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  39. Summary • Deployment Development • Internet Performance • Next Steps • Lots of collaboration: • To join: • Collection site needs: • perl5 & HTTP server • install timeping & pingdata (need only cgi-bin access, not root) • decide on links to monitor • Get an analysis site to retrieve & generate graphs, or at least get connectivity.pl & ping_data_plot.pl • Need volunteers to work on analysis scripts, some of it will require SAS, also need Java applets to visualize, \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  40. More Information • ICFA Monitoring WG home page (links to status report, meeting notes, how to access data, and code) • http://www.slac.stanford.edu/xorg/icfa/ntf/home.html • WAN Monitoring at SLAC has lots of links • http://www.slac.stanford.edu/comp/net/wan-mon.html • Tutorial on WAN Monitoring • http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html • MapPing Tool: • http://www.slac.stanford.edu/~warrenm/work/java/newjava/mapping.html • NIMI http://www.psc.edu/~mahdavi/nimi_paper/NIMI.html \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

  41. Internet Monitoring \\pcbackup\users\cottrell\xiwg\xiwt-mar98.ppt

More Related