1 / 14

End-to-end Performance over Research Networks

Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP. End-to-end Performance over Research Networks. End-to-end Performance Issues. Performance seen by end users hasn't followed backbone upgrades “Wizard gap” (ordinary users vs. land speed record heroes)

chadmurray
Download Presentation

End-to-end Performance over Research Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP End-to-end Performanceover Research Networks

  2. End-to-end Performance Issues Performance seen by end users hasn't followed backbone upgrades “Wizard gap” (ordinary users vs. land speed record heroes) Issues solving multi-domain performance problems Issues solving multi-layer performance problems Lack of performance-oriented network monitoring -> The “ends” must be included in network performance work! endpoints, i.e. hosts, operating systems, applications (users even) campus networks and their administrators

  3. Various efforts to improve e2e performance Internet2 “e2epi” (end-to-end performance initiative) Performance workshops Web100 kernel instrumentation and other TCP enhancements for Linux enable end-user tools such as NDT (e.g. ndt.switch.ch) auto-tuning for TCP buffers experimental TCP variants (Vegas, Westwood, HS-TCP, BIC, S-TCP, H-TCP...) GN2 PERT (Performance Enhancement and Response Team) “like a CERT but for performance” chartered to “own” performance issues (no fingerpointing) collect knowledge, produce documentation (to make itself obsolete) Premium IP and other backbone-specific enhancements

  4. Bandwidth is not everything Most transfers over the Internet (including the GTREN) limited by RTT TCP window-size limitations for “LFNs” (Long Fat Networks) short flows delay-sensitive applications (conversational A/V, RPC, games...) -> what works well in the LAN won't always do so over the WAN help users tune TCP (Web100/NDT very useful here) provide assistance with application design and engineering alternatives to TCP etc. RTT harder to improve than bandwidth speed-of-light issue (btw. router hop-count quickly becoming irrelevant) some inter-continental connections more useful than others e.g. TEIN link through Siberia reduces EU-China RTT by half Other important performance indicators: availability, predictibility... -> using capacity as prime “connectivity” metric no longer justified.

  5. Example from right here (how NOT to do it) My traceroute [v0.71] agathe (0.0.0.0) Wed May 24 10:24:32 2006 Keys: Help Display mode Restart statistics Order of fields quit Packets Pings Host Loss% Snt Last Avg Best Wrst StDev 1. 10.129.21.252 0.0% 377 5.1 8.3 2.5 181.5 15.5 2. 10.64.1.8 1.3% 377 531.6 507.7 125.1 992.6 152.5 3. 172.28.95.109 2.1% 377 544.3 506.3 98.1 1003. 157.6 4. 172.28.74.22 1.6% 377 499.9 509.9 123.5 1204. 162.7 5. 172.28.76.19 1.6% 377 479.8 512.4 117.8 1155. 160.2 6. 172.28.76.33 2.7% 377 475.0 513.0 110.3 1134. 159.7 7. 172.28.75.17 2.7% 377 421.9 515.9 135.5 1102. 158.2 8. 172.28.87.4 2.9% 376 424.8 517.4 119.1 1067. 154.8 9. 172.28.218.241 2.1% 376 583.6 522.1 113.3 1096. 159.4 10. 193.158.5.13 2.9% 376 536.9 513.6 107.3 919.3 156.1 11. zrh-e4.ZRH.CH.net.DTAG.DE 3.7% 376 556.2 526.1 106.6 1027. 154.3 12. swiix1-g2-1.switch.ch 2.9% 376 511.2 534.6 120.0 1087. 158.8 13. 130.59.36.249 2.9% 376 533.0 529.7 139.7 1053. 152.1 14. swiCS3-10GE-1-1.switch.ch 2.7% 376 527.4 525.6 111.8 1052. 148.1 15. swiNM1-G1-0-25.switch.ch 1.6% 376 529.3 528.9 125.7 1090. 150.4 16. swiLM1-V610.switch.ch 2.4% 375 510.2 526.9 136.2 1037. 153.8 17. diotima.switch.ch 1.9% 375 575.2 526.9 149.9 959.0 152.4

  6. GN2 PERT Part of SA3 (Service Activity – End-to-end Quality of Service) also called PACE - “Performance and Allocated Capacity for End-Users” PERT Case Managers mostly from several NRENs duty CMs, rotating weekly (with videoconference briefings) dedicated CMs for some cases reachable through PTS (PERT Ticket System) or pert-report@geant2.net Subject Matter Experts (SMEs) participation issues of “recruiting” and involvement (on demand vs. interest-based) PERT Knowledge Base (KB) currently Wiki-based - http://kb.pert.switch.ch/ “Performance Guides” published as deliverables

  7. GN2 PERT Ticket System (PTS)

  8. PERT Knowledge Base (KB)

  9. GN2 PERT Cases (closed) DEISA TCP Throughput Reduction solved – due to GEANT packet reordering with heavy cross-traffic will partly go away with GEANT2 (some of the routers are upgraded) DEISA-Teragrid Performance (TCP throughput) closed, but not solved in due time (until demo was over) DEISA TCP Throughput issues with some sites found RTT dependency, GEANT->GEANT2 changes explain variations Loss of large packets on one of the e-VLBI (-> JIVE) paths resolved by configuration

  10. GN2 PERT cases (ongoing) ITER VPN information-gathering phase – VPN makes traditional diagnostics hard e-VLBI ongoing investigation – infrequent tests and network changes over time EU->US routing through Japan ongoing, but maybe not really a case for PERT? or, should we have all (GTREN) BGP geeks participate as SMEs?

  11. GN2 PERT Experience Weaknesses Few, and often difficult (but interesting!) cases Mostly large groups: DEISA, e-VLBI (JIVE), DESY/FNAL, ITER... Trying to open up to larger customer base It's hard to close cases! lack of clear success indicators Friction can be further reduced weekly Case Manager handover, PTS, SME involvements Strengths Brings users (researchers) closer to NOCs Mutual learning experience Bodes well for PERT Knowledge Base Provides vital input on measurement infrastructure requirements Inspires PERT activities in NRENs

  12. SWITCH PERT Example: Opera oberta Opera oberta high-quality multicast transmissions of opera from Barcelona and Madrid mostly Spanish participants, but a few in FR, MX, and now CH currently 9 Mb/s DVB+D5.1, experimenting with HDTV (~15 Mb/s) Customer (EPFL) contacted us early tests were unsatisfactory (due to problems at source, it turns out) set up NOC support (awareness, test participation, monitoring) one transmission still failed (due to misconfigured SWITCH router) fixed problem, improved NOC support (out-of-hours service) next transmission (last night) a success – it had to be... -> include aspects of availability and support in “performance” notion

  13. Conclusions significant potential for service improvements on current infrastructure end-host tuning, delay-robust protocols, better NOC cooperation PERT concept really helps improves customers' “reach” into backbones “user interface” can still be improved Leverage new developments in the future backbone measurement instrumentation, e.g. GN2 JRA1 PerfSONAR Premium IP and other “on-demand” services Long-term benefits smart users + dumb networks -> unexpected performance and innovation The end-to-end principles are honoured!

More Related