1 / 12

NASA EOS Active Network Performance Testing Using Web100

NASA EOS Active Network Performance Testing Using Web100. Andy Germain Swales Aerospace 1 August 2002. Andy.Germain@gsfc.nasa.gov 301-902-4352. EOS Active Testing Overview. End-to-end user level test Active testing, no visibility into network internals Communities

rigg
Download Presentation

NASA EOS Active Network Performance Testing Using Web100

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NASA EOSActive Network Performance TestingUsing Web100 Andy Germain Swales Aerospace 1 August 2002 Andy.Germain@gsfc.nasa.gov 301-902-4352

  2. EOS Active Testing Overview • End-to-end user level test • Active testing, no visibility into network internals • Communities • EOS Internal Network: 9 Sites, 8 Sources, 13 Sinks • "Production" Flows, dedicated bandwidth • EOS Science Users: About 50 sites, tested from EOS DAACs • "QA" and Science flows, often via Abilene • CEOS: About 20 International sites • Earth Observation data sharing • Purposes • Verify that networks as implemented meet SLA and/or requirements • Assess whether networks can support intended applications • Resolve user complaints: Network problems -- or elsewhere?? • Determine bottlenecks -- seek routing alternatives • Provide a basis for allocation of additional resources • Results at http://corn.eos.nasa.gov/networks Andy Germain

  3. Test Process • Test script runs hourly to each site: • Traceroute (1 way) • Number of hops -- route stability  Hops Chart • Pings • 100 pings prior to thruput test and/or 100/300 during • Round Trip Time  RTT Chart • Packet Loss  Packet Loss Chart • TCP Throughput • Iperf  Thruput Chart • keeps send buffer full for 30 Seconds • Netstat  packets retransmitted (if pings blocked) Andy Germain

  4. EOS Performance Test Sites ASF Wash Toronto CCRS Mont MIT BU ORST Mich SUNY-SB Wisc Niagara Penn State EDC Ohio Colo St. Chicago GSFC UMD RSS NOAA SLAC UVA GPN NCAR NSIDC LaRC NGDC, NOAA NCDC UCSB JPL LANL Ariz MSFC, NSSTC UCSD NMEX Texas Key: USF EOS DAAC Miami NASA Nodes SCFs QA Other Other Nodes Andy Germain

  5. EOS International Test Sites RAL, OXFORD (Aura) IRE-RAS ASF CCRS CAO (SAGE III) UCL (Terra) Toronto (Terra) KNMI (Aura) JRC EDC MITI (Terra) GSFC ESRIN NSIDC LaRC JPL NASDA (ADEOS, Israel TRMM, Aura, Aqua) AIT, RFD, GISTDA INPE (Aqua), IDN CONAE CSIRO Mission Partner PI: QA/IST EOSDIS CEOS Andy Germain

  6. Uses of Web100 • One of our sources at GSFC runs Web100 • King = "GSFC MAX" • Connected to MAX by GigE • Typical use is in problem solving • DTB, Triage • Window size (easier to use than tcpdump) • Vs. circuit limitations vs. packet loss • Also ANLiperf • Window size again • Plan: extract packet drops from web100, not pings or netstats Andy Germain

  7. A recent case • Sending data from LaRC to JPL via a project dedicated 20 mbps ATM VC. • Problem surfaced after firewall was installed • Portus "proxy" firewall • RTT of 60 ms requires 150 KB windows • To fill pipe with a single TCP stream • Iperf worked well – a single stream typically got over 15 mbps • But ftp got < 8 mbps Andy Germain

  8. A recent case (2) • The problem, of course, was window size • Looked like it was the ftp application, since iperf performance showed that O/S was OK • But which end? • Ran ftps from both nodes to web100 node • Used DTB to capture window size • Problem: small disk quota  FTPs were quick • FTP data session not established until ftp started • So had to be quick to capture data with DTB • DTB showed one site had 64 kb windows • But problem was in O/S (IRIX), not ftp • Tcp_recvspace and tcp_sendspace • Iperf can exceed O/S defaults! Andy Germain

  9. Case #2 • Another case of limited thruput • This time iperf was limited • from one source to several destinations • Limit inverse to RTT  window size • But source and dest clearly used large windows • Testing to Web100 box showed source was not using extended windows • TCPdump on source showed it was! • Problem turned out to be PIX firewall • Nop'd out the WSCALE field! Andy Germain

  10. Case #3 • Iperf from GSFC to Tokyo XP • Via MAX, Abilene, Seattle, TransPac • Thruput appears to ramp up linearly for about 5 minutes (when no loss) • Then becomes window limited: • 1 MB window @ 188 ms RTT  42.5 mbps • Repeatable (more or less) • Low or no packet loss • Web100 Triage usually reports 100% path limited • But can't show early part of session (?) • What causes this ramp-up ??? Andy Germain

  11. Traceroute traceroute to perf.jp.apan.net (203.181.248.44), 30 hops max, 38 byte packets 1 enpl-rtr1-ge (198.10.49.57) 0.427 ms 0.325 ms 0.396 ms 2 169.154.192.49 (169.154.192.49) 0.397 ms 0.375 ms 0.275 ms 3 169.154.192.2 (169.154.192.2) 0.740 ms 1.266 ms 1.225 ms 4 gsfc-wash.maxgigapop.net (206.196.177.13) 1.093 ms 1.169 ms 0.907 ms 5 dcne-so3-1-0.maxgigapop.net (206.196.178.45) 1.434 ms 1.621 ms 1.410 ms 6 abilene-wash-oc48.maxgigapop.net (206.196.177.2) 1.073 ms 1.439 ms 1.352 ms 7 nycm-wash.abilene.ucaid.edu (198.32.8.46) 5.436 ms 5.570 ms 5.680 ms 8 clev-nycm.abilene.ucaid.edu (198.32.8.29) 17.747 ms 17.954 ms 17.764 ms 9 ipls-clev.abilene.ucaid.edu (198.32.8.25) 24.006 ms 24.380 ms 24.072 ms 10 kscy-ipls.abilene.ucaid.edu (198.32.8.5) 33.335 ms 33.263 ms 33.321 ms 11 dnvr-kscy.abilene.ucaid.edu (198.32.8.13) 43.781 ms 43.977 ms 43.756 ms 12 sttl-dnvr.abilene.ucaid.edu (198.32.8.49) 72.129 ms 72.286 ms 72.004 ms 13 TRANSPAC-PWAVE.pnw-gigapop.net (198.32.170.46) 72.204 ms 72.404 ms 72.220 ms 14 192.203.116.34 (192.203.116.34) 188.150 ms 188.216 ms 187.811 ms 15 perf.jp.apan.net (203.181.248.44) 187.786 ms 188.103 ms 188.040 ms Andy Germain

  12. Typical ramp up Client connecting to perf.jp.apan.net, TCP port 5002 TCP window size: 1000 KByte (WARNING: requested 500 KByte) ------------------------------------------------------------ [ 3] local 198.10.49.62 port 3623 connected with 203.181.248.44 port 5002 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.5 sec 808 KBytes 4.4 Mbits/sec [ 3] 1.5- 2.1 sec 856 KBytes 12.1 Mbits/sec [ 3] 2.1- 3.0 sec 1.4 MBytes 12.3 Mbits/sec [ 3] 3.0- 4.2 sec 1.7 MBytes 12.6 Mbits/sec [ 3] 13.0-14.2 sec 2.0 MBytes 14.8 Mbits/sec [ 3] 14.2-15.1 sec 1.7 MBytes 15.2 Mbits/sec [ 3] 15.1-16.0 sec 1.7 MBytes 15.2 Mbits/sec [ 3] 104.0-105.1 sec 2.8 MBytes 21.1 Mbits/sec [ 3] 105.1-106.0 sec 2.5 MBytes 22.4 Mbits/sec [ 3] 106.0-107.0 sec 2.8 MBytes 23.8 Mbits/sec Andy Germain

More Related