the performance bottleneck application computer or network l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Performance Bottleneck Application, Computer, or Network PowerPoint Presentation
Download Presentation
The Performance Bottleneck Application, Computer, or Network

Loading in 2 Seconds...

play fullscreen
1 / 108

The Performance Bottleneck Application, Computer, or Network - PowerPoint PPT Presentation


  • 294 Views
  • Uploaded on

The Performance Bottleneck Application, Computer, or Network Richard Carlson Internet2 Part 1 Outline Why there is a problem What can be done to find/fix problems Tools you can use Ramblings on what’s next Basic Premise Application’s performance should meet your expectations!

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

The Performance Bottleneck Application, Computer, or Network


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Why there is a problem
  • What can be done to find/fix problems
  • Tools you can use
  • Ramblings on what’s next
basic premise
Basic Premise
  • Application’s performance should meet your expectations!
  • If they don’t you should complain!
questions
Questions
  • How many times have you said:
    • What’s wrong with the network?
    • Why is the network so slow?
  • Do you have any way to find out?
    • Tools to check local host
    • Tools to check local network
    • Tools to check end-to-end path
underlying assumption
Underlying Assumption
  • When problems exist, it’s the networks fault!
simple network picture
Simple Network Picture

Bob’s

Host

Network Infrastructure

Carol’s

Host

network infrastructure
Network Infrastructure

Switch 2

Switch 3

R5

R4

R8

R1

R3

R6

Switch 1

R9

R2

R7

Switch 4

possible bottlenecks
Possible Bottlenecks
  • Network infrastructure
  • Host computer
  • Application design
network infrastructure bottlenecks
Network Infrastructure Bottlenecks
  • Links too small
    • Using standard Ethernet instead of FastEthernet
  • Links congested
    • Too many hosts crossing this link
  • Scenic routing
    • End-to-end path is longer than it needs to be
  • Broken equipment
    • Bad NIC, broken wire/cable, cross-talk
  • Administrative restrictions
    • Firewalls, Filters, shapers, restrictors
host computer bottlenecks
Host Computer Bottlenecks
  • CPU utilization
    • What else is the processor doing?
  • Memory limitations
    • Main memory and network buffers
  • I/O bus speed
    • Getting data into and out of the NIC
  • Disk access speed
application behavior bottlenecks
Application Behavior Bottlenecks
  • Chatty protocol
    • Lots of short messages between peers
  • High reliability protocol
    • Send packet and wait for reply before continuing
  • No run-time tuning options
    • Use only default settings
  • Blaster protocol
    • Ignore congestion control feedback
tcp 101
TCP 101
  • Transmission Control Protocol (TCP)
    • Provides applications with a reliable in-order delivery service
    • The most widely used Internet transport protocol
      • Web, File transfers, email, P2P, Remote login
  • User Datagram Protocol (UDP)
    • Provides applications with an unreliable delivery service
      • RTP, DNS
summary part 1
Summary – Part 1
  • Problems can exist at multiple levels
    • Network infrastructure
    • Host computer
    • Application design
  • Multiple problems can exist at the same time
  • All problems must be found and fixed before things get better
summary part 2
Summary – Part 2
  • Every problem exhibits the same symptom
    • The application performance doesn’t meet the users expectations!
outline16
Outline
  • Why there is a problem
  • What can be done to find/fix problems
  • Tools you can use
  • Ramblings on what’s next
real life examples
Real Life Examples
  • I know what the problem is
  • Bulk transfer with multiple problems
example 1 sc 04 experience
Example 1 - SC’04 experience
  • Booth having trouble getting application to run from Amsterdam to Pittsburgh
  • Tests between remote SGI and local PC showed throughput limited to < 20 Mbps
  • Assumption is: PC buffers too small
  • Question: How do we set WinXP send/receive window size
sc 04 determine winxp info
SC’04 Determine WinXP info

http://www.dslreports.com/drtcp

sc 04 confirm pc settings
SC’04 Confirm PC settings
  • DrTCP reported 16 MB buffers, but test program still slow, Q: How to confirm?
  • Run test to SC NDT server (PC has Fast Ethernet Connection)
    • Client-to-Server: 90 Mbps
    • Server-to-Client: 95 Mbps
    • PC Send/Recv window size: 16 Mbytes (wscale 8)
    • NDT Send/Recv window Size: 8 Mbytes (wscale 7)
    • Reported TCP RTT: 46.2 msec
      • approximately 600 Kbytes of data in TCP buffer
    • Min window size / RTT: 1.3 Gbps
sc 04 local pc configured ok
SC’04 Local PC Configured OK
  • No problem found
  • Able to run at line rate
  • Confirmed that PC’s TCP window values were set correctly
sc 04 remote sgi
SC’04 Remote SGI
  • Run test from remote SGI to SC show floor (SGI is Gigabit Ethernet connected).
    • Client-to-Server: 17 Mbps
    • Server-to-Client: 16 Mbps
    • SGI Send/Recv window size: 256 Kbytes (wscale 3)
    • NDT Send/Recv window Size: 8 Mbytes (wscale 7)
    • Reported RTT: 106.7 msec
    • Min window size / RTT: 19 Mbps
sc 04 remote sgi results
SC’04 Remote SGI Results
  • Needed to download and compile command line client
  • SGI TCP window is too small to fill transatlantic pipe (19 Mbps max)
  • User reluctant to make changes to SGI network interface from SC show floor
  • NDT client tool allows application to change buffer (setsockopt() function call)
sc 04 remote sgi tuned
SC’04 Remote SGI (tuned)
  • Re-run test from remote SGI to SC show floor.
    • Client-to-Server: 107 Mbps
    • Server-to-Client: 109 Mbps
    • SGI Send/Recv window size: 2 Mbytes (wscale 5)
    • NDT Send/Recv window Size: 8 Mbytes (wscale 7)
    • Reported RTT: 104 msec
    • Min window size / RTT: 153.8 Mbps
sc 04 debugging results
SC’04 Debugging Results
  • Team spent over 1 hour looking at Win XP config, trying to verify window size
  • Single NDT test verified this in under 30 seconds
  • 10 minutes to download and install NDT client on SGI
  • 15 minutes to discuss options and run client test with set buffer option
sc 04 debugging results26
SC’04 Debugging Results
  • 8 Minutes to find SGI limits and determine maximum allowable window setting (2 MB)
  • Total time 34 minutes to verify problem was with remote SGIs’ TCP send/receive window size
  • Network path verified but Application still performed poorly until it was also tuned
example 2 scp file transfer
Example 2 – SCP file transfer
  • Bob and Carol are collaborating on a project. Bob needs to send a copy of the data (50 MB) to Carol every ½ hour. Bob and Carol are 2,000 miles apart. How long should each transfer take?
    • 5 minutes?
    • 1 minute?
    • 5 seconds?
what should we expect
What should we expect?
  • Assumptions:
    • 100 Mbps Fast Ethernet is the slowest link
    • 50 msec round trip time
  • Bob & Carol calculate:
    • 50 MB * 8 = 400 Mbits
    • 400 Mb / 100 Mb/sec = 4 seconds
initial test results
Initial Test Results
  • This is unacceptable!
  • First look for network infrastructure problem
    • Use NDT tester to examine both hosts
ndt found duplex mismatch
NDT Found Duplex Mismatch
  • Investigating this it is found that the switch port is configured for 100 Mbps Full-Duplex operation.
    • Network administrator corrects configuration and asks for re-test
intermediate results
Intermediate Results
  • Time dropped from 18 minutes to 40 seconds.
  • But our calculations said it should take 4 seconds!
    • 400 Mb / 40 sec = 10 Mbps
    • Why are we limited to 10 Mbps?
    • Are you satisfied with 1/10th of the possible performance?
calculating the window size
Calculating the Window Size
  • Remember Bob found the round-trip time was 50 msec
  • Calculate window size limit
    • 85.3KB * 8 b/B = 698777 b
    • 698777 b / .050 s = 13.98 Mbps
  • Calculate new window size
    • (100 Mb/s * .050 s) / 8 b/B = 610.3 KB
    • Use 1MB as a minimum
steps so far
Steps so far
  • Found and fixed Duplex Mismatch
    • Network Infrastructure problem
  • Found and fixed TCP window values
    • Host configuration problem
  • Are we done yet?
intermediate results42
Intermediate Results
  • SCP still runs slower than expected
    • Hint: SCP uses internal buffers
    • Patch available from PSC
final results
Final Results
  • Fixed infrastructure problem
  • Fixed host configuration problem
  • Fixed Application configuration problem
    • Achieved target time of 4 seconds to transfer 50 MB file over 2000 miles
why is it hard to find fix problems
Why is it hard to Find/Fix Problems?
  • Network infrastructure is complex
  • Network infrastructure is shared
  • Network infrastructure consists of multiple components
shared infrastructure
Shared Infrastructure
  • Other applications accessing the network
    • Remote disk access
    • Automatic email checking
    • Heartbeat facilities
  • Other computers are attached to the closet switch
    • Uplink to campus infrastructure
  • Other users on and off site
    • Uplink from campus to gigapop/backbone
other network components
Other Network Components
  • DHCP (Dynamic Host Resolution Protocol)
    • At least 2 packets exchanged to configure your host
  • DNS (Domain Name Resolution)
    • At least 2 packets exchanged to translate FQDN into IP address
  • Network Security Devices
    • Intrusion Detection, VPN, Firewall
network infrastructure48
Network Infrastructure
  • Large complex system with potentially many problem areas
why is it hard to find fix problems49
Why is it hard to Find/Fix Problems?
  • Computers have multiple components
  • Each Operating System (OS) has a unique set of tools to tune the network stack
  • Application Appliances come with few knobs and limited options
computer components
Computer Components
  • Main CPU (clock speed)
  • Front & Back side bus
  • Main Memory
  • I/O Bus (ATA, SCSI, SATA)
  • Disk (access speed and size)
computer issues
Computer Issues
  • Lots of internal components with multi-tasking OS
  • Lots of tunable TCP/IP parameters that need to be ‘right’ for each possible connection
why is it hard to find fix problems52
Why is it hard to Find/Fix Problems?
  • Applications depend on default system settings
  • Problems scale with distance
  • More access to remote resources
default system settings
Default System Settings
  • For Linux 2.6.13 there are:
    • 11 tunable IP parameters
    • 45 tunable TCP parameters
    • 148 Web100 variables (TCP MIB)
      • Currently no OS ships with default settings that work well over trans-continental distances
  • Some applications allow run-time setting of some options
    • 30 settable/viewable IP parameters
    • 24 settable/viewable TCP parameters
    • There are no standard ways to set run-time option ‘flags’
application issues
Application Issues
  • Setting tunable parameters to the ‘right’ value
  • Getting the protocol ‘right’
how do you set realistic expectations
How do you set realistic Expectations?
  • Assume network bandwidth exists or find out what the limits are
    • Local LAN connection
    • Site Access link
  • Monitor the link utilization occasionally
    • Weathermap
    • MRTG graphs
  • Look at your host config/utilization
    • What is the CPU utilization
ethernet fastethernet gigabit ethernet
Ethernet, FastEthernet, Gigabit Ethernet
  • 10/100/1000 auto-sensing NICs are common today
  • Most campuses have installed 10/100 switched infrastructure
  • Access network links are currently the limiting factor in most networks
  • Backbone networks are 10 Gigabit/sec
site access and backbone
Site Access and Backbone
  • Campus access via Regional ‘GigaPoP’
    • Confirm with campus admin
  • Abilene Backbone
    • 10 Gbps POS links coast-to-coast
  • Other Federal backbone networks
  • Other Commercial network
  • Other institutions, sites, and networks
tools tools tools
Tools, Tools, Tools
  • Ping
  • Traceroute
  • Iperf
  • Tcpdump
  • Tcptrace
  • BWCTL
  • NDT
  • OWAMP
  • AMP
  • Advisor
  • Thrulay
  • Web100
  • MonaLisa
  • pathchar
  • NPAD
  • Pathdiag
  • Surveyor
  • Ethereal
  • CoralReef
  • MRTG
  • Skitter
  • Cflowd
  • Cricket
  • Net100
active measurement tools
Active Measurement Tools
  • Tools that inject packets into the network to measure some value
    • Available Bandwidth
    • Delay/Jitter
    • Loss
  • Requires bi-directional traffic or synchronized hosts
passive measurement tools
Passive Measurement Tools
  • Tools that monitor existing traffic on the network and extract some information
    • Bandwidth used
    • Jitter
    • Loss rate
  • May generate some privacy and/or security concerns
outline64
Outline
  • Why there is a problem
  • What can be done to find/fix problems
  • Tools you can use
  • Ramblings on what’s next
focus on 3 tools
Focus on 3 tools
  • Existing NDT tool
    • Allows users to test network path for a limited number of common problems
  • Existing NPAD tool
    • Allows users to test local network infrastructure while simulating a long path
  • Emerging PerfSonar tool
    • Allows users to retrieve network path data from major national and international REN network
network diagnostic tool ndt
Network Diagnostic Tool (NDT)
  • Measure performance to users desktop
  • Identify real problems for real users
    • Network infrastructure is the problem
    • Host tuning issues are the problem
  • Make tool simple to use and understand
  • Make tool useful for users and network administrators
ndt user interface
NDT user interface
  • Web-based JAVA applet allows testing from any browser
  • Command-line client allows testing from remote login shell
ndt test suite
NDT test suite
  • Looks for specific problems that affect a large number of users
    • Duplex Mismatch
    • Faulty Cables
    • Bottleneck link capacity
    • Achievable throughput
    • Ethernet duplex setting
    • Congestion on this network path
duplex mismatch detection
Duplex Mismatch Detection
  • Developing analytical model to describe how network operates (no prior art?)
  • Expanding model to describe UDP and TCP flows
  • Test models in LAN, MAN, and WAN environments

NIH/NLM grant funding

four cases of duplex setting
Four Cases of Duplex Setting

FD-FD

FD-HD

HD-FD

HD-HD

bottleneck link detection
Bottleneck Link Detection
  • What is the slowest link in the end-2-end path?
    • Monitors packet arrival times using libpacp routine
    • Use TCP dynamics to create packet pairs
    • Quantize results into link type bins (no fractional or bonded links)

Cisco URP grant work

normal congestion detection
Normal congestion detection
  • Shared network infrastructures will cause periodic congestion episodes
    • Detect/report when TCP throughput is limited by cross traffic
    • Detect/report when TCP throughput is limited by own traffic
faulty hardware link detection
Faulty Hardware/Link Detection
  • Detect non-congestive loss due to
    • Faulty NIC/switch interface
    • Bad Cat-5 cable
    • Dirty optical connector
  • Preliminary works shows that it is possible to distinguish between congestive and non-congestive loss
full half link duplex setting
Full/Half Link Duplex setting
  • Detect half-duplex link in E2E path
    • Identify when throughput is limited by half-duplex operations
  • Preliminary work shows detection possible when link transitions between blocking states
finding results of interest
Finding Results of Interest
  • Duplex Mismatch
    • This is a serious error and nothing will work right. Reported on main page and on Statistics page
  • Packet Arrival Order
    • Inferred value based on TCP operation. Reported on Statistics page, (with loss statistics) and order: value on More Details page
finding results of interest76
Finding Results of Interest
  • Packet Loss Rates
    • Calculated value based on TCP operation. Reported on Statistics page, (with out-of-order statistics) and loss: value on More Details page
  • Path Bottleneck Capacity
    • Measured value based on TCP operation. Reported on main page
additional functions and features
Additional Functions and Features
  • Provide basic tuning information
  • Basic Features
    • Basic configuration file
    • FIFO scheduling of tests
    • Simple server discovery protocol
    • Federation mode support
    • Command line client support
  • Created sourceforge.net project page
npad pathdiag
NPAD/pathdiag
  • A new tool from researchers at Pittsburgh Supercomputer Center
  • Finds problems that affect long network paths
  • Uses Web100-enhanced Linux based server
  • Web based Java client
long path problem
Long Path Problem
  • E2E application performance is dependant on distance between hosts
  • Full size frame time at 100 Mbps
    • Frame = 1500 Bytes
    • Time = 0.12 msec
    • In flight for 1 msec RTT = 8 packets
    • In flight for 70 msec RTT = 583 packets
long path problem80
Long Path Problem

Switch 2

Switch 3

R5

R4

R8

R1

R3

R6

Switch 1

R9

R2

R7

Switch 4

H2

1 msec H1 – H2

H3

X

H1

70 msec H1 – H3

tcp congestion avoidance
TCP Congestion Avoidance
  • Cut number of packets by ½
  • Increase by 1 per RTT
    • LAN (RTT=1msec)
      • In flight changes to 4 packets
      • Time to increase back to 8 is 4msec
    • WAN (RTT = 70 msec)
      • In flight changes to 292 packets
      • Time to increase back to 583 is 20.4 seconds
perfsonar next steps in performance monitoring
PerfSonar – Next Steps in Performance Monitoring
  • New Initiative involving multiple partners
    • ESnet (DOE labs)
    • GEANT (European Research and Education network)
    • Internet2 (Abilene and connectors)
perfsonar router stats on a path
PerfSonar – Router stats on a path
  • Demo ESnet tool

https://performance.es.net/cgi-bin/perfsonar-trace.cgi

Paste output from Traceroute into the window and view the MRTG graphs for the routers in the path

Author: Joe Metzger ESnet

the wizard gap
The Wizard Gap*

* Courtesy of Matt Mathis (PSC)

google it
Google it!
  • Enter “tuning tcp” into the google search engine.
  • Top 2 hits are:

http://www.psc.edu/networking/perf_tune.html

http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html

internet2 land speed record
Internet2 Land Speed Record
  • Challenge to community to demonstrate how to run fast – long distance flows
  • 2000 record – 751 Mbps over 5,262 km
  • 2005 record - 7.2 Gbps over 30,000 km
conclusions
Conclusions
  • Applications can fully utilize the network
  • All problems have a single symptom
    • All problems must be found and fixed before things get better
    • Some people stop investigating before finding all problems
  • Tools exist, and more are being developed, to make it easier to find problems
outline92
Outline
  • Why there is a problem
  • What can be done to find/fix problems
  • Tools you can use
  • Ramblings on what’s next
introduction
Introduction
  • Where have we been and where are we headed?
    • Technology and hardware
    • Transport Protocols
basic assumption
Basic Assumption
  • The Internet was designed to improve communications between people
what does the future hold
What does the future hold?
  • Moore’s Law shows no signs of slowing down
    • The original law says the number of transistors on a chip doubles every 18 months
    • Now it simply means that everything gets faster
pc hardware
PC Hardware
  • CPU processing power (flops) is increasing
  • Front/back side bus clock rate is increasing
  • Memory size is increasing
  • HD size is increasing too
    • For the past 10 years, every HD I’ve purchased cost $130
scientific workstation
Scientific Workstation
  • PC or Sparc class computer
    • Fast CPU
    • 1 GB RAM
    • 1 TB disk
    • 10 Gbps NIC
  • Today’s cost ~ $5,000
network capability
Network Capability
  • LAN networks (includes campus)
  • MAN/RON network
  • WAN network
  • Remember the 80/20 rule
network nic costs
Network NIC costs
  • 10 Mbps NICs were $50 - $150 circa 1985
  • 100 Mbps NICS were $50 - $150 circa 1995
  • 1,000 Mbps NICS are $50 - $150 circa 2005
  • 10 Gbps NICs are $1,500 - $2,500 today
  • Note today 10/100/1000 cards are common and 10/100 cards are < $10
ethernet switches
Ethernet Switches
  • Unmanaged 5 port 10/100 switch ~ $25.00
  • Unmanaged 5 port 10/100/1000 switch ~ $50
  • Managed switches have more ports and are more expensive ($150 - $400 per port)
network infrastructure101
Network Infrastructure
  • Campus
  • Regional
  • National
  • International
campus infrastructure
Campus Infrastructure
  • Consists of switches, routers, and cables
  • Limited funds make it hard to upgrade
regional infrastructure
Regional Infrastructure
  • Many states have optical networks
    • Illinois has I-Wire
  • Metro area optical gear is ‘reasonably’ priced
  • Move by some to own fiber
  • Flexible way to cut operating costs, but requires larger up-front investment
national infrastructure
National Infrastructure
  • Commercial vendors have pulled fiber to major metro areas
  • NLR – n x 10 Gbps
  • Abilene - 1 x 10 Gbps (Qwest core)
  • FedNets - (DoE, DoD, and NASA all run national networks)
  • CA*net – n x 10 Gbps
  • Almost 500 Gbps into SC|05 conference in Seattle
international infrastructure
International Infrastructure
  • Multiple trans-atlantic 10 Gbps links
  • Multiple trans-pacific 10 Gbps links
  • Gloriad
interesting sidebar
Interesting sidebar
  • China’s demand for copper, aluminum, and steel have caused an increase in theft
    • Man hole covers
    • Street lamps
    • Parking meters
    • Phone cable
  • One possible solution is to replace copper wires with FTTH solutions
transport protocol
Transport Protocol
  • TCP Reno has know problems with loss at high speeds
    • Linear growth following packet loss
    • No memory of past achievements
  • TCP research groups are actively working on solutions:
    • HighSpeed-TCP, Scaleable-TCP, Hamilton-TCP, BIC, CUBIC, FAST, UDT, Westwood+
    • Linux (2.6.13) has run-time support for these stacks
what drives prices
What drives prices?
  • Electronic component prices are driven by units produces
    • Try buying a brand NEW i386 CPU
    • Try upgrading your PC’s CPU
    • NIC’s are no different