1 / 22

Anders Magnusson

TCP Tuning and E2E Performance. Anders Magnusson. TREFpunkt - October 20, 2004. The speed-of-light problem. The sender must store every sent packet until it has received an ACK from the receiver

arlais
Download Presentation

Anders Magnusson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TCP Tuning and E2E Performance Anders Magnusson TREFpunkt - October 20, 2004

  2. The speed-of-light problem • The sender must store every sent packet until it has received an ACK from the receiver • Due to the speed of light limitations this might take a while, even in small countries like Sweden • Theoretical RTT Luleå-Stockholm is (1000/300000)*2 = 6.7ms, in reality 20ms • TCP window size to keep up with 1Gbit/s must then be (1000/8)*.02 = 2.5Mbyte October 20, 2004

  3. Operating system buffers Inside the operating system kernel there are usually a bunch of different buffers affecting performance The term “buffers” is somewhat misleading, usually it is just some sort of data structure that is used to reference data in memory (but in theory it could as well be real buffers) October 20, 2004

  4. TCP window buffers • The TCP window sizes can be adjusted on virtually all operating systems • There are two windows, send and receive • The window size for one direction of flow is set to MIN(sender’s send window, receiver’s receive window) • The send window must be large enough to keep all segments sent during the RTT October 20, 2004

  5. Socket buffers • Limits the amount of data an application may write to the kernel before being blocked • Often combined with the TCP send window, when ACKs are received the socket buffer data is adjusted accordingly • Must be >= TCP window size to avoid limitations October 20, 2004

  6. MBUF clusters • There are limitations how many network buffers (in many OSes called MBUFs) that may be allocated • MBUFs may have external storage associated with them, allocated out of a separate (limited) area • These buffers are often allocated at compile time and it is not uncommon that physical memory is static allocated for them October 20, 2004

  7. Other knobs to turn RFC1323 • Turns on “Window scaling option” needed to use larger TCP windows than 64k Initial window size • Avoid slow-start by injecting many packets into the network at connection startup Interface queues • Be able to store the packets that are ready to send until the network interface can transmit them October 20, 2004

  8. Problems often seen Packet loss • On a long-distance high-speed connection, packet loss in a TCP flow will reduce the speed significantly • If the sender enters congestion avoidance, the congestion window will open linearly, and with large windows this will be really slow • With an RTT of 185ms and window size of 25MB it will take around 50 minutes to reach full speed October 20, 2004

  9. Problems often seen Packet bursts • During the startup of a TCP bulk flow, the exponential increase in packet injection into the network during slow-start may cause packet bursts on links with large bandwidth-delay product • The result may be that intermediate switches/routers must drop packets, even though the TCP self-clocking would not permit more packets to be sent than could be received October 20, 2004

  10. Problems often seen ACK/window updates • Traditional approach for bulk flows is for the receiver to send an ACK each second received packet • Window updates are sent as soon as data is delivered to the receiving process • This will cause the return traffic to be more than half the number of the transmitted packets • Interrupts, packet handling in the sending host may use a significant amount of CPU October 20, 2004

  11. Problems often seen ARP timeouts • When an ARP entry times out, it is usually just removed from the ARP cache, and the next packet will initiate a new ARP request • If there is an ongoing packet flow, this approach may cause packets to be dropped until an ARP reply is received October 20, 2004

  12. Tuning of NetBSD • sysctl -w net.inet.tcp.rfc1323=1 • Activate window scaling and timestamp options due to RFC1323. • sysctl -w kern.somaxkva=[sbmax] • Set maximum size for all socket buffers together in the system • sysctl -w kern.sbmax=[sbmax] • Set maximum size of socket buffer for one TCP flow • sysctl -w net.inet.tcp.recvspace=[wstd] • sysctl -w net.inet.tcp.sendspace=[wstd] • Set max size of TCP windows. • sysctl kern.mbuf.nmbclusters • View maximum number of mbuf clusters. Used for storage of data packets to/from the network interface. Can only be set by recompiling Your kernel. October 20, 2004

  13. Tuning of FreeBSD • sysctl net.inet.tcp.rfc1323=1 • Activate window scaling and timestamp options due to RFC1323. • sysctl ipc.maxsockbuf=[sbmax] • Set maximum size of TCP window. • sysctl net.inet.tcp.recvspace=[wstd] • sysctl net.inet.tcp.sendspace=[wstd] • Set max size of TCP windows. • sysctl kern.ipc.nmbclusters • View maximum number of mbuf clusters. Used for storage of data packets to/from the network interface. Can only be set att boot time. October 20, 2004

  14. Tuning of Linux • echo "1" > /proc/sys/net/ipv4/tcp_window_scaling • Activate window scaling according to RFC 1323 • echo [wmax] > /proc/sys/net/core/rmem_max • echo [wmax] > /proc/sys/net/core/wmem_max • Set maximum size of TCP windows. • echo [wmax] > /proc/sys/net/core/rmem_default • echo [wmax] > /proc/sys/net/core/wmem_default • Set default size of TCP windows. • echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_rmem • echo "[wmin] [wstd] [wmax]" > /proc/sys/net/ipv4/tcp_wmem • Set min, default, max windows. Used by the autotuning function. • echo "bmin bdef bmax" > /proc/sys/net/ipv4/tcp_mem • Set maximum total TCP buffer-space allocatable. Used by the autotuning function. October 20, 2004

  15. Tuning of Windows (2k, XP, 2k3) • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Tcp1323Opts=1 • Turn on window scaling option • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\TcpWindowSize =[wmax] • Set maximum size of TCP window October 20, 2004

  16. How to set a Land Speed Record • Recipe: • Really high-quality networks • Hardware capable of sending/receiving fast enough • Operating system without foolish bottlenecks • Enthusiasts that spend weekends sending an obscene amount of data between Luleå and San Jose  October 20, 2004

  17. SUNET Internet Land Speed Record - Network setup GigaSunet OC-192 core 10GE OC192 Sprintlink OC-192 core End host in Luleå, Sweden 10GE Network path consists of 42(!) router hops, using paths shared with other users of the networks. End host in San Jose, CA October 20, 2004

  18. Records submitted September 12 • 1 966 080 000 000 bytes in 3648 real seconds = 4310 Mbit/second • 1831 Gbytes in almost exactly an hour • 120 000 packets/second transferred with an MTU of 4470 bytes • Record submitted for the IPv4 single and multiple stream class is 124.935 Petabit-meters/second (which is a 78% increase of our previous record) October 20, 2004

  19. Compared with others Compared to the previous record, we can note thatwe achieved this, using • Less powerful end hosts • 200% longer distance • Less than half the MTU size (which generates heavier CPU-load on the end-hosts) • The normal GigaSunet and Sprintlink production infrastructures October 20, 2004

  20. Fiber path for the Internet LSR Distance from Luleå, Sweden to San Jose, CA is approximately 28,983 km (18,013 miles) October 20, 2004

  21. Network load October 20, 2004

  22. More to read… • http://proj.sunet.se/LSR • Describes how the Land Speed Record(s) were achieved • http://proj.sunet.se/E2E • About end-to-end performance in GigaSunet October 20, 2004

More Related