Transport Protocols (continued) Relates to Lab 5. UDP and TCP
TCP: Flow Control Congestion Control
What is Flow/Congestion Control ? • Flow Control: Algorithms to prevent that the sender overruns the receiver buffer • Congestion Control: Algorithms to prevent that the sender overloads the network Sliding window implements both control mechanisms.
Congestion Control TCP connection 1 bottleneck router capacity R
TCP Flow Control TCP uses the sliding window algorithm to implement flow control
Sliding Window Flow Control Sliding Window Protocol is performed at the byte level: Here: Sender can transmit sequence numbers 6,7,8.
Sliding Window: “Window Opens” Acknowledgement is received that enlarges the window to the right (AckNo = 5, Win=6): 1 2 3 4 5 6 7 8 9 10 11 AckNo = 5, Win = 6 is received 1 2 3 4 5 6 7 8 9 10 11 A receiver opens a window when TCP buffer empties (meaning that data is delivered to the application).
Window Management in TCP • The receiver is returning two parameters to the sender • The interpretation is: • I am ready to receive new data with SeqNo= AckNo, AckNo+1, …., AckNo+Win-1 • Receiver can acknowledge data without opening the window • Receiver can change the window size without acknowledging data
TCP Congestion Control • Keep a sender from congesting the network.
TCP Congestion Control • Keep a sender from congesting the network. • The sender has two internal parameters: • Congestion Window (cwnd) • Slow-start threshhold Value (ssthresh) • Sliding window size is set to the minimum of (cwnd, receiver advertised win) • Congestion control works in two modes: • slow start (cwnd < ssthresh) • Probe the available bandwidth • congestion avoidance (cwnd >= ssthresh) • Try not to overload the network.
Slow Start • Initial value: Set cwnd = 1 • Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size) • Modern TCP implementation may set initial cwnd to 2 • Each time an ACK is received by the sender, the congestion window is increased by 1 segment: cwnd = cwnd + 1 • If an ACK acknowledges two segments, cwnd is still increased by only 1 segment. • Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1 maximum segment size. • Question: how can you accelerate your TCP download?
Slow Start Example • Not really slow: The congestion window size grows very rapidly • For every ACK, we increase cwnd by 1 irrespective of the number of segments ACK’ed • TCP slows down the increase of cwnd when cwnd > ssthresh
Congestion Avoidance • Congestion avoidance phase is started if cwnd has reached the slow-start threshold value • If cwnd >= ssthresh then each time an ACK is received, increment cwnd as follows: • cwnd = cwnd + 1/ cwnd • So cwnd is increased by one only if all cwnd segments have been acknowledged.
Example of Slow Start/Congestion Avoidance Assume that ssthresh = 8 ssthresh Cwnd (in segments) Roundtrip times
Responses to Congestion • TCP uses packet loss as congestion signal • A TCP sender can detect lost packets via: • Receipt of a duplicate ACK • Timeout of a retransmission timer
Response to Timeout • TCP interprets a Timeout as a severe congestion signal. When a timeout occurs, the sender performs: • cwnd is reset to one: cwnd = 1 • ssthresh is set to half of the current size of the congestion window: ssthressh = cwnd / 2 • and slow-start is entered
Reaction to Duplicate ACKs • Fast retransmit • Three duplicate ACKs indicate a packet loss • Retransmit without timeout • Fast recovery • Avoid slow start • Retransmit “lost packet” • ssthresh = cwnd/2 • cwnd = cwnd+3 • Increment cwnd by one for each additional duplicate ACK • When ACK arrives that acknowledges “new data” set: cwnd=ssthresh enter congestion avoidance
Flavors of TCP Congestion Control • TCP Tahoe (1988, FreeBSD 4.3 Tahoe) • Slow Start • Congestion Avoidance • Fast Retransmit • TCP Reno (1990, FreeBSD 4.3 Reno) • Fast Recovery • Modern TCP implementation • New Reno (1996) • SACK (1996)
TCP Tahoe This picture is copied from somewhere
TCP Reno (Jacobson 1990) This picture is copied from somewhere SS CA Fast retransmission/fast recovery
Retransmissions in TCP • A TCP sender retransmits a segment when it assumes that the segment has been lost: • No ACK has been received and a timeout occurs • Multiple ACKs have been received for the same segment
Retransmission Timer • TCP sender maintains one retransmission timer for each connection • When the timer reaches the retransmission timeout (RTO) value, the sender retransmits the first segment that has not been acknowledged • The timer is started when • When a packet with payload is transmitted and timer is not running • When an ACK arrives that acknowledges new data, • When a segment is retransmitted • The timer is stopped when • All segments are acknowledged
How to set the timer • Retransmission Timer: • The setting of the retransmission timer is crucial for good performance of TCP • Timeout value too small results in unnecessary retransmissions • Timeout value too large long waiting time before a retransmission can be issued • A problem is that the delays in the network are not fixed • Therefore, the retransmission timers must be adaptive
Setting the value of RTO: • The RTO value is set based on round-trip time (RTT) measurements that each TCP performs Each TCP connection measures the time difference between the transmission of a segment and the receipt of the corresponding ACK There is only one measurement ongoing at any time (i.e., measurements do not overlap) Figure on the right shows three RTT measurements
Setting the RTO value • RTO is calculated based on the RTT measurements • Uses an exponential moving average to estimate RTT (srtt) and variance of RTT (rttvar) from • The influence of past samples decrease exponentially • The RTT measurements are smoothed by the following estimators srtt and rttvar: srttn+1 = a RTT + (1- a ) srttn rttvarn+1 = b ( | RTT - srttn | ) + (1- b ) rttvarn RTOn+1 = srttn+1 + 4 rttvarn+1 • The gains are set toa =1/4 and b =1/8
Setting the RTO value (cont’d) • Initial value for RTO: • Sender should set the initial value of RTO to RTO0 = 3 seconds • RTO calculation after first RTT measurements arrived srtt1 = RTT rttvar1 = RTT / 2 RTO1 = srtt1 + 4 rttvarn+1 • When a timeout occurs , the RTO value is doubled RTOn+1 = max ( 2 RTOn, 64) seconds This is called an exponential backoff
Karn’s Algorithm If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. RTT measurements is ambiguous in this case Karn’s Algorithm: • Don’t update RTT on any segments that have been retransmitted
Summary • UDP: connectionless, unreliable, datagram service • TCP: reliable, connection-oriented, byte stream service • TCP header • Connection management • Delayed ACKs and nagle’s algorithm • TCP flow control • TCP congestion control • TCP retransmission and timeout • References • TCP/IP illustrated vol. 1, chapter11, 17-24 • RFC793 (Transmission Control Protocol) • RFC768 (User Datagram Protocol) • RFC2581 (TCP Congestion control) • RFC2988 (Computing TCP’s Retransmission Timer) • RFC3390 (Increasing TCP’s Initial Window)
Lab 7 Problems • Exercise 3(B), Part 4-5 • TCP Path MTU Discovery • How does it works? • TCP uses the minimum of the MSS announced by the 2 hosts in connection establishment • All TCP segment has the DF bit set in IP header • If a router along the path has a smaller MTU than the TCP segment size, it returns an “ICMP unreachable” error report with its MTU, e.g. MTUr • TCP set the MSS to (MTUr - 40),and try again. • The process continues until no “ICMP unreachable” error message is received.
SYN, MSS=1460 SYN, ACK, MSS=1460 ACK PUSH, LEN=1460 ICMP Unreachable, MTU=500 PUSH, LEN=460 ACK TCP Path MTU Discovery Example MTU 1500 MTU 1500 MTU 500 MTU 1500 10.0.1.11/24 10.0.1.33/24 10.0.2.33/24 10.0.2.22/24 PC3/Router PC2 PC1 ttcp –rs –1024 –n2 –p4444 ttcp –ts –1024 –n2 –p4444 –D 10.0.2.22
Exercise 6A: TCP Bulk Data transfer (fast link) PC1 PC2 eth1 eth1 10.0.5.11/24 10.0.5.22/24 ttcp –ts –l1000 –n500 –p4444 –D 10.0.5.22 ttcp –rs –l1000 –n500 –p4444 Step 4(5): Determine whether or not the TCP sender generally transmits the maximum amount of data allowed by the advertised window. Explain your answer.
Exercise 6B: TCP Bulk Data transfer (slow link) PC1 PC2 eth0 eth0 10.0.2.22/24 10.0.1.11/24 Router 1 Router 2 ttcp –ts –l1000 –n500 –p4444 –D 10.0.2.22 ttcp –rs –l1000 –n500 –p4444 Step 4(4): Does TCP sender generally transmits the maximum amount of data allowed by the advertised window? Explain your answer.
Trouble Shooting Well, something must be wrong. Let’s figure it out. I did everything by following the instructions, but it does not work and I can’t ping other hosts/routers. Why?
Trouble Shooting Techniques • The key to trouble shooting is to isolate the problem, i.e. what is the problem and where it happened in the network. • What is the output of the ping command? • “Network unreachable” • Check local routing table • “PING x.x.x.x (x.x.x.x) 56(84) bytes of data”, and it hangs there • run “traceroutex.x.x.x” to see where the ICMP Echo Request packet failed.