TCP Variations:Tahoe, Reno, New Reno, Vegas, Sack Paul D. Amer, Professor Computer & Information Sciences University of Delaware
What are TCP Variations? • Implementations of TCP that use different algorithms to achieve end-to-end congestion control. • Tahoe • Reno • NewReno • Vegas
Evolution of TCP 1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1990 4.3BSD Reno fast recovery delayed ACK’s 1987 Karn’s algorithm to better estimate round-trip time 1975 Three-way handshake Ray Tomlinson In SIGCOMM 75 1988 Van Jacobson’s algorithms slow start, congestion avoidance, fast retransmit (all implemented in 4.3BSD Tahoe) SIGCOMM 88 1983 BSD Unix 4.2 supports TCP/IP 1986 Congestion collapse 1st observed 1974 TCP described by Vint Cerf, Bob Kahn In IEEE Trans Comm 1981 TCP & IP RFC 793 & 791 1990 1975 1980 1985
Evolution of TCP 1996 NewReno modified fast recovery SACK TCP Selective Ack (Floyd et al) 1994 T/TCP Transaction TCP (Braden) 1994 ECN Explicit Congestion Notification (Floyd) 1993 TCP Vegas(not implemented) real congestion avoidance (Brakmo et al) 1996 FACK TCP Forward Ack extension to SACK (Mathis et al) 1996 Improving TCP startup (Hoe) 1994 1993 1996
How did TCP cause Congestion? (original recipe TCP) • Poor Efficiency • In telnet-like applications, TCP sends 1 byte of data with 4000% overhead. • Sending too much, too soon • Unnecessary retransmits • Send window too big • Very little change in behavior due to congestion
Pr Pb Sender Receiver Ab As Ar Self-clocking or ACK Clock • Self-clocking systems tend to be very stable under a wide range of bandwidths and delays. • The principal issue with self-clocking systems is getting them started.
TCP Algorithms: • Four intertwined algorithms used commonly in TCP implementations • Slow Start - Every ack increases the sender’s window (cwnd) size by 1 • Congestion Avoidance - Reducing sender’s window size by half at experience of loss, and increase the sender’s window at the rate of about one packet per RTT (NOTE: not per ack) • Fast Retransmit - Don’t wait for retransmit timer to go off, retransmit packet if 3 duplicate acks received • Fast Recovery - Since duplicate ack came through, one packet has left the wire. Perform congestion avoidance, don’t jump down to slow start
Packet Ack TCP Sender TCP Receiver pkt 0 ack 0 pkts 1,2 acks 1,2 pkts3,4,5,6 Slow Start t=0r cwnd=1 t=1r cwnd=2 new cwnd = old cwnd + # acks recd t=2r cwnd=4 t=3r cwnd=8
Packet Ack TCP Sender TCP Receiver Congestion Avoidance After cwnd reaches a certain threshold t=xr cwnd=4 t=(x+1)r cwnd=5 new cwnd = old cwnd + # acks/cwnd t=(x+2)r cwnd=6 t=(x+3)r cwnd=7
R S cwnd = 1 Example: Slow Start/Congestion Avoidance cwnd = 2 assume ssthresh = 8*MSS cwnd = 4 Eight TCP-PDUs cnwd = 8 ssthresh Eight ACKs nine TCP-PDUs cwnd = 9 nineACKs ten TCP-PDUs cwnd = 10 ten ACKs cwnd = 11
Slow Start & Congestion Avoidance • Initally: • - cwnd = 1*MSS • - ssthresh = very high • If a new ACK comes: • - if cwnd < ssthresh update cwnd according to slow start • if cwnd > ssthresh update cwnd according to congestion avoidance • If cwnd = ssthresh either • If timeout (i.e. loss) : • - ssthresh = flight size/2; • - cwnd = 1*MSS (initial) ssthresh cwnd Loss, e.g. timeout ssthresh time slow start – in green congestion avoidance – in blue
Packet Ack TCP Sender TCP Receiver pkt 15 pkt 16 pkt 17 pkt 18 pkt 19 pkt 20 ack 15 ack 16 ack 16 3 dup acks ack 16 ack 16 pkt 17 (retransmit) Fast Retransmit At a random point in the transfer t=xr t=(x+1)r Fast Retransmit
segment 1 cwnd = 1 TCP Tahoe’s Fast Retransmit S R • Sender receives 3 dupACKS. • Sender infers that the segment is lost. • Sender re-sends the segment immediately! • Sender returns to slow-start. ACK 1 cwnd = 2 segment 2 segment 3 ACK 2 ACK 3 cwnd = 4 segment 4 segment 5 segment 6 segment 7 ACK 3 3 duplicate ACKs ACK 3 ACK 3 segment 4 fast-retransmit of segment 4
TCP Variants : • TCP-Tahoe: • implements the slow start, congestion avoidance, and fast retransmit algorithms • TCP-Reno: • implements the slow start, congestion avoidance, fast retransmit, and fast recovery algorithms • Among other implementations are Vegas,NewReno (the most commonly implemented on webservers today, according to a survey) and SACK TCP.
TCP Tahoe Trace(with one dropped segment) Lost segment Fast Retransmit Begin congestion avoidance Begin slow-start
Could Tahoe be Improved? • Receipt of dupACKs tells the sender that the receiver is still getting new segments, i.e. there is still data flowing between sender and receiver • Why does sender go back to slow start after fast retransmit? • Why does sender let Ack clock die?
TCP Variation: TCP Reno • 2nd Improvement was TCP Reno (1990) • Nagle’s algorithm • Improved RTO calculation and back-off • AIMD congestion avoidance with slow-start • Fast retransmit & fast recovery
Fast Recovery Concept: • After fast retransmit, reduce cwnd by half, and continue sending segments at this reduced level. Problems: • Sender has too many outstanding segments. • How does sender transmit packets on a dupACK? Need to use a “trick” - inflate cwnd. cwnd (initial) ssthresh fast-retransmit fast-retransmit timeout new ACK new ACK Time Slow Start Congestion Avoidance “inflating” cwnd with dupACKs “deflating” cwnd with a new ACK
Fast Retransmit & Fast Recovery • After receiving 3 dupACKS: • Retransmit the lost segment. • Set ssthresh = flight size/2. • Set cwnd = ssthresh, and ndupacks = 3. N.B. In Reno: send_win = min ( rwnd, cwnd + ndupacks ). • If dupACK arrives: • ++ ndupacks • Transmit new segment, if allowed. • If new ACK arrives: • ndupacks = 0 • Exit fast recovery. • If RTO expires: • ndupacks = 0 • Perform slow-start - ( ssthresh = flight size/2, cwnd = 1 )
What if There are MultipleLosses in a Window? • With two losses in a window, Reno will occasionally timeout. • With three losses in a window, Reno will usually timeout. • With four losses in a window, Reno is guaranteed to timeout! • With three or more losses in a window, Tahoe typically out performs Reno!
TCP Variation:TCP NewReno • 3rd Improvement was TCP NewReno (1995) • Nagle’s algorithm • Improved RTO calculation and back-off • AIMD congestion avoidance with slow-start • Fast retransmit & modified fast recovery
Modifications to Fast Recovery • Partial ACKs: An ACK that acknowledges some but not all the segments that were outstanding at the start of fast recovery. NewReno interprets this as an indication of multiple loss. • If partial ACK received, re-transmit the next lost segment immediately and set ndupacks = 0 (deflate send_win). • Sender remains in fast recovery until all data outstanding when fast recovery was initiated is ACK’ed. Additional dupACK’s increase ndupacks.
Is There a Better Way? • The only way Tahoe, Reno and NewReno can detect congestion is by creating congestion! • They carefully probe for congestion by slowly increasing their sending rate. • When they find (create), congestion, they cut sending rate at least in half! • This slow advance and rapid retreat approach results in a saw-toothed sending rate, and highly erratic throughput. • What if TCP could detect congestion without causing congestion?
TCP Variation: TVP Vegas(True Congestion Avoidance) • Introduced by Brakmo and Peterson (1994) • Three changes to TCP Reno • Modified congestion avoidance • Don’t wait for a timeout, if actual throughput < expected throughput decrease the congestion window. (AIAD!) • New retransmission mechanism • motivation:what if sender never receives 3-dupACKs (due to lost segments or window size is too small.) • mechanism: sender does retransmission after a dupACK received, if RTT estimate > timeout. • Modified slow start • motivation: sender tries finding correct window size without causing a loss.
Vegas vs. NewReno TCP NewReno throughput with simulated background traffic TCP Vegas throughput with simulated background traffic Source: Brakmo and Peterson, TCP Vegas: End to End Congestion Avoidance on a Global Internet, IEEE JSAC, Vol 13, No. 8, Oct. 1995, pp. 1465 – 1480
What Variations are being used? • Experimental results obtained by testing 3728 web servers: • NewReno 42% • Tahoe 27% (w/o Fast Retransmit) • Reno 18% • Other 8% • Tahoe 5% Source: Padhye and Floyd, SIGCOMM ‘01, August 27-31, San Diego, CA
When entering slow start, if connection is new, ssthresh = arbitrarily large value cwnd = 1. else, ssthresh = max(flight size/2, 2*MSS) cwnd = 1. In slow start ++cwnd on new ACK When entering either fast recovery or modified fast recovery, ssthresh = max(flight size/2, 2*MSS) cwnd = ssthresh. In congestion avoidance cwnd += 1*MSS per RTT Summary of TCP Behavior
TCP without SACK • TCP uses cumulative ACKs • Receiver identifies the last byte of data successfully received • Out of order segments are not ACKed • Receiver sends duplicate ACKs • TCP without SACK forces the TCP sender • Either to wait an RTT to find out a segment was lost • Or, unnecessarily retransmit data that has been correctly received • Can result in reduced overall throughput
TCP with Selective Ack (SACK) • SACK + Selective Repeat Retransmission Policy allows • receiver informs sender about all segments that are successfully received. • sender fast retransmits only the missing data segments • SACK is implemented using two TCP Options • SACK-Permitted Option • SACK Option
1 - 100 1-100 101-200 101 - 200 ACK 201 201-300 301-400 501 - 600 401 - 500 ACK 201 SACK 401-601 1-100 101-200 401-500 501-600 SACK Example receiver’s buffer sender receiver
SACK Rules • With SACKs, the ACK field is still a cum ACK • A SACK cannot be sent unless the SACK-Permitted option has been received (in the SYN) • The 1st SACK block MUST specify the contiguous block of data containing the segment which triggered this acknowledgment • If SACKs are sent, SACK option should be included in all ACK’s which do not ACK the highest sequence number in the data receiver’s queue
Generating SACKs – data receiver behavior • If the data receiver has not received a SACK-Permitted Option for a given connection, the receiver must not send SACK options on that connection • The receiver should send an ACK for every valid segment that arrives containing new data • The data receiver should include as many distinct SACK blocks as possible in the SACK option • SACK option should be filled out by repeating the most recently reported SACK blocks • The data receiver provides the sender with the most up-to-date info about the state of the network and the receiver’s queue
Interpreting SACKs - Data Sender behavior • The sender records the SACK for future reference • Maintains a retransmission queue containing unacknowledged segments • One possible implementation • Turns on SACK bit for the segment in retransmission queue when it receives a SACK • Skips SACKed data during any later fast retransmission • On fast retransmit, retransmits data not SACKed so far and less than the highest SACKed data • Turns off SACK bit after retransmission time out
100-299 1099 699 ACK300 100 299 300 500 500 699 300-499 500-699 900-1099 ACK300, SACK 500-700 700-899 1100-1299 300 900 ACK300, SACK 900-1100, 500-700 Another SACK Example Receiver Buffer sender receiver
1099 1099 1099 ACK1100 300 900 500 500 500 300-499 700-899 900 300 700 699 699 900 300 ACK700, SACK 900-1100 1100-1299 1100 Another SACK Example (cont’d) sender receiver
200-599 300-399 100-199 100-199 200-299 500-599 300-399 400-499 sender receiver ACK600 ACK600 ACK200 ACK200 ACK200 ACK200 ACK200 sender receiver 400-499 ACK200, SACK 300-400 500-599 ACK200, SACK 300-500 200-299 200-299 ACK200, SACK 300-600 fast retransmit fast retransmit Without SACK vs. With SACK TCP with SACK TCP without SACK
SACK Observations • SACK TCP follows standard TCP congestion control; Adding SACK to TCP does not change the basic underlying congestion control algorithms • SACK TCP has major advantages when compared TCP Tahoe, Reno, Vegas and New Reno, as PDUs have been provided with additional information due to the SACK • Difference in behavior when multiple packets are dropped from one window of data • SACK information allows the sender to better decide what to retransmit and what not to
Data Receiver Reneging Reneging – fail to fulfill a promise or obligation • Data receiver is permitted to discard data in its queue that has not been acknowledged to the data sender, even if the data has already been SACKed • Such discarding of SACKed segments is discouraged, but may occur if the receiver must give buffer space back to the OS • If reneging occurs • first SACK should reflect the newest segment even if its going to be discarded • Except for the newest segment, all SACK blocks MUST NOT report any old data which is no longer actually held by the receiver
100 199 100-199 200 200 200 399 200-299 ACK200 300-399 reneg occurs; window decreases 200 400-499 599 ACK 200; SACK 500-600 ACK 200; SACK 300-400 window increases 500-599 300 500 Reneging Example sender receiver
Consequences of Reneging • Sender must maintain normal TCP timeouts • Data cannot be considered “communicated” until a cum ACK is sent • Sender must retransmit the data at the left window edge after a retransmit timeout, even if that data has been SACKed by the receiver • Sender MUST NOT discard data before being acked by the Cum Ack
TCP header length 1 6 options kind=4 length=2 kind=1 kind=1 SYN bit SACK-permitted NOP NOP SACK-Permitted Option • Sack–Permitted option • is allowed only in a SYN Segment. • indicates sender handles SACKs, and receiver should send SACKs if possible. • SACK option can be used once connection is established TCP Header Source port address Destination port address Sequence Number Cumulative Ack No. Window size Checksum Urgent pointer
Source port address Destination port address Sequence Number Cumulative Ack No. Window size Checksum Urgent pointer kind=1 kind=1 Source port address Destination port address Sequence Number Source port address Destination port address Cumulative Ack No. Sequence Number SYN “SACK-permitted” Window size Cumulative Ack No. TCP connection establishment phase Checksum Urgent pointer Window size 1 SYN/ACK options kind=1 kind=1 Checksum Urgent pointer “SACK-permitted” options ACK kind=1 kind=1 NOP NOP NOP NOP SYN bit 1 1 1 data transfer phase cum ack and optional SACKs kind=4 kind=4 length=2 length=2 ACK bit SYN bit ACK bit SACK-permitted SACK-permitted SACK-Permitted Option and SACK RECEIVER SENDER
SACK Option • Length of SACK with n blocks? Source port address Destination port address = (2 + 8 * n) bytes Sequence Number Cumulative Ack No. • Max number bytes available for TCP Options? HLEN Window size Checksum Urgent pointer = 40 bytes Kind=1 Kind=5 Length=?? Kind=1 Left edge of 1st block • Max number of SACK blocks possible? Right edge of 1st block = 4 SACK blocks (barring no other TCP Options) Left edge of nth block Right edge of nth block