Evolution of TCP: Tutorial Part III - Intelligent Network Laboratory @ KUT

TCP Tutorial - Part III- Laboratory of Intelligent Network @ KUT (http://link.kut.ac.kr) Youn-Hee Han It is licensed under a Creative Commons Attribution 2.5 License

Evolution of TCP KUT

Evolution of TCP 1984 Nagel’s algorithm to reduce overhead of small packets; predicts congestion collapse 1990 4.3BSD Reno fast recovery delayed ACK’s 1987 Karn’s algorithm to better estimate round-trip time 1975 Three-way handshake Ray Tomlinson In SIGCOMM 75 1988 Van Jacobson’s algorithms slow start, congestion avoidance, fast retransmit (all implemented in 4.3BSD Tahoe) SIGCOMM 88 1983 BSD Unix 4.2 supports TCP/IP 1986 Congestion collapse 1st observed 1974 TCP described by Vint Cerf, Bob Kahn In IEEE Trans Comm 1981 TCP & IP RFC 793 & 791 1990 1975 1980 1985 The slides are provided by Bradley Durandetta, Dept. of CIS, Univ. of Delaware. KUT

Evolution of TCP 1996 NewReno modified fast recovery SACK TCP Selective Ack (Floyd et al) 1994 T/TCP Transaction TCP (Braden) 1994 ECN Explicit Congestion Notification (Floyd) 1993 TCP Vegas(not implemented) real congestion avoidance (Brakmo et al) 1996 FACK TCP Forward Ack extension to SACK (Mathis et al) 1996 Improving TCP startup (Hoe) 1994 1993 1996 The slides are provided by Bradley Durandetta, Dept. of CIS, Univ. of Delaware. KUT

TCP Tahoe • TCP Tahoe • RFC 2001, Jan. 1997 • Unconditionally (either type of loss event) cuts its cwin to 1 MSS and enter slow start phase • Slow Start, Congestion Avoidance, Fast Retransmit • Slow Start (Exponential growth of cwnd) • Each RTT : cwnd  2 * cwnd • Congestion Avoidance (Linear growth of cwnd) • Each RTT : cwnd  cwnd + 1 • Fast Retransmit • Retransmits after 3 dupACKs without waiting for timeout window SS : slow start CA : congestion avoidance time SS CA SS CA KUT

TCP Tahoe Trace(with one dropped segment) Lost segment Detect the packet loss Begin congestion avoidance Begin slow-start KUT

TCP Reno • TCP Reno • developed by Van Jacobson in 1990 • standardized as RFC 2001, Jan. 1997 Van Jacobson in January 2006 Van Jacobson joined PARC as a research fellow in August 2006. He previously served as chief scientist at Packet Design LLC[1].Prior to that, he was chief scientist at Cisco Systems and group leader for the Network Research Group at Lawrence Berkeley Laboratory. KUT

TCP Reno • TCP Reno • Slow start, Congestion Avoidance, Fast Retransmit + Fast Recovery • Basic ideas • Fast recovery avoid slow start • 3 dupACKs  fast retransmit + fast recovery • Timeout  slow start KUT

TCP Reno • Fast Retransmit • when the sender receives 3 “duplicative ACKs”, i.e. 4 identical ACKs without the arrival of any other intervening packets, • Then, it assumes that the segment was lost, • Sender retransmits the segment and moves to Fast Recovery phase (instead of moving to Slow Start phase) • Fast Recovery • the sender decreases cwnd the half of its original size, adds 3 (3 packets have left the network and buffered by the receiver) • It continues to send new segments (if allowed by the cwnd value) until receiving new different ACK, which should acknowledge receiving all segments sent till moving to Fast Recovery phase (assuming that no more segments were lost). • For each additional duplicated ACK received, increment cwnd by 1 KUT

TCP Reno 0 1 Ack(1) • TCP Reno Example 2 3 Ack(1) Flight Size =No. of Unacknowledged segments 4 Ack(1) 5 Ack(1) 6 Ack(1) Ack(1) cwnd=8 7 8 Ack(1) Ack(1) • Fast Retransmit • Fast Recovery • - cwnd=8/2+3=7 • - ssthresh=8/2=4 1 cwnd=8 Ack(9) cwnd=9 9 Ack(10) cwnd=10 10 cwnd=11 11 • Exit Fast Recovery • Congestion Avoidance • - cwnd=ssthresh=4 12 KUT

TCP Reno Trace(with one dropped segment) 500000 48 Sent Segment 44 ACK'ed Segment cwnd 480000 40 ssthresh cwnd in Fast Recovery 36 460000 32 Detect the packet loss by 3 DupACKs 28 MSS 440000 24 Sequence # 20 420000 16 12 400000 8 4 380000 0 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 KUT Time (s)

TCP Tahoe & Reno Trace(with one dropped segment) KUT

TCP Newreno • TCP Newreno • RFC 2582, April 1999 • It enables to the algorithm to manage a loss of more than one packetwithout changing the TCP message structure • Another improvement (TCP SACK – RFC 2018) enables to cope with a loss of more than one packet by changing message structure (using TCP options) KUT

TCP Newreno • Limitation of TCP Reno • If cwnd size is too small (smaller than 4 packets) then it’s not possible to get 3 duplicate acks and run the algorithm • TCP Renocan not managea loss of multiple packets from a single window of data • It will cause a use of retransmission time out • TCP Reno doesn’t manage a loss of packets during the Fast Recovery stage • What if There are Multiple Losses in a Window? • With two losses in a window, Reno will occasionally timeout. • With three losses in a window, Reno will usually timeout. • With four losses in a window, Reno is guaranteed to timeout! • With three or more losses in a window, Tahoe typically out performs Reno! KUT

TCP Newreno 0 • Limitation of TCP Reno 1 Ack(1) 2 Flight Size =No. of Unacknowledged segments 3 Ack(1) 4 5 Ack(1) 6 Ack(1) cwnd=8 7 8 Ack(1) Ack(1) • Fast Retransmit • Fast Recovery • - cwnd=8/2+3=7 • - ssthresh=8/2=4 1 Ack(3) cwnd=8 cwnd=9 9 Ack(3) • Exit Fast Recovery • Congestion Avoidance • - cwnd=ssthresh=4 Flight Size(=7) > cwnd => No new segments KUT Segment #3 will be time-out

TCP Newreno TCP Reno Trace(with two dropped segments) 500000 48 Sent Segment 44 ACK'ed Segment cwnd 480000 40 ssthresh cwnd in Fast Recovery 36 460000 32 28 MSS 440000 24 Sequence # 20 420000 16 12 400000 8 4 380000 0 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 KUT Time (s)

TCP Newreno • Idea: • The sender remembers a number of the last segment that was sent before entering the Fast Retransmit phase • The sender deals with a situation when a “new” ACK (which is not DupAcks) does not cover the last remembered segment (“partial ACK”) • The sender consider the partial ACK as a situation when more packets were lost before entering the Fast Retransmit. • After discovering such situation the sender will retransmit the new lost packet too and will stay at the Fast Recovery stage • The sender will finish the Fast Recovery stage when it will get ACK that covers last segment sent before the Fast Retransmit KUT

TCP Newreno • Modifications to Fast Recovery • Partial ACKs • An ACK that acknowledges some but not all the segments that were sent before the start of fast recovery. • NewReno interprets “Partial ACK” as an indication of multiple loss. • If partial ACK received, re-transmit the next lost segment immediately and deflate cwnd as follows: • cwnd = cwnd – the amount of new acknowledged data + 1 • Sender remains in fast recovery until all data outstanding sent before fast recovery are ACK’ed. • In this fast recovery phase, an additional dupACK’s increase cwnd by 1*MSS KUT

TCP NewReno Trace(with two dropped segments) 500000 48 Sent Segment 44 ACK'ed Segment cwnd 480000 40 ssthresh cwnd in fast recovery 36 460000 32 28 MSS 440000 24 Sequence # 20 420000 16 7= 16 – 10 + 1 12 400000 8 4 380000 0 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 KUT Time(s)

Tahoe, Reno & NewReno Trace(with two dropped segments) KUT

Is There a Better Way? • The only way Tahoe, Reno and NewReno can detect congestion is by creating congestion! • They carefully probe for congestion by slowly increasing their sending rate. • When they find (create) congestion, they cut sending rate at least in half! (basically AIMD) • This slow advance and rapid retreat approach results in a saw-toothed sending rate and highly erratic throughput. • What if TCP could detect congestion without causing congestion? KUT

Is There a Better Way? Congestion window (cwnd) Avg. source send rate Buffer space at bottleneck router In shaded region the cogestion widow increases. We expect throughput to increase but it cannot increase beyond available bandwidth. Any increase in the window size only results in packets taking up buffer space at bottleneck router KUT

TCP Vegas • Introduced by Brakmo and Peterson (1994) • L. Brakmo, S. O'Malley, and L. Peterson. TCP Vegas: New techniques for congestion detection and avoidance. In Proceedings of the SIGCOMM '94 Symposium (Aug. 1994) pages 24-35. • L. Brakmo and L. Peterson. TCP Vegas: End to End Congestion Avoidance on a Global Internet. IEEE Journal on Selected Areas in Communication, Vol 13, No. 8 (October 1995) pages 1465-1480. • TCP Vegas vs. TCP (New)reno • TCP Vegas provides a proactive response to congestion • TCP (New)reno is reactive with respect to congestion Lawrence S. Brakmo @ The University of Arizona KUT

cwnd cwnd Duplicate ACK  CA Timeout  SS cwndcwnd/2 + 3 cwnd 1 SS CA time time SS CA SS CA FR TCP Vegas TCP Tahoe vs. TCP (New)reno vs. TCP Vegas cwnd Duplicate ACK  CA Timeout  SS TCP Tahoe cwnd1 cwnd 1 time SS SS CA SS CA CA TCP (New)reno TCP Vegas KUT

TCP Vegas • Modified congestion avoidance • source watches for some sign that router’s queue is building up and congestion will happen too • Sets congestion window size based on the difference between the expected and actual data rates • Define… • cwnd: current congestion window size • rtt*: the minimum of all measured round-trip time • It is commonly the RTT of the first segment sent by the connection • Expected Throughput = cwnd/rtt* • rtt: actual (with congestion) round-trip time • Actual Throughput = cwnd/rtt • Diff = Expected Throughput – Actual Throughput • Note that Diff is always positive (Diff>0) • Diff < 0 means that we need to change rtt* KUT

TCP Vegas • Modified congestion avoidance • Estimated flight size  •  = (cwnd/rtt* – cwnd/rtt)  rtt* •  : low threshold for  (want  >  ) •  : high threshold for  (want  <  ) • TCP Vegas attempts to control cwnd as follows: • If  < , increase cwnd by 1 * MSS • If  > , decrease cwnd by 1 * MSS • Otherwise ( ), cwnd is not changed. • The overall goal is to keep between  and  extra bytes in the network. KUT

TCP Vegas • Example • Assumption • A connection with rtt* of 100ms • MSS = 1KBytes • cwnd = 3KBytes • Expected Throughput = 3KBytes/100ms = 30KBytes/s • Actual Sampling • rtt=300ms • Actual Throughput = 3KBytes/300ms = 10KBytes/s • Estimated flight size  = (30 – 10) KBytes/s * 0.1s = 2 KBytes  2 segments • Cwnd Control Example • If =3KBytes (3 segments) and =5KBytes (5 segments), <  • Result: cwnd = cwnd + 1 * MSS = 3 KBytes + 1 KBytes = 4 Kbytes • If =1KBytes (1 segments) and =3KBytes (3 segments),     • Result: No change • If =1KBytes (1 segments) and =1.5KBytes (1.5 segments), <  • Result: cwnd = cwnd - 1 * MSS = 3 KBytes - 1 KBytes = 2 Kbytes KUT

TCP Vegas • Alpha & Beta • Initially, sets  to one and  to three. • =1 means… • at least one segment will be left on the buffer of the bottleneck router. • Then, when the aggregate traffic from the other connections decrease, our connection can take advantage of the extra available bandwidth immediately without having to wait for the one RTT delay necessary for the linear increase to occur. • - =2 means… • Small occasional changes in the available bandwidth will not create oscillations in the window size. • The - region provides a damping effects KUT

congestion window Throughput TCP Vegas Consideration =1 and =3 Black line = actual rate Green line = expected rate Shaded = region between a and b The goal is to keep the ActualRate between these two thresholds, that is, within the shaded region. Whenever ActualRate falls below shaded region (i.e., gets too far from Expected Rate), TCP Vegas decreases the congestion window because it fears that too many packets are being buffered in the network. Whenever ActualRate goes above the shaded region (i.e., gets too close to the Expected Rate), TCP Vegas increases the congestion window because it fears that it is underutilizing the network. KUT 33

TCP Vegas • TCP Vegas generally outperforms TCP Reno in a homogeneous environment • TCP Vegas achieves between 40% and 70% better throughput • TCP Vegas has 20% to 50% of the losses compared to the TCP Reno TCP Reno TCP Vegas KUT

TCP Selective Ack • TCP SACK • RFC 2018, Oct. 1996 • Selective Acknowledgment (SACK) allows… • the receiver to inform the sender about all segments that have been successfully received. • the sender to retransmit only those segments that have been lost. sender receiver 5000-5499 ACK 5500 5500-5999 6000-6499 6500-6999 ACK 5500; SACK=6000-6500 ACK 5500; SACK=6000-7000 KUT

TCP Selective Ack • TCP SACK-related Options • SACK-Permitted Option: from sender to receiver • SACK Option: from receiver to sender • TCP SACK Rules • A SACK cannot be sent unless the SACK-permitted option has been received (in the SYN). • The first SACK block must contain the most recently received segment that is to be SACKed. • The second block must contain the second most recently received segment that is to be SACKed, and so forth. • Notice this can result in some data in the receiver’s buffers which should be SACKed but is not (if there are more segments to SACK than available space in the TCP header). KUT

TCP Selective Ack • SACK Example (assuming Tahoe & long timeout) sender 5000-5499 receiver 5500-5999 ACK 5500 6000-6499 6500-6999 ACK 5500; SACK=6000-6500 7000-7499 7500-7999 ACK 5500; SACK=7000-7500, 6000-6500 8000-8499 8500-8999 ACK 5500; SACK=8000-8500, 7000-7500, 6000-6500 Assuming 3 maximum blocks 9000-9499 ACK 5500; SACK=9000-9500, 8000-8500, 7000-7500 KUT

TCP Selective Ack • What should the Sender do? • The sender must keep a buffer of unacknowledged data. • When it receives a SACK option, it should turn on a SACK-flag bit for all segments in the transmit buffer that are wholly contained within one of the SACK blocks. • After this SACK flag bit has been turned on, the sender should skip that segment during any later retransmission. • Rough Example • Assume packets 5-25 are transmitted • Let packets 5, 12, and 18 be lost • Receiver sends back a CACK=5, and SACK=(6-11,13-17,19-25) • Sender knows that packets 5, 12, and 18 are lost and retransmits them immediately KUT

TCP Selective Ack • SACK TCP at the Sender Example sender receiver 5000-5499 5500-5999 6000-6499 6500-6999 7000-7499 ACK 5500; SACK=6000-6500 ACK 5500; SACK=6000-7500 ACK 5500; SACK=6000-7000 SENDERTIMEOUT 5500-5999 7000-7499 KUT

TCP Selective Ack • SACK TCP Observation • SACK TCP follows standard TCP congestion control; it should not damage the network. • This information allows the sender to better decide what it needs to retransmit and what it does not. • This can only serve to help the sender, and should not adversely affect other TCPs (Tahoe, Reno, NewReno, Vegas). • While it is still possible for a SACK TCP to needlessly retransmit segments, the number of these retransmissions has been shown to be quite low in simulations, relative to Reno and Tahoe TCP. KUT

TCP Selective Ack • SACK TCP Implementation • Windows 2000 • Windows 98 / Windows ME • Solaris 7 and later • Linux kernel 2.1.90 and later • FreeBSD and NetBSD have optional modules • ACIRI has measured the behavior of 2278 random web servers that claim to be SACK-enabled. • Out of these, 2133 (93.6%) appeared to ignore SACK data and only 145 (6.4%) appeared to actually use the SACK data. • why is SACK deployment so much slower than RenoandNew-Reno? KUT

Evolution of TCP: Tutorial Part III - Intelligent Network Laboratory @ KUT

Evolution of TCP: Tutorial Part III - Intelligent Network Laboratory @ KUT

Presentation Transcript

TCP - Part II

TCP - Part II

TCP (Part 1)

TCP - Part III

MacSim Tutorial Part-III

TCP - Part I

TCP - Part I

TCP Tutorial - Part IV -

TCP Tutorial - Part I -

TCP (Part II)

APA Style Tutorial- Part III: Referencing

TCP - Part II

TCP Tutorial - Part II -

EZNEC Tutorial – Part III

Tutorial: TCP 101

TCP Tutorial - Part I -

TCP - Part II

TCP - Part I

TCP - Part I

TCP - Part II

TCP - Part II