Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
TCP Part III Error Control Congestion Control Timers
Background on ARQ Error Control 1 • Two types of errors: • Lost packets • Damaged packets • Error control schemes that involve error detection and retransmission of lost or corrupted frames are referred to as Automatic Repeat reQuest (ARQ) error control • Most Error Control techniques are based on: 1. Error Detection Scheme (Parity checks, CRC). 2. Retransmission Scheme.
Background on ARQ Error Control 2 • All retransmission schemes use all or a subset of the following procedures: • Positive acknowledgments (ACK) • Negative acknowledgment (NACK) • Selective acknowledgment (SACK) • All retransmission schemes (using ACK, NACK, SACK or all) rely on the use of timers • The most common ARQ retransmission schemes are: Stop-and-Wait ARQ Go-Back-N ARQ Selective Repeat ARQ
Error Control in TCP • TCP maintains multiple timers for each connection • TCP couples error control and congestion control (I.e., it assumes that errors are caused by congestion)
TCP Timers • TCP maintains multiple timers: • Retransmission Timer: • The timer is started during a transmission. A timeout causes a retransmission • Persist Timer • Ensures that window size information is transmitted even if no data is transmitted • Keepalive Timer • Detects crashes on the other end of the connection • Other timers • Delayed ACK timer, timeout of connection setup, abort timeout (total timeout - keeps retransmitting till this timeout, then it kills the connection), 2MSL timeout (when closing connection)
TCP Retransmission Timer • Retransmission Timer: • The setting of the retransmission timer is crucial for efficiency • Timeout value too small -> results in unnecessary retransmissions • Timeout value too large -> long waiting time before a retransmission can be issued • A problem is that the delays in the network are not fixed • Therefore, the retransmission timers must be adaptive
Measuring TCP Retransmission Timers Transfer file from aida to rigoletto Unplug Ethernet cable in the middle of file transfer
tcpdump Trace 10:42:01.704681 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:01.705603 aida.40001 > rigoletto.ftp-data: . 162649:164109(1460) ack 1 win 17520 10:42:01.706753 aida.40001 > rigoletto.ftp-data: . 164109:165569(1460) ack 1 win 17520 10:42:02.741764 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:05.741788 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:11.741828 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:23.741951 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:42:47.742176 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:43:35.742587 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:44:39.743140 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:45:43.743702 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:46:47.744271 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:47:51.752138 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:48:55.745547 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:49:59.746123 aida.40001 > rigoletto.ftp-data: . 161189:162649(1460) ack 1 win 17520 10:51:03.745839 aida.40001 > rigoletto.ftp-data: R 165569:165569(0) ack 1 win 17520
Interpreting the Measurements • The interval between retransmission attempts in seconds is: 1.03, 3, 6, 12, 24, 48, 64, 64, 64, 64, 64, 64, 64. • Time between retransmissions is doubled each time (Exponential Backoff Algorithm) • Timer is not increased beyond 64 seconds • TCP gives up after 13th attempt and 9 minutes (total timeout, tcp_ip_abort_interval is 2 mins in Solaris and can be programmed by administrator - 9 mins is the commonly used old timeout value)
TCP timers • First timeout occurs based on when timer was intialized. • This explains why the first timeout occurs at 1.03 sec and not 1.5. • If the base timer clock is 500 ms, the first timeout occurs after 3 timer ticks. This happens to occur at 1.03 sec after first segment was sent. Subsequent retransmissions occur at 3 sec, 6 sec, 12 sec, etc.
Adaptive mechanism • The retransmission mechanism of TCP is adaptive • The retransmission timers are set based on round-trip time (RTT) measurements that TCP performs • The RTT is based on time difference between segment transmission and ACK • But: • TCP does not ACK each segment • Can’t start a second RTT measurement if timing on one segment is in progress • Each connection has only one timer
Computation of RTO in adaptive scheme • Retransmission timer is set to a Retransmission Timeout (RTO) value. • RTO is calculated based on the RTT measurements. • The RTT measurements are smoothed by the following estimators A (mean RTT value) and D (smoothed mean deviation of RTT): Err = M - A A A+ g Err=A(1-g)+gM D D+ h (|Err|-D)=D(1-h)+ h|Err| RTO = A + 4D (latest formula) (book also says A+2D for initial value; we’ll use A+4D) The gains are set to h=1/4 and g=1/8 • In the formula for computing the new smoothed mean RTT A, 0.125 times the newly measured value (M) is added to 0.875 times the old smoothed value of A
Example of RTO computation (adaptive) • Assume A=1, D=1 (initial values) Err = 2 -1 =1 (since M, the measured RTT is 2) A = 1 + 0.125×1= 1.125; D = 1+0.25 (1-1)=1 RTO = A+4D=1.125+4 = 5.125 This is why in the figure below when segment 2 is lost, it is retransmitted after 5.125 sec.
Karn’s Algorithm • If an ACK for a retransmitted segment is received, the sender cannot tell if the ACK belongs to the original or the retransmission. • Karn’s Algorithm: • Don’t update A or D on any segments that have been retransmitted.
RTO Calculation: Example • At t1: RTO = 6 sec • At t2: RTO= 2 * 6 = 12 sec (exponential backoff) • At t3: RTO is not updated (Due to Karn’s algorithm)
Congestion control (Second topic of this lecture) • Most often, a packet loss in a network is due to an overflow at a congested router (rather than due to a transmission error) • A sender can detect lost packets through a: • Timeout of a retransmission timer • Receipt of a duplicate ACK • TCP assumes that a packet loss is caused by congestion and reduces the size of the sending window (cwnd) • Algorithms that reduce and then reopen the sending window as packets are lost: • Congestion Avoidance • Fast retransmit and Fast recovery
Recall Slow Start / Congestion Avoidance • Here we give a recap of the normal operation of Slow Start and Congestion Avoidance Ifcwnd <= ssthreshthen /* Slow Start Phase */Each time an ACK is received: cwnd = cwnd +segsize else /* cwnd > ssthresh */ /* Congestion Avoidance Phase */ Each time an ACK is received:cwnd = cwnd + segsize * segsize / cwnd + segsize / 8 endif
Congestion Avoidance Algorithm • When congestion occurs (indicated by timeout or receipt of duplicate ACK), • ssthresh is set to half the current window size (the minimum of the advertised window (AW) and cwnd): ssthresh = min(cwnd,AW) / 2 but at least 2 segments • cwnd is changed according to: cwnd = 1 segsize = 1 MSS bytes (in case of timeout only) • When new data is acknowledged,cwnd is increased according to whether it is in slow start or CA
Slow Start / Congestion Avoidance • A typical plot of cwnd for a TCP connection (segsize = 1500 bytes) :
Accelerated retransmissions (Fast retransmit) • TCP allows accelerated retransmissions (Fast Retransmit) • If receiver gets a segment out of order, it sends an ack with the expected sequence number. If sender receives one or two duplicate ACKs, it thinks segments are misordered. When expected segment is received at receiver, it sends the correct ACK. But if the third duplicate ACK is received at sender, it assumes lost segments and retransmits immediately without waiting for expiry of retransmission timer. Hence it is called fast retransmit.
Fast Retransmit and Fast Recovery • After the third duplicate ACK (meaning fourth ACK) is received by the sender, it transmits a single segment without waiting for a timeout to expire. • If 3rd duplicate ACK (this means fourth ACK with same ack no.) is received: ssthresh = min(cwnd, receiver’s advertised window)/2cwnd = ssthresh + 3 segsize; then retransmit segment Reason: TCP receiver has to issue an ACK every time it receives a new segment. Therefore when the sender receives 3 duplicate ACKs it implies that three segments got through the network successfully; Therefore it inflates the cwnd. • For each additional duplicate ACK received:cwnd = cwnd + segsizeand transmit a segment if allowed by new value of cwnd • When an ACK arrives that acknowledges new data set cwnd = ssthresh; (this should be the ACK for the retransmission from step 1); additionally, it will ack intermediate segments between lost packet and receipt of third duplicate ACK, so set cwnd = cwnd + segsize; now in CA phase
Example of slow start and congestion avoidance (MSS=512 bytes; advertised window =5120 bytes) • Normal operation Enter congestion avoidance
Example: computation of cwnd on previous slide • Upto and including ack 2561, this TCP connection is in slow start, and cwnd is increased by 1 MSS bytes each time an ACK is received. • Note that when cwnd = ssthresh, slow start is still applied. Hence when ack 2561 is received, cwnd = 2560+512 = 3072. • When the last ack shown on the previous slide is received, the TCP connection is in congestion avoidance since cwnd is > ssthresh. Therefore, cwnd = cwnd + MSS × MSS / cwnd + MSS / 8 = 3072 + 512 × 512/3072+512/8=3222
Example: RTO timeout (see congestion avoidance algorithm) • Example of a retransmit based on a timeout When segment is retransmitted, ssthresh is dropped to half of the minimum of the cwnd and advertised window. Since advertised window is 5120 bytes for this example, half of 3222 is 1611, but this is rounded down to the next multiple of the MSS (see page 316 for this rounding down concept).
Example: duplicate ACKs(congestion avoidance algorithm and fast retransmit/recovery algorithm) • In case of duplicate ACKs, both congestion avoidance algorithm and fast retransmit/recovery algorithms apply For reason for last cwnd increase to 2048, see last case in Fig. 21.11
Repacketization • When TCP does a retransmission, it can send the missing data in differently sized segments • Increase segment size (if allowed by MSS limit) to improve efficiency (new data arrives after first transmitted segment was lost)
Persist Timer in TCP • Assume the window size goes down to zero and the ACK that opens the window gets lost • If ACK (see figure) is lost, both sides are blocked. • Persist Timer: Forces the sender to periodically query the receiver about its window size (window probes)
Persist Timer • The persist timer is started by the sender when the sliding window is zero • Persist timer uses exponential backoff (initial value is 1.5 seconds), but it is bounded to the range [5 sec, 60sec] • So the time interval between timeouts are at: 5, 5, 6, 12, 24, 48, 60, 60, … • The first two are 5 because the first two timer values, 1.5 and 3, are both increased to be within bound [5, 60] • The window probe packet contains one byte of data • TCP allows sender to send one byte beyond close of receiver window • Persist timer never gives up (till connection gets aborted)
Keepalive Timer in TCP • When a TCP connection has been idle for a long time, a Keepalive timer reminds a station to check if the other side is still there. • A probe packet is sent if the connection has been idle for 2 hours • Assume a probe has been sent from A to B: (1) B is up and running: B responds with an ACK(2) B has crashed and is down:A will send 10 more probes, each 75 seconds apart. If A does not get a response, it will close the connection(3)B has rebooted:B will send a RST segment(4) B is up, but unreachable: Looks to A the same as (2)
TCP Summary • TCP Header - fields • TCP connection open/close (SYN/FIN) • Interactive TCP data transfer: • Delayed ACKs • Nagle’s algorithm
TCP Summary Contd. • Bulk TCP data transfer: • Flow control: sliding window (receiver paces sender) • Error control: time-outs and retransmissions • exponential backoff (in case of retransmits) • RTO changing adaptively to measured RTTs • Karn’s algorithm • Congestion control: congestion window (sender has window) • Slow start and congestion avoidance phases (normal operation) • Lost packets (timeout or duplicate ACKs) • congestion avoidance algorithm • fast retransmit and fast recovery algorithm • Because of the congestion recovery schemes, TCP’s ARQ scheme is Go-back-N if an error (loss) is detected by a retransmission time-out occurs but selective repeat if an error (loss) is detected by triple duplicate ACKs. • Repacketization • Persist and Keep-alive timers
Different schemes for determining RTO • Exponential backoff if a segment is retransmitted • adaptive RTO as a function of RTT (A+4D) • RTT measurement is in progress and a new segment sent then no RTT measurement is taken for new segment • Karn’s algorithm • no RTT measurement on retransmitted segment