Chapter 3 Transport Layer

Chapter 3Transport Layer Part 4: Congestion control Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith RossAddison-Wesley, April 2009. Transport Layer

Our goals: understand principles behind transport layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control learn about transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport TCP congestion control Chapter 3: Transport Layer Transport Layer

3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6Principles of congestion control 3.7 TCP congestion control Chapter 3 outline Transport Layer

Congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Principles of Congestion Control Transport Layer

two senders, two receivers one router, infinite buffers no retransmission large delays when congested maximum achievable throughput lout lin : original data unlimited shared output link buffers But storing pakets takes time Host A Host B Host A and Host B’s each transmit at an average rate of λin bytes/sec (e.g. 50MBs) Causes/costs of congestion: scenario 1 Receiver takes in at a rate of λout bytes/sec Link capacity is C Result of sharing link between two Hosts When λin> C/2 the average number of queued packets in the router is unbounded and the average delay between source and destination becomes infinite Throughput Transport Layer

one router, finite buffers (drop packets when exceed buffer) sender retransmits lost packets (reliable data trans) Causes/costs of congestion: scenario 2 Host A lout lin : original data l'in : original data, plus retransmitted data (offered load) Host B finite shared output link buffers Transport Layer

Case a. always: (goodput) l = l R/2 in R/2 R/2 out R/3 lout lout lout R/4 R/2 R/2 R/2 lin lin lin a. b. c. Causes/costs of congestion: scenario 2 host A magically knows when buffer in router is free so there is no loss so l’in= lin “costs” of congestion: • more work (retrans) for given “goodput” • unneeded retransmissions: link carries multiple copies of pkt Transport Layer

Case b: “perfect” retransmission: only retransmit when we know there is a loss: l > l R/2 in R/2 R/2 out R/3 lout lout lout R/4 R/2 R/2 R/2 lin lin lin a. b. c. Causes/costs of congestion: scenario 2 On average out of 0.5R units sent, 0.166R bytes/sec are retrans and 0.333R are original Amount of new data sent “costs” of congestion: • more work (retrans) for given “goodput” Transport Layer

retransmission of delayed (not lost) packet makes larger (than perfect case) for same l l R/2 in R/2 R/2 out R/3 lout lout lout R/4 R/2 R/2 R/2 lin lin lin a. b. c. Causes/costs of congestion: scenario 2 May time-out prematurely and retrans a packet that is delayed in the queue Receiver will receive multiple copies of some packets Assuming (for no good reason) that on average each packet is forwarded twice “costs” of congestion: • more work (retrans) for given “goodput” • unneeded retransmissions: link carries multiple copies of pkt Transport Layer

four senders multihop paths timeout/retransmit l l in in Host A Host D All Hosts transmit at an average rate of λin bytes/sec (e.g. 50MBs) Causes/costs of congestion: scenario 3 Q:what happens as and increase ? Host B lout lin : original data l'in : original data, plus retransmitted data finite shared output link buffers All routers have capacity R bytes/se R 1 Host C R 4 R 2 R 3 Transport Layer

l l l l l out out in in in Low traffic • Consider connection A to C that goes through routers R1 and R2 • A-C shares router R1 with D-B • A-C shares router R2 with B-D • Extremely small values of • Buffer overflows rare • Throughput approx equals offered load • i.e., = • With slightly larger values of increases the same amt since buffer overflows are still rare Transport Layer

l in Large traffic ‘ • A - C that goes through routers R1 and R2 • A-C shares router R2 with B-D • Will get extremely large values of from B-D • This will overflow R2 • segments from A to R1 to R2 will tend to get discarded. • Result: throughput from A to C will go towards 0 • See next slide Transport Layer

end-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP network-assisted congestion control: routers provide feedback to end systems Method 1: single bit indicating congestion (SNA, DECbit, TCP/IP ECN (proposed), ATM) Method 2: router gives explicit rate sender should transmit Approaches towards congestion control two broad approaches towards congestion control: Transport Layer

ABR: available bit rate: “elastic service” if sender’s path “underloaded”: sender should use available bandwidth if sender’s path congested: sender throttled to minimum guaranteed rate RM (resource management) cells: sent by sender, interspersed with data cells (~1 every 32 data cells) bits in RM cell set by switches (“network-assisted”) NI bit: no increase in rate (mild congestion) CI bit: congestion indication ER setting: 2-byte explicit rate field. RM cells returned to sender by receiver, with bits intact Case study: ATM ABR congestion control Transport Layer

two-byte ER (explicit rate) field in RM cell congested switch may lower ER value in cell sender’ send rate thus maximum supportable rate on path EFCI bit in data cells: set to 1 in congested switch if data cell preceding RM cell has EFCI set, sender sets CI bit in returned RM cell Case study: ATM* ABR congestion control * EFCI: explicit forward congestion indication * ATM (asynchronous transfer mode) is a dedicated-connection switching technology that organizes digital data into 53-byte cells Transport Layer

3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Chapter 3 outline Transport Layer

TCP congestion control: • TCP must use end-to-end congestion control; IP will not help! • Each sender limits the rate at which it sends data into its connection as a function of perceived network congestion. • Q: how does TCP limit the rate at which it sends traffic? • Q: how does a TCP sender perceive that there is congestion on the path between itself and the destination? • Q: what algorithm should the sender use to change its send rate as a function of perceived end-to-end congestion? Transport Layer

TCP congestion control: Q: how does TCP limit the rate at which it sends traffic? • Limiting rate at which it sends traffic • Each side of TCP connection has receive buffer, send buffer, variables (LastByteRead, rwnd, etc.) • TCP congestion control keeps another variable, the congestion window (cwnd): LastByteSent – LastByteAcked <= min{cwnd,rwnd} Transport Layer

TCP congestion control: • Limiting rate at which it sends traffic LastByteSent – LastByteAcked <= min{cwnd,rwnd} • To simplify, assume receive buffer is so large that rwnd can be ignored. • Also assume that sender always has data to send • Assume loss and packet delays are negligible • At beginning or RTT, sender sends cwnd bytes • At end of RTT, sender gets ACKs • So sender’s send rate is cwnd/RTT bytes/sec Transport Layer

TCP congestion control: • Q: how does a TCP sender perceive that there is congestion on the path between itself and the destination? • goal: TCP sender should transmit as fast as possible, but without congesting network • Q: how to find rate just below congestion level • decentralized: each TCP sender sets its own rate, based on implicit feedback: • ACK: segment received (a good thing!), network not congested, so increase sending rate • lost segment: assume loss due to congested network, so decrease sending rate Transport Layer

loss, so decrease rate X TCP congestion control: bandwidth probing • “probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate • continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, so increase rate X X X TCP’s “sawtooth” behavior X sending rate time • Q: how fast to increase/decrease? • details to follow Transport Layer

TCP congestion control: • TCP congestion-control algorithm • Slow start (required in TCP) • Congestion avoidance (required in TCP) • Fast recovery (recommended in TCP) Transport Layer

sender limits rate by limiting number of unACKed bytes “in pipeline”: cwnd: differs from rwnd(how, why?) cwnd: congestion window in bytes sender limited bymin(cwnd,rwnd) roughly, cwndis dynamic, function of perceived network congestion ACK(s) cwnd rate = bytes/sec RTT TCP Congestion Control: details LastByteSent-LastByteAcked  cwnd cwnd bytes RTT Transport Layer

segment loss event: reducing cwnd timeout: no response from receiver cut cwnd to 1 MSS (max segment size) 3 duplicate ACKs: at least some segments getting through (recall fast retransmit) cut cwnd in half, less aggressively than on timeout TCP Congestion Control: more details ACK received: increase cwnd • slowstart phase: • increase exponentially fast (despite name) at connection start, or following timeout • congestion avoidance: • increase linearly Transport Layer

when connection begins, cwnd = 1 MSS (max segment size) example: MSS = 500 bytes & RTT = 200 msec initial rate = 25 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate increase rate exponentially until first loss event or when threshold reached double cwnd every RTT done by incrementing cwnd by 1 MSS for every ACK received time TCP Slow Start cwnd becomes 2 MSS Host A Host B one segment RTT cwnd = 3 MSS two segments four segments cwnd = 4 MSS Transport Layer

ssthresh:cwnd threshold maintained by TCP on loss event: Set cwnd to 1 Also set ssthresh to cwnd/2 We want to avoid congestion, so we will use ssthresh as the new limit when cwnd >= ssthresh: transition from slowstart to congestion avoidance phase new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 cwnd > ssthresh timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment slow start congestion avoidance timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment duplicate ACK dupACKcount++ Transitioning into/out of slowstart L Transport Layer

TCP: congestion avoidance • when cwnd > ssthresh grow cwnd linearly • increase cwnd by 1 MSS per RTT • approach possible congestion slower than in slowstart • implementation: cwnd = cwnd + MSS/cwnd for each ACK received Transport Layer

TCP: congestion avoidance • When does this increase stop? • when timeout: • ssthresh is updated to half the value of cwnd • cwnd set to 1 MSS • Same behavior as for slow start • Example • Increase cwnd by MSS/cwnd whenever a new ACK arrives • If MSS is 1,460 bytes and cwnd is 14,600 bytes • Then 10 segments sent within an RTT • Each arriving ACK increases the congestion window size by 1/10 • When all 10 segments ACKed, cwnd increased by 1 MSS Transport Layer

TCP: congestion avoidance • When does this increase stop? • when receive 3 duplicate ACKs: • ssthresh is updated to half the value of cwnd • cwnd set to half cwnd (+ 3 MSS to account for ACK’s received) • Then enter fast recovery state Transport Layer

TCP: Fast Recovery • cwnd increased by 1 MSS for every duplicate ACK received for the missing segment that caused TCP to enter the fast-recovery state. • When ACK arrives for missing segment • Deflate cwnd • Transition to congestion-avoidence state • If timeout occurs • Cwnd set to 1 MSS • sthresh set to half original cwnd • Transition to slow-start state Transport Layer

loss: timeout loss: timeout loss: timeout cwnd > ssthresh slow start congestion avoidance fast recovery TCP congestion control FSM: overview If fast recovery is not implemented (i.e., TCP version Tahoe) loss: 3dupACK If fast recovery is implemented (TCP version Reno) loss: 3dupACK new ACK loss: 3dupACK Transport Layer

new ACK . cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed new ACK L cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s),as allowed cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 cwnd > ssthresh timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment slow start congestion avoidance timeout dupACKcount == 3 dupACKcount == 3 ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment duplicate ACK duplicate ACK ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment dupACKcount++ dupACKcount++ fast recovery New ACK duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed cwnd = ssthresh dupACKcount = 0 TCP congestion control FSM: details L Transport Layer

Popular “flavors” of TCP TCP Reno ssthresh cwnd window size (in segments) ssthresh TCP Tahoe Transmission round Transport Layer

Popular “flavors” of TCP Triple duplicate ACK event 12 MSS TCP Reno (newest version of TCP) Congest Avoid Fast Recov ssthresh cwnd window size (in segments) Congest Avoid 6 MSS Slow start ssthresh TCP Tahoe Slow start 1 MSS Transmission round Triple duplicate ACK event Transport Layer

Summary: TCP Congestion Control • when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. • when cwnd >= ssthresh, sender is in congestion-avoidance phase, window grows linearly. • when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh • when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS. Transport Layer

loss, so decrease rate X AIMD Retrospective • ACKs: increase cwnd by 1 MSS per RTT: additive increase • loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease “sawtooth” effect AIMD: Additive Increase Multiplicative Decrease ACKs being received, so increase rate X X X TCP’s “sawtooth” behavior X sending rate time Transport Layer

TCP throughput • Q: what’s average throughout of TCP as function of window size, RTT? • ignoring slow start • let W be window size when loss occurs. • when window is W, throughput is W/RTT • just after loss, window drops to W/2, throughput to W/2RTT. • average throughout: .75 W/RTT Transport Layer

TCP Futures: TCP over “long, fat pipes” • TCP was invented for SMTP, FTP, Telnet not HTTP and streaming video! • example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput • requires window size W = 83,333 in-flight segments • throughput in terms of loss rate: • ➜ L = 2·10-10 Wow that’s one loss event for 5,000,000,000 segments! • new versions of TCP for high-speed environments being researched! Transport Layer

fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 bottleneck router capacity R TCP connection 2 TCP Fairness Bottleneck link: assume that all other links that are used have plenty of bandwidth Assume no UDP Transport Layer

Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally Why is TCP fair? Two TCP connections Share link with trans rate of R Same MSS and RTT No other connections or UDP Ignore slow-start equal bandwidth share R Goal! Full bandwith utilization line loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 2 throughput loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Transport Layer

Fairness in practice If ideal connections from previous slide are not met, may get unequal bandwidth Fairness and UDP multimedia apps often do not use TCP do not want rate throttled by congestion control instead use UDP: pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections nothing prevents app from opening parallel connections between 2 hosts. web browsers do this example: link of rate R supporting 9 connections; new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 ! Fairness (more) Transport Layer

principles behind transport layer services: multiplexing, demultiplexing reliable data transfer flow control congestion control instantiation and implementation in the Internet UDP TCP Next: leaving the network “edge” (application, transport layers) into the network “core” Chapter 3: Summary Transport Layer

Chapter 3 Transport Layer