Transport layer
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Transport Layer PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Transport Layer. Michalis Faloutsos Many slides from Kurose-Ross. Transport Layer Functionality. Hide network from application layer Transport layer resides at end points Sees the network as a black box. Transport Layers of the Internet. TCP: reliable protocol

Download Presentation

Transport Layer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Transport layer

Transport Layer

Michalis Faloutsos

Many slides from Kurose-Ross

Transport layer functionality

Transport Layer Functionality

  • Hide network from application layer

  • Transport layer resides at end points

  • Sees the network as a black box

Transport layers of the internet

Transport Layers of the Internet

  • TCP: reliable protocol

    • Guarantees end-to-end delivery

    • Self-controls rate: congestion and flow control

    • Connection oriented: handshake, state

    • Ordered delivery of packets to application

  • UDP: unreliable protocol

    • Non-regulated sending rate

    • Multiplexing-demultiplexing

Tcp overview

TCP overview

Tcp what and how for more rfcs 793 1122 1323 2018 2581

full duplex data:

bi-directional data flow in same connection

MSS: maximum segment size


handshaking (exchange of control msgs) init’s sender, receiver state before data exchange

flow controlled:

sender will not overwhelm receiver


one sender, one receiver

reliable, in-order byte steam:

no “message boundaries”


TCP congestion and flow control set window size

send & receive buffers

TCP: What and How For more: RFCs: 793, 1122, 1323, 2018, 2581

Tcp segment structure

32 bits

source port #

dest port #

sequence number

acknowledgement number





rcvr window size








ptr urgent data

Options (variable length)



(variable length)

TCP segment structure

URG: urgent data

(generally not used)


by bytes

of data

(not segments!)



PSH: push data now

(generally not used)

# bytes

rcvr willing

to accept


connection estab

(setup, teardown




(as in UDP)

Tcp overview1

TCP overview

  • TCP is a sliding window protocol

    • Sender can have (Window) bytes in flight

  • Operates with cumulative ACKs

  • It includes control for the sending rate

    • Flow control: receiver-set sending rate

    • Congestion control: network-aware sending rate


Tcp seq s and acks

Seq. #’s:

byte stream “number” of first byte in segment’s data


seq # of next byte expected from other side

cumulative ACK

Q: how receiver handles out-of-order segments

A: TCP spec doesn’t say, - up to implementor


TCP seq. #’s and ACKs

Host B

Host A




Seq=42, ACK=79, data = ‘C’

host ACKs

receipt of

‘C’, echoes

back ‘C’

Seq=79, ACK=43, data = ‘C’

host ACKs


of echoed


Seq=43, ACK=80

simple telnet scenario

Tcp in a nutshell

TCP in a nutshell

I. Slow start phase (actually this is fast increase)

  • Start with a window of 1 (or 2)

  • Successful ACK: Increase window by one 1 max size segment

  • Do this up to a threshold: sshthresh

    II. Congestion control phase

  • Increase window by 1 max size segment every RTT

  • Drop window in half, if there is congestion

    • Packet loss: duplicate ACKs

    • Time expiration

Tcp congestion control

end-end control (no network assistance)

transmission rate limited by congestion window size, Congwin, over segments:

w * MSS

throughput =



TCP Congestion Control


w segments, each with MSS bytes sent in one RTT:

Tcp congestion control intuition

TCP is “probing” for usable bandwidth:

ideally: transmit as fast as possible (Congwin as large as possible) without loss

increaseCongwin until loss (congestion)

loss: decreaseCongwin, then begin probing (increasing) again

TCP congestion control: Intuition

Tcp congestion control1

TCP has two “phases”

slow start:

start from small, increase quickly

congestion avoidance:

Additive Increase Multiplicative Decrease

important variables:


threshold: defines threshold between two slow start phase, congestion control phase

TCP congestion control:

Tcp slowstart

exponential increase (per RTT) in window size

loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)

Slowstart algorithm


TCP Slowstart

Host A

Host B

one segment


initialize: Congwin = 1

for (each segment ACKed)


until (loss event OR

CongWin > threshold)

two segments

four segments

Why call it slow start

Why Call it Slow Start ?

  • The original version of TCP suggested that the sender transmit as much as the Advertised Window permitted.

  • Routers may not be able to cope with this “burst” of transmissions.

  • Slow start is slower than the above version -- ensures that a transmission burst does not happen at once.

Tcp congestion avoidance

TCP Congestion Avoidance

Congestion avoidance

/* slowstart is over */

/* Congwin > threshold */

Until (loss event) {

every w segments ACKed:



threshold = Congwin/2

Congwin = 1

perform slowstart


1: TCP Reno skips slowstart (fast

recovery) after three duplicate ACKs

Tcp congestion real life is hairy

Remember: bytes vs packets!


Thres = Max( 2* MSS,


MSS: max segment size

InFlighData: un-ACK-ed data

RFC 2581: TCP Congestion Control

TCP Congestion: Real Life is Hairy!

Congestion avoidance

/* slowstart is over */

/* Congwin > threshold */

Until (loss event) {

every w segments ACKed:



threshold = Congwin/2

Congwin = 1

perform slowstart


Transport layer

Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity

TCP congestion avoidance:

AIMD:additive increase, multiplicative decrease

increase window by 1 per RTT

decrease “window” by factor of 2 on loss event

TCP Fairness and AIMD

TCP connection 1



capacity R


connection 2

Why is tcp fair

Two competing sessions:

Additive increase gives slope of 1, as throughout increases

multiplicative decrease decreases throughput proportionally

Why is TCP fair?

equal bandwidth share


loss: decrease window by factor of 2

congestion avoidance: additive increase

Connection 2 throughput

loss: decrease window by factor of 2

congestion avoidance: additive increase

Connection 1 throughput


Macroscopic description of throughput

Macroscopic Description of Throughput

  • Assume window toggling: W/2 to W

  • High rate: W * MSS / RTT

  • Low rate: W * MSS / 2 RTT

  • Rate increase is linearly between two extremes

  • Average throughput:

    • 0.75 * W * MSS / RTT

Tcp reliable data transfer

TCP: reliable data transfer

event: data received

from application above

Simplified sender, assuming

create, send segment

  • one way data transfer

  • no flow, congestion control




event: timer timeout for

segment with seq # y




retransmit segment

event: ACK received,

with ACK # y

ACK processing

Tcp sender

TCP sender

00sendbase = initial_sequence number

01 nextseqnum = initial_sequence number


03 loop (forever) {

04 switch(event)

05 event: data received from application above

06 create TCP segment with sequence number nextseqnum

07 start timer for segment nextseqnum

08 pass segment to IP

09 nextseqnum = nextseqnum + length(data)

10 event: timer timeout for segment with sequence number y

11 retransmit segment with sequence number y

12 compute new timeout interval for segment y

13 restart timer for sequence number y

14 event: ACK received, with ACK field value of y

15 if (y > sendbase) { /* cumulative ACK of all data up to y */

16 cancel all timers for segments with sequence numbers < y

17 sendbase = y

18 }

19 else { /* a duplicate ACK for already ACKed segment */

20 increment number of duplicate ACKs received for y

21 if (number of duplicate ACKS received for y == 3) {

22 /* TCP fast retransmit */

23 resend segment with sequence number y

24 restart timer for segment y

25 }

26 } /* end of loop forever */




Tcp receiver ack generation rfc 1122 rfc 2581

TCP Receiver: ACK generation[RFC 1122, RFC 2581]

TCP Receiver action

delayed ACK. Wait up to 500ms

for next segment. If no next segment,

send ACK

immediately send single

cumulative ACK

send duplicate ACK, indicating seq. #

of next expected byte

immediate ACK if segment starts

at lower end of gap


in-order segment arrival,

no gaps,

everything else already ACKed

in-order segment arrival,

no gaps,

one delayed ACK pending

out-of-order segment arrival

higher-than-expect seq. #

gap detected

arrival of segment that

partially or completely fills gap

Tcp retransmission scenarios

Host A

Host B

Seq=92, 8 bytes data





Seq=92, 8 bytes data




lost ACK scenario

TCP: retransmission scenarios

Host A

Host B

Seq=92, 8 bytes data

Seq=100, 20 bytes data

Seq=92 timeout



Seq=100 timeout

Seq=92, 8 bytes data


premature timeout,

cumulative ACKs

Tcp round trip time and timeout

Q: how to set TCP timeout value?

longer than RTT

note: RTT will vary

too short: premature timeout

unnecessary retransmissions

too long: slow reaction to segment loss

Q: how to estimate RTT?

SampleRTT: measured time from segment transmission until ACK receipt

ignore retransmissions, cumulatively ACKed segments

SampleRTT will vary, want estimated RTT “smoother”

use several recent measurements, not just current SampleRTT

TCP Round Trip Time and Timeout

Tcp round trip time and timeout1

Setting the timeout

EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin

TCP Round Trip Time and Timeout

EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT

Exponential weighted moving average

influence of given sample decreases exponentially fast

typical value of x: 0.1

Timeout = EstimatedRTT + 4*Deviation

Deviation = (1-x)*Deviation +


A problem

A problem

  • When there are retransmissions, it is unclear if the ACK is for the original transmission or for a retransmission.

    • How do we overcome this ?

The karn patridge algorithm

The Karn Patridge Algorithm

  • Take SampleRTT measurements only for segments that have been sent once !

  • This eliminates the possibility that wrong RTT estimates are factored into the estimation.

  • Another change -- Each time TCP retransmits, it sets the next timeout to 2 X Last timeout --> This is called the Exponential Back-off (primarily for avoiding congestion).

Jacobson karels algorithm

Jacobson Karels Algorithm

  • An issue with the Karn/Patridge scheme is that it does not take into account the variation between RTT samples.

  • New method proposed -- the Jacobson Karels Algorithm.

  • Estimated RTT = Estimated RTT + d X Difference

    • Difference = Sample RTT - Estimated RTT

  • Deviation = Deviation + d (|Difference| - deviation)

  • Timeout = m Estimated RTT + f deviation.

  • The values of m and f are computed based on experience -- Typically m = 1 and f = 4.

Silly window syndrome

Silly Window Syndrome

  • Suppose a MSS worth of data is collected and advertised window is MSS/2.

  • What should the sender do ? -- transmit half full segments or wait to send a full MSS when window opens ?

  • Early implementations were aggressive -- transmit MSS/2.

  • Aggressively doing this, would consistently result in small segment sizes -- called the Silly Window Syndrome.


Issues ..

  • We cannot eliminate the possibility of small segments being sent.

  • However, we can introduce methods to coalesce small chunks.

    • Delaying ACKs -- receiver does not send ACKs as soon as it receives segments.

      • How long to delay ? Not very clear.

    • Ultimate solution falls to the sender -- when should I transmit ?

Nagle s algorithm

Nagle’s Algorithm

  • If sender waits too long --> bad for interactive connections.

  • If it does not wait long enough -- silly window syndrome.

  • How do we solve this?

  • Timer -- clock based

    • If both available data and Window ≥ MSS, send full segment.

    • Else, if there is unACKed data in flight, buffer new data until ACK returns.

    • Else, send new data now.

  • Note -- Socket interface allows some applications to turn off Nagle’s algorithm by setting the TCP-NODELAY option.

Tcp connection management

Recall:TCP sender, receiver establish “connection” before exchanging data segments

initialize TCP variables:

seq. #s

buffers, flow control info (e.g. RcvWindow)

client: connection initiator

Socket clientSocket = new Socket("hostname","port number");

server: contacted by client

Socket connectionSocket = welcomeSocket.accept();

TCP Connection Management

Tcp set up

TCP Set-up

Three way handshake:

Step 1:client end system sends TCP SYN control segment to server

  • specifies initial seq #

    Step 2:server end system receives SYN, replies with SYNACK control segment

  • ACKs received SYN

  • allocates buffers

  • specifies server-> receiver initial seq. #

    Step 3: Client replies with an ACK (using servers seq number)

Tcp connection management cont

Closing a connection:

client closes socket:clientSocket.close();

Step 1:client end system sends TCP FIN control segment to server

Step 2:server receives FIN, replies with ACK. Closes connection, sends FIN.

Last ACK is never ACK-ed!!









timed wait


TCP Connection Management (cont.)

Tcp connection management cont1

Step 3:client receives FIN, replies with ACK.

Enters “timed wait” - will respond with ACK to received FINs

Step 4:server, receives ACK. Connection closed. Sends FIN.

Last ACK is never ACK-ed

TCP Connection Management (cont.)









timed wait



  • Login