CS514: Intermediate Course in Operating Systems

CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 4: Sept. 5

TCP Streams • One week ago we very briefly saw how TCP overcomes failures • TCP is the workhorse of the Internet • Under load, many routers drop all non-TCP traffic first! • This is because TCP is a “good citizen” • Every web operation uses its own TCP connection! • Today: look at TCP in more detail • In what sense is the Internet “the bottom half of the TCP protocol”?

TCP is a “stream” protocol • Basic concepts • Implementation issues, usual optimizations • Where are the costs? • Van Jacobson optimizations for TCP • Routers: RED, RSVP and RIO • Reliability and consistency

Streams concept • Reliable, point-to-point communication channel • Like a telephone connection: • Information received in order sent, no loss or duplication • Call setup required before communication is possible (in contrast with basic message transport via UDP) • No message structure: abstraction is a stream of bytes • Automatic flow control, error correction

TCP sliding window sender provides data window has k “segments” initially empty initially empty receiver consumes data

TCP sliding window sender provides data window has k “segments” mi IP packets carry segments still empty... receiver consumes data

TCP sliding window sender provides data window has k “segments” mi+1 mi IP packets carry segments receiver replies with acks and nacks. sender resends missing data mi receiver consumes data

TCP sliding window sender provides data window has k “segments” mi+kmi+k-1 .... mi IP packets carry segments receiver replies with acks and nacks. sender resends missing data - - mi+k-2 - mi+k-3 ... mi receiver consumes data

TCP sliding window When acknowledgement is received, segment number keeps incrementing but slot number is reused. sender provides data window has k “segments” mi+kmi+k-1 .... mi+k+1 IP packets carry segments receiver replies with acks and nacks. sender resends missing data - - mi+k-2 - mi+k-3 ... mi receiver consumes data

TCP sliding window sender provides data window has k “segments” mi+kmi+k-1 .... mi IP packets carry segments receiver replies with acks and nacks. sender resends missing data - - mi+k-2 - mi+k-3 ... mi receiver consumes data

Typical implementation issues? • When to send the ack • Send early: inefficient, channel clogged with acks • Send late: sender side fills window and waits • When to send the nack • Send early: sender will send duplicates of all msgs • Send late: long delay waiting for desired data • How big to make the window • Send messages in “bursts”?

Where are the costs? • Excess packets sent/received: very costly • Hence want minimal number of acks, nacks • Also want to avoid excess retransmissions • Notice “tension” between sending acks/nacks too soon, and retransmission too soon, and between doing so too late. • Too soon: consumes bandwidth • Too late: leaves processes idle

Costs (cont) • Delays on sender side: • Overheads associated with scheduling (e.g. if window fills up • Avoiding “nervous” scheduling: • Highwater/lowater mark scheme lets sender sit idle until there are several window slots free • Ideally, seek window size at which sender, receiver are rate matched and neither ever waits

Costs (cont) • Delays on receiver side • Want a large enough window so that any error correction is “in the future” for receiver • Don’t want to delay nacks too long (else retransmission delayed too long) • Nervous scheduling less of an issue here • Don’t use hiwater/lowater scheme in receiver

Timed approach • Measure round-trip time (e.g. perhaps 1ms) • Track rate of transmission for recent past • Use to calibrate various constants: • Nack if a missing packet is late by 50% of expected time • Calibrate window to be 50-75% full in steady state • Experience: very hard to make it work; variability in network load/latencies too big

Van Jacobson optimizations • Dynamically adjust window size: while no loss detected, repeatedly increase size (linearly) • Detect loss: halve size (“exponential” backoff) • Experience is very positive, many TCP’s use this • Also optimize to supress unchanging header fields

Dealing with failures • Packets lost, duplicated, out of order: easy, just use sequence numbers (TCP calls these “segment” numbers) • Sender or receiver fails, or line breaks: • After excessive retransmissions, or • After excessive wait for missing data, or • After not seeing “keepalives” for too long ... break the connection and report “end of file”

Problems with this approach? • Channel can break because of a transient condition! • Example: overloaded machine, connection that temporarily fails, router crashes and must reboot itself (all are relatively common conditions) • Systems with many TCP channels: some may break but others stay connected!

Inconsistently broken TCP channels primary backup Clients initially connected to primary, which keeps backup up to date. (For example, in a database system)

Inconsistently broken TCP channels primary backup Transient problem causes some links to break but not all. Backup thinks it is now primary, primary thinks backup is down

Inconsistently broken TCP channels primary backup Some clients still connected to primary, but one has switched to backup and one is completely disconnected from both

Why should this matter? • Suppose that primary and backup are a service used for air traffic control • Service tells controllers which parts of airspace are “available” for routing flights towards airport • Primary and backup may try and give different controllers access to the same airspace! Each thinks it is “in charge” for the system as a whole!

Subtle semantics questions • Are the “reliability semantics” of TCP actually different from those of RPC? • In both cases, what you “know” is limited to what has been explicitly acknowledged • Both can report “failures” when none has occured • Ultimately, TCP and RPC give same guarantees! • Many systems run RPC over TCP as the “reliable” RPC option. Is this different from normal RPC?

Reliability/Consistency summary • TCP connections can overcome loss of individual packets in communication layer • RPC protocols also overcome such loss • Both report failures inconsistently • Not clear how either could be used to implement a “safe” primary-backup server for our ATC example!

TCP and Router Issues Overload! Server

TCP and Router Issues • Designers of routers need to deal with overload • Very hard to predict! • Most studies show that load on routers is nearly random • At any point in time, most load is from some set of TCP connections • Goal: TCP flow control should kick in before router is totally overloaded

Some options? • We could just wait for the overload to go away • Eventually load will presumably drop • Or routes will adapt • But this could take a long time • Jahanian study: can take hours for route changes to propagate • Usually, however, routes adapt within a few minutes if better options exist

Some options? • The router could send some sort of “I’m getting overloaded” message back to the TCP sender • This could be done: TCP packets are recognizable by their IP headers • But it might be slow and when one router is overloaded, perhaps many are – potential for a storm of such messages • Also seems to violate end-to-end philosophy • Can we signal without extra msgs?

TCP and Router Issues Overload! Server Ouch! Slow down

Some options • Also, keep in mind that at any point in time, a router might be handling thousands of TCP connections! • The networks crowd calls them “flows” • So the router, faced with load, might have to send thousands of separate “ouch” messages!

Some options? • What about adding a bit to the TCP/IP header: “encountered an overloaded router” • Router would set the bit if it was overloaded • But during overload, packets often must be dropped • Also, the bit would be seen at the destination… not the sender

TCP and Router Issues Overload! Ouch! Server

Some options • Can we detect problems without extra messages? • Sender might notice a problem because of NACKs • Receiver could notice • Missing packets • Changing inter-packet spacing (assumes that TCP normally achieves a very regular spacing, which only happens under good conditions) • Problem is that by the time we notice these things, the router may be in deep doo-doo!

Random Early Detection • Work was done by Van Jacobson with Sally Floyd • They used a network simulator to understand how it would work. • You can use simulators too, for your projects • The best one is from Estrin’s group and is called NS-2 (widely used to evaluate network protocols) • Abbreviated as RED

Random Early Detection • Idea is very simple • Router senses that load is increasing • It simply notices that it has less available memory for buffering • This is because packets are entering faster than they can be forwarded • Picks a packet at random and discards it • Even though perhaps it could be forwarded • Takes “unreliability” to a new level! • E.g. “upper level of the bridge is crowded, so toss a few cars off the edge”

Random Early Detection • Receiver detects the loss and sends a NACK • The network isn’t completely overloaded yet so the NACK gets through • Sender chokes back • Often combined with flow control that senses changing inter-packet spacing

How Internet Companies think of the Network Layers 1-3

How Internet Companies think of the Network TCP Server Layer 4: “end-to-end” “ease off!” Layers 1-3

TCP is a good citizen • We view the Internet as the bottom half of the TCP protocol • And TCP is a good citizen that behaves itself • Chokes back as requested • An elegant dialog between the network and the protocol • Notice that it is entirely stateless • Cooperation between “network” protocols, not “distributed system”

TCP issues • We’ve seen that connections can break inappropriately • And now have seen that TCP can choke back because of congestion • What if we want to run audio or video over the Internet?

Styles of Audio/Video • Asynchronous • Play back a pre-recorded CD or a radio broadcast • Download a copy of a short news video • For these cases, we don’t have any “real time” requirements • Synchronous or real-time • More like a telephone conversation • Need the data with short latencies

TCP challenges • TCP works well for file transfer, fetching web pages, email… etc • The technology is not very good for any sort of real-time use • Telephone over the Internet • Media delivery that lasts a long time and can’t be transferred in advance, like a live broadcast • Also, not very robust against various forms of attack by intruders

Research on better TCP • One idea is to reserve resources • RSVP: Resource Reservation Protocol • Proposed by Floyd and others • Idea is to set aside resources needed for this TCP stream • What’s a resource? • Buffering space in routers • Guarantee of a percentage of bandwidth on the links out of routers

RSVP • How it works: • When making the TCP connection, user specifies desired quality of service (QoS) • A reservation request is sent to the destination • Hop by hop we set aside the needed resources • Called a “lease” • Upon successful traversal of the network, the TCP session can start • Now, each time a packet arrives, router must do flow classification

Keeping RSVP stateless • How can we avoid a form of shared state between clients and routers? • Leases are designed to vanish if not renewed • They have a timeout, perhaps 10s • Renewal benefits from QoS properties of the connection!

RSVP criticisms? • Doesn’t work well if network routes change dynamically or failures occur • Router slows down because • Flow classification is hard work • Needs enough resource for guarantees • Very hard to bill the user • Resources cost real money! • In this case, many ISPs participate in session: how to split costs? • ISPs may not want to disclose their route information!

RSVP: A dead standard? • Everyone knows the acronym • Corresponds to an IETF standard • But seems unlikely to be used • Core issue is cost • Number of reservations could rise with number of endpoints squared! • And most resource is mostly unused… • And the billing issue may sound silly, but not to ISPs • Road Runner is a typical ISP: Independent Service Provider (seller of Internet access)

How else can we get QoS? • One could argue that RSVP is not really and end-to-end solution • Problem is that routers have a form of shared state, even if only leased • Led to proposals by Clark and others at MIT for an end-to-end approximation with similar behavior • Called Diffsrv: Differential Quality of Service

Diffsrv idea • Basic idea is that reservation is tracked at entry to the network • Need a form of network service to figure out if reservation, theoretically, can be satisfied • Packets are marked “in profile” or “out of profile” • E.g I reserve 100kbits and am in profile if I send < 100kbits, out of profile if I exceed my reservation • Requires a single bit per packet

RIO: RED with I/O bits • Routers now implement RED but selectively drop out of profile (or unreserved) packets in preference to in-profile packets • In limit, router drops all packets except in-profile packets • Statistically should average out much as if real reservations were done… but…

CS514: Intermediate Course in Operating Systems