CS 620 Advanced Operating Systems

CS 620 Advanced Operating Systems Lecture 7 – Communication Professor Timothy Arndt BU 331

Layered Protocols • As we saw previously, network software is often structured as a layered protocol suite. We will now examine these protocols in somewhat more detail. • Protocol: An agreement between communicating parties on how communication is to proceed. • Error correction codes. • Blocksize. • Ack/Nak.

Layered Protocols • Layered protocol: The protocol decisions concern very different things • How many volts is 1 or zero? How wide is the pulse? (low level details) • Error correction • Routing • Sequencing (higher level details) • As a result you have many routines that work on the various aspects. They are called layered.

Layered Protocols • Layer X of the sender acts as if it is directly communicating with layer X of the receiver but in fact it is communicating with layer X-1 of the sender. • Similarly layer X of the sender acts as a virtual layer X+1 of the receiver to layer X+1 of the sender. • A famous example is the ISO OSI (International Standards Organization Open Systems Interconnection Reference Model).

Layered Protocols

Layered Protocols • So for example the network layer sends messages intended for the other network layer but in fact sends them to the data link layer. • Also the network layer must accept messages from the transport layer, which it then sends to the other network layer (really its own data link layer. • What a layer really does to a message it receives is add a header (and maybe a trailer) that is to be interpreted by its corresponding layer in the receiver.

Layered Protocols • So the network layer adds a header (in front of the transport layer's header) and sends to the other network layer (really its own data link layer that adds a header in front of the network layer's and a trailer). • So headers get added as you go down the sender's layers (often called the Protocol Stack or Protocol Suite). • They get used (and stripped off) as the message goes up the receiver's stack.

Layered Protocols

Layered Protocols • It all starts with process A sending a message. By the time it reaches the wire it has 6 headers (the physical layer doesn't add one - Why?) and one trailer. • The nice thing is that the layers are independent. You can change one layer and not change the others. • Physical layer: hardware, i.e. voltages, speeds, connectors. • Data link layer: Error correction and detection. "Group the bits into units called frames".

Layered Protocols • Frames contain error detection (and correction) bits. • This is what the pair of data link layers do when viewed as an extension of the physical. • But when being used, the sending DL layer gets a packet from the network layer and breaks it into frames and adds the error detection bits.

Data Link Layer • Discussion between a receiver and a sender in the data link layer. 2-3

Layered Protocols • Network layer: Routing. • Connection oriented network-layer protocol: X.25 or ATM. • Send a message to destination and establish a route that will be used for further messages during this connection (a connection number is given). • Like a telephone call. • Connectionless: IP (Internet Protocol). • Each packet (message between the network layers) is routed separately. • Like the post office.

Layered Protocols • Transport layer: make reliable and ordered (but not always). • Break incoming message into packets and send to corresponding transport layer (really send to ...). They are sequence numbered. • Header contains info as to which packets have been sent and received. • These sequence numbers are for the end to end message.

Layered Protocols • I.e. if grail.cba.csuohio.edu sends message to www.microsoft.com the transport layer breaks message into packets and numbers the packets. • These packets may take different routes. • On any one hop the data link layer keeps the frames ordered. • If you use connection-oriented network layer there is little for transport layer to do. • If you use IP for network layer, there is a lot to do. • If use connection-oriented TCP for transport layer of client-server system, slower than need be • Can use transactional TCP

Client-Server TCP • Normal operation of TCP. • Transactional TCP. 2-4

Layered Protocols • Session Layer: dialog and synchronization. • Dialog control • Synchronization facilities • Presentation layer: Describes "meaning" of fields. • Record definition • Application layer: For specific applications (e.g. mail, news, ftp). • Middleware logically resides in the application layer, but contains functionality that is quite general • Authentication • Authorization • Multicast, etc. • This leads to a slightly modified reference model

Middleware Protocols • An adapted reference model for networked communication. 2-5

Remote Procedure Call (RPC) • Developed by Birrell and Nelson (1984). • Recall how different the client code for copying a file was from the normal centralized (uniprocessor) code. • Let’s make the client server request-reply look like a normal procedure call and return. • Notice that getchar in the centralized version turns into a read system call. The following is for Unix: • read looks like a normal procedure to its caller.

Remote Procedure Call (RPC) • read is a user mode program. • read manipulates registers and then does a trap to the kernel. • After the trap, the kernel manipulates registers and then does a C-language routine and lots of work gets done (drivers, disks, etc). • After the I/O, the process get unblocked, the kernel read manipulates registers, and returns. The user mode read manipulates registers and returns to the original caller. • Let’s do something similar with request reply:

Remote Procedure Call (RPC) • User (client) does a subroutine call to getchar (or read). • Client knows nothing about messages. • We link in a user mode program called the client stub (analogous to the user mode read above). • This takes the parameters to read and converts them to a message (marshalls the arguments). • Sends a message to machine containing the server directed to a server stub. • Does a blocking receive (of the reply message).

Remote Procedure Call (RPC) • The server stub is linked with the server. • It receives the message from the client stub. • Unmarshalls the arguments and calls the server (as a subroutine). • The server procedure does what it does and returns (to the server stub). • Server knows nothing about messages • Server stub now converts this to a reply message sent to the client stub. • Marshalls the arguments.

Remote Procedure Call (RPC) • Client stub unblocks and receives the reply. • Unmarshalls the arguments. • Returns to the client. • Client believes (correctly) that the routine it calls has returned just like a normal procedure does.

Passing Value Parameters (1) • Steps involved in doing remote computation through RPC 2-8

Remote Procedure Call (RPC) • Heterogeneity: Machines have different data formats. • How can we handle these differences in RPC? • Have conversions between all possibilities. • Done during marshalling and unmarshalling. • Adopt a standard and convert to/from it.

Passing Value Parameters (2) • Original message on the Pentium • The message after receipt on the SPARC • The message after being inverted. The little numbers in boxes indicate the address of each byte

Remote Procedure Call (RPC) • Pointers: Avoid them for RPC! • Can put the object pointed to into the message itself (assuming you know its length). • Convert call-by-reference to copyin/copyout • If we have in or out parameters (instead of in out) can eliminate one of the copies • Change the server to handle pointers in a special way. • Callback to client stub

Registering and name servers • As we said before, we can use a name server. • This permits the server to move using the following process. • deregister from the name server • move • reregister • This is sometimes called dynamic binding.

Registering and name servers • The client stub calls the name server (binder) the first time to get a handle to use for the future. • There is a callback from the binder to the client stub if the server deregisters or we could have the attempt to use the handle fail so that the client stub will go to the binder again.

RPC Failures • This gets hard and ugly. • Can't find the server. • Need some sort of out-of-band response from the client stub to the client. • Ada exceptions • C signals • Multithread the client and start the "exception" thread. • This loses transparency (centralized systems don't have this).

RPC Failures • Lost request message. • This is easy if known. That is, if we are sure the request was lost. • Also easy if idempotent and we think it might be lost. • Simply retransmit the request. • Assumes the client still knows the request. • Lost reply message. • If it is known the reply was lost, have server retransmit.

RPC Failures • Assumes the server still has the reply. • How long should the server hold the reply? • Wait forever for the reply to be ack'ed? No! • Discard after "enough" time. • Discard after we receive another request from this client. • Ask the client if the reply was received. • Keep resending reply. • What if we are not sure of whether we lost the request or the reply? • If the server is stateless, it doesn't know and the client can't tell! • If idempotent, simply retransmit the request.

RPC Failures • What if the server is not idempotent and can't tell if we lost the request or the reply? • Use sequence numbers so server can tell that this is a new request not a retransmission of a request it has already done. • Doesn't work for stateless servers. • Server crashes • Did it crash before or after doing some nonidempotent action? • Can't tell from messages.

RPC Failures • From databases, we get the idea of transactions and commits. • This really does solve the problem but is not cheap. • Fairly easy to get “at least once” (try request again if timer expires) or “at most once (give up if timer expires)” semantics. Hard to get “exactly once” without transactions. • To be more precise. A transaction either happens exactly once or not at all (sounds like at most once) and the client knows which.

RPC Failures • Client crashes • Orphan computations exist. • Again transactions work but are expensive. • We can have the rebooted client start another epoch and all computations of previous epoch are killed and clients resubmit. • It is better is to let old computations with owners that can be found continue. • This isn’t a great solution.

RPC Failures • An orphan may hold locks or might have done something not easily undone. • Serious programming is needed.

Implementation Issues • Protocol choice • Existing ones like UDP are designed for harder (more general) cases and so are not efficient. • Often developers of distributed systems invent their own protocol that is more efficient. • But of course they are all different. • On a LAN we would like large messages since they are more efficient and don't take so long considering the high data rate.

Implementation Issues • Acks • One per packet vs. one per message. • Called stop-and-wait and blast. • In former wait for each ack. • In blast keep sending packets until message finished. • Could also do a hybrid. • Blast but ack each packet. • Blast but request only those missing instead of general nak. • Called selective repeat.

Implementation Issues • Flow control • Buffer overrun problem. • Internet worm caused by buffer overrun and rewriting non-buffer space. This is not the problem here. • Can occur right at the interface chip, in which case the (later) packet is lost. • More likely with blast but can occur with stop and wait if have multiple senders.

Implementation Issues • What to do • If chip needs a delay to do back to back receives have sender delay that amount. • If we can only buffer n packets, have sender only send n then wait for ack. • The above fails when we have simultaneous sends. But hopefully that is not too common. • This tuning to the specific hardware present is one reason why general protocols don't work as well as specialized ones.

Implementation Issues • Why is RPC slow? We have to... • Call stub • get message buffer • marshall parameters • If using UDP, computer checksum • fill in headers • Copy message to kernel space (Unless we have a special kernel) • Put in real destination address • Start DMA to communication device • ---------------- wire time

Implementation Issues • Why is RPC slow? We have to... • Process interrupt (or polling delay) • Check packet • Determine relevant stub • Copy to stub address space (unless we have a special kernel) • Unmarshall • Call server • On the Paragon (large Intel MPP of a few years ago), a variety of the above took 30ms of which 1ms was wire time.

Implementation Issues • Eliminating copying • Message transmission is essentially a copy so the minimum number of copies is 1. • This requires the network device to do its DMA from the user buffer (client stub) directly into the server stub. • But it is hard for the receiver to know where to put the message until it arrives and is inspected. • Sounds like a copy is needed from the receiving buffer to the server stub. • We can avoid this by adjusting memory maps.

Implementation Issues • Messages must then be full pages (as that is what is mapped). • Normally there are two copies on the receiving side. • From a hardware buffer to a kernel buffer. • From the kernel buffer to user space (server stub). • Often there are two on the sending side. • User space (client stub) to kernel buffer. • Kernel buffer to buffer on device. • Then start the device. • The sender ones can be reduced.

Implementation Issues • The device can do DMA from the kernel buffer thus eliminating the second. • Doing DMA from the user would eliminate the first, but we would need scatter gather (just gather here) since the header must be in the kernel space since the user is not allowed to set it (for security). • To eliminate the two on the receiver side is harder. • We can eliminate the first if the device writes directly into a kernel buffer. • To eliminate the second requires the remapping trick.

Implementation Issues • Timers and timeout values • Getting a good value for the timeouts is a black art. • Too small a value leads to many unneeded retransmissions. • Too large causes us to wait too long when a message is lost. • Should it be adaptive?? • If we find that we sent an extra message then raise the timeout value for this class of transmissions. • If timeout expires most of the time, lower the value for this class.

Implementation Issues • How to keep timeout values? • If you know that almost all timers of this class are going to go off (alarms) and accuracy is important, then keep a list sorted by time to alarm. • Only have to scan head for timer (so we can do it frequently). • Additions must search for a place to add. • Deletions (cancelled alarms) are presumed rare. • If deletions are common and we can afford not so accurate an alarm, then sweep list of all processes (not so frequently since accuracy not required). • Deletions and additions are easy since list is indexed by process number.

Implementation Issues • Difficulties with RPC • Global variables like errno inherently have shared-variable semantics and so they don't fit in a distributed system. • One (remote) procedure sets the variable and the local procedure is supposed to see it. • But the setting is a normal store so is not seen by the communication system. • So transparency is violated.

Implementation Issues • Weak typing (as in C) makes marshalling hard/impossible. • How big is the object we should copy? • What is the conversion needed if heterogeneous system? • So transparency is violated.

How does a programmer create a program with RPC? • uuidgen generates a unique identifier for the RPC • Include it in an IDL (interface description language file) and describe the interface for the RPC in the file as well • Write the client and server code • Client and server stubs are generated from the IDL file automatically • Link things together and run on desired machines

Writing a Client and a Server • The steps in writing a client and a server in DCE RPC. 2-14

CS 620 Advanced Operating Systems