Implementing Remote Procedure Calls

Implementing Remote Procedure Calls Andrew D. Birrell and Bruce Jay Nelson Presented by: ShreyaBhargava

Distributed environment • A1, A2, A3 are distinct address spaces p1 A1 A2 p2 A3 p3

How do these systems communicate? • Use messages. • Issues: Present at the lowest level of abstraction and requires the application programmer to be able to identify the destination process, syntax of the message, & the source process • Can there be a simpler alternative?

Remote Procedure Calls! • N1 can run other processes while P1 is waiting on rpc completion. • RPC Idea: Use the same mechanism to pass control and data across network as function calls within a single computer. • Does this remind us of stack ripping?

What are we trying to achieve? • Make distributed computing easy. • Make RPC communication highly efficient. • Do not burden application programmers with additional coding requirements on the top of RPC package. • Provide secure communication. • RPC should be independent of where it executes.

Issues while designing RPC • How can we be sure that a calling procedure is mapped to the correct procedure to be executed on the remote machine? • How does the caller determine the machine address of the callee in order to specify it, a procedure to execute? • What happens if there is some sort of communication failure between the caller and the callee? • What is the best suited protocol for transmitting data? • Does the mechanism work just as well if the client and the server are brought together into a single machine? • What about data integrity and security?

Design Decisions • Why not Message passing? Function call is the major control and data transfer mechanism embedded in MESA. • Why not shared virtual address? Argue that this is not cost-effective and adding this could degrade efficiency. • How long should a caller wait for the result from callee? Is there a time out? No time-outs for the procedure calls, since local procedure calls do not have a time-out (this, is in the absence of machine/communication failures).

RPC Components • The client code • User-stub • RPC Communication Package • Server-stub • The Server

When a RPC is invoked

RPC Mechanism • User makes a procedure call. • User-stub does binding and makes packets with procedure call and arguments. • Communication package transfers the packets reliably. • Server-stub unpacks the packets and server as a local call. • Server-stub packs the result of the local call. • Communication package transfers the result packets. • User-stub unpacks the packet and pass result to the user. • NOTE: All communication is done in the blocking mode.

Issues while designing RPC • How can we be sure that a calling procedure is mapped to the correct procedure to be executed on the remote machine? • How does the caller determine the machine address of the callee in order to specify it, a procedure to execute? • What happens if there is some sort of communication failure between the caller and the callee? • What is the best suited protocol for trnasmitting data? • Does the mechanism work just as well if the client and the server are brought together into a single machine? • What about data integrity and security?

Who builds what • User: User module, Server module and an Interface module • Lupine : Generates User-stub and Server-Stub. • RPC runtime : Built as a part of the Cedar System. • Responsibilities: Lupine is responsible for generating the code for packing and unpacking arguments and results, and for dispatching to the correct procedure for an incoming call in the server stub. RPC runtime is responsible for packet level communication. Programmer is responsible for specifying proper argument and result that is compatible with chosen design and for handling reported machine or communication failure.

Binding • Naming • Location

NAMING (compile time decision) • Bind an importer of an interface to an exporter of an interface. • Two parts to “Naming”. Type – Specifies which interface the caller expects the callee to implement. Instance – Specifies which particular implementer of an interface is desired.

Location (runtime decision) • Early binding machine address of the server hardcoded in the application program. • Broadcast to locate a server Too much interference. • nameserver type, instance

LOCATING AN APPROPRIATE EXPORTER (RUNTIME DECISION) Use Grapevine distributed database. Database consists of two types of entries: Individual Group Individual Group Instance Address Type Instance Wpi #33# Fileaccess wpi Alpine #23# mail server alpine Elb #11# fileaccess elb

PRIOR TO A CALLHow does a callee export an interface? • When a callee wishes to export an interface (make it available to callers), it stores information about its interface in a network accessible database.

Cont’d • The caller can then find the server callee in a database lookup by specifying a particular instance of the desired interface type and receiving location information about that instance, or by specifying the type of the interface and receiving a list of instances that implement that type, and then iterating through them to find an available match.

CONT’D Each server maintains one export table, containing entries for all currently exported interfaces. This table is used to map incoming RPC request messages to their corresponding server procedure. Each entry in the export table consist of a unique identifier for that interface, and a pointer to the server stub that should be called to invoke the interface service. Note: Unique identifier is never reused. Grapevine provides for late binding. Binding callers to specific servers at runtime makes it possible to move the server to another machine without requiring changes to the client software.

RPC communication protocol • The protocol used is intended for small, discrete chunks of data, which can contain: • Identifiers specifying caller, callee and call. • Requested procedure and procedure arguments. • Procedure results. • Acknowledgements of received packets. • Exception information. • Why not use general byte-stream protocol like TCP? Latency is more important than bandwidth. Connection establishment and tear down should be light weight. Moreover, to service many clients at the same time, server should not maintain much state information of connection.

More on RPC Protocol • No connection set up and tear down. • Result packet is used as the ack of call packet. • The start of the next session(identified by the call id) is used as the ack of the result packet. • No buffering or flow-control strategies implemented.

How are simple calls handled? • Call packet: call id+procedure(export id, table index) + argument. • Return packet: call id+ return values. • Capacity of the network pipe is basically one packet (stop and wait) -> only one RPC call outstanding per process. • Session: call packet -> result packet.

Simple calls cont’d

Simple calls cont’d • Retransmission of a packet (either from caller or callee) occurs until an acknowledgement is received. • To the caller, a received packet containing the procedure results is viewed as an acknowledgement. • To the callee, a received packet containing a new procedure call is viewed as an acknowledgement of the last procedure result sent. • Each call by the caller carries a unique identifier so that subsequent calls to the same procedure may be processed, but duplicate packets (from retransmissions) for the same call will be discarded. • Any given caller (process or thread on a given machine) will have at most one outstanding remote call.

How are complicated calls handled?

Complicated calls cont’d • Packet loss Retransmission of modified packet with request for explicit ack. • Calls with long arguments or result Pkts except last one are sent with explicit request for ack. ack for last arg packet = result pkt ack for last result pkt = next call pkt Flow control – stop and wait (not the best way to send bulk data, works best with simple calls) • Long duration packet Loss of the last argpkt to caller -> retransmission with explicit req. for ack. After getting ack for the last pkt, caller keeps sending probe pkt to assure that the callee is still working Timer to senn probe increases gradually • Long gap between calls Loss of the last result pkt to callee -> retransmission with explicit req. for ack.

Dealing with crashes • Client crash and restart - RPC at the client gives a new incarnation id. - Client has to rebind to the service. - server uses the new client id to distinguish this instance from the previous one. • Server crash and restart - server get a new server id. - all clients bound to the previous incarnation id are out of luck, they have to rebind.

Exception handling • MESA Exception handling. • Exception arises -> callee returns an exception pkt instead of a result pkt. RPC runtime on the caller raises the exception to the client process. User handling procedure terminate process. Return value of catch procedure is returned to callee or notifies callee about abort. Callee process resumes its exception or unwinds its call stack. What about in a case of communication failure? RPC runtime raises a call failed exception.

Security • RPC package and protocol include facilities for encryption-based security. • Use Grapevine as an authentication service.

Processes & Optimizations • Processes: A server callee maintains a pool of available server processes to handle incoming requests. This saves the cost of creating a new process to handle each request. A new process is created to handle a new request when the available processes are busy. To save on the costs of context switches between processes, each packet contains Ids of calling and serving processes. • Optimizations: Minimize the costs of maintaining connections. Avoid costs of establishing and terminating connections. Reduce the number of process switches involved in a call.

Performance • Measurements made for remote calls between two Dorados computers connected by Ethernet (2.94 Mbps) • Ethernet shared with other users, but the network was lightly loaded. • Did not use any encryption facilities. • 12000 calls made on each procedure. • Interval timed is from the time the user invokes a local procedure to the return of the procedure call.

Conclusion • For small packets, RPC overhead dominates • For large packets, transmission time dominates

Implementing Remote Procedure Calls