User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented by Arthur Strutzenberg

Interprocess Communication • In the LRPC paper/presentation, it discussed the need for • Failure Isolation • Extensibility • Modularity • Usually a balance between the 3 needs and performance • This will is a central theme for this paper as well.

Interprocess Communication • Traditionally this is the responsibility of the Kernel • This suffers from two problems • Architectural Performance • Interaction between kernel based communication and user level threads • Generally designers use a pessimistic (non cooperative) approach • This begs the following question “How can you have your cake and eat it too?”

Interprocess Communication • What if the Communication layer is extracted out of the kernel, and made part of the User level • This can increase performance by allowing • Messages sent between address spaces directly • Elimination of unnecessary processor reallocation • Amortization (processor reallocation (when needed) is spread over several independent calls) • Parallelism in message passing is exploited

User-Level Remote Procedure Call (URPC) • Allows communication between address spaces without kernel mediation • Isolates • Processor Reallocation • Thread Management • Data Transfer • Kernel is ONLY responsible for allocating processors to the address space

URPC & Communication • Application OS Communication typically is • Narrow Channel (Ports) • Limited Number of Operations • Create • Send • Receive • Destroy • Most modern OS have support for RPC

URPC & Communication • What does this buy URPC? • RPC is generally limited in definition about how the channels of communication operate • Also the definition generally does not specify how processor scheduling (reallocation) will interact with the data transfer

URPC & Communication • URPC exploits this information by • Messages passed through logical channels are kept in memory that is shared between client and server • This memory once allocated is kept intact • Thread management is User Level (lightweight instead of “Kernel weight”) • (Haven’t we read this in another paper?)

URPC & Thread Management • There is less overhead involved in switching a processor to another thread in the same address space (context switching) versus reallocating it to another thread in a different address space (Processor Reallocation) • URPC uses this along with the user level scheduler to always give preference to threads within the same address space

URPC & Thread Management • Some numbers for comparison: • A context switch within the address space • 15 microseconds • A processor reallocation • 55 microseconds

URPC & Processor Allocation • What happens when a client invokes a procedure on a server process and the server has no processors allocated to it? • URPC calls this “underpowered” • The paper identifies this as a load balancing problem • The solution is reallocation from client to server • A client with an idle processor can elect to reallocate the idle processor to the server • This is not without cost, as this is expensive and requires a call to the kernel

Rationale for URPC • The design of the URPC package presented in this paper has three main components • Thread Management • Data Transfer • Processor Reallocation

Lets kill two birds with one stone • URPC uses an “optimistic reallocation policy” which makes the following assumptions • The Client will always have other work to do • The server will (soon) have a processor available to service messages • This leads to the “amortization of cost” • The cost of a processor reallocation is spread over several calls

Why the optimistic approach doesn’t always hold • This approach does not work as well when the application • Runs as a single thread • Is Real time • Has high latency I/O • Priority Invocations • URPC handles this by allowing the client’s address space to force a processor reallocation to the server’s even though there might still be work to do

The Kernel handles Processor Reallocation • URPC handles this through call called “Processor.Donate” • This passes control of an idle processor down to the kernel, and then back up to a specified address in the receiving space

Voluntary Return of Processors • The policy of URPC on its server processors is “…Upon receipt of a processor from a client address, return the processor when all outstanding messages from the client have generated replies, or when the server determines that the client has become ‘underpowered’….”

Parallels to the User Threads Paper • Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This has the potential to lead to some interesting side effects. • This is extremely similar to some of the problems discussed in the User Threads paper • For example, a server thread could conceivably continue to hold a donated processor and handle requests from other clients

What this leads to… • One word: STARVATION • URPC handles this by only directly reallocating processors to load balance. • In other words, the system also needs the notion of preemptive reallocation • The Preemptive reallocation must also adhere to • No higher priority thread waits while a lower priority thread runs • No processor idles when there is work for it to do (even if the work is in another address space)

Controlling Channel Access • Data flows in URPC involving different address spaces use a bidirectional shared memory queue. The queues have a test and set lock on either end, which the papers specifically state must be NON SPINNING • The protocol is, if the lock is free, acquire it, otherwise go on and do something else • Remember this protocol operates under the assumption that there is always work to do!!

Data Transfer Using Shared Memory • There is still the risk of what the paper refers to as the “abusability factor” with RPC, where Clients & Servers can • Overload each other • Deny service • Provide bogus results • Violate communication protocols • URPC passes the responsibility to handle this off to the stubs.

Cross-Address Space Procedure Call and Thread Management • This section of the paper identifies that there is a correspondence between Send  Receive (messaging) And Start  Stop (Threads) • Does this not remind everybody of a classic paper that we had to read?

Another link to the User Threads Paper • Additionally the paper identifies three arguments with the thread—message relationship • High performance thread management facilities are needed for fine-grained parallel programs • High performance can only be provided at the user level • The close interaction between communication and thread management can be exploited

URPC Performance • Some comparisons:

URPC Performance • URPC can be broken down into 4 components • Send • Poll • Receive • Dispatch

Call Latency and Throughput • Call Latency is the time from which a thread calls into the stub until control returns from the stub. • These are load dependent, and depend on • Number of Client Processors (C) • Number of Server Processors (S) • Number of runnable threads in the client’s Address Space (T) • The graphs measure how long it takes to make 100,000 “Null” procedure calls into the server in a “tight loop”

Call Latency and Throughput

Conclusions • In certain circumstances, it makes sense to move the Communication layer from the kernel to user space. • Most OS’s are designed for a uniprocessor system, and are ported over to an SMMP system. • URPC is one example of a system that is designed for SMMP directly, and takes direct advantage of the characteristics of the system

Conclusions • As a lead in to Professor Walpoles Discussion and Q&A, lets conclude by trying to fill out the following table:

User-Level Interprocess Communication for Shared Memory Multiprocessors