User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck

OUTLINE • Interprocess Communication (IPC) • User-Level Remote Procedure Call (URPC) • URPC Design Rational • Processor Reallocation • Data Transfer Using Shared Memory • Thread Management • URPC Performance • Latency • Throughput • Related Work • Conclusion

Interprocess Communication • Central to the design of contemporary Operating Systems • Encourages system decomposition across address space boundaries • Fault isolation • Extensibility • Modularity • Provides for communication between different address spaces on the same machine

Interprocess Communication • Extent of the usability of the address spaces depends on the performance of the communication primitives • IPC has bee the responsibility of the kernel – which has two significant issues • Architectural performance barriers • Performance of kernel-based synchronous communication is limited by the cost of invoking the kernel and reallocating processor between address spaces • In prior work on LRPC. 70% of the overhead can be attributed to the kernel’s mediation of the cross-address space call • Interaction between kernel-based communication and high-performance user-level threads • For satisfactory performance, medium- and fine-grained parallel applications need user-level thread management. • Costs (performance and system complexity) high for partitioning strongly interdependent communication and thread management across protection boundaries.

Solution • Eliminate the kernel from the path of cross-address space communication • User-level Remote Procedure Call improves performance because: • Messages are sent between address spaces directly without invoking the kernel • Eliminates unnecessary CPU reallocation • When CPU reallocation is needed, the cost can be amortized over multiple independent calls • Exploitation of inherent parallelism in message sending and receiving improves performance.

Messages Review • In many contemporary OS’s applications communicate via narrow channels or ports • Only a few available operations – create, send, receive, destroy • Permit program to program communication across address space boundaries or even machine to machine • Messages are powerful, but they represent a control and data structure alien to traditional Algol-like languages.

Remote Procedure Call (RPC) • Almost every mature OS supports RPCs which enable messages to do the work with a procedure call interface • RPCs provide the synchronous language-level transfer of control between programs in different address spaces • Communication occurs through a narrow channel, which is left undefined as to its specific operation

User-level Remote Procedure Call (URPC) • URPC exploits the lack of definition of the RPC channel in two ways • Messages are passed between address spaces through logical channels kept in memory and shared between client and server • Thread management is implemented at the user-level and manages messages at the user-level without kernel involvement for a call or reply • URPC provides synchronous, typed messages for the programmer, hiding the asynchronous untyped characteristics below the thread management layer

User-level Remote Procedure Call (URPC) • URPC provides safe and efficient communication between address spaces on the same machine without kernel mediation • Isolates the three components of interprocess communication: processor reallocation, thread management, and data transfer • Kernel involvement limited to CPU reallocation • Control transfer handled by thread management and CPU reallocation • A simple procedure call with URPC has a latency of 93 µsecs compared to the LRPC’s 157 µsecs

URPC Design Rational • Designed on the Observation that there are several independent components to a cross-address space call. • Main components are: • Processor Reallocation • Ensuring that there is a physical processor to handle the client’s call in the server and the server’s reply in the client • Data Transfer Using Shared Memory • Moving arguments between the client and server address spaces • Thread Management • Blocking the caller’s thread, running a thread through the procedure in the server’s address space, and resuming the caller’s thread on return

Processor Reallocation • Aim is to reduce the frequency that CPU reallocations occur with an optimistic reallocation policy • Optimistic assumption • Client has other work to do • Server will soon have a processor to service a message • Some situations to not be optimistic and invoke the kernel for a reallocation • Single thread applications • High-latency I/O • Real-Time applications • Priority invocations

Processor Reallocation • Kernel handles processor reallocation to underpowered address spaces • Invoked using Processor.Donate which identifies the receiving address space to the kernel • Receiver is given identity of the caller by the kernel • The voluntary return of the processor is not guaranteed

SampleExecution • Three applications in there own address spaces • Editor as the Client • Server WinMgr • Server FCMgr • Two available processors • Two threads T1 & T2 in the client

Data Transfer Using Shared Memory • In URPC each client-server combination is bound to a pair-wise mapped logical channel in shared memory • Mapping occurs once before the first call • Applications access URPC through the stubs layer • Safety of the communication is the responsibility of the stubs • Unlike traditional RPC the kernel is NOT invoked to copy data from one address space to another

Data Transfer Using Shared Memory • Data flows over a bidirectional shared memory queue with non-spinning test-and-set locks on either end

Thread Management • Calling semantics of cross address space procedure call are synchronous with respect to the calling thread • Each communication function (send-receive) has corresponding thread management function (start-stop) • This close interaction between threads and communication can be exploited with user-level implementation to achieve good performance for both

Thread Management • Thread overhead – points of reference • Heavyweight – kernel makes no distinction between a thread and its address space • Middleweight – Kernel managed but decoupled from address space to allow multiple threads • Lightweight – user-level managed via libraries that execute in the context of weightier threads • Lightweight thread usage implies two level scheduling • Lightweight threads scheduled user-level on heavier threads • Heavier threads scheduled by the kernel

URPC Performance Cost of thread management actions between URPC and Toas threads Breakdown of the time taken by each component when no processor reallocation needed

URPC Performance - Latency C-client processors S-Server processors T-runnable client threads Time for T threads to make 100,000 “Null” procedure calls. Latency measured from call into the Null stub until control returns from the stub

URPC Performance - Throughput C-client processors S-Server processors T-runnable client threads Time for T threads to make 100,000 “Null” procedure calls.

Conclusion • URPC represents the appropriate division of responsibility between the user-level and the system kernel in shared memory multiprocessor systems • Performance improves over kernel involved message methods • URPC demonstrates the advantages to designing system facilities for the capabilities of a multiprocessor machine and making the distinction between a multiprocessor OS and uniprocessor OS that runs on a multiprocessor

User-Level Interprocess Communication for Shared Memory Multiprocessors