200 likes | 286 Views
This paper delves into Remote Procedure Calls (RPC) and User-Level Remote Procedure Call (URPC), focusing on shared memory multiprocessors. It examines processor reallocation, data transfer, and thread management at the user level to boost performance and efficiency.
E N D
User Level Interprocess Communication for Shared Memory Multiprocessorby Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Introduction • RPC • Help in implementing distributed applications by eliminating the need to implement communication mechanism. • Decomposed system provides advantages of failure isolation, extensibility and modularity. So RPC is used even when the call is in the same machine.
Introduction • RPC Costs • Stub overhead • Message buffer overhead (4 copies) • Access validation • Message transfer • Scheduling • Context switch • Dispatch
Introduction • LRPC Costs • Stub overhead • Message buffer overhead (1 copy) • Only necessary access validation • Message transfer • Only necessary scheduling • Context switch is minimized by using domain caching
Introduction • IPC • Main components (All work in Kernel) • Processor reallocation (process context switch) • Data transfer • Thread management • Problems • Processor reallocation is expensive • Parallel applications need user-level thread management
URPC • User-Level Remote Procedure Call • Shared memory multiprocessors • Processor reallocation - minimize • Data transfer - user-level (Package called URPC) • Thread management - user-level (Package called FastThreads)
Processor Reallocation • Limit the frequency of processor reallocation • Why • Cost of process context switch is more expensive than thread context switch • Cost of invoking kernel • Client makes procedure call in server address space • Invoke kernel • Kernel reallocates processor to server address space • Server finishes the job • Invoke kernel • Kernel reallocates processor to client address space • Client resumes the work
Processor Reallocation • Limit the frequency of processor reallocation • How • Optimistic reallocation policy • Client has other works • Server has or will soon has a processor to do the job • Uniprocessor can delay processor reallocation • Client makes procedure call in server address space • Client does something else • Server finishes the job • Client resumes the work
Processor Reallocation • Problems • Inappropriate situations • Single-threaded client, real time applications & high-latency I/O applications • Solve: Allow client to force processor reallocation • Underpowered • No processor to handle the pending request from client • Solve: Donate – idle processor donates itself to underpowered address space
Processor Reallocation • Problems • Voluntary return of processor • Processor working in server never return to client because it is too busy working on the request of other clients. • Solve: enforce the process reallocation when necessary such as high priority waiting while low priority job is running and processor is idling
Processor Reallocation • LRPC VS URPC • Domain caching looks for idle processor in server context • Optimistic reallocation assume there will be an available processor in server context and queue the request to be done later • URPC needs two level scheduling decisions including looking for idle processor and underpoweredaddress space while LRPC does not.
Data Transfer • Use pair-wise shared memory to avoid the need of copying in kernel. • Both give the same level of security since data need to be passed into stubs before it can be used
Thread Management • Arguments • Fine-grained parallel application needs high performance thread management which could only be achieved by implementing in user-level • Communication & Thread management can achieve very good performances when both are implemented at user-level
Thread Management • Features of kernel such as time slicing degrade performance of applications • To invoke thread management operation, kernel traps are required • Thread management policy implemented in kernel is unlikely to be efficient for all parallel applications
Thread Management • Threads block in order to • Synchronize their activities in same address space • Wait for external events from different address space • Communication implemented at kernel level will result in synchronization at both user level and kernel level
Performance • Thread managementfaster at user level • Component breakdown
Performance • Call latency & throughput is at worst when S=0
Conclusion • Moving the possible functionality from kernel into user-lever to improve performance • In order to achieve great performance on multiprocessors, system need to be designed to support its functionality