1 / 15

User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors. Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy presented by: Joe Rosinski. Introduction. Trends at the time of research Modularity “Decomposed system” over “monolithic”

Download Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy presented by: Joe Rosinski

  2. Introduction • Trends at the time of research • Modularity “Decomposed system” over “monolithic” • Increased fault protection, extensibility • IPC increasingly critical in operating system design • Prevalence of high-performance user-level threaded programming • Trends indicate performance problems

  3. Problems • System calls are being spread across kernel modules • All IPC leads through the kernel • As modularity increases kernel traffic will increase • LRPC indicates performance has peaked under this model • ( 70% kernel related overhead ) • Degraded performance and added complexity when user-level threads communicate across boundaries

  4. Solution For Shared Memory Multiprocessors • URPC (User-level Remote Procedure Call) • Remove the kernel from cross-address space communication • User-level management of cross-address space communication • Use shared memory to send messages directly between address spaces

  5. URPC • Separates IPC into 3 components • Data transfer • Thread management • Processor reallocation • Goals • Move a&b into user-level • Limit the kernel from c

  6. Data Transfer • Logical channels of pair-wise shared memory • Channels created & mapped once for every client/server pairing • Channels are bi-directional • TSL controlled access in either direction • Just as secure as going through the kernel

  7. Data & Security • Applications access URPC procedures through Stubs layer • Stubs unmarshal data into procedure parameters • Stubs copy data in/out, no direct use of shared memory • Arguments are passed in buffers that are allocated and pair-wise mapped during binding • Data queues monitored by application level thread management

  8. Thread Management • LRPC: client threads effectively cross address-space to server • URPC: always try to reschedule another thread within address-space • Synchronous from programmer pov, but asynchronous to thread mgmt level • Switching threads within the same address space requires less overhead than processor reallocation

  9. Processor Reallocation • Switching the processor between threads of different address spaces • Requires privileged kernel mode to access protected mapping registers • Does include significant overhead • As pointed out in the LRPC paper • Therefore, URPC strives to avoid processor reallocation • This avoidance can lead to substantial performance gains

  10. Sample Execution Timeline Optimistic Reallocation Scheduling Policy pending outgoing messages detected  processor donated  FCMgr “Underpowered”

  11. Optimistic Scheduling Policy • Assumptions • Client has other work to do • Server will soon have a processor to service a message • Doesn't perform well in all situations • Uniprocessors • Real-time applications • High-latency I/O operations (require early initialization ) • Priority invocations • URPC allows forced processor reallocation to solve some of these problems

  12. Performance Note: Table II results are independent of load

  13. Performance • Latency is proportional to the number of threads per cpu’s • T = C = S = 1 call latency is 93 microseconds • T = 2, C =1, S = 1, latency increases to 112 microseconds however, throughput raises 75% (benefits of parallelism) • Call latency effectively reduced to 53 microseconds • C = 1, S = 0, worst performance • In both cases, C = 2, S = 2 yields best performance

  14. Performance • Worst case URPC latency for one thread is 375 microseconds • Similar hardware, LRPC call latency is 157 microseconds • Reasons: • URPC requires two level scheduling • URPC ‘s low level scheduling is done by LRPC • Small price considering possible gains, this is necessary to have high level scheduling

  15. Conclusions • Performance gains from moving features out of the kernel not vice-versa • URPC represents appropriate division for operating system kernels of shared memory multiprocessors • URPC showcases a design specific to a multiprocessor, not just uniprocessor design that runs on multiprocessor hardware

More Related