Improving ipc by kernel design l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 19

Improving IPC by Kernel Design PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on
  • Presentation posted in: General

Improving IPC by Kernel Design. Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993. The Performance of u-Kernel-Based Systems. H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Download Presentation

Improving IPC by Kernel Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Improving IPC by Kernel Design

Jochen Liedtke

Proceeding of the 14th ACM Symposium on Operating Systems Principles

Asheville, North Carolina

1993


The Performance ofu-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Proceedings of the 16th Symposium on Operating Systems Principles

October 1997, pp. 66-77


Jochen Liedtke (1953 – 2001)

  • 1977 – Diploma in Mathematics from University of Beilefeld.

  • 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles.

  • 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel.


The IPC Dilemma

  • IPC is a core paradigm of u-kernel architectures

  • Most IPC implementations perform poorly

  • Really fast message passing systems are needed to run device drivers and other performance critical components at the user-level.

  • Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture


What to Do?

  • Optimize IPC performance above all else!

  • Results: L3 and L4: second-generation micro-kernel based operating systems

  • Many clever optimizations, but no single “silver bullet”


Summary of Techniques

Seventeen Total


Client (Sender)

Server (Receiver)

send ( ); System call,

Enter kernel

Exit kernel

receive ( );System call,

Enter kernel

Exit kernel

Client is not Blocked

send ( ); System call,

Enter kernel

Exit kernel

receive ( ); System call,

Enter kernel

Exit kernel

Standard System Calls (Send/Recv)

Kernel entered/exited four times per call!


New Call/Response-based System Calls

Special system calls for RPC-style interaction

Kernel entered and exited only twice per call!

Client (Sender)

Server (Receiver)

reply_and_recv_next ( );

call ( ); System call,

Enter kernel

Allocate CPU to Server

Suspend

Re allocate CPU to Client

Exit kernel

Resume from being suspended

Exit kernel

handle message

reply_and_recv_next ( );

Enter kernel

Send Reply

Wait for next message


Complex Message Structure

Batching IPC

Combine a sequence of send operations into a single operation by supporting complex messages

  • Benefit: reduces number of sends.


Direct Transfer by Temporary Mapping

  • Naïve message transfer: copy from sender to kernel then from kernel to receiver

  • Optimizing transfer by sharing memory between sender and receiver is not secure

  • L3 supports single-copy transfers by temporarily mapping a communication window into the sender.


Scheduling

  • Conventionally, ipc operations call or reply & receive require scheduling actions:

    • Delete sending thread from the ready queue.

    • Insert sending thread into the waiting queue

    • Delete the receiving thread from the waiting queue.

    • Insert receiving thread into the ready queue.

  • These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).


Solution, Lazy Scheduling

  • Don’t bother updating the scheduler queues!

  • Instead, delay the movement of threads among queues until the queues are queried.

  • Why?

    • A sending thread that blocks will soon unblock again, and maybe nobody will ever notice that it blocked

  • Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks


Pass Short Messages in Registers

  • Most messages are very short, 8 bytes (plus 8 bytes of sender id)

    • Eg. ack/error replies from device drivers or hardware initiated interrupt messages.

  • Transfer short messages via cpu registers.

  • Performance gain of 2.4 us or 48%T.


Impact on IPC Performance

  • For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement.

  • For large message (4K) a 3 fold improvement is seen.


Relative Importance of Techniques

  • Quantifiable impact of techniques

    • 49% means that that removing that item would increase ipc time by 49%.


OS and Application-Level Performance


OS-Level Performance


Application-Level Performance


Conclusion

  • Use a synergistic approach to improve IPC performance

    • A thorough understanding of hardware/software interaction is required

    • no “silver bullet”

  • IPC performance can be improved by a factor of 10

  • … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS

    • L4-based Linux outperforms Mach-based Linux, but not monolithic Linux


  • Login