Improving ipc by kernel design l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Improving IPC by Kernel Design PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on
  • Presentation posted in: General

Improving IPC by Kernel Design. Jochen Liedtke Proceeding of the 14 th ACM Symposium on Operating Systems Principles Asheville, North Carolina 1993. The Performance of u-Kernel-Based Systems. H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Download Presentation

Improving IPC by Kernel Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Improving ipc by kernel design l.jpg

Improving IPC by Kernel Design

Jochen Liedtke

Proceeding of the 14th ACM Symposium on Operating Systems Principles

Asheville, North Carolina

1993


The performance of u kernel based systems l.jpg

The Performance ofu-Kernel-Based Systems

H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter

Proceedings of the 16th Symposium on Operating Systems Principles

October 1997, pp. 66-77


Jochen liedtke 1953 2001 l.jpg

Jochen Liedtke (1953 – 2001)

  • 1977 – Diploma in Mathematics from University of Beilefeld.

  • 1984 – Moved to GMD (German National Research Center). Build L3. Known for overcoming ipc performance hurdles.

  • 1996 – IBM T.J Watson Research Center. Developed L4, a 12kb second generation microkernel.


The ipc dilemma l.jpg

The IPC Dilemma

  • IPC is a core paradigm of u-kernel architectures

  • Most IPC implementations perform poorly

  • Really fast message passing systems are needed to run device drivers and other performance critical components at the user-level.

  • Result: programmers circumvent IPC, co-locating device drivers in the kernel and defeating the main purpose of the microkernel architecture


What to do l.jpg

What to Do?

  • Optimize IPC performance above all else!

  • Results: L3 and L4: second-generation micro-kernel based operating systems

  • Many clever optimizations, but no single “silver bullet”


Summary of techniques l.jpg

Summary of Techniques

Seventeen Total


Standard system calls send recv l.jpg

Client (Sender)

Server (Receiver)

send ( ); System call,

Enter kernel

Exit kernel

receive ( );System call,

Enter kernel

Exit kernel

Client is not Blocked

send ( ); System call,

Enter kernel

Exit kernel

receive ( ); System call,

Enter kernel

Exit kernel

Standard System Calls (Send/Recv)

Kernel entered/exited four times per call!


New call response based system calls l.jpg

New Call/Response-based System Calls

Special system calls for RPC-style interaction

Kernel entered and exited only twice per call!

Client (Sender)

Server (Receiver)

reply_and_recv_next ( );

call ( ); System call,

Enter kernel

Allocate CPU to Server

Suspend

Re allocate CPU to Client

Exit kernel

Resume from being suspended

Exit kernel

handle message

reply_and_recv_next ( );

Enter kernel

Send Reply

Wait for next message


Complex message structure l.jpg

Complex Message Structure

Batching IPC

Combine a sequence of send operations into a single operation by supporting complex messages

  • Benefit: reduces number of sends.


Direct transfer by temporary mapping l.jpg

Direct Transfer by Temporary Mapping

  • Naïve message transfer: copy from sender to kernel then from kernel to receiver

  • Optimizing transfer by sharing memory between sender and receiver is not secure

  • L3 supports single-copy transfers by temporarily mapping a communication window into the sender.


Scheduling l.jpg

Scheduling

  • Conventionally, ipc operations call or reply & receive require scheduling actions:

    • Delete sending thread from the ready queue.

    • Insert sending thread into the waiting queue

    • Delete the receiving thread from the waiting queue.

    • Insert receiving thread into the ready queue.

  • These operations, together with 4 expected TLB misses will take at least 1.2 us (23%T).


Solution lazy scheduling l.jpg

Solution, Lazy Scheduling

  • Don’t bother updating the scheduler queues!

  • Instead, delay the movement of threads among queues until the queues are queried.

  • Why?

    • A sending thread that blocks will soon unblock again, and maybe nobody will ever notice that it blocked

  • Lazy scheduling is achieved by setting state flags (ready / waiting) in the Thread Control Blocks


Pass short messages in registers l.jpg

Pass Short Messages in Registers

  • Most messages are very short, 8 bytes (plus 8 bytes of sender id)

    • Eg. ack/error replies from device drivers or hardware initiated interrupt messages.

  • Transfer short messages via cpu registers.

  • Performance gain of 2.4 us or 48%T.


Impact on ipc performance l.jpg

Impact on IPC Performance

  • For an eight byte message, ipc time for L3 is 5.2 us compared to 115 us for Mach, a 22 fold improvement.

  • For large message (4K) a 3 fold improvement is seen.


Relative importance of techniques l.jpg

Relative Importance of Techniques

  • Quantifiable impact of techniques

    • 49% means that that removing that item would increase ipc time by 49%.


Os and application level performance l.jpg

OS and Application-Level Performance


Os level performance l.jpg

OS-Level Performance


Application level performance l.jpg

Application-Level Performance


Conclusion l.jpg

Conclusion

  • Use a synergistic approach to improve IPC performance

    • A thorough understanding of hardware/software interaction is required

    • no “silver bullet”

  • IPC performance can be improved by a factor of 10

  • … but even so, a micro-kernel-based OS will not be as fast as an equivalent monolithic OS

    • L4-based Linux outperforms Mach-based Linux, but not monolithic Linux


  • Login