190 likes | 325 Views
Improving IPC by Kernel Design By Jochen Liedtke German National Research Center for Computer Science . Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC. L3 Similar to MACH
E N D
Improving IPC by Kernel Design ByJochen LiedtkeGerman National Research Center for Computer Science Presented By Srinivas Sundaravaradan
MACH • µ-Kernel system based on message passing • Over 5000 cycles to transfer a short message • Buffering IPC • L3 • Similar to MACH • Hardware Interrupts delivered through messages • No Ports
Design Philosophy • Focus on IPC • Any Feature that will increase cost must be closely evaluated. • When in doubt, design in favor of IPC • Design for Performance • A poorly performing technique is unacceptable • Evaluate feature cost compared to concrete baseline • Aim for a concrete performance goal • Comprehensive Design • Consider synergistic effects of all methods and techniques • Cover all levels of implementation, from design to code
Making IPC faster • Fewer • Call / Reply & Receive Next • Combining messages • Faster • 15 other optimizations • Architectural level • Use redesign of L3 as opportunity to change kernel design
Methodology • Theoretical minimum • Null message between address spaces • receiver is ready to receive it • 107 cycles to enter & leave kernel • 45 cycles for TLB misses • 172 cycles • Goal • 350 cycles • Achieved 250 cycles = T
Minimize system calls • Why minimize system calls ? • 60% of T • Traditional IPC • 4 system calls • Solution • Call • Reply & Receive next
Minimize system calls Client Server Receive Blocked Send Call Unblocked Receive (reply) Send (reply) Reply and receive next Receive (next) Unblocked Blocked
A Complex Message Complex Message • Direct String • Data to be transferred directly from send buffer to receive buffer • Indirect String • Location and size of data to be transferred by reference • Memory Object • Description of a region of memory to be mapped in receiver address space (shared memory)
Ways of Message Transfer • Twofold Message Copy • user space A -> kernel space -> user space B • LRPC mechanism • share user-level memory • secure ? • does not support variable-to-variable transfer
copy mapped with kernel-only permission Temporary Mapping… • Two copy message transfer costs 20 + 0.75n cycles • L3 copies data once to a special communication window in kernel space • Window is mapped to the receiver for the duration of the call (page directory entry) kernel add mapping to space B kernel
Temporary Mapping… frames in memory Top-level Page table 2nd-level tables
Lazy Scheduling • Scheduler overhead is significant component of IPC cost • Threads doing IPC are often moved to wait queue only to be inserted back again onto the ready queue. • Lazy Scheduling • avoid locking of queues • queue manipulation is avoided • instruction execution • TLB misses
Use registers for short messages • Messages are usually short ! • ack/error replies from drivers • hardware interrupt messages • Intel 486 processor • 7 general purpose registers • sender info, data • May not work for CPU’s with fewer registers
Summary of Optimizations • Architectural • System Calls, Messages, Direct Transfer, Strict Process Orientation, Thread Control Blocks • Algorithmic • Thread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy Scheduling, Direct Process Switch, Short messages • Interface • Unnecessary Copies, Parameter passing • Coding • Cache misses, TLB misses, Segment registers, General registers, Jumps and Checks, Process Switch
Conclusions • L3’s message passing was 22 times faster than that of MACH • Kernel redesign focused mainly on IPC • Caveats • Ports and Buffering • Specific to the architecture