1 / 61

CS533 Concepts of Operating Systems Class 6

CS533 Concepts of Operating Systems Class 6. Micro-kernels Mach vs L3 vs L4. Binary Compatibility. Emulation libraries Trampoline mechanism Single server architecture Multi-server architecture IPC overhead proportional to number of servers (independent protection domains). Optimizing IPC.

garth
Download Presentation

CS533 Concepts of Operating Systems Class 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS533 Concepts of Operating SystemsClass 6 Micro-kernels Mach vs L3 vs L4

  2. Binary Compatibility • Emulation libraries • Trampoline mechanism • Single server architecture • Multi-server architecture • IPC overhead proportional to number of servers (independent protection domains) CS533 - Concepts of Operating Systems

  3. Optimizing IPC • Liedtke argues Mach’s overhead is due to poor implementation! • Optimized IPC implementation in L3 • Architectural level • System Calls, Messages, Direct Transfer, Strict Process Orientation, Control Blocks. • Algorithmic level • Thread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy Scheduling, Direct Process Switch, Short Messages. • Interface level • Unnecessary Copies, Parameter passing. • Coding level • Cache Misses, TLB Misses, Segment Registers, General Registers, Jumps and Checks, Process Switch. CS533 - Concepts of Operating Systems

  4. L3 IPC Performance vs Mach IPC CS533 - Concepts of Operating Systems

  5. L3 RPC Performance vs Previous Systems CS533 - Concepts of Operating Systems

  6. But Is That Enough? • What is the impact on overall system performance? • Haertig et al explore performance and extensibility of L4-based Linux OS vs Mach-based Linux and native Linux • L4 has even more IPC optimizations than L3! CS533 - Concepts of Operating Systems

  7. L4Linux – Design & Implementation • Fully binary compliant with Linux/X86 • Restricted modifications to architecture-dependent part of Linux • No Linux-specific modifications to L4 kernel CS533 - Concepts of Operating Systems

  8. Experiment • What is the penalty of using L4Linux? Compare L4Linux to native Linux • Does the performance of the underlying micro-kernel matter? Compare L4Linux to MkLinux • Does co-location improve performance? Compare L4Linux to an in-kernel version of MkLinux CS533 - Concepts of Operating Systems

  9. Microbenchmarks • measured system call overhead on shortest system call “getpid()” CS533 - Concepts of Operating Systems

  10. Microbenchmarks (cont.) • Measures specific system calls to determine basic performance. CS533 - Concepts of Operating Systems

  11. Macrobenchmarks • measured time to recompile Linux server CS533 - Concepts of Operating Systems

  12. Macrobenchmarks (cont.) • Next use a commercial test suite to simulate a system under full load. CS533 - Concepts of Operating Systems

  13. Performance Analysis • L4Linux is, on average 8.3% slower than native Linux. Only 6.8% slower at maximum load. • MkLinux: 49% average, 60% at maximum. • Co-located MkLinux: 29% average, 37% at maximum. CS533 - Concepts of Operating Systems

  14. Conclusion? • Can hardware-based protection be made to work efficiently enough? • Did these experiments explore the cost of “fine grained” protection? CS533 - Concepts of Operating Systems

  15. Spare Slides CS533 - Concepts of Operating Systems

  16. The IPC Dilemma • IPC is very import in μ-kernel design • Increases modularity, flexibility, security and scalability. • Past implementations have been inefficient. • Message transfer takes 50 - 500μs. CS533 - Concepts of Operating Systems

  17. The L3 (μ-kernel based) OS • A task consists of: • Threads • Communicate via messages that consist of strings and/or memory objects. • Dataspaces • Memory objects. • Address space • Where dataspaces are mapped. CS533 - Concepts of Operating Systems

  18. Redesign Principles • IPC performance is the Master. • All design decisions require a performance discussion. • If something performs poorly, look for new techniques. • Synergetic effects have to be taken into considerations. • The design has to cover all levels from architecture down to coding. • The design has to be made on a concrete basis. • The design has to aim at a concrete performance goal. CS533 - Concepts of Operating Systems

  19. Achievable Performance • A simple scenario • Thread A sends a null message to thread B • Minimum of 172 cycles • Will aim at 350 cycles (7 μs) • Will actually achieve 250 cycles (5 μs) CS533 - Concepts of Operating Systems

  20. Levels of the redesign • Architectural • System Calls, Messages, Direct Transfer, Strict Process Orientation, Control Blocks. • Algorithmic • Thread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy Scheduling, Direct Process Switch, Short Messages. • Interface • Unnecessary Copies, Parameter passing. • Coding • Cache Misses, TLB Misses, Segment Registers, General Registers, Jumps and Checks, Process Switch. CS533 - Concepts of Operating Systems

  21. Architectural Level • System Calls • Expensive! So, require as few as possible. • Implement two calls: • Call • Reply & Receive Next • Combines sending an outgoing message with waiting for an incoming message. • Schedulers can handle replies the same as requests. CS533 - Concepts of Operating Systems

  22. A Complex Message Messages • Complex Messages: • Direct String, Indirect Strings (optional) • Memory Objects • Used to combine sends if no reply is needed. • Can transfer values directly from sender’s variable to receiver’s variables. CS533 - Concepts of Operating Systems

  23. User A Kernel User B Direct Transfer • Each address space has a fixed kernel accessible part. • Messages transferred via the kernel part • User A space -> Kernel -> User B space • Requires 2 copies. • Larger Messages lead to higher costs CS533 - Concepts of Operating Systems

  24. Shared User Level memory (LRPC, SRC RPC) • Security can be penetrated. • Cannot check message’s legality. • Long messages -> address space becoming a critical resource. • Explicit opening of communication channels. • Not application friendly. CS533 - Concepts of Operating Systems

  25. User A Kernel User B Temporary Mapping • L3 uses a Communication Window • Only kernel accessible, and exists per address space. • Target region is temporarily mapped there. • Then the message is copied to the communication window and ends up in the correct place in the target address space. CS533 - Concepts of Operating Systems

  26. Temporary Mapping • Must be fast! • 2 level page table only requires one word to be copied. • pdir A -> pdir B • TLB must be clean of entries relating to the use of the communication window by other operations. • One thread • TLB is always “window clean”. • Multiple threads • Interrupts – TLB is flushed • Thread switch – Invalidate Communication window entries. CS533 - Concepts of Operating Systems

  27. Strict Process Orientation • Kernel mode handled in same way as User mode • One kernel stack per thread • May lead to a large number of stacks • Minor problem if stacks are objects in virtual memory CS533 - Concepts of Operating Systems

  28. User area Kernel area tcb Kernel stack Thread Control Blocks (tcb’s) • Hold kernel, hardware, and thread-specific data. • Stored in a virtual array in shared kernel space. CS533 - Concepts of Operating Systems

  29. Tcb Benefits • Fast tcb access • Saves 3 TLB misses per IPC • Threads can be locked by unmapping the tcb • Helps make thread persistent • IPC independent from memory management CS533 - Concepts of Operating Systems

  30. Algorithmic Level • Thread ID’s • L3 uses a 64 bit unique identifier (uid) containing the thread number. • Tcb address is easily obtained • anding the lower 32 bits with a bit mask and adding the tcb base address. • Virtual Queues • Busy queue, present queue, polling-me queue. • Unmapping the tcb includes removal from queues • Prevents page faults from parsing/adding/deleting from the queues. CS533 - Concepts of Operating Systems

  31. Algorithmic Level • Timeouts and Wakeups • Operation fails if message transfer has not started t ms after invoking it. • Kept in n unordered wakeup lists. • A new thread’s tcb is linked into the list τ mod n. • Thread with wakeups far away are kept in a long time wakeup list and reinserted into the normal lists when time approaches. • Scheduler will only have to check k/n entries per clock interrupt. • Usually costs less the 4% of ipc time. CS533 - Concepts of Operating Systems

  32. Algorithmic Level • Lazy Scheduling • Only a thread state variable is changed (ready/waiting). • Deletion from queues happens when queues are parsed. • Reduces delete operations. • Reduces insert operations when a thread needs to be inserted that hasn’t been deleted yet. CS533 - Concepts of Operating Systems

  33. Algorithmic Level • Short messages via registers • Register transfers are fast • 50-80% of messages ≥ 8 bytes • Up to 8 byte messages can be transferred by registers with a decent performance gain. • May not pay off for other processors. CS533 - Concepts of Operating Systems

  34. Interface Level • Unnecessary Copies • Message objects grouped by types • Send/receive buffers structured in the same way • Use same variable for sending and receiving • Avoid unnecessary copies • Parameter Passing • Use registers whenever possible. • Far more efficient • Give compilers better opportunities to optimize code. CS533 - Concepts of Operating Systems

  35. Code Level • Cache Misses • Cache line fill sequence should match the usual data access sequence. • TLB Misses • Try and pack in one page: • Ipc related kernel code • Processor internal tables • Start/end of Larger tables • Most heavily used entries CS533 - Concepts of Operating Systems

  36. Coding Level • Registers • Segment register loading is expensive. • One flat segment coving the complete address space. • On entry, kernel checks if registers contain the flat descriptor. • Guarantees they contain it when returning to user level. • Jumps and Check • Basic code blocks should be arranged so that as few jumps are taken as possible. • Process switch • Save/restore of stack pointer and address space only invoked when really necessary. CS533 - Concepts of Operating Systems

  37. L4 Slides CS533 - Concepts of Operating Systems

  38. Introduction • μ-kernels have reputation for being too slow,inflexible • Can 2nd generation μ-kernel (L4) overcome limitations? • Experiment: • Port Linux to run on L4 (Mach 3.0) • Compared to native Linux, MkLinux (Linux on 1st gen Mach derived μ-kernel) CS533 - Concepts of Operating Systems

  39. Introduction (cont.) • Test speed of standard OS personality on top of fast μ-kernel: Linux implemented on L4 • Test extensibility of system: • pipe-based communication implemented directly on μ-kernel • mapping-related OS extensions implemented as user tasks • user-level real-time memory management implemented • Test if L4 abstractions independent of platform CS533 - Concepts of Operating Systems

  40. L4 Essentials • Based on threads and address spaces • Recursive construction of address spaces by user-level servers • Initial address space σ0 represents physical memory • Basic operations: granting, mapping, and unmapping. • Owner of address space can grant or map page to another address space • All address spaces maintained by user-level servers (pagers) CS533 - Concepts of Operating Systems

  41. L4Linux – Design & Implementation • Fully binary compliant with Linux/X86 • Restricted modifications to architecture-dependent part of Linux • No Linux-specific modifications to L4 kernel CS533 - Concepts of Operating Systems

  42. L4Linux – Design & Implementation • Address Spaces • Initial address space σ0 represents physical memory • Basic operations: granting, mapping, and unmapping. • L4 uses “flexpages”: logical memory ranging from one physical page up to a complete address space. • An invoker can only map and unmap pages that have been mapped into its own address space CS533 - Concepts of Operating Systems

  43. L4Linux – Design & Implementation CS533 - Concepts of Operating Systems

  44. L4Linux – Design & Implementation • Address Spaces (cont.) • I/O ports are parts of address spaces. • Hardware interrupts are handled by user-level processes. The L4 kernel will send a message via IPC. CS533 - Concepts of Operating Systems

  45. L4Linux – Design & Implementation • The Linux server • L4Linux will use a single-server approach. • A single Linux server will run on top of L4, multiplexing a single thread for system calls and page faults. • The Linux server maps physical memory into its address space, and acts as the pager for any user processes it creates. • The Server cannot directly access the hardware page tables, and must maintain logical pages in its own address space. CS533 - Concepts of Operating Systems

  46. L4Linux – Design & Implementation • Interrupt Handling • All interrupt handlers are mapped to messages. • The Linux server contains threads that do nothing but wait for interrupt messages. • Interrupt threads have a higher priority than the main thread. CS533 - Concepts of Operating Systems

  47. L4Linux – Design & Implementation • User Processes • Each different user process is implemented as a different L4 task: Has its own address space and threads. • The Linux Server is the pager for these processes. Any fault by the user-level processes is sent by RPC from the L4 kernel to the Server. CS533 - Concepts of Operating Systems

  48. L4Linux – Design & Implementation • System Calls • Three system call interfaces: • A modified version of libc.so that uses L4 primitives. • A modified version of libc.a • A user-level exception handler (trampoline) calls the corresponding routine in the modified shared library. • The first two options are the fastest. The third is maintained for compatibility. CS533 - Concepts of Operating Systems

  49. L4Linux – Design & Implementation • Signalling • Each user-level process has an additional thread for signal handling. • Main server thread sends a message for the signal handling thread, telling the user thread to save it’s state and enter Linux CS533 - Concepts of Operating Systems

  50. L4Linux – Design & Implementation • Scheduling • All thread scheduling is down by the L4 kernel • The Linux server’s schedule() routine is only used for multiplexing it’s single thread. • After each system call, if no other system call is pending, it simply resumes the user process thread and sleeps. CS533 - Concepts of Operating Systems

More Related