1 / 26

Automatic Compaction of OS Kernel Code via On-Demand Code Loading

Automatic Compaction of OS Kernel Code via On-Demand Code Loading. Haifeng He, Saumya Debray, Gregory Andrews. The University of Arizona. Background. Desktop. General Purpose Operating Systems. Resource constraints Limited amount of memory . Embedded Devices.

callum
Download Presentation

Automatic Compaction of OS Kernel Code via On-Demand Code Loading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Compaction of OS Kernel Code via On-Demand Code Loading Haifeng He, Saumya Debray, Gregory Andrews The University of Arizona

  2. Background Desktop General Purpose Operating Systems • Resource constraints • Limited amount of memory Embedded Devices Reduce memory footprint of OS kernel code as much as possible

  3. General OS with Embedded Apps. A Linux kernel with minimal configuration Profiling with MiBench suite Unexecuted but still can’t be discarded Executed • Needed (exception handling) • Not needed but missed by existing analysis 32% 18%-24% Statically proved as unnecessary by prior work About 68% kernel code is not executed

  4. Our Approach Memory Hierarchy Kernel Code Limited amount of main memory lives in memory Hot code On-Demand Code Loading Greater amount of secondary storage Cold code lives in secondary storage

  5. Code clustering A Big Picture Secondary Storage Main Memory Scheduler Memory management Interrupt handling Remaining kernel code Core code Memory-resident kernel code Hot code Code buffer Accommodate one cluster at a time size(cluser)  size(code buffer)

  6. Memory Requirement for Kernel Code Main Memory Size is predetermined The total size of memory-resident code size(core code)x(1 + ) where specified by user (e.g. 0%,10%) Core code Hot code Select the most frequently executed code How much hot code should stay in memory? Code buffer Size specified by user Upper-bound of memory usage for kernel code

  7. Our Approach • Reminiscent of the old idea of overlays • Purely software-based approach • Does not require MMU or OSs support for VM • Main steps • Apply clustering to whole-program control flow graph • Group “related” code together • Reduce cost of code loading • Transform kernel code to support overlays • Modify control flow edges

  8. Code Clustering • Objective • minimize the number of code loading • Given: • An edge-weighted whole-program control flow graph • A list of functions marked as core code • A growth bound  for memory-resident code • Code buffer size BufSz • Apply a greedy node-coalescing algorithm until no coalescing can be carried out without violating • Size of memory-resident code  size(core code)x(1+ ) • Size of each cluster BufSz

  9. Code Transformation • Apply code transformation on • Inter-cluster control flow edges • Control flow edges from memory-resident code to clusters (but not needed on the other way) • All indirect control flow edges (targets only known at runtime)

  10. Code Transformation After clustering Rewritten code Cluster A Cluster A call F push &F call dyn_loader Runtime library dyn_loader 1. Address look up for &F 2. Load B into code buffer 3. Translate target addr &F into relative addr in code buffer Cluster B Cluster B (in code buffer) 0x200 0x500 0x220 F: 0x520 F:

  11. pc Issue: Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A 0x500 … push &F 0x530 call dyn_loader 0x540 Cluster A return address = 0x540 0x200 … 0x220 F: … 0x250 ret Cluster B

  12. pc pc pc Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A 0x500 … 0x520 F: … 0x540 0x550 ret Cluster B return address = 0x540 0x200 … 0x220 F: … 0x250 ret Cluster B Load B into code buffer A has been overwritten by B!

  13. pc Issue: Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A 0x500 … push &F 0x530 call dyn_loader 0x540 Cluster A return address = 0x540 0x200 … 0x220 F: … 0x250 ret Cluster B

  14. pc Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A 0x500 … push &F 0x530 call dyn_loader 0x540 Cluster A Fix return address = 0x540 = &dyn_restore_A 0x200 … 0x220 F: … 0x250 ret Cluster B dyn_restore_A Actual ret_addr = 0x140

  15. pc pc Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A 0x500 … 0x520 F: … 0x540 0x550 ret Cluster B return address = &dyn_restore_A 0x100 … 0x220 F: … 0x250 ret Cluster B Load B into code buffer dyn_restore_A Actual ret_addr = 0x140

  16. pc Call Return in Code Buffer Code Runtime code buffer : start at 0x500 0x100 … push &F 0x130 call dyn_loader 0x140 Cluster A restore 0x500 … push &F 0x530 call dyn_loader 0x540 Cluster A return address = &dyn_restore_A 0x100 … 0x220 F: … 0x250 ret Cluster B dyn_restore_A Actual ret_addr = 0x140

  17. Context Switches and Interrupts • Context switches • Interrupt • Currently keep interrupt handlers in main memory Remember A in Thread 1 task_struct Execute cluster A in code buffer Continue executing. in code buffer Thread 1 context switches context switches Execute. May change code buffer Thread 2 Reload A into code buffer Time

  18. Experimental Setup • Start with a minimally configured kernel (Linux 2.4.31) • Compile the kernel with optimization for code size (gcc –Os) • Original code size: 590KB • Implemented using binary rewriting tool PLTO • Benchmarks: MiBench, MediaBench, httpd

  19. Memory Usage Reduction for Kernel Code Code buffer size = 2KB Reduction decreases because amount of memory-resident code increases 

  20. Estimated Cost of Code Loading • All experiments were run in desktop environment • We estimated the cost of code loading as follows: • Choose Micron NAND flash memory as an example (2KB page, takes to read a page) • Est. Cost =

  21. Overhead of Code Loading 57% memory reduction 56% memory reduction 55% memory reduction Unmodified Kernel 

  22. Related Work • Code compaction of OS kernel • D. Chanet et al. LCTES 05 • H. He et al. CGO 07 • Reduce memory requirement in embedded system • C. Park et al. EMSOFT 04 • H. Park et al. DATE 06 • B. Egger et al. CASE 06, EMSOFT 06 • Binary rewriting of OS kernel • Flower et al. FDDO-4

  23. Conclusions • Embedded devices typically have a limited amount of memory • General-purpose OS kernels contain lots of code that is not executed in an embedded context • Reduce the memory requirement of OS kernel by using an on-demand code overlay mechanism • Memory requirements reduced significantly with little degradation in performance

  24. Estimated Cost of Code Loading

  25. A Big Picture Main Memory Cold code Memory- resident kernel code Scheduler Memory management Interrupt handling Core code Hot code Reuse code buffer Code buffer Accommodate one cluster at a time Code clustering

  26. Memory Requirement for Kernel Code Need to be in memory Size is predetermined How much hot code should stay in memory? Hot code Core code Select the most frequently executed code Keep the total size of memory-resident code size(core code)x(1 + ) where specified by user (0%,10%) Code buffer Size specified by user (we chose 2KB) Upper-bound of memory usage for kernel code

More Related