html5-img
1 / 43

Virtual Memory and I/O

Virtual Memory and I/O. Mingsheng Hong. I/O Systems . Major I/O Hardware Hard disks, network adaptors … Problems related with I/O Systems Various types of Hardware – device drivers to provide OS with a unified I/O interface

kesler
Download Presentation

Virtual Memory and I/O

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Memory and I/O Mingsheng Hong

  2. I/O Systems • Major I/O Hardware • Hard disks, network adaptors … • Problems related with I/O Systems • Various types of Hardware – device drivers to provide OS with a unified I/O interface • Typically much slower than CPU and memory speed – system bottleneck • Too much CPU involvement in I/O operations

  3. Techniques to Improve I/O Performance • Buffering • e.g. download a file from network • DMA • Caching • CPU cache, TLB, file cache..

  4. Other Techniques to Improve I/O Performance • Virtual Memory Page Remapping (IO-Lite) • Allows (cached) files and memory to be shared by different processes without extra data copy • Prefetching Data (Software Pretching and Caching for TLBs) • Prefetches and caches page table entries

  5. Summary of First Paper • IO-Lite: A Unified I/O Buffering and Caching System (Pai et al. Best Paper of 3rd OSDI, 1999) • A unified I/O System • Uses immutable data buffers to store all I/O data (only one physical copy) • Uses VM page remapping • IPC • file system (disk files, file cache) • network subsystem

  6. Summary of Second Paper • Software Prefetching and Caching for Translation Lookaside buffers (Bala et al. 1994) • A software approach to help reduce TLB misses • Works well for IPC-intensive systems • Bigger performance gain for future systems

  7. Features of IO-Lite • Eliminates redundant data copying • Saves CPU work & avoids cache pollution • Eliminates Multiple buffering • Saves main memory => improves hit rate of file cache • Enables cross-subsystem optimizations • Cache Internet checksum • Supports application-specific cache replacement policies

  8. Related work before IO-Lite • I/O APIs should preserve copy semantics • Memory-mapped files • Copy On Write • Fbufs

  9. Key Data Structures • Immutable Buffers and Buffer Aggregates

  10. DiscussionI • When we pass a buffer aggregate from process A to process B, how to efficiently do VM page remapping (modify B’s page table entries)? • Possible Approach 1: find any empty entry, and modify the VM address contained in buffer aggregate • Very inefficient • Possible Approach 2: reserve the range of virtual addresses of buffers in the address space of each process • Basically limited the total size of buffers – How about dynamically allocated buffers?

  11. Impact of Immutable I/O Buffers • Copy-On-Write Optimization • Modified values are stored in a new buffer, as opposed to “in-place modification” • Three situations when the data object is … • Completely modified • Allocates a new buffer • Partially modified (modification localized) • Chains unmodified and modified portions of data • Partially modified (modification not localized) • Compares the cost of writing an entire object with that of chaining; chooses the cheaper method

  12. DiscussionII • How to measure the two costs? • Heuristics needed • Fragmented data v.s. clustered data • Chained data increase reading cost • Similar to shadow page technique used in System R • Should the cost of retrieving data from buffer also be considered?

  13. What does IO-Lite do? • Reduces extra data copy in • IPC • file system (disk files, file cache) • network subsystem • Makes possible cross-subsystem optimization

  14. IO-Lite and IPC • Operations on Buffers & Aggregates • When I/O data is transferred • Pass related aggregates by value • Associated buffers are passed by reference • When buffer is deallocated • Buffer returned to a memory pool • Buffer’s VM page mappings persist • When buffer is reused (by the same process) • No further VM map changes required • (Temporarily) grant write permission to associated producer process

  15. Io-Lite and Filesystem • IO-Lite I/O APIs Provided • IOL_read(int fd, IOL_Agg **aggr, size_t size) • IOL_write(int fd, IOL_Agg **aggr) • IOL_write operations are atomic – concurrency support • I/O functions in stdio library reimplemented • Filesystem cache reorganized • Buffer aggregates (pointers to data), instead of file data, are stored in cache • Copy Semantics ensured • Suppose a portion of a cached file is read, and then is overwritten

  16. Copy Semantics Illustration 1

  17. Copy Semantics Illustration 2

  18. Copy Semantics Illustration 3

  19. More on File Cache Management & VM Paging • Cache replacement policy (can be customized) • The eviction order is by current reference status & time of last file access • Evict one entry when the file cache “appears” to be too large • Added one entry on every file cache miss • When a buffer page is paged out, data will be written back to swap space, and possibly to several other disk locations (for different files)

  20. IO-Lite and Network Subsystem • Access control and protection for processes • ACL related with buffer pools • Must determine the ACL of a data object prior to allocating memory for it • Early demultiplexing technique to determine ACL for each incoming packet

  21. A Cross-Subsystem Optimization • Internet checksum caching • Cache the computed checksum for each slice of a buffer aggregate • Increment the version number when buffer is reallocated – can be used to check whether data changed • Works well for static files. Also has a big benefit on the CGI programs that chain dynamic data with static data

  22. Performance – Competitors • Flash Web server – a high performance HTTP server • Flash-Lite – A modified version of Flash using IO-Lite API • Apache 1.3.1 – representing the widely used Web server today

  23. Performance – Static Content requesting

  24. Performance – CGI Programs

  25. Performance – Real Workload • Average request size: 17KBytes

  26. Performance – WAN Effects • Memory for buffers = # clients * Tss

  27. Performance – Other Applications

  28. Conclusion on I/O-Lite • A unified framework of I/O subsystems • Impressive performance in Web applications due to copy-avoidance & checksum caching

  29. Software Prefetching & Caching for TLBs • Prefetching & Caching • Never applied to TLB misses in a software approach • Improves overall performance by up to 3% • But has a great potential on newer architectures • Clock Speed: 40MHz => 200 MHz

  30. Issues in Virtual Memory • User Address Space is typically huge … • TLB to cache page tables • Software support to help reduce TLB misses

  31. Motivations • TLB misses occur more frequently in Microkernel-based OS • RISC computers handle TLB misses in software (trap) • IPCs have a bigger impact on system performance

  32. Approach • Use a software approach to prefetch and cache TLB entries • Experiments done on MIPS R3000-based (RISC) architecture with Mach 3.0 • Applications chosen from standard benchmarks, as well as a synthetic IPC-intensive benchmark

  33. Discussion • The way the authors motivate their paper • A right approach for a particular type of system • A valid Argument for future computer systems regarding performance gain • Figures of experimental results mostly showing the reduced number of TLB misses, instead of overall performance improvement • A synthetic IPC-intensive application to support their approach

  34. Prefetching: What entries to prefetch? • L1U: user address spaces • L1K: kernel data structures • L2: user (L1U) page tables • Stack segments • Code segments • Data segments • L3: L1K and L2 page tables

  35. Prefetching: Details • On the first IPC call, probe hardware TLB on the PIC path and enter related TLB entries into PTLB • On Subsequent IPC calls, entries are prefetched into PTLB by a hashed lookup • Entries are stored in upmapped, cached physical memory

  36. Prefetching: Performance

  37. Prefetching: Performance Rate of TLB misses?

  38. Caching: Software Victim Cache • Use a region of unmapped, cached memory to cache entries evicted from hardware TLB • PTE lookup sequence: • hardware TLB • STLB • generic trap handler

  39. Caching: Benefits • A faster trap path for TLB misses • Avoids overhead of context switch • Eliminates (reduces?) cascaded TLB misses

  40. Caching: Performance Kernel TLB hit rates Average STLB penalties

  41. Caching: Performance

  42. Prefetching + Caching: Performance Worse than using PTLB alone! (Don’t understand the authors comment to justify it…)

  43. Discussion • SLTB (caching) is better than PLTB. So using it alone suffices. • Is it possible to improve the IPC performance using both VM page remapping and software prefetching & caching?

More Related