240 likes | 322 Views
Presenting a solution in IO-Lite for optimizing data buffering across subsystems, reducing redundancy, and improving server throughput. The system facilitates efficient sharing of buffered data among various applications, addressing issues like memory waste and high cache miss rates. Implemented through immutable buffers and buffer aggregates, IO-Lite offers a simple yet powerful solution that enhances performance without compromising data integrity. Experimental results show significant performance improvements, making it a valuable tool for enhancing I/O operations.
E N D
IO-Lite:A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching Professor Zhang (Summer 2005)
Outline • Problem & Significance • Literature Review • Proposed Solution • Design, Implementation, & Operation • Experimental Design • Results • Conclusion • Further Research
The Problem • The I/O subsystem and various applications all tend to use their own private I/O buffers • Redundant data copying • Multiple buffering • Lack of cross-subsystem optimization
Problem’s Significance • Wastes memory • Reduces space available for caching • Causes higher cache miss rates • High CPU overhead • Limits server throughput
Literature Review • POSIX I/O -Problem: • double-buffering • Memory-mapped files (mmap) -Problem: • Not generalized to network I/O
Literature Review • Transparent Copy Avoidance -Problem: • VM page alignment problems • Copy-on-write faults • Genie (emulated copy) • Lack of full transparency leads to same problems • Copy Avoidance with Handoff Semantics -Problem: • Lack of concurrent sharing reduces effectiveness
Literature Review • Fast buffers (fbufs) • Designed by Druschel -Problem: • Does not support filesystem access, or a file cache • Extensible kernels -Problem: • More overhead, not OS-portable
IO-Lite Solution • Unified buffering and caching • Allow all applications and subsystems share the same buffered I/O data • Very simple at face value, very complex to implement
Basic Design • Immutable buffers • Initial allocated data cannot be modified • Effectively read-only sharing Advantages? • Eliminates synchronization and protection problems Disadvantages? • I/O data cannot be modified in place
Further Design Considerations To make up for immutable buffers: • Create buffer aggregate abstraction (an ADT) • mutable • Reference to IO-Lite Window in VM • Aggregates contain ordered list of form <address, length> • Aggregates passed by value • Buffers passed by reference
Further Design Considerations • Buffer sharing must be concurrent • To achieve this, use similar method to fbufs • Expand to include the filesystem • Adapts for general purpose OS • Worst case scenario (in terms of overhead): • Page remapping • (when last buffer is allocated before first is deallocated)
IO-Lite Implementation • New read & write API which supersedes the regular read & write • size_t IOL_read(int fd, IOL_Agg **aggr, size_t size); • size_t IOL_write(int fd, IOL_Agg *aggr); • IOL_Agg is buffer aggregate data type • Both operations are atomic
IO-Lite Implementation • Applications: • Recommends implementation in runtime I/O Libraries to avoid modifying all programs • Filesystem: • File cache data structure: <file-id, offset, length> • Network: • Need to modify network device drivers to allow early demultiplexing (using a packet filter)
IO-Lite Operation • With regards to the cache: • Cache replacement basically LRU • Allows for application customization • Cache eviction controlled by VM daemon • Do >½ replaced pages contain I/O data?
IO-Lite Operation • Impact of immutable buffers: • Case 1: Entire object is modified • Lack of in-place modification has no ill effect • Case 2: Subset of object needs to be modified • Rather than recopy entire object, use chaining • Performance loss is small if blocks are localized • Case 3: Scattered subset needs modification • IO-Lite incorporates mmap interface for this
Experimental Design • Compared: • Apache 1.3.1 • Widely used web server • Flash (event-driven HTTP server) • Designed by authors in previous year • Flash-Lite (Flash modified to use IO-Lite API) • New design by authors
Experimental Design • General: varied requested file size • 40 requests for same file • File size ranged from 500 bytes – 200 Kbytes • Persistent connections • Reduces overhead • CGI • Additional I/O traditionally slows servers
Experimental Design • Real workloads • Shows performance benefits by allowing more space for caching • Based on Rice’s CSCI department logs • Wide Area Network (WAN) • Test throughput with 0-256 slow clients connecting • Applications • Incorporated API into UNIX programs
Results • General test: • Bandwidth increase of 43% over Flash, 137% over Apache • No real difference for files less than 5KBytes
Results • Persistent Connections • Flash-Lite even more effective at smaller file sizes • CGI • All servers slow, but Flash-Lite still much better • Real workload • Flash-Lite throughput 65% greater than Apache • WAN • Flash-Lite does not suffer from slow clients • Applications • Varied improvement for all programs tested
Conclusion • IO-Lite consistently improved performance in all contexts tested • Requires modification to numerous libraries and network device drivers • EG: see Peng, Sharma, & Chiueh (2003)
Further Research • There have been 42 citations • Almost all fell between 2001-2003 • Authors have not written any follow-ups • Lack of papers that involve implementation of IO-Lite or a variation of it • Probably because of complexity and number of modifications that are necessary
Appendix: Figures 2) 4) 5) 3) 6)