1 / 37

Supporting High Performance I/O with Effective Caching and Prefetching

Supporting High Performance I/O with Effective Caching and Prefetching. Song Jiang The ECE Department Wayne State University. Disk I/O performance is critical to data-intensive applications. Data-intensive applications demand efficient I/O support . Database; Multimedia;

tuan
Download Presentation

Supporting High Performance I/O with Effective Caching and Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting High Performance I/O with Effective Caching and Prefetching Song Jiang The ECE Department Wayne State University

  2. Disk I/O performance is critical to data-intensive applications • Data-intensive applications demand efficient I/O support. • Database; • Multimedia; • Scientific simulations; • Other desktop applications. • Hard disk is the most cost-effective online storage device. • Workstations; • Distributed systems; • Parallel systems.

  3. The disks in 2000 are more than 57 times “SLOWER” than their ancestors in 1980. Unbalancedsystem improvements Bryant and O’Hallaron, “Computer Systems: A Programmer’s Perspective”, Prentice Hall, 2003.

  4. Disk performance and disk arm seeks • Disk performance is limited by mechanical constraints; • Seek • Rotation • Disk seek is the Achilles’ heel of disk access performance; • Access of sequential blocks is much faster.

  5. An example: IBM Ultrastar 18ZX specification * Our goal: to increase the sequentiality for a larger transfer rate. Seq. Read: 4,700 IO/s Rand. Read: < 200 IO/s * Taken from IBM “ULTRASTAR 9LZX/18ZX Hardware/Functional Specification” Version 2.4

  6. Towards peak disk transfer rate

  7. Random accesses are common • Workloads on popular operating systems • UNIX: most accessed files are short in length (80% are smaller than 26 Kbytes) [Ousterhout 1991]; • Windows NT: 40% of I/O operations are to files shorter than 2KBytes [Vogels 1999]. • Scientific computing • The SIO study: in many applications the majority of requests are for a small amount of data (less than a few Kbytes) [Reed 1997]; • The CHARISMA study: large, regular data structures are distributed among processes with interleaved accesses of shared files [Kotz 1996].

  8. Improving disk performance by reducing random accesses Two approaches: • Aggregate small random accesses into large sequential accesses; • Hide random accesses. My objective:improve access sequentiality in a general-purpose OS without any aid from programs. Software levels: • Applications: Some database systems; • Run-time systems: Collective-I/O; • Operating systems: TIP Informed prefetching; • Disk scheduler: CSCAN, SSTF.

  9. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Sequence-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Summary.

  10. Critical role of buffer cache in OS Application I/O Requests • Buffer cache can significantly influence request streams to disk. • Only unsatisfied requests are passed to disk; • Prefetching requests are generated to disk. • The opportunity for improving access sequentiality is not well exploited. • Buffer cache ignores disk layout information; • Buffer cache is an agent between two sides; • Only the behavior of one side (application) is considered. • The effectiveness of the I/O scheduler relies on the output of buffer cache. Buffer cache (caching & prefetching) I/O Scheduler Disk Driver disk

  11. Average access time = • (1-miss_ratio)*hit_time+miss_ratio*miss_penalty • Sequentially accessed blocks small miss penalty • Randomly accessed blocks large miss penalty Hard Disk Drive A C Disk Tracks X1 X2 X3 X4 D B Problems of caching without disk layout information • Minimizing cache miss ratio by only exploiting temporal locality; • Avg. access time = (1-miss_ratilo)*hit_time + miss_ratio*miss_penalty • Small number of misses could mean long access time. • Should minimize access time by also considering on-disk data layout. • Distinguish sequentially and randomly accessed blocks; • Give “expensive” random blocks a high caching priority to suppress future random accesses; • Disk accesses become more sequential.

  12. D File X C X1 File Y X2 Z2 Y1 Y2 Z1 A File Z B File R Hard Disk Drive Problems of prefetching without disk layout information Disk Tracks • Conducting prefetching only on file abstraction. • Working only inside individual file. • Performance constraints due to prefetching on logical file abstraction. • Sequential accesses spanning multiple small files cannot be detected. • Sequentiality at logic abstraction may not correspond to sequentiality on physical disk. • Previously detected sequences of blocks are not remembered. • Metadata blocks cannot be prefetched. • Should conduct prefetching directly on data disk layout.

  13. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Sequence-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Summary.

  14. Using Disk layout information in buffer cache management • What disk layout information to use? • Logical block number (LBN): logical disk geometry provided by firmware; • Accesses of contiguous LBNs have a performance close to accesses of contiguous blocks on disk; • The LBN interface is highly portable across platforms. • How to efficiently manage the disk layout information? • Currently LBN is only used to identify disk locations for read/write; • To track access times of disk blocks and search for access sequences via LBNs; • Disk block table: a data structure for efficient disk block tracking.

  15. time1 time2 Timestamps 10 7 0 7 20 LBN:5140 = 0*5122+ 10*512 + 20 1 ^ 1 7 Diskblock table: a new data structure for tracking disk blocks = 8 = 9 = 10 = 7 LBN : Block 8 10 N1 N2 N3 N4 10 8 1 7 9 9

  16. DiskSeen: a two-side-oriented buffer cache scheme Buffer Cache Caching area Prefetching area Destaging area Disk Block transfers between areas On-demand read Asynchronous prefetching Delayed write-back

  17. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Sequence-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Summary.

  18. readahead window Current window Sequence-based prefetching Timestamp LBN • Prefetching is based on LBN; • The algorithm is similar to the Linux 2.6.11 readahead algorithm: • When 32 contiguous blocks are accessed, create 16-block current window and 32-block readahead window; • The two windows are shifted forward while sequential accesses are detected.

  19. 43515 63110 63200 63290 22000 43501 43510 52000 N/A 35000 43500 37000 • Prefetching: • Spatial window size: 32 [ Block: 5 … (5+32) ] • Temporal window size: 256 [ timestamp: 43515 … 43515+256 ] History-aware prefetching Current on-going trail 85000 85001 74000 85010 85011 63110 63111 63200 63290 43515 43501 43510 43550 43500 20000 20001 N/A N/A 34950 History trail B1 B2 B3 B4 B5 71211 63370 73560 23531 64300 … … 51300 43520 43520 13540 52405 43521 43521 23531 N/A …… 43550 43550 21600 N/A N/A B37 B8 B7 B6

  20. Spatial window size readahead window Temporal window size History-aware prefetching (Cont’d) Timestamp current window LBN • Two windows are shifted forward to mask disk times; • To set parameters for next readahead window: • Increase spatial window size if many prefetched blocks are requested. • Increase temporal window size if only a small number of blocks are prefetchable;

  21. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Disk-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Summary.

  22. Miss-penalty-aware management of the destaging area • Two FIFO queues: on-demand blocks in on-demand queue, prefetched blocks in prefetch queue; • Memory allocation between the queues determines block replacement; • Benefit with addition of a segment = miss penalty in the ghost segment. Ghost segment On-demand queue Prefetch queue

  23. Summary: the DiskSeen scheme • Using block table, disk layout information is easily examined; • Prefetching works directly on disk layout and is history-aware; • Block replacement is miss-penalty aware. Buffer Cache Caching area Prefetching area Destaging area

  24. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Disk-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Future work.

  25. The DiskSeen prototype in Linux 2.6.11 • Prefetch blocks directly on block device; • Blocks without disk mappings are treated as random blocks; • The size of segment for memory allocation is 128KB; • Intel P4 3.0GHz processor, a 512MB memory, and Western Digital hard disk of 7200 RPM and 160GB; • The file system is Ext3.

  26. Benchmarks • grep: search a collection of files for lines containing a match to a given regular expression. • CVS: a versioning control system; • TPC-H: a decision support benchmark; • BLAST: a tool searching multiple databases to match nucleotide or protein sequences.

  27. 74% 25% Execution times of benchmarks DiskSeen (2nd run) DiskSeen (1st run)

  28. Disk head movements (CVS) CVS repository Working directory Prefetching W/O DiskSeen With DiskSeen

  29. Disk head movement (grep) W/O DiskSeen With DiskSeen

  30. Disk head movement (grep) data blocks Inode blocks W/O DiskSeen with DiskSeen

  31. Disk head movement (grep) Prefetching W/O DiskSeen with DiskSeen

  32. Outline • Inadequacy of current buffer cache management in OS; • Managing disk layout information; • Disk-based and history-aware prefetching; • Miss-penalty aware caching; • Performance evaluation in a Linux implementation; • Related work; • Summary.

  33. Related work • Caching • Numerous replacement algorithms have been proposed: LRU, FBR [Robinson, 90], LRU-K [O’Neil, 93], 2Q[Johnson, 94],LRFU[Lee, 99],EELRU [Smaragdakis, 99],MQ[Zhou, 01],LIRS[Jiang, 02],ARC[Modha, 03],CLOCK-pro [JIang, 05] ….. • They all aim to only minimize the number of misses, instead of disk times due to misses.

  34. Related work (Con’t) • Prefetching • Based on complex models • probability graph model [Vellanki, 99] • Information-theoretic Lempel-Ziv algorithm [Curewitz, 93] • Time series model [Tran, 01]  Commercial systems have rarely used sophisticated prediction schemes.

  35. Related work (Con’t) • Prefetching • Hint-based prefetching ([Patterson 95, Cao 96, Brown 01]) • Across-file prefetching ([Griffioen 94, Kroeger 96, Kroeger 01]) • Using file as prefetching unit; • High overhead and inconsistent performance; • Unsuitable for general-purpose OS; • Applicable only in application-level prefetching such as web proxy/server file prefetching [Chen 03].

  36. Related work (Con’t) • Improving disk placement • Cluster related data and metadata into the same cylinder group [Mckusick 84, Ganger 97]; • Reorganize disk block placement [Black 91, Hsu 03]; • Dynamically create block replicas [Huang 05]; • DiskSeen is more flexible and responsive to changing access patterns.

  37. Summary • Disk performance is constrained by random accesses; • The buffer cache plays a critical role in reducing random accesses; • Missing data disk layout information in buffer cache limits its performance; • DiskSeen improves disk performance by considering both temporal and spatial locality. • A Linux kernel implementation shows significant performance improvements for various types of workloads.

More Related