1 / 67

Storage Systems CSE 598d, Spring 2007

Storage Systems CSE 598d, Spring 2007. Lecture 11: Disk scheduling Feb 27, 2007 (ACK: Several slides borrowed from Shiva Chaitanya). Disk Access Time: Components. CPU time to issue and process I/O contention for controller contention for bus contention for memory

gaille
Download Presentation

Storage Systems CSE 598d, Spring 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage SystemsCSE 598d, Spring 2007 Lecture 11: Disk scheduling Feb 27, 2007 (ACK: Several slides borrowed from Shiva Chaitanya)

  2. Disk Access Time: Components • CPU time to issue and process I/O • contention for controller • contention for bus • contention for memory • verifying block correctness with checksums (retransmissions) • waiting in scheduling queues • ...

  3. Disk Scheduling Seek time is a dominant factor of total disk I/O time Let operating system or disk controller choose which request to serve next depending on the head’s current position and requested block’s position on disk Disk scheduling is much more difficult than CPU scheduling • a mechanical device – hard to determine (accurate) access times • disk accesses cannot be preempted – runs until it finishes • disk I/O often the main performance bottleneck

  4. Scheduling at Multiple Locations! S/W, H/W Components between an application and the disk: - File system - Device driver - SCSI bus - RAID controller (if employing RAID) - Some bus - Disk controller Why? • Why not do it only at FS/DD level? • Why not do it only within the disk? Scheduling locations

  5. Scheduling at Multiple Locations! Why? • Key ideas that disk scheduling employs: • Request re-ordering for seek/positioning minimization • Exploit temporal locality • Anticipation for sequential streams • Introduce non-work conserving behavior! • Exploit spatial locality • Coalesce consecutively placed requests • Free-block scheduling • Different optimizations are best done at different locations • Furthermore, the best location to do an optimization depends on the workload!

  6. Goals • Short response time • High overall throughput • Fairness (equal probability for all blocks to be accessed in the same time) Tradeoff: Throughput vs. Fairness Socialism vs. Capitalism?

  7. Disk Scheduling Several traditional algorithms • First-Come-First-Serve (FCFS) • Shortest Seek Time First (SSTF) • Shortest Positioning Time First (SPTF) • SCAN • C-SCAN • LOOK • C-LOOK • …

  8. cylinder number 1 5 10 15 20 25 time First–Come–First–Serve (FCFS) FCFS serves the first arriving request first: • Long seeks • Short average response time incoming requests (in order of arrival): 21 8 7 2 14 24 12

  9. cylinder number 1 5 10 15 20 25 time Shortest Seek Time First (SSTF) SSTF serves closest request first: • short seek times • longer maximum seek times – may lead to starvation incoming requests (in order of arrival): 24 21 8 7 2 14 12

  10. cylinder number 1 5 10 15 20 25 time SCAN SCAN moves head edge to edge and serves requests on the way: • bi-directional • compromise between response time and seek time optimizations incoming requests (in order of arrival): 14 21 24 7 12 2 8

  11. cylinder number 1 5 10 15 20 25 time C–SCAN Circular-SCAN moves head from edge to edge • serves requests on one way – uni-directional • improves response time (fairness) incoming requests (in order of arrival): 7 12 2 21 14 8 24

  12. cylinder number 1 5 10 15 20 25 time LOOK and C–LOOK LOOK (C-LOOK) is a variation of SCAN (C-SCAN): • same schedule as SCAN • does not run to the edges • stops and returns at outer- and innermost request • increased efficiency incoming requests (in order of arrival): 12 14 2 24 7 21 8

  13. V–SCAN(R) V-SCAN(R) combines SCAN (or LOOK) and SSTF • define a R-sized unidirectional SCAN window, i.e., C-SCAN, and use SSTF outside the window • Example: V-SCAN(0.6) • makes a C-SCAN (C-LOOK) window over 60 % of the cylinders • uses SSTF for requests outside the window • V-SCAN(0.0) equivalent with SSTF • V-SCAN(1.0) equivalent with SCAN • V-SCAN(0.2) is supposed to be an appropriate configuration

  14. Shortest Positioning Time First (SPTF) • Given the complete knowledge of the actual mapping of data blocks onto the media, the scheduler can choose the request with the minimum positioning delay (combined seek and rotational latency) • SPTF, like SSTF suffers from poor starvation resistance. To reduce response time variance, priority can be given to requests that have been in pending queue for excessive periods of time

  15. Aged Shortest Positioning Time First (ASPTF) • ASPTF(W) adjusts each positioning delay (Tpos) by subtracting a weighted value corresponding to the amount of time the request has been waiting for service (Twait) Teff = Tpos – (W*Twait) • For large values of W, ASPTF behaves like FCFS

  16. Scheduling in Modern Disk Drives Features of current disk drives that affect traditional scheduling algorithms • Host interface • Data layout • On-Board Cache Ref: B.L. Worthington, Greg Ganger, N. Patt : Scheduling Algorithms for Modern Disk Drives ACM Sigmetrics 1994

  17. Host interface • Controller presents a request to the disk drive in terms of the starting logical block number and request size • Subsequent media access hidden from the host • Scheduling entities outside of the drive have little knowledge of overhead delays

  18. Data Layout • Many systems assume sequentiality of LBN-to-PBN mappings in seek reducing algorithms • Aggressive algorithms require highly accurate knowledge of the data layout which is typically hidden • Complexity of mappings increased by zoned recording, track/cylinder skew and defect management

  19. On-Board Cache Memory within disk drives has progressed from small speed-matching buffers to megabytes of cache memory Disk logic typically prefetches data into cache to satisfy sequential read requests. This affects scheduling in two ways: • Position of the head cannot be determined easily • Requests that can be satisfied by cache could be given higher priority

  20. As expected, FCFS quickly saturates as workload increases SSTF provides lower mean response time Scheduling by Logical Block Number

  21. Scheduling by Logical Block Number FCFS has the lowest coefficient for lighter workloads As FCFS begins to saturate and its response time variance increases, C-LOOK emerges as a better algorithm for response time variance

  22. Scheduling with Full knowledge As W increases, the average response time slowly grows, though variance drops

  23. Scheduling with Full Knowledge

  24. Modern Disk Scheduling • In modern drives, C-LOOK best exploits the prefetching cache for workloads with significant read sequentiality • SSTF and LOOK perform better for random workloads • Powerful disk controllers use variants of Shortest Positioning Time First (SPTF).

  25. Freeblock Scheduling • An approach to utilizing more of a disk’s potential media bandwidth • Fill rotational latency periods with useful media transfers for background applications • It has been observed that 20-50% of a never-idle disk’s bandwidth can often be provided to background applications without affecting foreground response times Ref: Christopher R. Lumb, Jiri Schindler, Greg Ganger : “Towards Higher Disk Head Utilization: Extracting Free Bandwidth From Busy Disk Drives”, OSDI , 2000

  26. Disk-intensive background tasks • Disk Reorganization • File system cleaning • Backup • Prefetching • Write-back • Integrity Checking • RAID scrubbing • Virus detection • Index Reorganization • …

  27. Free Bandwidth • Time required for a disk media access Taccess = Tseek + Trotate + Ttransfer • Freeblock scheduling uses the Trotate component of disk access to transfer additional data • Instead of just waiting for desired sector to arrive, this technique transfers the intermediate sectors

  28. Steps in Freeblock Scheduling • Predict how much rotational latency will occur before the next foreground media transfer • Requires detailed knowledge of disk attributes, including layout algorithms and time dependent mechanical positioning overheads • Squeeze additional media transfers into that time • Get to the destination track in time for the foreground transfer

  29. Anticipatory Disk Scheduling Reorder available disk requests for • performance by seek optimization, • proportional resource allocation, etc. Any policy needs multiple outstanding requests to make good decisions! Ref: Sitaram Iyer, Peter Druschel : “Anticipatory scheduling : A disk scheduling framework to overcome deceptive idleness in synchronous I/O”, SOSP 2001

  30. With enough requests… issued by process A issued by process B time seek location on disk E.g., Throughput = 21 MB/s (IBM Deskstar disk)

  31. issued by process A issued by process B too late! forced! forced! With synchronous I/O… schedule E.g., Throughput = 5 MB/s Next

  32. Deceptive idleness Process A is about to issue next request. but Scheduler hastily assumes that process A has no further requests!

  33. Allocate disk service in say 1:2 ratio: Deceptive idleness causes 1:1 allocation: A B A B Proportional scheduler Next

  34. Anticipatory scheduling Key idea: Sometimes wait for process whose request was last serviced. Keeps disk idle for short intervals. But with informed decisions, this: • Improves throughput • Achieves desired proportions

  35. Cost-benefit analysis Balance expected benefits of waiting against cost of keeping disk idle. Tradeoffs sensitive to scheduling policy e.g., 1. seek optimizing scheduler 2. proportional scheduler

  36. Statistics For each process, measure: 1.Expected median and 95percentile thinktime 2. Expected positioning time Number of requests Thinktime Median 95percentile last next

  37. Benefit = best.positioning_time — next.positioning_time Cost = next.median_thinktime Waiting_duration = (Benefit > Cost) ? next.95percentile_thinktime : 0 Cost-benefit analysisfor seek optimizing scheduler best := best available request chosen by scheduler next := expected forthcoming request from process whose request was last serviced

  38. Proportional scheduler Costs and benefits are different. e.g., proportional scheduler: Wait for process whose request was last serviced, 1. if it has received less than its allocation, and 2. if it has think time below a threshold (e.g., 3ms) Waiting_duration = next.95percentile_thinktime

  39. Prefetch Overlaps computation with I/O. Side-effect: avoids deceptive idleness! • Application-driven • Kernel-driven

  40. Conclusion Anticipatory scheduling: • overcomes deceptive idleness • achieves significant performance improvement on real applications • achieves desired proportions • and is easy to implement!

  41. Fairness : Evaluating disk scheduling algorithms • Storage system designers prefer to keep the queue length at disks small regardless of the load • When queuing threshold is reached at the disk, the controller or the device driver queues the requests until disk queue is processed • Low queuing threshold minimizes request starvation at the disk level when unfair scheduling algorithms are deployed Ref: Alma Riska, Erik Riedel : “It’s not fair – evaluating efficient disk scheduling” , MASCOTS 2003

  42. Results • Queuing more requests at the disk provides the scheduling algorithms more information used for better disk resource utilization • Percentage of requests starved remains small even if longer queues build up at the disk • Overall request starvation is independent from the queuing threshold at the disk

  43. Storage subsytem architecture Queues at various levels Outstanding requests queued at disk and at device driver in a single disk system And, at the disks and the controller(s) in a multiple disk system

  44. Impact of queuing thresholds Average load of 64 outstanding requests in system Average load of 16 outstanding requests in system

  45. Response time distribution • Higher the load the larger the gap between the performances of different scheduling algorithms • Fair and simple FCFS yields longest average request response time • Best performance obtained when increasing the queue threshold under SPTF • How about request starvation and variability in the request response time ?

  46. Response time distribution Tail of response time distribution with average load of 16 outstanding requests and threshold of 8 Tail of response time distribution with average load of 16 outstanding requests and threshold of 16

  47. Observations .. • Majority of requests under FCFS exhibit long response times, while seek-reducing algorithms result in majority of short response times • More than 90% of requests under SPTF have shorter response times than FCFS and only 1% exhibit upto double the response times in FCFS • Amount of starvation in position-based scheduling algorithms for both queuing thresholds is the same relative to FCFS • Hence, queuing more requests improves disk performance without introducing more request starvation

  48. Scheduling at Device driver level • Depends on workload and filesystem layout • Eg, with SCAN, seek times to sectors in the middle of the disks are shorter • OS could choose between algorithms based on current queue • Likely to be expensive in CPU cycles • Queue changes as new requests arrive • SSTF or SCAN are reasonable defaults • Allow algorithm selection as part of OS tuning • FreeBSD: C-SCAN • Linux 2.2 :SCAN • Linux 2.6 : four different versions of elevator algorithm

  49. Discussion: Scheduling at Multiple Locations • Positioning-based optimizations best done within the disk • Seek-based optimizations best done at device driver • Why do scheduling within FS? • Device and DD independent • Aware of buffer cache • Application isolation • Disk queue length crucial • Short queue results in degraded throughput • Locally good but globally bad schedules • Long queue results in unfairness • Non-work conservation can improve fairness and throughout! • Anticipatory scheduling • Achieving proportional fairness non-trivial • Solutions based on hierarchy of queues, anticipatory scheduling can help • Request coalescing can result in great improvement in throughput • FS and device driver are good places • Improve the sequentiality of the request stream seen by the disk • Free-block scheduling can improve throughput • Can view this as a “corrector” for the non work conserving nature of disk

  50. Additional slides on free-block scheduling

More Related