1 / 15

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES. D. Colarelli D. Grunwald U. Colorado, Boulder. Highlights. Paper proposes To replace tape libraries by large non-redundant arrays of disks To cache on active drives Files that have been recently accessed Update logs for other files

sidneydelia
Download Presentation

MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MASSIVE ARRAYS OF IDLE DISKS FOR STORAGE ARCHIVES D. Colarelli D. Grunwald U. Colorado, Boulder

  2. Highlights • Paper proposes • To replace tape libraries by large non-redundant arrays of disks • To cache on active drives • Files that have been recently accessed • Update logs for other files • To keep other drives mostly inactive by spinning them down between accesses

  3. Introduction (I) • Robotic tape libraries are now the standard solution for archiving very large amounts of data • Disadvantages include • Slow access times:average search time of 41s for T9940 drives • Not much cheaper than disk drives • Could we replace tem by massive arrays of hard drives?

  4. Introduction (II) • Major limitation of hard drive solution is power consumption • Almost ten times that of equivalent tape library • Could power down disks that are not currently accessed • 50% of data are likely to be never accessed • 25% of data are likely to be accessed once

  5. Introduction (III) • Must be at least as reliable as tape libraries • No need to use a redundant scheme • Solution is Massive Array of Inactive Drives • Paper investigates design issues through trace-driven simulations

  6. Design Issues • Two major design decisions • Data migration or duplication (caching) • File system or block-level interface

  7. Migration would move “hot” data to active drives Migration uses disk space more efficiently Requires a mapor directory mechanism that maps the storage across all drives Caching would cache read data and act as a write log for write data Keeps two copies of all cached files Maps or directories are proportional to size of cache Migration or caching

  8. Could use file system information to cache entire files Would probably perform better Would require system modifications Would work with existing systems File system or block interface

  9. MAID with caching Passive drives(spin up/down) Active drives (always on) Passive Drive Manager Cache Manager Virtualization Manager

  10. Design choices (I) • Compared MAID-cache and MAID-no cache • MAID-cache • Caches read and writes on active drives • Caching unit is “chunk” of 64 sectors • Cache policy is LRU • All writes are placed in the cache write-log where they wait to be committed to the non-active (passive) drives

  11. Design choices (II) • Must always check write log before reading data from the cache or the passive drives • Passive drives remain on standby until • A cache miss occurs • The write log becomes too long • Return to standby when spin-down inactivity time limit is reached • Varying time limit is primary way to affect system performance and energy consumption

  12. Simulation parameters • Power management policy: • Always on • Fixed-delay spin-down • Adaptive spin-down • Data layout • Linear: keep successive blocks on same drive • Striped: the opposite • Caching/No caching

  13. Simulation results • Based on a supercomputer center workload • All MAID configurations achieve similar power consumptions • 15 to 16 % of that of always on configuration • MAID configurations w/o cache have average response times comparable to that of always on configuration • Workload had little locality

  14. Simulation results (II) • Average response times of MAID configurations with cache much worse than that of always on configuration • 0.680 to 0.720 s compared to 0.303 s • Striped configuration with fixed spin-down delay has lowest average response time of all MAID configurations • 0.309 s

  15. Conclusion • MAID can achieve average response times comparable to that of an always on configuration with a much lower power consumption IMPORTANT In a more recent paper, the authors found out that cached configurations worked much better for workloads exhibiting more locality of accessesthan their supercomputer center workload

More Related