five minute rule ten years later and other computer storage rules of thumb
Download
Skip this Video
Download Presentation
“Five minute rule ten years later and other computer storage rules of thumb”

Loading in 2 Seconds...

play fullscreen
1 / 17

Five minute rule ten years later and other computer storage rules of thumb - PowerPoint PPT Presentation


  • 199 Views
  • Uploaded on

“Five minute rule ten years later and other computer storage rules of thumb”. Authors: Jim Gray, Goetz Graefe Reviewed by: Nagapramod Mandagere Biplob Debnath. Outline. Problem Statement Motivation Importance and Relevance Main Contributions and Validation Key Ideas Illustrations

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Five minute rule ten years later and other computer storage rules of thumb ' - sheehan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
five minute rule ten years later and other computer storage rules of thumb

“Five minute rule ten years later and other computer storage rules of thumb”

Authors: Jim Gray, Goetz Graefe

Reviewed by: Nagapramod Mandagere Biplob Debnath

outline
Outline
  • Problem Statement
  • Motivation
  • Importance and Relevance
  • Main Contributions and Validation
  • Key Ideas
  • Illustrations
  • New Metrics
  • Assumptions
  • Re-write Today
  • Questions
problem statement
Problem Statement
  • Broader Problem: Viewing developments over a long period of time to try and extract important technology trends.
  • Specific Instance: Inferring rules of thumb for buffer replacement policies in a number of settings, including RAID environments.
    • Given: Trends over time for parameters such as memory cost, disk cost, tape cost
    • Find: Rules of thumb for deciding where to store the data and when to replace data from memory buffer
    • Objectives: Simple rules, extensible rules
    • Constraints: Hierarchical Storage Model
typical database administrators dilemma
Typical Database Administrators Dilemma

The performance isn’t good. Am I doing something wrong?

Should I cache on the client?

Should I cache this data in memory?

Should store data back on disk? (local or network disk)

Should I move data to tape?

importance relevance
Importance & Relevance
  • Different rates at which parameters changes
  • seek/second & Disk capacity – 10x to 100x
  • Disk MB/K$ & DRAM MB/K$ - 1000x
importance relevance6
Importance & Relevance
  • The location of data is very important
    • Main Memory: Very Fast, Expensive, limited size
    • Disk Storage: Lot slower that main memory, inexpensive, close to unlimited size
    • Tape Storage: Slowest, dirt cheap, unlimited capacity
  • How can one decide what data resides where?
    • System Learns from data access patterns and adapts (Admins hate to give up control)
    • Administrator controls data locality by using some experience or historical performance info (rules of thumb)
main contributions validation
Main Contributions & Validation
  • The Five minute rule
    • Randomly accessed buffer pages can be replaced if unused for more than 5 minutes.
    • Sequentially accessed buffer pages can be replaced if unused for more than 1 minute.
  • Metrics for storage performance characterization
    • Cost/Access
    • Maps: Megabyte accesses per second
    • Scan: Time it takes to sequentially read or write all the data in the device
  • Validation Methodology - Examples
    • Examples
      • Random access
      • On pass sort
      • Two pass sort
    • Trends observed over a period of time
key ideas
Key Ideas
  • Tradeoff between the cost of RAM and the cost of disk accesses.
    • The tradeoff is that caching pages in the extra memory can save disk IOs.
    • The break-even point is met when the rent on the extra memory for cache ($/page/sec) exactly matches the savings in disk accesses per second ($/disk_access/sec).
illustration typical system in 1997
Illustration – Typical System in 1997
  • For a system with following characteristics
    • PagesPerMBofRAM = 128 pages/MB (8KB pages)
    • AccessesPerSecondPerDisk = 64 access/sec/disk
    • PricePerDiskDrive = 2000 $/disk (9GB + controller)
    • PricePerMBofDRAM = 15 $/MB_DRAM
  • The Inter reference interval is 266 seconds ~ 5 minutes
illustration
Illustration
  • One pass algorithms
    • reads data and never references it,
    • no need to cache the data in RAM.
    • system needs only enough buffer memory to allow data to stream from disk to main memory.
    • Typically, two or three one-track buffers (~100 KB) are adequate per disk to buffer disk operations and allow the device to stream data to the application.
illustration11
Illustration
  • Two pass algorithms
    • sequential operations that read a large dataset and then revisit parts of the data.
    • Database join, cube, rollup, and sort operators
    • Sorting uses two pass if memory size is smaller than the data set size
    • Inter reference time is typically about a minute (sequential data access)
illustration two pass sort
Illustration – Two Pass Sort
  • One pass sort needs larger amount of memory
  • Memory needed grows faster with size of input file
  • For files bigger than memory size, two pass is the only option
disk vs tape tradeoff
Disk vs Tape tradeoff
  • Tape vs Disk Trade off ?????
    • Tape - larger penalty (slower access, least cost)
    • Solution – Larger breakeven point, bigger page size
new metrics
New Metrics
  • Data flow applications which stream huge amounts of data like data mining applications, multimedia applications
  • New Metrics
    • Kaps
      • Kilo byte accesses per second
    • Maps
      • Mega byte accesses per second
    • Scan
      • Time taken to sequentially read or write all data on a device
  • These metrics combined with rent costs provide a price/performance metric
assumptions
Assumptions
  • Disk storages have same characteristics (cost/performance). It assumes that the disk storage systems is homogenous and does not consider the more recent shift towards hierarchical/heterogeneous storage systems.
  • The trade off only consider the performance aspect, the security and fault tolerance issues are assumed to be uniform throughout.
re write
Re-write
  • Re-evaluate the rules of thumb considering more recent costs and the more recent trends in storage systems like heterogeneous/hierarchical storage
    • Take into account SAN, NAS characteristics
questions
Questions???
  • Does Five minute rule hold good today???
  • No (With Reservations)
    • If one changes the Page Size to MegaByte range, five minute rule still applies.
      • Pages/MB of RAM = 16 (8 K pages)
      • Access/sec/disk = 64
      • Price/disk drive = $400
      • Price/MB of RAM = $0.1
      • Break even point ~ 1000s
  • Further Evidence - Jim (Keynote in FAST 2004) Grayhttp://www.usenix.org/events/fast05/
ad