1 / 17

Motivation for Memory Hierarchy

Motivation for Memory Hierarchy. What we want from memory Fast Large Cheap There are different kinds of memory technologies Register Files, SRAM, DRAM, MRAM, Disk…. Register . Cache. Memory. Disk Memory. size: speed: $/Mbyte: line size:. 32 B 0.3 ns 8 B. 32 KB-4MB 1 ns $60/MB

petula
Download Presentation

Motivation for Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motivation for Memory Hierarchy • What we want from memory • Fast • Large • Cheap • There are different kinds of memory technologies • Register Files, SRAM, DRAM, MRAM, Disk… Register Cache Memory Disk Memory size: speed: $/Mbyte: line size: 32 B 0.3 ns 8 B 32 KB-4MB 1 ns $60/MB 32 B 1024 MB 30 ns $0.10/MB 4 KB 300 GB 8 X 106 ns $0.001/MB larger, slower, cheaper ECE 232

  2. Need for speed • Assume CPU runs at 3GHz • Every instruction requires 4B of instruction and at least one memory access (4B of data) • 3 * 8 = 24GB/sec • Peak performance of sequential burst of transfer (Performance for random access is much much slower due to latency) ECE 232

  3. Need for Large Memory • Small memories are fast • So just write small programs 640 K of memory should be enough for anybody. -- Bill Gates, 1981 • Real programs require large memories • Powerpoint 2003 – 25 megabytes • Data base applications may require Gigabytes of memory ECE 232

  4. Levels in Memory Hierarchy • Hierarchy makes memory appear faster, larger and cheaper by exploiting locality of reference • Temporal locality • Spatial locality • Memory • Latency (remember from pipeline?) needed for random access • Bandwidth for moving blocks of memory • Strategy: Provide a Small, Fast Memory which holds a subset of the main memory • It is both low latency (smaller address space) and • High bandwidth (larger data width) ECE 232

  5. Basic Philosophy • Move data into ‘smaller, faster’ memory • Operate on it (latency) • Move it back to ‘larger, cheaper’ memory (bandwidth) • How do we keep track if changed • What if we run out of space in ‘smaller, faster’ memory? ECE 232

  6. Notice that the data width is changing Why? Bandwidth: Transfer rate between various levels CPU-Cache: 24 GBps Cache-Main: 0.5-6.4GBps Main-Disk: 187MBps (serial ATA/1500) cache virtual memory CPU Memory C a c h e disk 8 B 32 B 4 KB regs Typical Hierarchy ECE 232

  7. Bandwidth Issue • Fetch large blocks at a time (Bandwidth) • Supports spatial locality for (i=0; i < length; i++) sum += array[i]; • array has spatial locality • sum has temporal locality ECE 232

  8. Figure of Merit • Why are we building the cache? • Minimize the average memory access time • That means maximize number of access found in the cache • “Hit Rate” • Percentage of Memory Access In Cache • Assumption • Every instruction requires exactly 1 memory access • Every instruction requires 1 clock cycle to complete • Cache access time is same as clock cycle • Main memory access time is 20 cycles • CPI (cycles/instruction) = hitRate * clocksCacheHit + (1 – hitRate) * clocksCacheMiss ECE 232

  9. CPI • Highly sensitive to hit rate • 90% hit rate • .90 * 1 + .10 * 20 = 2.9 CPI • 95% hit rate • .95 * 1 + .05 * 20 = 1.95 CPI • 99% hit rate • .99 * 1 + .01 * 20 = 1.01 CPI • Hit rate matters • Larger cache, multi-level cache improves hit rate ECE 232

  10. How is cache implemented • Basic concept • Traditional Memory • Given an address, provide some data • Associative Memory • Given data, provide an address • AKA “Content Addressable Memory” • “Data” is the Address • “Address” is which cache line ECE 232

  11. Cache Implementation • Fully associative (read text for set associative) # of Cache Lines Width of Cache Lines Associative Memory ECE 232

  12. The Issues • How is the cache organized • Size • Line size • Number of Lines • Write policy • Replacement Strategy ECE 232

  13. Cache Size • Need to choose size of lines • Bigger Lines Exploit More Spatial Locality • Diminishing returns for larger and larger lines • Tends to be around 128 B • And Number of Lines • More lines == Higher hit rate • Slower Memory • As many as practical Width of Cache Lines ECE 232

  14. Writing to the Cache • Need to keep cache consistent with memory • Write to cache and memory simultaneously • “Write-through” • Refinement: Write to cache and mark as ‘dirty’ • Will need to eventually copy back to main memory • “Write-back” ECE 232

  15. Replacement Strategies • Problem: We need to make space in cache for a new entry • Which Line Should be ‘Evicted’ • Ideal?: Longest Time Till Next Access • Least-recently used • Complicated • Random selection • Simple • Effect on hit rate is relatively small ECE 232

  16. Processor-DRAM Gap (latency) µProc 60%/yr. 1000 CPU “Moore’s Law” 100 Processor-Memory Performance Gap:(grows 50% / year) Performance 10 DRAM 7%/yr. DRAM 1 1988 1987 1989 1990 1991 1992 1993 1994 1995 1985 1986 1996 1997 1998 2000 1980 1982 1983 1984 1999 1981 Time Patterson, 1998 ECE 232

  17. Will Do Almost Anything to Improve Hit Rate • Lots of techniques • Most important: Make the cache big • An improvement of 1% is very worthwhile • Avoid worst case whenever possible • Multilevel caching ECE 232

More Related