1 / 35

Memory Hierarchy

Memory Hierarchy. Faster Access-Lower Cost. Principle of Locality. Programs access small portions of their address space at any instant of time. Two types Temporal locality Item referenced will be referenced again soon Spatial locality

nelly
Download Presentation

Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Hierarchy Faster Access-Lower Cost

  2. Principle of Locality • Programs access small portions of their address space at any instant of time. • Two types • Temporal locality • Item referenced will be referenced again soon • Spatial locality • Items near the last referenced item will be referenced soon

  3. Memory Hierarchy • Takes advantage of principle of locality • Memory technologies • SRAM – fast but costly • DRAM – slower but not as costly • Magnetic Disk – much slower but very cheap • Idea: construct hierarchy of these memories in increasing size away form processor

  4. Cache Memory (Two level) • Block – Smallest unit of data transferred. • Hit rate – Fraction of memory access found in cache • Miss rate (1 – hit rate) • Hit time – Time to access a level of memory including determine hit or miss • Miss penalty – Time required to fetch block from lower memory Processor Cache

  5. Direct Mapped Cache • How do you map a block of memory from larger memory space to cache? • Simplest method: assign location in cache for each location in memory • Function: • (block addr) mod (# cache blocks) • If # cache blocks is 2n, block address for memory address A is log2(A) • Note this is just the lower n bits of A

  6. A Direct-Mapped Cache Example

  7. Accessing A Cache References: 10110 - m 11010 - m 10110 - h 11010 - h 10000 - m 00011 - m 10000 - h 10010 - m

  8. Updated Cache References: 10110 - m 11010 - m 10110 - h 11010 - h 10000 - m 00011 - m 10000 - h 10010 - m

  9. Selecting the Data

  10. Handling Cache Misses • Must modify control to take into account if miss occurs • Consider instruction memory • Algorithm • Send (PC – 4) to memory • Read memory and wait for result • Write cache entry • Restart instruction execution

  11. Handling Writes • Want to avoid inconsistent cache and memory • Two approaches • Write-through • Write-back

  12. Write-Through • Idea: Write data into both cache and memory • Simple solution • Problematic in that the write to memory will take longer than write to cache (maybe 100 times longer) • Can use a write buffer • What problems arise from using a write buffer?

  13. Write-Back • Write only to the cache • Mark cache blocks that have been written to as “dirty” • If block is dirty it must be written to memory when it is replaced • What type of problems can arise using this strategy?

  14. Memory Design to Support Caches • Assume: • 1 memory bus clock cycle to send addr. • 15 memory bus clock cycles for DRAM access • 1 memory bus clock cycle to send on word of data • 4 word block transfer • 1 + 4x15 + 4x1 = 65 bus clock cycles • Miss penalty is high • Bytes transferred per clock cycle (4x4)/65=0.25

  15. Memory Designs How do designs b & c increase the bytes per clock cycle transfer rate?

  16. Bits In Cache • Block size is larger than word – say 2m words • Cache has 2n blocks • Tag bits: 32 – (n – m + 2) • Size: 2nx(mx32 + (32–n–m-2) + 1)

  17. Analysis of Block Size • Larger blocks exploit spatial locality • Therefore, miss rate is lowered • What happens as block size continues to gets larger? • Cache size is fixed • Number of cache blocks is reduced • Contention for block space in cache increases • Miss rate goes up

  18. Measuring Cache Performance • CPU Time = (CPU exe cycles + mem stall cycles) x Clock cycle time • Read-stall cycles = Reads/Program x Read miss rate x Read Miss Penalty • Writes are a problem because of buffer stalls

  19. Measuring Cache Performance:Simplifications • Assume write-through scheme • Assume well designed system so that the write buffer stalls can be ignored • Read and write miss penalties are the same. • Memory-stall clock cycles = Instructions/Program x misses/Instruction x Miss penalty

  20. Example • Assume • Instruction cache miss rate: 2% • Data cache miss rate: 4% • CPI (Cycles Per Instruction): 2 • Miss penalty: 100 cc • SPECint2000 benchmark: 36% load & store Instructions • Clock Cycle Time: 1 ns (1x10-9 sec) • Find the CPU execution time • How much faster would perfect cache be?

  21. Solution • Instruction miss cycles: I x 2% x 100 = 2I • Data miss cycles: I x 36% x 4% x 100 = 1.44I • Memory Stall cycles: 2I + 1.44I = 3.44I • CPI (w. Memory Stalls): 2 + 3.44 = 5.44 • CPU execution time = 5.44I x 1 ns • Perfect cache is 5.44/2 = 2.72 times faster

  22. Types of Cache Mappings • Direct mapped • Each block in only one place • (block number) mod (# cache blocks) • Set Associative • Each block can be mapped to n places in cache • (block number) mod (# sets in cache) • Fully Associative • Block can map anywhere in cache

  23. Types of Cache Mappings (2)

  24. Set Associative Cache Mappings

  25. Locating the Block in Cache

  26. Virtual Memory: The Concept • Use main memory as a cache for magnetic disk • Motivations • Safe and efficient sharing of main memory • Remove programmer burden of handling small limited amounts of memory • Invented in the 1960s

  27. Virtual Memory: Sharing Memory • Programs must be well behaved • Main concept: each pgm has it own address space • Virtual memory: addr in program → physical address • Protection • Protect one process from another • Set of mechanisms for ensuring this

  28. Virtual Memory: Small Memories • W/o virtual memory programmer must make large pgm fit in small memory space • Solution was the use of overlays • Even with our relatively large main memories, we would still have to do this today w/o virtual memory!!!!

  29. Virtual Memory: Terminology • Page – term for cache block • Page Fault – term for cache miss • Virtual address • Address within the program space • Translated to physical address by combination of hardware & software • Process called address translation

  30. Virtual Memory: Conceptual Diagram

  31. Virtual Memory: Address Translation

  32. Virtual Memory: Page Faults • Main memory approx. 100,000 times faster than disk • Page Fault is enormously costly • Key decisions: • Page size – 4KB to 16KB • Reducing page faults attractive • Page Faults can be handled in software • Only write-back can be used

  33. Virtual Memory: Placing & Finding a Page Each process has its own page table

  34. Virtual Memory: Swap Space Swap Space

  35. Virtual Memory: Translation-Lookaside Buffer

More Related