1 / 31

The Memory Hierarchy II CPSC 321

The Memory Hierarchy II CPSC 321. Andreas Klappenecker. Today’s Menu. Cache Virtual Memory Translation Lookaside Buffer. Caches. Why? How? . Memory. Users want large and fast memories SRAM is too expensive for main memory DRAM is too slow for many purposes Compromise

pelham
Download Presentation

The Memory Hierarchy II CPSC 321

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Memory Hierarchy II CPSC 321 Andreas Klappenecker

  2. Today’s Menu Cache Virtual Memory Translation Lookaside Buffer

  3. Caches Why? How?

  4. Memory • Users want large and fast memories • SRAM is too expensive for main memory • DRAM is too slow for many purposes • Compromise • Build a memory hierarchy

  5. Locality • If an item is referenced, then • it will be again referenced soon (temporal locality) • nearby data will be referenced soon (spatial locality) • Why does code have locality?

  6. Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B

  7. Direct Mapped Cache The index is determined by address mod 1024 • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cachehitotherwise it is acache missWhat kind of locality are we taking advantage of?

  8. Direct Mapped Cache • Taking advantage of spatial locality:

  9. Bits in a Cache How many total bits are required for a direct-mapped cache with 16 KB of data, 4 word blocks, assuming a 32 bit address? • 16 KB = 4K words = 2^12 words • Block size of 4 words => 2^10 blocks • Each block has 4 x 32 = 128 bits of data + tag + valid bit • tag + valid bit = (32 – 10 – 2 – 2) + 1 = 19 • Total cache size = 2^10*(128 + 19) = 2^10 * 147 Therefore, 147 KB are needed for the cache

  10. Cache Block Mapping • Direct mapped cache • a block goes in exactly one place in the cache • Fully associative cache • a block can go anywhere in the cache • it is difficult to find a block • parallel comparison to speed-up search • Set associative cache • a block can go to a (small) number of places • compromise between the two extremes above

  11. Cache Types

  12. Set Associative Caches • Each block maps to a unique set, • the block can be placed into any element of that set, • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative

  13. Summary: Where can a Block be Placed?

  14. Summary: How is a Block Found?

  15. Virtual Memory

  16. Virtual Memory • Processor generates virtual addresses • Memory is accessed using physical addresses • Virtual and physical memory is broken into blocks of memory, called pages • A virtual page may be • absent from main memory, residing on the disk • or may be mapped to a physical page

  17. Virtual Memory • Main memory can act as a cache for the secondary storage (disk) • Virtual address generated by processor (left) • Address translation (middle) • Physical addresses (right) • Advantages: • illusion of having more physical memory • program relocation • protection

  18. Pages: virtual memory blocks • Page faults: if data is not in memory, retrieve it from disk • huge miss penalty, thus pages should be fairly large (e.g., 4KB) • reducing page faults is important (LRU is worth the price) • can handle the faults in software instead of hardware • using write-through takes too long so we use writeback • Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB

  19. Page Faults • Incredible high penalty for a page fault • Reduce number of page faults by optimizing page placement • Use fully associative placement • full search of pages is impractical • pages are located by a full table that indexes the memory, called the page table • the page table resides within the memory

  20. Page Tables The page table maps each page to either a page in main memory or to a page stored on disk

  21. Page Tables

  22. Making Memory Access Fast • Page tables slow us down • Memory access will take at least twice as long • access page table in memory • access page • What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer

  23. Making Address Translation Fast A cache for address translations: translation lookaside buffer

  24. Translation Lookaside Buffer • Some typical values for a TLB • TLB size 32-4096 • Block size: 1-2 page table entries (4-8bytes each) • Hit time: 0.5-1 clock cycle • Miss penalty: 10-30 clock cycles • Miss rate: 0.01%-1%

  25. TLBs and Caches

  26. More Modern Systems • Very complicated memory systems:

  27. Some Issues • Processor speeds continue to increase very fast — much faster than either DRAM or disk access times • Design challenge: dealing with this growing disparity • Trends: • synchronous SRAMs (provide a burst of data) • redesign DRAM chips to provide higher bandwidth or processing • restructure code to increase locality • use prefetching (make cache visible to ISA)

More Related