1 / 28

10/18: Lecture Topics

10/18: Lecture Topics. Using spatial locality memory blocks Write-back vs. write-through Types of cache misses Cache performance Cache tradeoffs Cache summary. Locality. Temporal locality : the principle that data being accessed now will probably be accessed again soon

Download Presentation

10/18: Lecture Topics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 10/18: Lecture Topics • Using spatial locality • memory blocks • Write-back vs. write-through • Types of cache misses • Cache performance • Cache tradeoffs • Cache summary

  2. Locality • Temporal locality: the principle that data being accessed now will probably be accessed again soon • Useful data tends to continue to be useful • Spatial locality: the principle that data near the data being accessed now will probably be needed soon • If data item n is useful now, then it’s likely that data item n+1 will be useful soon

  3. Memory Access Patterns • Memory accesses don’t look like this • random accesses • Memory accesses do look like this • hot variables • step through arrays

  4. Locality • Last time, we improved memory performance by taking advantage of temporal locality • When a word in memory was accessed we loaded it into memory • This does nothing for spatial locality

  5. Possible Spatial Locality Solution • Store one word per cache line • When memory word N is accessed, load word N, word N+1 and N+2 … into the cache • This is called prefetching • What’s a drawback? Example: What if we access the word at address 1000100?

  6. Block 0 0x00000000–0x0000003F Block 1 0x00000040–0x0000007F Block 2 0x00000080–0x000000BF Memory Blocks • Divide memory into blocks • If any word in a block is accessed, then load an entire block into the cache Cache line for 16 word block size

  7. Address Tags Revisited • A cache block size > 1 word requires the address to be divided differently • Instead of a byte offset into a word, we need a byte offset into the block • Assuming we had 10-bit addresses, and 4 words in a block…

  8. Cache Diagram • Cache with a block size of 4 words What does the cache look like after accesses to these 10 bit addresses? 1000010010, 1100110011

  9. Cache Lookup • 32 bit addr., 64K DM cache, 4 words/block ref. address 10000110 11101010 10000101 11101010 Index into cache Cache hit; select word yes Is valid bit on? Do tags match? yes return data no no Cache miss; access memory

  10. Cache Example • Suppose the L1 cache is 32KB, 2-way set associative and has 8 words per block, how do we partition the 32-bit address? • How many bits for the block offset? • How many bits for the index? • How many bits for the tag?

  11. The Effects of Block Size • Big blocks are good • Fewer first time misses • Exploits spatial locality • Small blocks are good • Don’t evict so much other data when bringing in a new entry • More likely that all items in the block will turn out to be useful • How do you choose a block size?

  12. Reads vs. Writes • Caching is essentially making a copy of the data • When you read, the copies still match when you’re done • When you write, the results must eventually propagate to both copies • Especially at the lowest level, which is in some sense the permanent copy

  13. Write-Through Caches • Write the update to the cache and the memory immediately • Advantages: • The cache and the memory are always consistent • Evicting a cache line is cheap because no data needs to be written back • Easier to implement • Disadvantages?

  14. Write-Back Caches • Write the update to the cache only. Write to the memory only when the cache block is evicted. • Advantages: • Writes go at cache speed rather than memory speed. • Some writes never need to be written to the memory. • When a whole block is written back, can use high bandwidth transfer. • Disadvantages?

  15. Dirty bit • When evicting a block from a write-back cache, we could • always write the block back to memory • write it back only if we changed it • Caches use a “dirty bit” to mark if a line was changed • the dirty bit is 0 when the block is loaded • it is set to 1 if the block is modified • when the line is evicted, it is written back only if the dirty bit is 1

  16. Dirty Bit Example • Use the dirty bit to determine when evicted cache lines need to be written back to memory $r2 = Mem[10010000] Mem[10010100] = 10 $r3 = Mem[11010100] $r4 = Mem[11011000] Mem[01010000] = 10 • Assume 8 bit addresses • Assume all memory words are initialized to 7

  17. i-Cache and d-Cache • There usually are two separate caches for instructions and data. Why? • Avoids structural hazards in pipelining • The combined cache is twice as big but still has an access time of a small cache • Allows both caches to operate in parallel, for twice the bandwidth

  18. Handling i-Cache Misses • Stall the pipeline and send the address of the missed instruction to the memory • Instruct memory to perform a read; wait for the access to complete 3. Update the cache 4. Restart the instruction, this time fetching it successfully from the cache d-Cache misses are even easier, but still require a pipeline stall

  19. Cache Replacement • How do you decide which cache block to replace? • If the cache is direct-mapped, it’s easy • Otherwise, common strategies: • Random • Least Recently Used (LRU) • Other strategies are used at lower levels of the hierarchy. More on those later.

  20. LRU Replacement • Replace the block that hasn’t been used for the longest time. Reference stream: A B C D B D E B A C B C E D C B

  21. LRU Implementations • LRU is very difficult to implement for high degrees of associativity • 4-way approximation: • 1 bit to indicate least recently used pair • 1 bit per pair to indicate least recently used item in this pair • Much more complex approximations at lower levels of the hierarchy

  22. The Three C’s of Caches • Three reasons for cache misses: • Compulsory miss: item has never been in the cache • Capacity miss: item has been in the cache, but space was tight and it was forced out (occurs even with fully associative caches) • Conflict miss: item was in the cache, but the cache was not associative enough, so it was forced out (never occurs with fully associative caches)

  23. Eliminating Cache Misses • What cache parameters (cache size, block size, associativity) can you change to eliminate the following kinds of misses • compulsory • capacity • conflict

  24. Multi-Level Caches • Use each level of the memory hierarchy as a cache over the next lowest level • Inserting level 2 between levels 1 and 3 allows: • level 1 to have a higher miss rate (so can be smaller and cheaper) • level 3 to have a larger access time (so can be slower and cheaper) • The new effective access time equation:

  25. Which cache system is better? • 32 KB unified data and instruction cache • hit rate of 97% • 16 KB data cache • hit rate of 92% • And 16 KB instruction cache • hit rate of 98% • Assume • 20% of instructions are loads or stores

  26. Cache Parameters and Tradeoffs • If you are designing a cache, what choices do you have and what are their tradeoffs?

  27. Cache Comparisons L1 i-Cache L1 d-Cache L2 unified Cache

  28. Summary: Classifying Caches • Where can a block be placed? • Direct mapped, Set/Fully associative • How is a block found? • Direct mapped: by index • Set associative: by index and search • Fully associative: by search • What happens on a write access? • Write-back or Write-through • Which block should be replaced? • Random • LRU (Least Recently Used)

More Related