1 / 40

Cache Organization

Cache Organization. Rehashing our terms. The Architectural view of memory is: What the machine language sees Memory is just a big array of storage Breaking up the memory system into different pieces – cache, main memory (made up of DRAM) and Disk storage – is not architectural .

toan
Download Presentation

Cache Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Organization

  2. Rehashing our terms • The Architectural view of memory is: • What the machine language sees • Memory is just a big array of storage • Breaking up the memory system into different pieces – cache, main memory (made up of DRAM) and Disk storage – is not architectural. • The machine language doesn’t know about it • The processor may not know about it • A new implementation may not break it up into the same pieces (or break it up at all). Caching needs to be Transparent!

  3. What’s in a Cache? • Cache memory can copy data from any part of main memory • What does it have to store? • The data • Where it came from (the address) • Terminology: • The TAG holds the memory address • The BLOCK holds the memory data

  4. Cache organization • A cache memory consists of multiple tag/block pairs • Searches can be done in parallel (within reason) • At most one tag will match • If there is a tag match, it is a cache HIT • If there is no tag match, it is a cache MISS Our goal is to keep the data we think will be accessed in the near future in the cache

  5. Cache Terminology • If a block is found in the cache  “hit” • Otherwise  “miss” • Hit rate= (# hits) / (# requests made to the cache) • Miss rate = 1 – Hit rate • Hit time = time to access the cache to see if a block is present + time to get the block to the CPU • Miss time (aka miss penalty) = time to replace a block in cache with one from DRAM • Average Cache access time = hit time + miss rate * miss penalty

  6. Cache Misses • If a block is found in the cache  “hit” • Otherwise  “miss” • Every cache miss will get the data from memory and ALLOCATE a cache block to put the data in.

  7. V V A very simple memory system Processor Cache Memory 2 cache blocks 4 bit tag field 1 byte block size 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250

  8. 0 0 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 Is it in the cache? 110 No valid tags 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 1 110 150 160 tag data 170 180 This is a Cache miss 190 200 R0 R1 R2 R3 Allocate: address  tag Mem[1]  block Mark Valid 210 220 230 240 250

  9. 1 0 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 0 230 240 250

  10. 1 0 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 Check tags: 5  1 110 120 Cache Miss Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 1 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 0 230 240 250

  11. 1 1 150 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 lru 1 110 150 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 0 230 240 250

  12. 1 1 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 74 Check tags: 1 = 1 (HIT!) 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 lru 1 110 150 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

  13. 1 1 110 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 240 250

  14. 1 1 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 Where does it go??? 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

  15. Cache Misses • Every cache miss will get the data from memory and ALLOCATE a cache block to put the data in. • Which spot should be allocated? • What if the cache is full? • Kick someone else out? • Which one? Random? -- OK, but hard to grade test questions • Better than random? How?

  16. Picking the most likely addresses • What is the probability of accessing a given memory location? • With no information, it is just as likely as any other address • Q: Are programs random? • A: No! • They tend to use the same memory locations over and over. • We can use this to pick the most referenced locations to put into the cache

  17. Locality of Reference • A program does not access all of its data & code with equal probability • (not even close) • Principle of locality of reference: • Programs access a relatively small portion of their address space during any given window of time – applies to both instructions and data • Temporal locality: if an item was recently used, it will probably be used again soon • Spatial locality: if an item was recently referenced, nearby items will probably also be referenced soon

  18. Using locality in the cache • How does this affect our cache design? • Temporal locality says any new miss data should have priority to be placed into the cache • It is the most recent reference location • Temporal locality also says that the least recently referenced (or least recently used – LRU ) cache line should be evicted to make room for the new line. • Because the re-access probability falls over time as a cache line isn’t referenced, the LRU line is least likely to be re-referenced. • If we haven’t used it in a while, throw it away

  19. 1 1 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 74 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

  20. 1 1 7 170 170 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 74 7  5 and 7  1 (MISS!) 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 1 110 150 lru 5 150 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

  21. 1 1 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 74 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 lru 1 110 150 7 170 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 3 Hits: 1 150 230 170 240 250

  22. 1 1 170 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 74 100 7  1 and 7 = 7 (HIT!) 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 lru 1 110 150 7 170 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 3 Hits: 1 150 230 170 240 250

  23. 1 1 A very simple memory system Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 74 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 7 ] Ld R2  M[ 7 ] 130 140 lru 1 110 150 7 170 160 tag data 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 3 Hits: 2 170 230 170 240 250

  24. Calculating Average Access Latency • Avg latency = cache latency + memory latency  miss rate • Avg latency = 1 cycle + 15 cycles  (3/5) = 11 cycles per reference • To improve average latency: • Improve memory access latency, or • Improve cache access latency, or • Improve cache hit rate

  25. Calculating Cost • How much does a cache cost? • Calculate storage requirements • 2 bytes of SRAM • Calculate overhead to support access (tags) • 2 4-bit tags = 1 byte of SRAM • The cost of the tags is often forgotten for caches, but this cost drives the design of real caches • What is the cost if a 32 bit address is used?

  26. How can we reduce the overhead? • Have a small address • Impractical, and caches are supposed to be non-architectural • Cache bigger units than bytes • Each block has a single tag, and blocks can be whatever size we choose.

  27. Spatial Locality • Spatial locality in a program says that if we reference a memory location (e.g., 1000), we are more likely to reference a location near it (e.g. 1001) than some random location. • Think: Arrays, or variables on the stack • Approach: When we access an address, also cache data around it

  28. Overhead? • This helps reduce the tag (address) overhead on two fronts: • We only need to store one tag for each block • This increases the ratio of data : tag • If we align our blocks, we can reduce how much of the address we need to store

  29. Tag size vs Block size • If our address space is 32-bit…

  30. Block size for caches Processor Cache Memory Block # 2 cache sets 2 byte block 3 bit tag field 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 V 150 160 V 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250

  31. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 0 150 160 0 170 180 190 200 R0 R1 R2 R3 210 220 230 240 250

  32. Addr: 0001 Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 1 0 100 150 110 160 lru 0 170 180 block offset 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 0 230 240 250

  33. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 1 0 100 150 110 160 lru 0 170 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 1 Hits: 0 230 240 250

  34. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 lru 1 0 100 150 110 160 1 2 140 170 150 180 190 block offset Addr: 0101 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

  35. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 lru 1 0 100 150 110 160 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 0 150 230 240 250

  36. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 1 0 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

  37. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 1 0 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 1 150 230 110 240 250

  38. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 lru 1 0 100 150 110 160 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

  39. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 lru 1 0 100 150 110 160 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 2 150 230 140 240 250

  40. Block size for caches Processor Cache Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 100 110 120 Ld R1  M[ 1 ] Ld R2  M[ 5 ] Ld R3  M[ 1 ] Ld R3  M[ 4 ] Ld R2  M[ 0 ] 130 tag data 140 1 0 100 150 110 160 lru 1 2 140 170 150 180 190 200 R0 R1 R2 R3 210 110 220 Misses: 2 Hits: 3 140 100 230 140 240 250

More Related