1 / 94

COEN 180

COEN 180. Main Memory Cache Architectures. Basics. Processor cycles at 2-3 nsec, main memory access ~ 200-400 nsec. Use faster memory as a cache. Actually: Instruction queue (part of processor) L1 cache ~ 32 KB on processor chip L2 cache ~ 1 MB (L3 cache ~ 4 MB)

yves
Download Presentation

COEN 180

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COEN 180 Main Memory Cache Architectures

  2. Basics • Processor cycles at 2-3 nsec, main memory access ~ 200-400 nsec. • Use faster memory as a cache. • Actually: • Instruction queue (part of processor) • L1 cache ~ 32 KB on processor chip • L2 cache ~ 1 MB • (L3 cache ~ 4 MB) • Caches on DRAM processor chips

  3. Basics • Cache algorithms need to be implemented in hardware and be simple. • Hit rate needs to be high. • MM is byte addressable, but only words are moved, typically, the last two bits of the address are not even transmitted.

  4. Basics Cache versus Main Memory Main Memory ... ... ABAB FFFF ... ... ... ... Cache ... ABAB FFFF ... Cache: Contains some data Fast Direct Mapped Cache: This item can only be in one cache line. Main Memory: Contains all the data Slow

  5. Basics Average Access Time = (Hit Rate)*(Access to Cache) + (Miss Rate)*(Access to MM)

  6. Basics Average Access Time = (Cache Access Time) + (Miss Rate)*(Miss Penalty) Miss Penalty: Additional time it takes to get data from slower memory.

  7. Your Turn • Assume cache access for an on-chip cache is 5 nsec. • Assume main memory access is 145 nsec. • Access time for a miss is 5 nsec + 145 nsec. • Calculate the access times for a hit rate of • 50% • 90% • 99% • Conclusion: hit rates need to be high. 77.50 nsec 19.50 nsec 6.45 nsec

  8. Basics:Do not confuse memory and data Main Memory contents address Address can be 32b, MM word can be 32b, but addresses and contents are ontologically different

  9. Virtual Memory • Gives the impression of much more memory than there really is. • Pages memory pages into and out of disk. • Handled by MU (Memory Unit). • Distinguish between virtual addresses and physical addresses.

  10. Virtual Memory • Memory is broken up into pages • Typical value: 4KB • Some pages reside in memory and on disk • “Working set” of pages should be in main memory

  11. Virtual Memory • Virtual addresses are 32b long. • Or 64b for a 64b processor. • Physical addresses are often smaller. • Correspond to maximum MM-size. • Since most MM is byte addressable, but data is moved in words (4B, 8B, ...), the least significant bits of physical address are not put on the address bus.

  12. Virtual Memory • Can use caches at the virtual memory level • Using virtual memory addresses. • Or at the physical memory level. • Using physical memory addresses. • If nothing is said, assume virtual memory addresses.

  13. Virtual Memory • Main Memory acts as a cache for data on the disk. • We consider this cache organization later.

  14. L3 –MM Cache:Cache Replacement • Which items should be in the cache? • Algorithm needs to be very fast and simple. • Need to implement algorithm in hardware. • Basic scheme: • If MM item is read or written, put it in the cache. • Throw out an old item. • Determining the old item is the subject of many different caching algorithms.

  15. Direct Mapped Cache • Each item in MM can be located in only one position in cache. • MM addresses typically refer to a single byte (an ASCII text character) • For historical reasons • Hard to change • Physically, only complete words are accessed.

  16. Direct Mapped Cache • Each word / block can be in only one location in the cache. • Many words / blocks compete for the same cache location. • Cache replacement algorithm: • Very simple: Throw out the previous occupant.

  17. Direct Mapped CacheAddressing Addressing Mechanism for MM contents: Address 0110 1100 1110 1110 0101 1010 1111 0010 Go to byte 10 (=2dec) In word: 0110 1100 1110 1110 0101 1010 1111 00

  18. Direct Mapped CacheAddressing • Tag (highest order bits); • Index; • Byte in word address. • Typically the two least significant bits for 4B per word. Address is split into

  19. Direct Mapped CacheAddressing • Tag serves to identify the data item in the cache. • Index is the address of the word in the cache.

  20. Direct Mapped Cache 0110 1100 1110 1110 0101 1010 1111 0010 Cache Index Tag 0110 1100 1110 1110 011010 1010 1010 1010 1010 1010 1010 1010 The index tells us where the contents of MM[0110 1100 1110 1110 0101 1010 1111 0010] are stored in the cache. Namely at cache line (location) 01 1010 1111 00

  21. Direct Mapped Cache 0110 1100 1110 1110 0101 1010 1111 0010 Cache Index Tag 0110 1100 1110 1110 011010 1010 1010 1010 1010 1010 1010 1010 The tag allows us to check what data item is stored in the cache. Namely the word with address 0110 1100 1110 1110 0101 1010 1111 00**

  22. Direct Mapped Cache • Contents of main memory address • 0110 1100 1110 1110 0101 1010 1111 0010 • and of main memory address • 1100 1111 0000 1110 0101 1010 1111 0010 • would be stored at the same location in cache. • To know which one is stored there, keep the tag with the contents.

  23. Direct Mapped Cache 0110 1100 1110 1110 0101 1010 1111 0010 Tag: Identifies item in cache Index: Where the item is in the cache: Cache line / address Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 Contents of MM[...]

  24. Direct Mapped CacheYour Turn • Why are the most significant bits of the address the tag and not the index? • Answer: • A whole region of main memory can be loaded into cache. • Makes sense because of spatial locality. • Neighboring MM addresses have different indices but the same tag. • Otherwise, neighboring MM addresses have different tags and same index, that is, they are competing for the same cache location.

  25. Direct Mapped CacheExample • Memory words are 2B long. • Memory contains 128B and is byte addressable • 128 addressable items. • 27 addresses. • Memory addresses 7 b long. • Cache contains 4 words. • 2 b cache address = index • Memory address split into • 4b tag • 2b index • 1b Byte in word address.

  26. Direct Mapped CacheExample Main Memory contents: 000 0000: 00 000 0001: FF 000 0010: 10 000 0011: 00 000 0100: 11 000 0101: 00 000 0110: FF 000 0111: FF 000 1000: DA 000 1001: AB ... ... Contents of MM: 2B MM address

  27. Direct Mapped CacheExample • Assume item MM[000 0010] is in cache. • Cache contains complete MM line. • Split address into tag, index, and Byte in Word address: • Tag is 0000 • Index is 01 • Byte in Word is 0 000 0010 Byte in Word Index Tag

  28. Direct Mapped CacheExample TAG Data 0 Data 1 0001 DA AB 0000 10 00 0000 11 00 0000 FF FF Access MM location 000 1001: Go to cache line 000 1001 Check whether tag is 000 1001 Hit: Data is in cache. Go to data 000 1001

  29. Direct Mapped CacheExample TAG Data 0 Data 1 0001 DA AB 0000 10 00 0000 11 00 0000 FF FF Access MM location 000 0001: Go to cache line 000 1001 Check whether tag is 000 1001 Miss, data is not in cache

  30. Direct Mapped Cache • Cache lines contain • Contents • Tags • Some metadata (as we will see). • Distinguish between cache capacity and cache storage needs. • Difference is cache storage overhead.

  31. Direct Mapped CacheExample TAG Data 0 Data 1 0001 DA AB 0000 10 00 0000 11 00 0000 FF FF Cache contains 10B, of which 8B are data

  32. Direct Mapped Cache • Vocabulary Repetition • Byte addressable: one address per byte. • Cache lines: items stored at a single cache address (index).

  33. Direct Mapped CacheYour Turn: • Main Memory • Contains 512 MB. • 8 B in a word. • Byte addressable • What is the length of an address? • Solution • 512M = 29220 = 229 addressable items. • Addresses are 29 bits long.

  34. Direct Mapped CacheYour Turn Cache cache line (8B + tags) • Main Memory • Contains 512 MB. • 8 B in a word. • Byte addressable • Cache • Contains 1 MB • Cache line consists of 1 word (of 8B) • How many cache lines? • How long are indices? • 1M / 8 = 128K = 217 cache lines. • Indices are 17b long. Nr. cache lines

  35. Tag: 9b Index: 17b Byte in Word: 3b Direct Mapped CacheYour Turn • MM address is 29 bits • Index is 17 bits • How is a MM address split up? • Solution: • 8 B in a word  3 bits for “Byte in Word”. • 17 bits for index. • 9 bits for tag.

  36. Direct Mapped CacheYour Turn • What is the cache storage overhead? • Solution • Overhead per cache line is the tag. • Cache line contains 8B contents. • Cache line contains 9b tag. • (Plus possibly other metadata, which we ignore.) • Overhead is 9b / 8B = 9/64 = 14.0625 %

  37. Reads from a Cache • Input is MM location • Calculate cache line from MM location • This is were the item might be. • Use the tag to check whether this is the correct item.

  38. Reads from a cache Assume memory address is 0110 1100 1110 1110 0101 1010 1111 0010 Go to cache line 01 1010 1111 00 Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 They are: The result of the look-up is 0101 0101 0101 0101 0101 0101 0101 0101 This is a HIT. Check whether the tags are the same

  39. Reads from a cache Assume memory address is 0110 1111 1110 1110 0101 1010 1111 0010 Go to cache line 01 1010 1111 00 Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 Check whether the tags are the same They are not: The requested word is not in the cache. A MISS.

  40. Reads from a Cache • Miss Penalty: • The added time necessary to find the word. • In this case, go to main memory and satisfy it from there.

  41. Miss! Reads from a Cache Give me MM[address] • Miss: • Go to cache • Find out that it is a miss • Go to main memory Processor Here is the result, sorry it took so long. Cache Main Memory Miss Penalty: Time to go to Main Memory.

  42. Your Turn: ? • Why don’t we send requests to both cache and MM at the same time. • This way, cache access and MM access overlap. • There is less miss penalty. • Main memory would be overwhelmed with all these read requests.

  43. Cache Writes • A “write to cache” operation updates • Contents, • Tag field, • If written item replaces another item instead of writing a new value. • Meta data.

  44. Write Policies • Write-through: • A write is performed to both the cache and to the main memory. • Copy-back: • A write is performed only to cache. If an item is replaced by another item, then the item to be replaced is copied back to main memory.

  45. Write Through Processor Simultaneous update Cache Main Memory

  46. Write-Through • Cache and MM always contain the same contents. • When an item is replaced by another one in the cache, there is no need for additional synchronization. • Write traffic to both cache and MM. Processor Cache MM

  47. Cache Operations Write-Through READ: • Extract Tag and Index from Address. • Go to the cache line given by Index. • See whether the Tag matches the Tag stored there. • If they match: Hit. Satisfy read from cache. • If they do not match: Miss. Satisfy read from main memory. Also store item in cache. (Replacement policy, as we will see.)

  48. Cache Operations Write-Through Write: • Extract Tag and Index from address. • Write datum in cache at location given by Index. • Reset the tag field in the cache line with Tag. • Write datum in main memory.

  49. Copy Back • Writes only to cache. • MM and cache are not in the same state after a write. • Need to save values in the cache if item in cache is replaced.

  50. Copy Back • Write item MM[0000 0000 0000 1111 1111 1111 1111 1111]. • Puts item MM[0000 0000 0000 1111 1111 1111 1111 1111]. into cache. • Read item MM[1111 0000 0000 1111 1111 1111 1111 1111]. • Both items have same index  latter item overwrites first item. • First item not updated in cache: it is dirty. • Need to write contents of MM[0000 0000 0000 1111 1111 1111 1111 1111] to MM before putting MM[1111 0000 0000 1111 1111 1111 1111 1111] into cache.

More Related