COEN 180

COEN 180 Main Memory Cache Architectures

Basics • Processor speed about 100 – 300 times faster than main memory access. • Use faster memory as a cache. • Actually: • Instruction queue (part of processor) • L1 cache ~ 32 KB on processor chip • L2 cache ~ 1 MB • (L3 cache ~ 4 MB) • Caches on DRAM processor chips

Basics • Cache algorithms need to be implemented in hardware and be simple. • MM is byte addressable, but only words are moved, typically, the last two bits of the address are not even transmitted. • Hit rate needs to be high.

Basics Cache versus Main Memory Main Memory ... ... ABAB FFFF ... ... ... ... Cache ... ABAB FFFF ... Cache: Contains some data Fast Direct Mapped Cache: This item can only be in one cache line. Main Memory: Contains all the data Slow

Basics Average Access Time = (Hit Rate)*(Access to Cache) + (Miss Rate)*(Access to MM)

Your Turn • Assume cache access for an on-chip cache is 5 nsec. • Assume main memory access is 145 nsec. • Access time for a miss is 5 nsec + 145 nsec. • Calculate the access times for a hit rate of • 50% • 90% • 99% • Conclusion: hit rates need to be high. 77.50 nsec 19.50 nsec 6.45 nsec

Basics Main Memory contents address Address can be 32b, MM word can be 32b, but addresses and contents are ontologically different

Virtual Memory • Gives the impression of much more memory than there really is. • Pages memory pages into and out of disk. • Handled by MU (Memory Unit). • Distinguish between virtual addresses and physical addresses.

Virtual Memory • Virtual addresses are 32b long. • Or 64b for a 64b processor. • Physical addresses are smaller. • Correspond to maximum MM-size. • Since most MM is byte addressable, but data is moved in words (4B, 8B, ...), the least significant bits of physical address are not part of the address bus.

Virtual Memory • Can use caches at the virtual memory level • Using virtual memory addresses. • Or at the physical memory level. • Using physical memory addresses. • If nothing is said, assume virtual memory addresses.

Cache Replacement • Which items should be in the cache? • Algorithm needs to be very fast and simple. • Need to implement algorithm in hardware. • Simplest scheme: • If MM item is read or written, put it in the cache. • Throw out old item.

Direct Mapped Cache • Each item in MM can be located in only one position in cache. • MM addresses typically refer to a single byte (an ASCII text character) • For historical reasons • Hard to change • Physically, only complete words are accessed.

Direct Mapped Cache Address 0110 1100 1110 1110 0101 1010 1111 0010 Go to byte 10 (=2dec) In word: 0110 1100 1110 1110 0101 1010 1111 00

Direct Mapped Cache • Tag (highest order bits); • Index; • Byte in word address. • Typically the two least significant bits for 4B per word. Address is split into

Direct Mapped Cache • Tag serves to identify the data item in the cache. • Index is the address of the word in the cache.

Direct Mapped Cache 0110 1100 1110 1110 0101 1010 1111 0010 Cache Index Tag The index tells us where the contents of MM[0110 1100 1110 1110 0101 1010 1111 0010] are stored in the cache. Namely at cache line (location) 01 1010 1111 00

Direct Mapped Cache • Contents of main memory address • 0110 1100 1110 1110 0101 1010 1111 0010 • and of main memory address • 1100 1111 0000 1110 0101 1010 1111 0010 • would be stored at the same location in cache. • To know which one is stored there, keep the tag with the contents.

Direct Mapped Cache 0110 1100 1110 1110 0101 1010 1111 0010 Tag: Identifies item in cache Index: Where the item is in the cache: Cache line / address Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 Contents of MM[...]

Direct Mapped CacheYour Turn • Why are the most significant bits of the address the tag and not the index? • Answer: • A whole region of main memory can be loaded into cache. • Makes sense because of spatial locality. • Neighboring MM addresses have different indices but the same tag. • Otherwise, neighboring MM addresses have different tags and same index, that is, they are competing for the same cache location.

Direct Mapped CacheExample • Memory words are 2B long. • Memory contains 128B and is byte addressable • 128 addressable items. • 27 addresses. • Memory addresses 7 b long. • Cache contains 4 words. • 2 b cache address = index • Memory address split into • 4b tag • 2b index • 1b Byte in word address.

Direct Mapped CacheExample Main Memory contents: 000 0000: FF 000 0001: FF 000 0010: 00 000 0011: 00 000 0100: 00 000 0101: 00 000 0110: FF 000 0111: FF 000 1000: AF 000 1001: AB ... ... Contents of MM: 2B MM address

Direct Mapped CacheExample • Assume item MM[000 0010] is in cache. • Cache contains complete MM line. • Split address into tag, index, and Byte in Word address: • Tag is 0000 • Index is 01 • Byte in Word is 0 000 0010 Byte in Word Index Tag

Direct Mapped CacheExample • View of Cache Cache line Tag Byte 0 Byte 1 00 0000 FF FF 01 0000 00 00 10 0000 00 00 11 1100 AB CD Only this portion is stored. Cache line contains 2.5 B Cache line addresses are implicit.

Direct Mapped Cache • Cache lines contain • Contents • Tags • Some metadata (as we will see). • Distinguish between cache capacity and cache storage needs. • Difference is cache storage overhead.

Direct Mapped Cache • Vocabulary • Byte addressable: one address per byte. • Cache lines: items stored at a single cache address (index).

Direct Mapped CacheYour Turn: • Main Memory • Contains 512 MB. • 8 B in a word. • Byte addressable • What is the length of an address? • Solution • 512M = 29220 = 229 addressable items. • Addresses are 29 bits long.

Direct Mapped CacheYour Turn Cache cache line (8B + tags) • Main Memory • Contains 512 MB. • 8 B in a word. • Byte addressable • Cache • Contains 1 MB • Cache line consists of 1 word (of 8B) • How many cache lines? • How long are indices? • 1M / 8 = 128K = 217 cache lines. • Indices are 17b long. Nr. cache lines

Tag: 9b Index: 17b Byte in Word: 3b Direct Mapped CacheYour Turn • MM address is 29 bits • Index is 17 bits • How is a MM address split up? • Solution: • 8 B in a word  3 bits for “Byte in Word”. • 17 bits for index. • 9 bits for tag.

Direct Mapped CacheYour Turn • What is the cache storage overhead? • Solution • Overhead per cache line is the tag. • Cache line contains 8B contents. • Cache line contains 9b tag. • (Plus possibly metadata, which we ignore.) • Overhead is 9b / 8B = 9/64 = 14.0625 %

Reads from a Cache • Input is MM location • Calculate cache line from MM location • This is were the item might be. • Use the tag to check whether this is the correct item.

Reads from a cache Assume memory address is 0110 1100 1110 1110 0101 1010 1111 0010 Go to cache line 01 1010 1111 00 Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 They are: The result of the look-up is 0101 0101 0101 0101 0101 0101 0101 0101 This is a HIT. Check whether the tags are the same

Reads from a cache Assume memory address is 0110 1111 1110 1110 0101 1010 1111 0010 Go to cache line 01 1010 1111 00 Cache 0110 1100 1110 1110 01 : 0101 0101 0101 0101 0101 0101 0101 0101 Check whether the tags are the same They are not: The requested word is not in the cache. A MISS.

Reads from a Cache • Miss Penalty: • The added time necessary to find the word. • In this case, go to main memory and satisfy it from there.

Miss! Reads from a Cache Give me MM[address] • Miss: • Go to cache • Find out that it is a miss • Go to main memory Processor Here is the result, sorry it took so long. Cache Main Memory Miss Penalty: Time to go to Main Memory.

Your Turn: ? • Why don’t we send requests to both cache and MM at the same time. • This way, cache access and MM access overlap. • There is less miss penalty. • Main memory would be overwhelmed with all these read requests.

Cache Writes • A “write to cache” operation updates • Contents, • Tag field, • If written item replaces another item instead of writing a new value. • Meta data.

Write Policies • Write-through: • A write is performed to both the cache and to the main memory. • Copy-back: • A write is performed only to cache. If an item is replaced by another item, then the item to be replaced is copied back to main memory.

Write Through Processor Simultaneous update Cache Main Memory

Write-Through • Cache and MM always contain the same contents. • When an item is replaced by another one in the cache, there is no need for additional synchronization. • Write traffic to both cache and MM. Processor Cache MM

Cache Operations Write-Through READ: • Extract Tag and Index from Address. • Go to the cache line given by Index. • See whether the Tag matches the Tag stored there. • If they match: Hit. Satisfy read from cache. • If they do not match: Miss. Satisfy read from main memory. Also store item in cache. (Replacement policy, as we will see.)

Cache Operations Write-Through Write: • Extract Tag and Index from address. • Write datum in cache at location given by Index. • Reset the tag field in the cache line with Tag. • Write datum in main memory.

Copy Back • Writes only to cache. • MM and cache are not in the same state after a write. • Need to save values in the cache if item in cache is replaced.

Copy Back • Write item MM[0000 0000 0000 1111 1111 1111 1111 1111]. • Puts item MM[0000 0000 0000 1111 1111 1111 1111 1111]. into cache. • Read item MM[1111 0000 0000 1111 1111 1111 1111 1111]. • Both items have same index  latter item overwrites first item. • First item not updated in cache: it is dirty. • Need to write contents of MM[0000 0000 0000 1111 1111 1111 1111 1111] to MM before putting MM[1111 0000 0000 1111 1111 1111 1111 1111] into cache.

Copy Back • Read item MM[0000 0000 0000 1111 1111 1111 1111 1111]. • Puts item MM[0000 0000 0000 1111 1111 1111 1111 1111]. into cache. • Read item MM[1111 0000 0000 1111 1111 1111 1111 1111]. • Both items have same index  latter item overwrites first item. • First item already in MM. It is clean. • Need to write contents of MM[0000 0000 0000 1111 1111 1111 1111 1111] to MM before putting MM[1111 0000 0000 1111 1111 1111 1111 1111] into cache.

Copy Back • Use a “dirty bit” to distinguish between clean and dirty items. • When an item is put into cache, set the dirty bit to 0. (Item is clean.) • When we write to an item in cache, set the dirty bit to 1. (Item is now dirty.) • When we replace item in cache, read the dirty bit. • If the dirty bit is 0, no synchronization is necessary. • If the dirty bit is 1, write the contents of the item into MM before replacing it.

Copy Back vs. Write Through • Copy Back • Less write traffic to MM. • Reads can be slower. • Possibly need to synchronize cache and MM if replaced item is dirty. • 1b more overhead per cache line (dirty bit). • Write Through • Write traffic at MM can slow down MM speed. • Higher miss penalty. • Fast cache replacement  Fast reads.

Your Turn • Use virtual memory addresses. • Assume 32b = 4B words. • Memory is byte addressable. • What is the storage overhead for a cache with 2MB capacity?

Your Turn • 2MB capacity, 4B per cache line • 2M/4 = 512K = 219 cache lines. • Index is 19b long. • Virtual memory • 32b addresses • 2b for “Byte in Word” • 19b index • 11b tag (11 + 19 +2 = 32)

Your turn • Direct mapped cache • Cache line contains 32b data. • Tag is 11b. • Copy-Through • Additional dirty bit. • Cache overhead per line • 12b / 32b = 37.5%.

Cache Misses • Cache loading (when process starts) • All data (incl. instructions) is in MM. • All accesses are cache misses. • Mandatory misses. • Contention / Conflicts • Process needs two (or more items)that map to the same cache location. • Worst case: all accesses to these items are misses.

COEN 180

COEN 180

Presentation Transcript

COEN 252

COEN 250

COEN 351

COEN 250

COEN 180

COEN 350

COEN 180

COEN 180

COEN 180

COEN 180

COEN 180

COEN 180 Storage Industry Overview

COEN 350

COEN 150

COEN 350

COEN 180

COEN 252

COEN 350

COEN 350

COEN 180