Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multi-level Caches

Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multi-level Caches Soon Tee Teoh CS 147

Cache Replacement Policy • For direct-mapped cache, if a word is to be loaded to the cache, it goes into a fixed position, and replaces whatever was there before. • For set-associative or fully associative cache, a word can be loaded to more than one possible locations. We need a cache replacement policy to decide which position to load the new word to.

Cache Replacement Policy • 3 common options: • Random replacement • Simple • But perhaps high miss rate • First in, first out (FIFO) • Rationale: Oldest is most likely not needed anymore • Implementation: Maintain a queue • Least recently used (LRU) • Rationale: The one that has been unused for the longest time is most likely not needed anymore • Implementation: Usually costly to implement

Assumptions for following examples • Assume 1 KB (that is, 1024 bytes) of main memory • Assume 4-byte words • Assume 32-word cache • Assume memory is byte-addressed • Therefore 1 KB memory needs 10-bit address

Line Size • Rather than fetching a single word from memory to cache, fetch a whole block of l words, called a line. • This takes advantage of spatial locality. • The number of words in a line is a power of two. Example: 4 words in a line 9 8 7 6 5 4 3 2 1 0 Tag Index Line Word

Line Size Example 2-way set associative cache, with 4-word lines, and capacity to contain 32 words 9 8 7 6 5 4 3 2 1 0 Tag Index Line Word Tag 2 Line 2 Index Tag 1 Line 1 00 01 10 11 1111 0011 Scenario: Suppose CPU is requesting the word at memory address 1010110100. Step 1: the index is 11, therefore look at the two tags at index 11. Suppose Tag 1 at index 11 is 1111 and Tag 2 is 0011. Since both tags don’t match tag 1010 requested by CPU, we load words 1010110000 through 1010111100 to the cache.

Line Size Example (continued) Suppose that the cache replacement policy determines that we should replace Set 1, then the line will be loaded into Set 1, and the tag changed. Tag 2 Line 2 Index Tag 1 Line 1 00 01 10 11 1010 0011 Word 1010110000 Word 1010110100 Word 1010111000 Word 1010111100

Line Size Example (continued) The CPU memory request was word 1010110100. Therefore the second word (highlighted) of the line is loaded to the CPU. Tag 2 Line 2 Index Tag 1 Line 1 00 01 10 11 1010 0011 If after this the CPU accesses any other word in the line, it will be found in the cache.

Cache Write Method • Suppose the CPU wants to write a word to memory. • If the memory unit uses the cache, it has several options. • 2 commonly-used options: • Write-through • Write-back

Write-Through • On a write request by the CPU, check if the old data is in the cache. • If the old data is in the cache (Write Hit), write the new data into the cache, and also into memory, replacing the old data in both cache and memory. • If the old data is not in the cache (Write Miss), either: • Load the line to cache, and write the new data to both cache and memory (this method is called write-allocate), or • Just write the new data to memory, don’t load the line to cache (this method is called no-write-allocate) • Advantage: Keeps cache and memory consistent • Disadvantage: Needs to stall for memory access on every memory write • To reduce this problem, use a write buffer. When the CPU wants to write a word to memory, it puts the word into the write buffer, and then continues executing the instructions following the memory write. Simultaneously, the write buffer writes the words to memory.

Write-Back • When the instruction requires a write to memory, • If there is a cache hit, write only to the cache. • Later, if there is a cache miss, and this line needs to be replaced, write the data back to memory. • If there is a cache miss, either • Load the line to cache, and write the new data to both cache and memory (this method is called write-allocate), or • Just write the new data to memory, don’t load the line to cache (this method is called no-write-allocate) • In the Write-back method, we have a dirty bit associated with each cache entry. If the dirty bit is set to 1, we need to write back to memory when this cache entry needs to be replaced. If the dirty bit is 0, we don’t need to write back to memory, saving CPU stalls • Disadvantage of Write-Back approach: Inconsistency - memory contains stale data • Note: Write-allocate is usually used with write-back. No-write-allocate is usually used with write-through.

Cache Loading • In the beginning, the cache contains junk. • When the CPU makes a memory access, it compares the tag field in the memory address to the tag in the cache. Even if the tags match, we don’t know if the data is valid. • Therefore, we add a valid bit to each cache entry. • In the beginning, all the valid bits are set to 0. • Later, as data are loaded from memory to cache, the valid bit for the cache entry is set to 1. • To check if a word is in the cache, we have to check if the cache tag matches the address tag, and that the valid bit is 1.

Instruction and Data Caches • Can either have separate Instruction Cache and Data Cache, or have one unified cache. • Advantage of separate cache: Can access Instruction Cache and Data Cache simultaneously in the same cycle, as required by a pipelined datapath • Advantage of unified cache: More flexible, so may have a higher hit rate

Multiple-Level Caches • More levels in the memory hierarchy • Can have two levels of cache • The Level-1 cache (or L1 cache, or internal cache) is smaller and faster, and lies in the processor next to the CPU. • The Level-2 cache (or L2 cache, or external cache) is larger but slower, and lies outside the processor. • Memory access first goes to the L1 cache. If L1 cache access is a miss, go to L2 cache. If L2 cache is a miss, go to main memory. If main memory is a miss, go to virtual memory on hard disk.

Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multi-level Caches

Lecture 19: Cache Replacement Policy, Line Size, Write Method, and Multi-level Caches

Presentation Transcript

Lecture 9

Final Lecture: Fiscal Policy

Groundwater flow to wells

Lecture III: Collective Behavior of Multi -Agent Systems: Analysis

Module III: Profiling Health Care Providers – A Multi-level Model Application

Lecture #5

Cache

RTI Implementer Series Module 3: Multi-Level Prevention System

Lecture 8 OOP v.s . FP, Subtyping

Lecture 3 (Complexities of Parallelism)

Drawing In One-Point Perspective

Policy Analysis

Lecture 10

This lecture will help you understand:

Lecture 25: Multi-view stereo, continued

Four Square Writing Method

第四章、存储系统 -2

Cache Memories

How to take:

CS 147 Cache Memory