1 / 25

15-213 Recitation 6 – 3/11/02

This recitation provides an overview of cache organization, including cache review, cache replacement policies, and cache on multiprocessors. It also covers cache addressing, cache parameters, and examples of cache parameters determination. Additionally, it discusses replacement policies for set-associative caches and provides examples of optimal, FIFO, LRU, and pseudo LRU policies. The recitation concludes with an introduction to cache organization in multiprocessor systems.

talvin
Download Presentation

15-213 Recitation 6 – 3/11/02

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 15-213 Recitation 6 – 3/11/02 Outline • Cache organization review • Cache replacement policies • Cache on multiprocessors James Wilson e-mail: wilson2@andrew.cmu.edu Office Hours: Friday 1:30 – 3:00 Wean 52xx cluster • Reminders • Lab 4 due – 3/12, 11:59pm

  2. Cache organization (review) t tag bits per line 1 valid bit per line B = 2b bytes per cache block Cache is an array of sets. Each set contains one or more lines. Each line holds a block of data. valid tag 0 1 • • • B–1 E lines per set • • • set 0: valid tag 0 1 • • • B–1 valid tag 0 1 • • • B–1 • • • set 1: S = 2s sets valid tag 0 1 • • • B–1 • • • valid tag 0 1 • • • B–1 • • • set S-1: valid tag 0 1 • • • B–1

  3. Addressing the cache (review) Address A: Address A is in the cache if its tag matches one of the valid lines in the set associated with the set index of A b bits t bits s bits m-1 0 <tag> <set index> <line offset> v tag 0 1 • • • B–1 • • • set s: v tag 0 1 • • • B–1

  4. Parameters of cache organization • B = 2b = line size • E = associativity • S = 2s = number of sets • Cache size = B × E × S • Other parameters: • s = set index • b = byte offset • t = tag • m = address size • t + s + b = m

  5. Determining cache parameters • Suppose we are told we have a 8 KB, direct-map cache with 64 byte lines, and the word size is 32 bits. • A direct-map cache has an associativity of 1. • What are the values of t, s, and b? • B = 2b = 64, so b = 6 • B × E × S = C = 8192 (8 KB), and we know E = 1 • S = 2s = C / B = 128, so s = 7 • t = m – s – b = 32 – 6 – 7 = 19 t = 19 s = 7 b = 6 31 12 5 0

  6. One more example • Suppose our cache is 16 KB, 4-way set associative with 32 byte lines. These are the parameters to the L1 cache of the P3 Xeon processors used by the fish machines. • B = 2b = 32, so b = 5 • B × E × S = C = 16384 (16 KB), and E = 4 • S = 2s = C / (E × B) = 128, so s = 7 • t = m – s – b = 32 – 5 – 7 = 20 t = 20 s = 7 b = 5 31 11 4 0

  7. Replacement Policy • Replacement policy: • Determines which cache line to be evicted • Matters for set-associate caches • Non-exist for direct-mapped cache

  8. Example • Assuming a 2-way associative cache, determine the number of misses for the following trace. A .. B .. C .. A .. B .. C .. B .. A .. B .. D ... A, B, C, D all mapped to the same set.

  9. Ideal Case: OPTIMAL • Policy 0: OPTIMAL • Replace the cache line that is accessed furthest in the future • Properties: • Knowledge of the future • The best case scenario

  10. A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ 6 Ideal Case: OPTIMAL Optimal A B C A B C B A B D # of Misses

  11. Policy 1: FIFO • Policy 1: FIFO • Replace the oldest cache line

  12. A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ 6 9 Policy 1: FIFO Optimal FIFO A B C A B C B A B D # of Misses

  13. Policy 2: LRU • Policy 2: Least-Recently Used • Replace the least-recently used cache line • Properties: • Approximate the OPTIMAL policy by predicting the future behavior using the past behavior • The least-recently used cache line will not be likely to be accessed again in near future

  14. A, + A,B+ A,C+ A,C B,C+ B,C B,C B,A+ B,A D,A+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C A,C+ A,B+ D,B+ A, + A,B+ C,B+ C,A+ B,A+ B,C+ B,C B,A+ B,A B,D+ 6 9 8 Policy 2: LRU Optimal FIFO LRU A B C A B C B A B D # of Misses

  15. Realty: Pseudo LRU • Realty • LRU is hard to implement • Pseudo LRU is implemented as an approximation of LRU • Pseudo LRU • Each cache line is equipped with a bit • The bit is cleared periodically • The bit is set when the cache line is accessed • Evict the cache line that has the bit unset

  16. Example 1: Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, 6 bit addresses (t=2,s=2,b=2):

  17. Direct Mapped Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Assume Direct mapped cache, 4 four-byte lines, Final state:

  18. Example 2: Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte blocks (t=4,s=2,b=0):

  19. Set Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Four-way set associative, 4 sets, one-byte block, Final state:

  20. V Tag Byte 0 Byte 1 Byte 2 Byte 3 Example 3: Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2):

  21. V Tag Byte 0 Byte 1 Byte 2 Byte 3 1 1010 40 41 42 43 1 0010 8 9 10 11 1 0100 16 17 18 19 1 0001 4 5 6 7 Fully Associative Cache Reference String 1 4 8 5 20 17 19 56 9 11 4 43 5 6 9 17 Fully associative, 4 four-word blocks (t=4,s=0,b=2): Note: Used LRU eviction policy

  22. Multiprocessor Systems • Multiprocessor systems are common, but they are not as easy to build as “adding a processor” • Might think of a multiprocessor system like this: Processor 1 Memory Processor 2

  23. The Problem… • Caches can become unsynchronized • Big problem for any system. Memory should be viewed consistently by each processor Processor 1 Cache Memory Processor 2 Cache

  24. Cache Coherency • Imagine that each processor’s cache could see what the other is doing • Both of them could stay up to date (“coherent”) • How they manage to do so is a “cache coherency protocol” • The most widely used protocol is MESI • MESI = Modified Exclusive Shared Invalid • Each of these is a state for each cache line • Invalid – Data is invalid and must be retrieved from memory • Exclusive – This processor has exclusive access to the data • Shared – Other caches have copies of the data • Modified – This cache holds a modified copy of the data (other caches do not have the updated copy)

  25. MESI Protocol

More Related