1 / 14

CMPE 421 Advanced Computer Architecture

CMPE 421 Advanced Computer Architecture. Caching with Associativity PART2. V. Tag. Data. V. Tag. Data. 0:. 1:. 2. 3:. 4:. 5:. 6:. 7:. 8. 9:. 10:. 11:. 12:. 13:. 14:. 15:. Other Cache organizations. Fully Associative. Direct Mapped. Index. No Index.

Download Presentation

CMPE 421 Advanced Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPE 421 Advanced Computer Architecture Caching with Associativity PART2

  2. V Tag Data V Tag Data 0: 1: 2 3: 4: 5: 6: 7: 8 9: 10: 11: 12: 13: 14: 15: Other Cache organizations Fully Associative Direct Mapped Index No Index Each address has only one possible location Address = Tag | Index | Block offset Address = Tag | Block offset

  3. Fully Associative Cache

  4. V Tag Data V Tag Data 0: 0: 1: 1: 2: 3: 2: 4: 5: 3: 6: 7: A Compromise 4-Way set associative 2-Way set associative Each address has four possible locations with the same index Each address has two possible locations with the same index One fewer index bit: 1/2 the indexes Two fewer index bits: 1/4 the indexes Address = Tag | Index | Block offset Address = Tag | Index | Block offset

  5. Used for tag compare Selects the set Selects the word in the block Increasing associativity Decreasing associativity Fully associative (only one set) Tag is all the bits except block and byte offset Direct mapped (only one way) Smaller tags Range of Set Associative Caches Tag Index Block offset Byte offset

  6. Set Associative Cache Main Memory 0000xx 0001xx 0010xx 0011xx 0100xx 0101xx 0110xx 0111xx 1000xx 1001xx 1010xx 1011xx 1100xx 1101xx 1110xx 1111xx Two low order bits define the byte in the word (32-b words) One word blocks Cache Way Set V Tag Data 0 0 1 0 1 1 Q1: How do we find it? Use next 1 low order memory address bit to determine which cache set (i.e., modulo the number of sets in the cache) Q2: Is it there? Compare all the cache tags in the set to the high order 3 memory address bits to tell if the memory block is in the cache (block address) modulo (# set in the cache)

  7. Set Associative Cache Organization FIGURE 7.17 The implementation of a four-way set-associative cache requires four comparators and a 4-to-1 multiplexor. The comparators determine which element of the selected set (if any) matches the tag. The output of the comparators is used to select the data from one of the four blocks of the indexed set, using a multiplexor with a decoded select signal. In some implementations, the Output enable signals on the data portions of the cache RAMs can be used to select the entry in the set that drives the output. The Output enable signal comes from the comparators, causing the element that matches to drive the data outputs.

  8. 01 4 00 01 0 4 00 0 01 4 00 0 01 4 Remember the Example for Direct Mapping (ping pong effect) • Consider the main memory word reference string 0 4 0 4 0 4 0 4 Start with an empty cache - all blocks initially marked as not valid miss miss miss miss 0 4 0 4 00 Mem(0) 00 Mem(0) 01 Mem(4) 00 Mem(0) 4 0 4 0 miss miss miss miss 01 Mem(4) 00 Mem(0) 01 Mem(4) 00 Mem(0) • 8 requests, 8 misses • Ping pong effect due to conflict misses - two memory locations that map into the same cache block

  9. Solution: Use set associative cache • Consider the main memory word reference string 0 4 0 4 0 4 0 4 Start with an empty cache - all blocks initially marked as not valid miss miss hit hit 0 4 0 4 000 Mem(0) 000 Mem(0) 000 Mem(0) 000 Mem(0) 010 Mem(4) 010 Mem(4) 010 Mem(4) • 8 requests, 2 misses • Solves the ping pong effect in a direct mapped cache due to conflict misses since now two memory locations that map into the same cache set can co-exist!

  10. Index Index Index V Tag Data V Tag Data V Tag Data 000: 0 0 0 00: 001: 0 0 0 0: 010: 0 0 0 01: 011: 0 0 0 100: 0 0 0 10: 101: 0 0 0 1: 110: 0 0 0 11: 111: 0 0 0 Byte offset (2 bits)Block offset (2 bits)Index (1-3 bits)Tag (3-5 bits) Set Associative Example 0100111000 0100111000 0100111000 Miss Miss Miss Miss Miss Miss 1100110100 1100110100 1100110100 Miss Hit Hit 0100111100 0100111100 0100111100 Miss Miss Miss 0110110000 0110110000 0110110000 1100111000 Miss 1100111000 Miss 1100111000 Hit - 010 011 110 110 010 1 - 01001 1 - 11001 1 - 01101 - 0100 1100 1 1 - 1100 0110 1 Direct-Mapped 2-Way Set Assoc. 4-Way Set Assoc.

  11. New Performance Numbers Miss rates for DEC 3100 (MIPS machine) Separate 64KB Instruction/Data Caches Benchmark Associativity Instruction Data miss Combined rate miss rate gcc Direct 2.0% 1.7% 1.9% gcc 2-way 1.6% 1.4% 1.5% gcc 4-way 1.6% 1.4% 1.5% spice Direct 0.3% 0.6% 0.4% spice 2-way 0.3% 0.6% 0.4% spice 4-way 0.3% 0.6% 0.4%

  12. Benefits of Set Associative Caches • The choice of direct mapped or set associative depends on the cost of a miss versus the cost of implementation Data from Hennessy & Patterson, Computer Architecture, 2003 • Largest gains are in going from direct mapped to 2-way (20%+ reduction in miss rate)

  13. 0 0 31 13 12 12 Virt.Pg.# Disk Address V Phys. Page # 0 1 2 ... ... 512K 23 13 Virtual Memory (32-bit system): 8KB page size,16MB Mem Virtual Address 13 19 219=512K 4GB / 8KB =512K entries Page offset Index 11 Physical Address

  14. Virtual Page # Valid Physical Page #/(index) Bit Disk address000000 1 1001000001 0 sector 5000...000010 1 0010000011 0 sector 4323…000100 1 1011000101 1 1010000110 0 sector 1239...000111 1 0001 Virtual memory example System with 20-bit V.A., 16KB pages, 256KB of physical memory Page offset takes 14 bits, 6 bits for V.P.N. and 4 bits for P.P.N. Page Table: Access to:000010001100 1010 1010 PPN = 0010 Physical Address:0010001100 1010 1010 Access to:00011001 0011 1100 0000 sector xxxx... 0 PPN = Page Fault tosector 1239... 1010 1 Read data from sector 1239into PPN 1010 Pick a page to “kick out” of memory (use LRU). Assume LRU is VPN 000101for this example.

More Related