1 / 28

COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches

COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches. Siddhartha Chatterjee Fall 2000. Control. Datapath. The Big Picture: Where are We Now?. The Five Classic Components of a Computer This unit: Memory System. Processor. Input. Memory. Output. Memory System.

joannj
Download Presentation

COMP 206 Computer Architecture and Implementation Unit 8a: Basics of Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 206Computer Architecture and ImplementationUnit 8a: Basics of Caches Siddhartha Chatterjee Fall 2000

  2. Control Datapath The Big Picture: Where are We Now? • The Five Classic Components of a Computer • This unit: Memory System Processor Input Memory Output Siddhartha Chatterjee

  3. Memory System Processor DRAM Cache The Motivation for Caches • Motivation • Large memories (DRAM) are slow • Small memories (SRAM) are fast • Make the average access time small by servicing most accesses from a small, fast memory • Reduce the bandwidth required of the large memory Siddhartha Chatterjee

  4. Probability of reference 0 Address Space 2n The Principle of Locality • The Principle of Locality • Program access a relatively small portion of the address space at any instant of time • Example: 90% of time in 10% of the code • Two different types of locality • Temporal Locality (locality in time): If an item is referenced, it will tend to be referenced again soon • Spatial Locality (locality in space): If an item is referenced, items whose addresses are close by tend to be referenced soon Siddhartha Chatterjee

  5. Capacity Access Time Cost/bit Upper Level Staging Transfer Unit Faster CPU Registers 100s Bytes 1-5 ns ~$.05 Registers programmer/compiler 1-8 bytes Words Cache K-M Bytes 1-30 ns ~$.0005 L1, L2, … Cache cache controller 8-128 bytes Blocks Main Memory M-G Bytes 100ns-1s ~$.000001 Memory OS 4-64K bytes Pages Disk 1-30 G Bytes 3-15 ms 10-4- 10-5 cents Disk user/operator Mbytes Files Larger Tape/Network “infinite” sec-min 10-6 cents Tape/Network Lower Level Levels of the Memory Hierarchy Siddhartha Chatterjee

  6. Memory Hierarchy: Principles of Operation • At any given time, data is copied between only 2 adjacent levels • Upper Level (Cache) : the one closer to the processor • Smaller, faster, and uses more expensive technology • Lower Level (Memory): the one further away from the processor • Bigger, slower, and uses less expensive technology • Block • The minimum unit of information that can either be present or not present in the two-level hierarchy Lower Level (Memory) Upper Level (Cache) To Processor Blk X From Processor Blk Y Siddhartha Chatterjee

  7. Memory Hierarchy: Terminology • Hit: data appears in some block in the upper level (example: Block X in previous slide) • Hit Rate: the fraction of memory access found in the upper level • Hit Time: Time to access the upper level which consists of • RAM access time + Time to determine hit/miss • Miss: data needs to be retrieve from a block in the lower level (Block Y on previous slide) • Miss Rate = 1 - (Hit Rate) • Miss Penalty = Time to replace a block in the upper level + • Time to deliver the block the processor • Hit Time << Miss Penalty Siddhartha Chatterjee

  8. Set 0 Set j-1 Block 0 Block k-1 Replacement info Sector 0 Sector m-1 Tag Byte 0 Byte n-1 Valid Dirty Shared Cache Addressing • Block/line is unit of allocation • Sector/sub-block is unit of transfer and coherence • Cache parameters j, k, m, n are integers, and generally powers of 2 Siddhartha Chatterjee

  9. Examples of Cache Configurations Memory address Block address Block offset Tag Set Sector Byte Siddhartha Chatterjee

  10. Storage Overhead of Cache Siddhartha Chatterjee

  11. Basics of Cache Operation Siddhartha Chatterjee

  12. Details of Simple Blocking Cache Write Through Write Back Siddhartha Chatterjee

  13. 31 9 4 0 Cache Tag Example: 0x50 Cache Index Byte Select Ex: 0x01 Ex: 0x00 Stored as part of the cache “state” Cache Data : Byte 31 Byte 1 Byte 0 0 Valid Bit Cache Tag : Byte 63 Byte 33 Byte 32 1 0x50 2 3 : : : : Byte 1023 Byte 992 31 Byte Select Example: 1KB, Direct-Mapped, 32B Blocks • For a 1024 (210) byte cache with 32-byte blocks • The uppermost 22 = (32 - 10) address bits are the tag • The lowest 5 address bits are the Byte Select (Block Size = 25) • The next 5 address bits (bit5 - bit9) are the Cache Index Siddhartha Chatterjee

  14. Cache Tag Byte Select 31 9 Cache Index 4 0 0x0002fe 0x00 0x00 Valid Bit Cache Tag Cache Data : 0 0xxxxxxx Byte 31 Byte 1 Byte 0 0 : 1 0x000050 Byte 63 Byte 33 Byte 32 1 1 0x004440 2 3 : : : : Byte 1023 Byte 992 31 Byte Select = Cache Miss Example 1a: 1KB Direct Mapped Cache Siddhartha Chatterjee

  15. Cache Tag 31 9 Cache Index 4 Byte Select 0 0x0002fe 0x00 0x00 Valid Bit Cache Tag Cache Data 1 0x0002fe New Block of data 0 : 1 0x000050 Byte 63 Byte 33 Byte 32 1 1 0x004440 2 3 : : : : Byte 1023 Byte 992 31 Byte Select = Example 1b: 1KB Direct Mapped Cache Siddhartha Chatterjee

  16. Cache Tag 31 9 Cache Index 4 Byte Select 0 0x000050 0x01 0x08 Valid Bit Cache Tag Cache Data : 1 0x0002fe Byte 31 Byte 1 Byte 0 0 : 1 0x000050 Byte 63 Byte 33 Byte 32 1 1 0x004440 2 3 : : : : Byte 1023 Byte 992 31 Byte Select = Cache Hit Example 2: 1KB Direct Mapped Cache Siddhartha Chatterjee

  17. Cache Tag Byte Select 31 9 Cache Index 4 0 0x002450 0x02 0x04 Valid Bit Cache Tag Cache Data : 1 0x0002fe Byte 31 Byte 1 Byte 0 0 : 1 0x000050 Byte 63 Byte 33 Byte 32 1 1 0x004440 2 3 : : : : Byte 1023 Byte 992 31 Byte Select = Cache Miss Example 3a: 1KB Direct Mapped Cache Siddhartha Chatterjee

  18. Cache Tag Byte Select 31 9 Cache Index 4 0 0x002450 0x02 0x04 Valid Bit Cache Tag Cache Data : 1 0x0002fe Byte 31 Byte 1 Byte 0 0 : 1 0x000050 Byte 63 Byte 33 Byte 32 1 1 0x002450 New Block of data 2 3 : : : : Byte 1023 Byte 992 31 Byte Select = Example 3b: 1KB Direct Mapped Cache Siddhartha Chatterjee

  19. Cache Index Valid Cache Tag Cache Data Cache Data Cache Tag Valid Cache Block 0 Cache Block 0 : : : : : : Addr. Tag Addr. Tag Compare 1 0 Compare Mux SEL1 SEL0 OR Cache Block Hit A-way Set-Associative Cache • A-way set associative: A entries for each Cache Index • A direct-mapped caches operating in parallel • Example: Two-way set associative cache • Cache Index selects a “set” from the cache • The two tags in the set are compared in parallel • Data is selected based on the tag result Siddhartha Chatterjee

  20. 31 4 0 Cache Tag (27 bits long) Byte Select Ex: 0x01 Cache Tag Valid Bit Cache Data : X Byte 31 Byte 1 Byte 0 : X Byte 63 Byte 33 Byte 32 X X : : : X Fully Associative Cache • Push the set-associative idea to its limit! • Forget about the Cache Index • Compare the Cache Tags of all cache tag entries in parallel • Example: Block Size = 32B, we need N 27-bit comparators Siddhartha Chatterjee

  21. Cache Shapes Direct-mapped (A = 1, S = 16) 2-way set-associative (A = 2, S = 8) 4-way set-associative (A = 4, S = 4) 8-way set-associative (A = 8, S = 2) Fully associative (A = 16, S = 1) Siddhartha Chatterjee

  22. Need for Block Replacement Policy • Direct Mapped Cache • Each memory location can only mapped to 1 cache location • No need to make any decision :-) • Current item replaces previous item in that cache location • N-way Set Associative Cache • Each memory location have a choice of N cache locations • Fully Associative Cache • Each memory location can be placed in ANY cache location • Cache miss in a N-way Set Associative or Fully Associative Cache • Bring in new block from memory • Throw out a cache block to make room for the new block • We need to decide which block to throw out! Siddhartha Chatterjee

  23. Entry 0 Entry 1 Replacement : Pointer Entry 63 Cache Block Replacement Policies • Random Replacement • Hardware randomly selects a cache item and throw it out • Least Recently Used • Hardware keeps track of the access history • Replace the entry that has not been used for the longest time. • For two-way set-associative cache one needs one bit for LRU replacement • Example of a Simple “Pseudo” LRU Implementation • Assume 64 Fully Associative entries • Hardware replacement pointer points to one cache entry • Whenever an access is made to the entry the pointer points to: • Move the pointer to the next entry • Otherwise: do not move the pointer Siddhartha Chatterjee

  24. Cache Write Policy • Cache read is much easier to handle than cache write • Instruction cache is much easier to design than data cache • Cache write • How do we keep data in the cache and memory consistent? • Two options (decision time again :-) • Write Back: write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss • Need a “dirty bit” for each cache block • Greatly reduce the memory bandwidth requirement • Control can be complex • Write Through: write to cache and memory at the same time • What!!! How can this be? Isn’t memory too slow for this? Siddhartha Chatterjee

  25. Cache Processor DRAM Write Buffer Write Buffer for Write Through • A Write Buffer is needed between cache and main memory • Processor: writes data into the cache and the write buffer • Memory controller: write contents of the buffer to memory • Write buffer is just a FIFO • Typical number of entries: 4 • Works fine if store frequency (w.r.t. time) << 1 / DRAM write cycle • Memory system designer’s nightmare • Store frequency (w.r.t. time) > 1 / DRAM write cycle • Write buffer saturation Siddhartha Chatterjee

  26. Cache Processor DRAM Write Buffer Cache L2 Cache Processor DRAM Write Buffer Write Buffer Saturation • Store frequency (w.r.t. time) > 1 / DRAM write cycle • If this condition exist for a long period of time (CPU cycle time too quick and/or too many store instructions in a row) • Store buffer will overflow no matter how big you make it • CPU Cycle Time << DRAM Write Cycle Time • Solutions for write buffer saturation • Use a write back cache • Install a second level (L2) cache Siddhartha Chatterjee

  27. 31 9 4 0 Cache Tag Example: 0x00 Cache Index Byte Select Ex: 0x00 Ex: 0x00 Dirty-bit Valid Bit Cache Tag Cache Data : 0x00 Byte 31 Byte 1 Byte 0 0 : Byte 63 Byte 33 Byte 32 1 2 3 : : : : Byte 1023 Byte 992 31 Write Allocate versus Not Allocate • Assume that a 16-bit write to memory location 0x00 causes a cache miss • Do we read in the block? • Yes: Write Allocate • No: Write No-Allocate Siddhartha Chatterjee

  28. Four Questions for Memory Hierarchy • Where can a block be placed in the upper level? (Block placement) • How is a block found if it is in the upper level? (Block identification) • Which block should be replaced on a miss? (Block replacement) • What happens on a write? (Write strategy) Siddhartha Chatterjee

More Related