1 / 91

Memory Organization 2: Cache Memories

Memory Organization 2: Cache Memories. CE 140 A1/A2 30 July 2003. Required Reading. Ch 5, Hamacher. Memory Hierarchy. Increasing Speed and Cost Per Bit. Increasing Size. Registers. Caches. Main Memory. Magnetic Disk. Optical Storage. Tape. Principle of Locality of Reference.

hindsd
Download Presentation

Memory Organization 2: Cache Memories

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Organization 2: Cache Memories CE 140 A1/A2 30 July 2003

  2. Required Reading • Ch 5, Hamacher

  3. Memory Hierarchy Increasing Speed and Cost Per Bit Increasing Size Registers Caches Main Memory Magnetic Disk Optical Storage Tape

  4. Principle of Locality of Reference • Programs tend to reuse data and instructions they have used recently • Instructions in localized areas are executed repeatedly • 90% of execution time spent only only 10% of code • “Make the common case fast”  favor accesses to such data • Keep recently accessed data in the fastest memory

  5. Temporal Locality • A recently executed instruction is likely to be executed again very soon

  6. Spatial Locality • Instructions in close proximity to a recently executed instruction are likely to be executed soon

  7. Memory Hierarchy • Provide a memory system with cost almost as low as the cheapest level of memory and speed almost as fast the fastest level • All data in one level is also found in the level below

  8. Memory Hierarchy • Importance increased with advances in performance of processors • 1980: most processors without caches • 1995: two levels of caches • Bridge the processor-memory performance gap

  9. CPU 1000 100 Processor-Memory Performance Gap Performance 10 Memory 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Processor-Memory Gap Source: Computer Architecture: A Quantitative Approach by Patterson/Hennessy

  10. Cache • Small, fast storage used to improve speed of access to slower, larger memory • Exploits spatial and temporal locality

  11. Cache • Temporal Locality: Whenever an item is first needed, it is first brought to the cache, where it will hopefully remain until it is needed again. Also influences choice on which item to discard when cache is full • Spatial Locality: Instead of fetching just one item into the cache, fetch several adjacent data items as well (block/cache line)

  12. Memory Hierarchy Design • Block placement: Where can a block be placed in the upper level? • Block identification: How is a block found if it is in the upper level? • Block replacement: Which block should be replaced on a miss? • Write strategy: What happens on a write?

  13. Where can a block be placed in a cache? • Mapping function determines how a block is placed in the cache

  14. Mapping Functions • Three Types • Direct Mapping • Associative Mapping • Set-Associative Mapping • Examples assume 64K (4K x 16 words) main memory and 2K (128 x 16 words) cache • 1 Block consists of 16 words

  15. Where can a block be placed in a cache? How is a block found? MAIN MEMORY CACHE Block 0 Block 0 TAG Block 1 Block 1 TAG MAPPING FUNCTION Block 127 Block 128 Block 126 TAG Block 127 TAG Word Block 4095 16-Bit Address 12 4

  16. Direct Mapping • Simplest • Block j of main memory maps onto block (j modulo 128) of the cache. • Example: Block 2103 of main memory maps to block (2103 mod 128) = block 55 • Each main memory block has only one place in cache • More than one block contends for only one cache position • Block Address MOD Number of Blocks in Cache

  17. Direct Mapping • 16-bit address (64K words) • 16 words per block  lower 4 bits • Cache block position  middle 7 bits • 32 blocks are mapped to the same word • Higher 5 bits tell which of the 32 blocks are mapped • Higher 5 bits are stored in 5 tag bits associated with cache location

  18. How is a block found if it is in the cache? Direct Mapping • Middle 7 bits select determine which location in cache is used • Higher-order 5 bits are matched with tag bits in cache to check if desired block is the one stored in the cache

  19. Direct Mapping MAIN MEMORY Block 0 CACHE Block 1 Block 0 TAG Block 1 TAG Block 127 Block 128 Block 127 TAG Tag Block Word Block 4095 16-Bit Address 5 7 4

  20. Associative Mapping • A block can be mapped to any available cache location • Higher 12 bits are stored in tag bits

  21. How is a block found if it is in the cache? Associative Mapping • Tag bits (Higher-order 12 bits) of an address are compared with tag bits of each block to check if desired block is present • Higher cost than direct mapping due to need to search all 128 tags • Tags must be searched in parallel for performance reasons

  22. Associative Mapping MAIN MEMORY Block 0 CACHE Block 1 Block 0 TAG Block 1 TAG Block 127 Block 128 Block 127 TAG Tag Word Block 4095 16-Bit Address 12 4

  23. Set-Associative Mapping • Cache blocks are grouped into sets • A main memory block can reside in any block of a specific set • Less contention than direct mapping • Less cost than associative mapping • Set = (Block Address) MOD (Number of Sets in Cache) • k-way set associative cache: k blocks per set

  24. How is a block found if it is in the cache? Set-Associative Mapping • Example: Cache groups two blocks per set  64 sets (6-bit set field) • 64 blocks can be mapped onto one set • Tag bits in each cache block store upper 6 bits of address to tell which of the 64 blocks are currently in the cache

  25. Set-Associative Mapping MAIN MEMORY CACHE Block 0 Block 0 TAG Set 0 Block 1 Block 1 TAG Block 127 Block 128 Block 126 TAG Set 63 Block 127 TAG Tag Set Word Block 4095 16-Bit Address 6 6 4

  26. Levels of Set Associativity • Direct Mapping: 1 block per set  128 sets • Fully Associative Mapping: 128 blocks per set  1 set • Set Associative Mapping is in between Direct and Fully Associative • Different mappings are just different degrees of set associativity

  27. Which block should be replaced on a cache miss? • Replacement Algorithm • Determines which block in the cache is to be replaced in the event of a cache miss and the cache is full • Trivial for direct mapped caches

  28. Which block should be replaced on a cache miss? • Replacement algorithms • Random Replacement • First-In First-Out (FIFO) • Optimal Algorithm • Least Recently Used (LRU) • Least Frequently Used • Most Frequently Used

  29. Example of Replacement Algorithms • Assume • fully associative cache • Reference string • Sequence of block requests • Example • 3 2 3 6 7 3 3 5 4

  30. Random Replacement • Simplest algorithm • Replaces elements at random • Spreads allocation uniformly • Quite effective in some cases

  31. First-In First-Out (2 block-cache) 17 Cache Misses

  32. First-In First-Out (3 block-cache) 14 Cache Misses

  33. First-In First-Out (4 block-cache) 15 Cache Misses

  34. Belady’s Anomaly • Increasing the number of blocks does not decrease the number of cache misses • For replacement algorithms, the number of cache misses may increase as the number of blocks increase

  35. Optimal Algorithm • Replace the page that will not be used for the longest period of time • Guarantees lowest page fault rate for a fixed number of blocks • Needs prior knowledge of reference string

  36. Least Recently Used (LRU) • Overwrite the block that has gone the longest time without being referenced • Cache controller tracks references through counters • Inefficient when accessing sequential elements of a large array

  37. Least-Recently Used (4 block-cache) 12 Cache Misses

  38. Least Frequently Used • Has a counter for the number of references that have been made to a block • Block with least frequency is replaced • FIFO is used as a tie breaker • Rationale: A block that is frequently accessed will be accessed again

  39. Most Frequently Used • Replace the page with the highest count • Rationale: Page with the highest count will no longer be used and page with least count has yet to be used

  40. What happens on a write? • Write policies • Write-through • Write-back

  41. Write-Through • Cache location and main memory location are updated simultaneously • Simpler but results in unnecessary write operations if word is updated many times during its cache residency • Requires only valid bit

  42. Valid Bit • Indicate if block stored in cache is still valid • Set to 1 when block is initially loaded to cache • Transfers from disk to main memory use DMA, bypass cache • When main memory block is updated by a source that bypasses the cache, if block is also in cache, its valid bit is set to 0

  43. Write-Back • Update only the cache location and mark it updated using dirtybit/modified bit • Main memory location is updated later when block is replaced • Writes at the speed of the cache • Also results in unnecessary writes because whole block is written back to memory even if only one word is updated • Requires valid bit and dirty bit

  44. Dirty Bit • Tells whether block in cache has been modified/has newer data than main memory block • Problem: Transfer from main memory to disk bypassing the cache • Solution: Flush the cache (write back all dirty blocks) before DMA transfer begins

  45. What happens on a write miss? • No-write allocate: Data is directly written to main memory • Write allocate: Block is first loaded from cache, then cache block is written to

  46. Write Buffer • Used as temporary holding location for data to be written to memory • Processor need not wait for write to finish • Data in write buffer will be written when memory is available for writing • Works for both write-through and write-back caches

  47. Example of Mapping Techniques • Consider data cache with 8 blocks of data • Each block of data consists of only one word • These are greatly simplified parameters • Consider 4 x 10 array of numbers, arranged in column order • 40 elements = 28h stored from 7A00h to 7A27h

  48. Example of Mapping Techniques 13 Direct Mapped 15 Set-Associative 16 Associative 16-Bit Address 16

  49. Example of Mapping Techniques • Consider the following algorithm • This gets the average of the first row (0), and stores the value of the element divided by the average of all the elements SUM := 0 for j:= 0 to 9 do SUM := SUM + A(0,j) end AVE := SUM / 10 for i:= 9 downto 0 do A(0,i) := A(0,i) / AVE end

  50. Example of Mapping Techniques SUM := 0 for j:= 0 to 9 do SUM := SUM + A(0,j) end AVE := SUM / 10 for i:= 9 downto 0 do A(0,i) := A(0,i) / AVE end

More Related