1 / 48

Chapter 5:

Chapter 5:. 22540 - Computer Arch. & Org. (2). Memory Hierarchy. Memory Hierarchy. Principle of Locality Temporal Locality (Locality in Time) Spatial Locality (Locality in Space) Speed & Size. Memory Hierarchy. Magnetic Disks. CPU. Main Memory. Cache. Cache Memory.

lluvia
Download Presentation

Chapter 5:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 5: 22540 - Computer Arch. & Org. (2) Memory Hierarchy

  2. Memory Hierarchy • Principle of Locality • Temporal Locality (Locality in Time) • Spatial Locality (Locality in Space) • Speed & Size

  3. Memory Hierarchy Magnetic Disks CPU Main Memory Cache

  4. Cache Memory • High Speed (Towards CPU) • Conceals Slow Memory • Small Size (Low Cost) Access = Cache + Mem MainMemory (Slow)Mem Miss CPU Cache(Fast)Cache Hit 95% hit ratio

  5. Cache Memory • CPU – Main Memory Address • Cache Size < Main Memory Size MainMemory 4 GB CPU 32-bit Address Cache1 MB Only 20 bits !!!

  6. Cache Memory MainMemory 00000000 00000001 • • • • • • • • • • 3FFFFFFF Cache 00000 00001 • • • • FFFFF Address Mapping !!!

  7. Associative Memory Cache Location MainMemory 00000000 00000001 • • 00012000 • • 08000000 • • 15000000 • 3FFFFFFF Cache 00000 00001 • • • • FFFFF 00012000 15000000 08000000 Address (Key) Data

  8. Associative Memory Address 00012000 Cache Can have any number of locations 00012000 4 8 Data 6 3 15000000 4 8 1 3 08000000 32 Bits(Key) 8 Bits(Data)

  9. Associative Memory Address 00012000 Cache 00012000 4 8 ? ? ? = = = 6 3 15000000 1 3 08000000 How many comparators? 32 Bits(Key) 8 Bits(Data)

  10. Associative Memory Address 00012000 Cache 1 00012000 4 8 ? ? ? = = = 6 3 1 15000000 1 3 1 08000000 ValidBit 32 Bits(Key) 8 Bits(Data)

  11. Associative Memory 32 Bits Address 0000 0000 0000 0000 0000 0000 1000 1000 Cache 0 • • • 0000 1000 10 4 8 5 4 1 7 6 2 6 3 4 4 8 2 1 9 • • • 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 • • • Data 4 8 32 Bits(Data) 30 Bits(Key)

  12. Direct Mapping Cache Address What happens when Address= 100 00040 000 00040 Cache 00000 000 1 6 00040 Tag Data 7 C 000 1 6 00800 080 0 5 04000 150 FFFFF Compare Match No match 20Bits(Index) 12 Bits(Tag) 8Bits(Data)

  13. Direct Mapping Cache Address 0000 0000 0000 0000 0000 0000 0100 0000 Data 4 8 00000 Cache Select 0 0 0 00010 4 8 5 4 1 7 6 2 Tag 000 4 8 5 4 1 7 6 2 1 3 7 6 2 4 6 8 00200 0 8 0 01000 1 5 0 6 3 4 4 8 2 1 9 3FFFF Compare Match No match 18Bits(Index) 12 Bits(Tag) 32Bits(Data)

  14. Set Associative Cache Address 000 00040 2-Way Set Associative Cache 00000 000 1 6 00040 030 4 9 Data Tag Data Tag 7 C 000 1 6 030 1 6 00800 080 3 1 070 0 5 04000 150 2 0 090 FFFFF Compare Compare 20Bits(ndex) 12 Bits(Tag) 8Bits(Data) Match No match

  15. Cache Size Example: Number of Blocks = 4 K Block Size = 4 Words Word Size = 32 bits Address Size = 32 bits Tag Bits (Direct Mapping Cache) = Tag Bits (2-Way Set Associative) = Tag Bits (4-Way Set Associative) = Tag Bits (Associative Cache) =

  16. Block Size • Increasing Block Size • Utilizes Spatial Locality • Reduces the Number of Blocks

  17. Cache Performance Example: CPU CPI = 2 clocks/instruction Loads & Stores instructions = 36% Instruction Cache = 2% miss rate Data Cache = 4% miss rate Memory Miss Penalty = 100 clocks Instructions Penalties = Data Penalties = CPI (with penalties) = Perfect Cache Speedup =

  18. Cache Performance • Average Memory Access Time (AMAT) • AMAT = Time for a Hit + Miss Rate × Miss Penalty Example: Clock Cycle = 1 ns Cache Access Time (Hit) = 1 ns Cache Miss Penalty = 20 Clocks Miss Rate = 5% AMAT =

  19. Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Direct Mapping) CPU Reference 0 8 0 6 8 Miss Miss Miss Miss Miss Cache 0 Tag 1 2 3

  20. Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 2 blocks (2-Way Set Associative) CPU Reference 0 8 0 6 8 Miss Miss Hit Miss Miss Cache 0 Tag 1

  21. Cache Misses Example: Block Address Sequence = 0, 8, 0, 6, 8 Cache Size = 4 blocks (Associative) CPU Reference 0 8 0 6 8 Miss Miss Hit Miss Hit Cache

  22. Instruction Cache • Cache Miss • Send original PC value to memory • Perform a read operation • Wait for the cache to receive the instruction • Restart instruction execution

  23. Data Cache Writes • Write-Through • Consistent Copies • Slow Mem CPU Cache

  24. Data Cache Writes • Write-Through • Consistent Copies • Slow Example: CPI without miss = 1 Memory delays = 100 clocks 10% of memory references are writes Overall CPI =

  25. Data Cache Writes • Write-Through with Write Buffer • Buffer size • Fill-Rate and Mem-Rate (Possible Stall) Mem CPU Cache

  26. Data Cache Writes • Write-Back • Fast • Complex & inconsistent copies Mem Block Replacement CPU Cache

  27. Data Cache Writes • Write-Back • Fast • Complex & inconsistent copies Mem Miss Block Replacement CPU Cache

  28. Data Cache Writes • Write-Back with Buffer • Reduces the “Miss” penalty by 50% Mem Miss Block Replacement CPU Cache

  29. Cache Replacement Policies • First In First Out (FIFO) • Simple • May replace a block which is used more, leading to a miss • Least Recently Used (LRU) • More complex • Better Hit Rate

  30. Multilevel Cache Example: CPU CPI = 1 @ 4 GHz  0.25 ns Clock Primary Cache Miss Rate = 2% Memory Access Time = 100 ns  400 Clocks CPI (Single Level Cache) = Secondary Cache Miss Rate = 0.5% Secondary Cache Access Time = 5 ns  20 Clocks CPI (2-Level Cache) =

  31. Main Memory • Latency & Bandwidth • Address (Selection of row & column) • Data Transfer (Number of bits) Example: Send Address = 1 clock Memory Access = 15 clocks Transfer a 32-bit Word = 1 clock Cache Block = 4 words Cache Miss Memory Bandwidth =

  32. Main Memory • Simple Design • Wide Bus • Interleaved Example: 1 Word 1 Word Mem CPU Cache 1 Word 2 Words Mem CPU Cache Mem 1 Word 1 Word CPU Cache Mem

  33. DRAM Technology

  34. Virtual Memory • Allow Efficient & Safe Sharing of Memory • Memory Protection • Program Relocatability • Remove Programming Burdens of Small Memory • Much Larger Memory Space • Reuse Physical Memory

  35. Virtual Memory Segmentation • Segments • Variable Size • Two-Part Address 0 • • • 000 0 • • • 00 Segment1 Segment0 0 • • • 001 0 • • • 01 Frame 0 • • • • • • Segment1 31 • • • 0 ?? • • • 0 Segment0 Frame 1 Offset Segment Number • • • • • • Translation 23 • • • • • • • • • 0 Offset Segment #

  36. Virtual Memory Paging • Virtual Memory • Pages • Stored on Disk • Virtual Address • Physical Memory • Frames • Stored in RAM • Physical Address • Page Faults 0 • • • 000 0 • • • 00 Page 0 0 • • • 001 0 • • • 01 Page 0 Frame 0 • • • • • • Page 1 Page 1 Frame 1 • • • • • •

  37. Virtual Memory Paging • AddressTranslation 0 • • • 000 0 • • • 00 Page 1 0 • • • 001 0 • • • 01 Page 0 Frame 0 • • • • • • Page 0 31 • • • 12 11 • • • 0 Page 1 Frame 1 Page Offset Virtual Page Number Virtual Address • • • • • • Translation 23 • • • 12 11 • • • 0 Page Offset Physical Page # Physical Address

  38. Paging Table • Page Table • Virtual to Physical Page Number Translation • Stored in RAM • Page Table Register Page Table Register

  39. Paging Table • Page Faults • Swap Space • Reserved space for full virtual memory space for a process • Stored on Disk • Page Table • LRU Replacement Scheme Disk Storage 0 1 2 • • • Virtual Page Number

  40. Paging Table • Page Table Size Example: Virtual Address: 32 bits Page Size: 4 KB Page Table: 4 Bytes/Entry Number of Pages = Page Table Size = Disk Storage 0 1 2 • • • Virtual Page Number

  41. Translation-Lookaside Buffer (TLB) • Address Translation Cache Virtual Page Number Disk Storage

  42. Virtual Memory Misses • TLB Miss • Page Fault • Cache Miss Virtual Address Hit Hit Hit Miss Miss Miss PageTable TLB Cache PageFault UpdateTLB

  43. Memory Hierarchy Misses • Compulsory Miss • Capacity Miss • Conflict Miss

  44. Parallelism & Cache Coherence • Coherence • What values can be returned by a read • Consistency • When a written value will be returned by a read Main Memory 0 0 Cache Cache 0 0 0 0 0 1 Processor Processor

  45. Cache Coherence Enforcement • Migration (of Data to Local Caches) • Reduces latency & bandwidth for shared memory. • Replication (of Read-shared Data) • Reduces latency & contention for access

  46. Cache Coherence Protocol • Snooping • Each cache monitors bus reads/writes. • Processors exchange full blocks. • Large block sizes may lead to false sharing. Main Memory 0 1 0 0 Invalidate Cache Cache 0 0 0 0 0 1 0 1 Processor Processor

  47. Cache Coherence Protocol • Directory-based protocols • Caches and memory record sharing status of blocks in a directory.

  48. Chapter 5 The End

More Related