1 / 70

Large and Fast: Exploiting Memory Hierarchy

Large and Fast: Exploiting Memory Hierarchy. Memory Technology. Static RAM (SRAM) 0.5ns – 2.5ns, $2000 – $5000 per GB Dynamic RAM (DRAM) 50ns – 70ns, $20 – $75 per GB Magnetic disk 5ms – 20ms, $0.20 – $2 per GB Ideal memory Access time of SRAM Capacity and cost/GB of disk.

min
Download Presentation

Large and Fast: Exploiting Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large and Fast: Exploiting Memory Hierarchy

  2. Memory Technology • Static RAM (SRAM) • 0.5ns – 2.5ns, $2000 – $5000 per GB • Dynamic RAM (DRAM) • 50ns – 70ns, $20 – $75 per GB • Magnetic disk • 5ms – 20ms, $0.20 – $2 per GB • Ideal memory • Access time of SRAM • Capacity and cost/GB of disk

  3. Principle of Locality • Programs access a small proportion of their address space at any time • Temporal locality • Items accessed recently are likely to be accessed again soon • e.g., instructions in a loop, induction variables • Spatial locality • Items near those accessed recently are likely to be accessed soon • E.g., sequential instruction access, array data

  4. Taking Advantage of Locality • Memory hierarchy • Store everything on disk • Copy recently accessed (and nearby) items from disk to smaller DRAM memory • Main memory • Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory • Cache memory attached to CPU

  5. Mapping Function Cache size = 64KB  216 Bytes Cache block = 4B  22 Bytes  1 word Number of lines = 214

  6. b b t

  7. 1 2 3

  8. 0001 0110 00110011100111 00 1 6 Line# = 00110011100111 = 0CE7

  9. 1 2 3

  10. XNOR = XOR

  11. Next slide

  12. 000101100011001110011100 0 5 8 C E 7 Line# = 00110011100111 = 0CE7

  13. Set Associative Cache • hybrid between a fully associative cache & direct mapped cache. • considered a reasonable compromise between the complex • hardware needed for fully associative caches (parallel searches of • all slots), and the simplistic direct-mapped scheme, • may cause collisions of addresses to the same slot

  14. Set Associative Cache • Parking lot analogy • 1000 parking spots (0-999) – 3 digits needed to find slot • Instead use 2 digits (say last 2 digits of SSN) to hash into 100 • sets of 10 spots. • Once you find your set, you can park in any of the 10 spots.

  15. Finding a Set • There are 213 sets (also known as 8K-way set associative cache) • Parallel search all 213 sets to see if 9-bit tag matches a particular slot) • If found, go to matching set (b14 – b2) bits get word at b1 – b0, • o.w., decide which slot you want to put data in from memory.

  16. 1 2 3

  17. Tag Data Set

  18. Set#1 Set#0 Line# (14 bits) 9 bits Set#(13 bits) 2 bits 000101100011001110011100 0 2 C set# (2 way - consider 1 out of 13 bits) Line# = 00110011100111 = 0CE7

  19. Measuring Cache Performance • Components of CPU time • Program execution cycles • Includes cache hit time • Memory stall cycles • Mainly from cache misses • With simplifying assumptions:

  20. Cache Performance Example • Given • I-cache miss rate = 2% • D-cache miss rate = 4% • Miss penalty = 100 cycles • Base CPI (ideal cache) = 2 • Load & stores are 36% of instructions • Miss cycles per instruction • I-cache: 0.02 × 100 = 2 • D-cache: 0.36 × 0.04 × 100 = 1.44 • Actual CPI = 2 + 2 + 1.44 = 5.44 • Ideal CPU is 5.44/2 =2.72 times faster

More Related