1 / 22

Memory hierarchies

Memory hierarchies. An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle. Recall system diagram. Main memory. CPU. system bus. I/O. I/O. I/O. I/O. I/O. SRAM versus DRAM.

hanchett
Download Presentation

Memory hierarchies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory hierarchies An introduction to the use of cache memories as a technique for exploiting the ‘locality-of-reference’ principle

  2. Recall system diagram Main memory CPU system bus I/O I/O I/O I/O I/O

  3. SRAM versus DRAM • Random Access Memory uses two distinct technologies: ‘static’ and ‘dynamic’ • Static RAM is much faster • Dynamic RAM is much cheaper • Static RAM offers ‘persistent’ storage • Dynamic RAM provides ‘volatile’ storage • Our workstations have 1GB of DRAM, but only 512KB of SRAM (ratio: 2-million to 1)

  4. Data access time • The time-period between the supplying of the data access signal and the output (or acceptance) of the data by the addressed subsystem • FETCH-OPERATION: cpu places address for a memory-location on the system bus, and memory-system responds by placing memory-location’s data on the system bus

  5. Comparative access timings • Processor registers: 5 nanoseconds • SRAM memory: 15 nanoseconds • DRAM memory: 60 nanoseconds • Magnetic disk: 10 milliseconds • Optical disk: (even slower)

  6. System design decisions • How much memory? • How fast? • How expensive? • Tradeoffs and compromises • Modern systems employ a ‘hybrid’ design in which small, fast, expensive SRAM is supplemented by larger, slower, cheaper DRAM

  7. Memory hierarchy CPU ‘word’ transfers CACHE MEMORY ‘burst’ transfers MAIN MEMORY

  8. Instructions versus Data • Modern system designs frequently use a pair of separate cache memories, one for storing processor instructions and another for storing the program’s data • Why is this a good idea? • Let’s examine the way a cache works

  9. Fetching program-instructions • CPU fetches its next program-instruction by placing the value from register EIP on the system bus and issuing a ‘read’ signal • If that instruction has not previously been fetched, it will not be available in the ‘fast’ cache memory, so it must be gotten from the ‘slower’ main memory; however, it will be ‘saved’ in the cache in case it’s needed again very soon (e.g., in a program loop)

  10. Most programs use ‘loops’ // EXAMPLE: int sum = 0, i = 0; do { sum += array[ i ]; ++ i; } while ( i < 64 );

  11. Assembly language loop movl $0, %eax # initialize accumulator movl $0, %esi # initialize array-index again: addl array(%esi, 4), %eax # add next int incl %esi # increment array-index cmp $64, %esi # check for exit-condition jae finis # index is beyond bounds jmp again # else do another addition finis:

  12. Benefits of cache-mechanism • During the first pass through the loop, all of the loop-instructions must be fetched from the (slow) DRAM memory • But loops are typically short – maybe only a few dozen bytes – and multiple bytes can be fetched together in a single ‘burst’ • Thus subsequent passes through the loop can fetch all of these same instructions from the (fast) SRAM memory

  13. ‘Locality-of-Reference’ • The ‘locality-of-reference’ principle is says that, with most computer programs, when the CPU fetches an instruction to execute, it is very likely that the next instruction it fetches will be located at a nearly address

  14. Accessing data-operands • Accessing a program’s data differs from accessing its instructions in that the data consists of ‘variables’ as well ‘constants’ • Instructions are constant, but data may be modified (i.e., may ‘write’ as well as ‘read’) • This introduces the problem known as ‘cache coherency’ – if a new value gets assigned to a variable, its cached value will be modified, but what will happen to its original memory-value?

  15. Cache ‘write’ policies • ‘Write-Through’ policy: a modified variable in the cache is immediately copied back to the original memory-location, to insure that the memory and the cache are kept ‘consistent’ (synchronized) • ‘Write-Back’ policy: a modified variable in the cache is not immediately copied back to its original memory-location – some values nearby might also get changed soon, in which case all adjacent changes could be done in one ‘burst’

  16. Locality-of-reference for data // EXAMPLE (from array-processing): int price[64], int quantity[64], revenue[64]; for (int i = 0; i < 64; i++) revenue[ i ] = price[ i ] * quantity[ i ];

  17. Cache ‘hit’ or cache ‘miss’ • When the CPU accesses an address, it first looks in the (fast) cache to see if the address’s value is perhaps already there • In case the CPU finds that address-value is in the cache, this is called a ‘cache hit’ • Otherwise, if the CPU finds that the value it wants to access is not presently in the cache, this is called a ‘cache miss’

  18. Cache-Line Replacement • Small-size cache-memory quickly fills up • So some ‘stale’ items need to be removed in order to make room for the ‘fresh’ items • Various policies exist for ‘replacement’ • But ‘write-back’ of any data in the cache which is ‘inconsistent’ with data in memory can safely be deferred until it’s time for the cached data to be ‘replaced’

  19. Performance impact y y = 75 – 60x 75ns Examples: If x = .90, then y = 21ns If x = .95, then y = 18ns 60ns Average access time 15ns 0 x 1 Access frequency (proportion of cache-hits)

  20. Control Register 0 • Privileged software (e.g., a kernel module) can disable Pentium’s cache-mechanism (by using some inline assembly language) asm(“ movl %cr0, %eax “); asm(“ orl $0x60000000, %eax “); asm(“ movl %eax, %cr0 “); 31 30 29 18 16 5 4 3 2 1 0 PG CD NW AM WP NE ET TS EM MP PE

  21. In-class demonstration • Install ‘nocache.c’ module to turn off cache • Execute ‘cr0.cpp’ to verify cache disabled • Run ‘sieve.cpp’ using the ‘time’ command to measure performance w/o caching • Remove ‘nocache’ to re-enable the cache • Run ‘sieve.cpp’ again to measure speedup • Run ‘fsieve.cpp’ for comparing the speeds of memory-access versus disk-access

  22. In-class exercises • Try modifying the ‘sieve.cpp’ source-code so that each array-element is stored at an address which is in a different cache-line (cache-line size on Pentium is 32 bytes) • Try modifying the ‘fsieve.cpp’ source-code to omit the file’s ‘O_SYNC’ flag; also try shrinking the ‘m’ parameter so that more than one element is stored in a disk-sector (size of each disk-sector is 512 bytes)

More Related