CSC 405 Computer Organization

CSC 405 Computer Organization Cache Memory Performance Analysis

The Principle of Locality For the typical program in execution the principle of locality states that memory references tend to cluster in both position (spatial locality) and time (temporal locality). With high probability the next call to a word in memory will be close to the previous call. Also, words of memory that have been used recently are more likely to be used again. This figure illustrates the pattern in storage references for a typical program during execution. The horizontal axis is time and the vertical axis is memory address (page #). Notice that in any specific time interval and for a significant time duration, the memory locations being accessed are not random and they constitute a relatively small fraction of the complete program. These patterns of memory reference show the working sets of the program. Figure Ref: IBM Systems Journal, 1971.

CPU Cache Primary Memory Secondary Storage Two-Level Memory The locality property can be exploited to improve system performance. Specifically, locality makes the effective use of a hierarchical memory system possible. The time required for the CPU to obtain a word from secondary storage (e.g. hard drive) is around 1000 times longer than is required when the word is in primary memory (RAM). The CPU can access a word in cache memory around 5 to 10 times faster than from primary memory. These relative speeds called system access times are primarily driven by bus clock rates and memory block transfer sizes. System access time is not the access time of the memory itself as quoted by chip manufacturers. The quoted memory access time for a DIMM or SIMM is the time required to move a word in memory into the memory buffer register on the DIMM or SIMM itself. 500 MHz 100 MHz 1Mbyte/S

Performance Analysis of a Two-Level Memory We will analyze the performance of a two-level memory consisting of cache memory M1 and primary memory M2. To express the average time to access a word we must consider the speeds (CPU access times) of the two memories as well as the probability that a given reference will be in level-one memory (M1). where TS=average (system) access time T1=CPU access time of M1 T2=CPU access time of M2 H=hit ratio (fraction of time reference is found in M1) We must be careful in our interpretation of these terms. For example, since a miss in cache results in a block of primary memory being written into cache, we must use the time required to transfer this block as T2. Also, we must consider the possibility that the block of memory being written over in cache must first be copied back into primary memory.

Let's work through an example: Compare the average system access time for computer with a 450MHz processor, a 100MHz system bus and no cache to a system with the same bus and processor speeds but with an L1 cache (i.e. 450MHz internal bus to CPU) that results in a hit ratio of 0.90. Assume a 2 nSec memory for cache and an 8 nSec memory for RAM. For the two-level memory system we have, T1=1/450MHz = 2nS T2=1/100MHz = 10nS H=0.90 so TS=3nS Compared with TS=10nS for the system with no cache. This is over 3 times faster! In reality the improvement is not nearly so dramatic. Homework: Give an explanation for what is wrong with this analysis. (Hint: What operations are performed when there is a miss?)

Case Study: The Truth about Cache You have a Pentium class computer with the 430TX cache controller chipset and 64 MBytes of SIMM RAM, and you are running the Windows 98 operating system. You find a bargain on memory SIMMs that match your current memory so you "upgrade" you system to 128 Mbytes of RAM. The result is a factor of 2 slow down in average performance. Suspecting the bargain memory as the problem you remove your old memory and replace it with just the new bargain memory. You find that your old faster performance returns. You put half your old memory back in bringing your system up to 96 Mbytes to find that your computer now runs somewhat slower (about halfway between the previous two speeds). Please explain this apparent paradox. (Claiming that the instructor has lost his mind, regardless of the validity of the statement, is not relevant in this case.)

Homework A 450MHz Pentium with 32KBytes L1 cache, 128MBytes RAM, and a 133MHz system bus runs a program with an average working set size of 80KBytes. While in a working set the program has a 0.9997 probability that the next memory request will be from this working set and a 0.9 probability that the next memory request will be the next instruction/data value in memory (i.e. 10% of the time a request is from a random memory address in the working set). (Note: when the program changes working sets, it will begin making memory requests from the new working set with 0.9997 probability.) Your task... (1) Determine how much (if any) performance improvement could be achieved by adding a 256KByte L2 (access speed= 450/2MHz) to the processor. (2) Determine what size memory blocks should be moved between cache and RAM. (3) Give an outline of a memory caching strategy that makes sense.

CSC 405 Computer Organization

CSC 405 Computer Organization

Presentation Transcript

CSC 317 Computer Organization and Architecture

CSC 405 Introduction to Computer Security

CSC 317 Computer Organization and Architecture

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 3210 Computer Organization and Programming

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 405 Introduction to Computer Security

CSC 3210 Computer Organization and Programming

CSC 405 Introduction to Computer Security

CSC 3210 Computer Organization and Programming

CSC 3210 Computer Organization and Programming

CSC 3210 Computer Organization and Programming

CSC I 2510 Computer Organization

CSC 3210 Computer Organization and Programming

CSC 3210 Computer Organization and Programming

CSC 3210 Computer Organization and Programming

CSC 3210 Computer Organization and Programming