1 / 20

Cache Organization and Performance Evaluation

Cache Organization and Performance Evaluation. Vittorio Zaccaria. Exercise 1. How many total bits are required for a direct mapped instruction cache with 64 KB of data and one-word blocks, assuming 32-bit address ?. 1 Word=4 bytes Block no. = 64KB/4Bytes=2^14 blocks

adara
Download Presentation

Cache Organization and Performance Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Organization and Performance Evaluation Vittorio Zaccaria

  2. Exercise 1 • How many total bits are required for a direct mapped instruction cache with 64 KB of data and one-word blocks, assuming 32-bit address? 1 Word=4 bytes Block no. = 64KB/4Bytes=2^14 blocks Tag bits=32-14[index]-2[offset]=16 Size=[16(tag)+1(validbit)+4(blocksize)*8]*2^14=802816

  3. Exercise 2: DM cache 64blocks x 32 bytes • Assuming byte addressing and 32-bit addresses, how many bits are there in each of the tag, Index, and Offset fields of the address? • How many total bytes of data can be stored in the cache? • How many bytes of memory does the cache use (including tags, valid bits, and data)? • How many possible blocks reference to the same cache block? • If the cache is loaded with random blocks, what is the probability of, given an address, having a match in the tag field? Index=6 bits, offset=5 bits; tag=21bits 2KB (21+1[valid])*64/8+32*64=2224Bytes 2^21 1/(2^21)

  4. Exercise 3 • Assume a cache with: • Cache size = 128 bytes total. • 2-word blocks. • 2-way set associative. • How may blocks has the cache? • How many bits is the index? • How many bits is the tag? [128/(8*2)]=8 [2=log(8blocks/2 sets)] [32-2-3(offset)=27]

  5. Cache Performance • CPUtime = Instruction Count x (CPIexecution + Mem accesses per instruction x Miss rate x Miss penalty in cycles) x Clock cycle time • Misses per instruction = #Memory accesses per instruction x Miss rate • CPI = CPIexecution + Misses per instruction x Miss penalty cycles • AMAT= HitTime+MissRate*MissPenalty (can be expressed in cycles or in secs).

  6. Why misses? • Compulsory—The first access to a block is not in the cache, so the block must be brought into the cache. • Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. • Conflict—If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur.

  7. 3Cs Absolute Miss Rate (SPEC92)

  8. Exercise 4 • Consider a VAX-11/780 • MP=6 cycles • CPI exec = 8.5 • MR=0.11 • #mem_acc/instruction=3 • Compute arch. CPI with cache CPIrealCache= CPIexec+#memacc/instr*MR*MP= 8.5 + 3* 0.11 *6 = 10.48

  9. Exercise 5 • Compare the previous architecture in the 100% miss rate case with the same in the 100% hit rate case. Compare the speedup of the real cache with the ideal one. • 100%miss • 100%Hit: CPInoCache=8.5 + 3*6 = 26.5 CPIidealCache=CPIexec=8.5 Speedup(idealCache, realCache)=10.48/8.5=1.23

  10. Excercise 6 • Compute the CPI of an architecture cache with: • CPIideal=1.5 • MP=10 • MR=0.11 • #mem_acc/instr=1.4 CPIrealCache= 1.5+1.4*0.11*10=3.04

  11. Exercise (6 cont.) • Compare the case of 100% hit rate with the case of 100% miss rate. • Speedup real-ideal cache: CPInoCache= 1.5+1.4*10=15.5 CPIidealCache= 1.5 Speedup=3.04 / 1.5 = 2

  12. Exercise 7 • Consider two architectures: A and B • Tclk(A)=20ns, 8.5% faster than Tclk(B) • Both A and B have #mem_acc/instr=1.3 • MP(A)=MP(B)=200 ns • MR(A)=3.9%, MR(B)=3.0% • Compute AMAT(A) and AMAT(B) • Compute CPI(A) and CPI(B)

  13. Solution 7 • CPI(A)= • CPI(B) • AMAT(A)= • AMAT(B)= 1.5+1.3*10*3.9%=2.07 1.5+1.3*[3%*round(200ns*/(20ns+8.5%*20ns))] =1.85 20ns+200ns*3.9%=27.8ns 20ns(1+8.5%)+200ns*3.0%=27.7ns

  14. Exercise 8 • Architecture A[I$,D$]: • 1 instr. on 85% of the cycles; other cycles NOP. • Architecture B[I$,D$]: • 2 instr. on 65% of cycles; 1 instr. on 30% of the time; other cycles NOP. • Assume hit time= 1 cycle, miss time = 50 cycles. • I$ hit rate = 100% • D$ hit rate= 98% • L/S instr = 33% of all instr.

  15. Exercise 8 (cont.) • CPI(A) and CPI(B) with a perfect memory system? • AMAT in cycles relative to D$? • CPI(A)=100cycles/85instr=1.17 • CPI(B)=100/(65*2+30)=0.62 1+0.02*49=1.98 cycles

  16. Exercise 8 (cont.) • CPI(A) and CPI(B) with actual cache? Speedup(B,A)=1.58; • CPI(A)=1.17+0.33*0.02*49=1.49 • CPI(B)=0.62+0.33*0.02*49=0.94

  17. Exercise 9 • 300 MHz CPU, 50 MHz bus speed • DCache has 2 64-bit words per block • Buses: • 2 bytes wide • burst transfer mode: • each block read is: 4-1-1-1-1-1-1-1 (bus clocks) • Hit time= 1 cycle • 6% miss rate. • Ideal ICache

  18. Exercise 9 (cont.) • Consider only read data accesses. What is the effective AMAT in ns? • How would you speedup? • Doubling bus width? • Doubling bus speed? • Compute first AMAT and then speedup (1+0.06*((4+7)*300/50))CPU clocks =4.96CPU clocks, 16.5 ns

  19. Exercise 9 • Doubling bus width? First datum in 4 bus clocks, then 1-1-1 AMAT= (1+0.06*(4+3)*6)CPUclocks =3.52CPUclocks =11.7 ns

  20. Exercise 9 (cont.) • Doubling bus speed? 1 bus clock= 3 cpu cycles AMAT= (1+0.06*(4+7)*3)CPUclocks =2.98CPUclocks =9 ns Speedup(2Xfreq,2Xwidth)=1.18

More Related