1 / 42

Using Dead Blocks as a Virtual Victim Cache

Using Dead Blocks as a Virtual Victim Cache. Samira Khan, Daniel A. Jiménez , Doug Burger, Babak Falsafi. The Cache Utilization Wall. Performance gap Processors getting faster Memory only getting larger Caches are not efficient Designed for fast lookup Contain too many useless blocks!.

Download Presentation

Using Dead Blocks as a Virtual Victim Cache

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Dead Blocks as a Virtual Victim Cache Samira Khan, Daniel A. Jiménez, Doug Burger, BabakFalsafi

  2. The Cache Utilization Wall • Performance gap • Processors getting faster • Memory only getting larger • Caches are not efficient • Designed for fast lookup • Contain too many useless blocks! We want the cache to be as efficient as possible

  3. Cache Problem: Dead Blocks fill hit hit hit last hit eviction • Live Block • will be referenced again before eviction • Dead Block • from the last reference until evicted dead live MRU LRU Cache set Cache blocks are dead on average 59% of the time

  4. Reducing Dead Blocks: Virtual Victim Cache Put victim blocks in the dead blocks LRU MRU Cache Live block Dead block Victim block Dead blocks all over the cache acts as a victim cache

  5. Contribution: Virtual Victim Cache Contribution: • Skewed dead block predictor • Victim placement and lookup Result: • Improves predictor accuracy by 4.7% • Reduces miss rate by 26% • Improves performance by 12.1%

  6. Introduction • Virtual Victim Cache • Methodology • Results • Conclusion

  7. Virtual Victim Cache Goal: use dead blocks to hold the victim blocks Mechanism Required: • Identify which block is dead • Lookup the victims

  8. Different Dead Block Predictors • Counting Based [ICCD05] • Predicts dead after certain number of accesses • Time Based [ISCA02] • Predicts dead after certain number of cycles • Trace Based [ISCA01] • Predicts the last touch based on PC • Cache Burst Based [MICRO08] • Predicts when block moves out of the MRU

  9. Trace-Based Dead Block Predictor [ISCA 01] • Predicts last touch based on sequence of instructions • Encoding: truncated addition of instruction PCs • Called signature • Predictor table is indexed by signature • 2 bit saturating counters

  10. Trace-Based Dead Block Predictor [ISCA 01] fill hit hit hit last hit eviction Predictor table dead live PC1: ld a fill 1 dead PC2: stb hit PC3: ld a PC sequence PC4: st a hit hit PC5: ld a PC6: ld e PC7: ld f PC8: st a hit, last touch signature =<PC1,PC3,PC4,PC5,PC8>

  11. Skewed Trace Predictor Index = hash(signature) Index1 = hash1(signature) Index2 = hash2(signature) confidence conf1 conf2 dead if confidence >= threshold dead if conf1+conf2 >= threshold Reference trace predictor table Skewed trace predictor table

  12. Skewed Trace Predictor • Uses two different hash functions • Reduces conflict • Improves accuracy Index2=hash2(sigX) sigX Index1= hash1(sigX) conflict Index1= hash1(sigY) Predictor tables sigY Index3=hash2(sigY) Conflict in both tables is less likely

  13. Victim Placement and Lookup in VVC • Place victims in dead blocks of adjacent sets • Any victim can be placed in any set • Have to lookup each set for a hit • Trade off between • number of sets • lookup latency We use only one adjacent set to minimize lookup latency

  14. How to determine adjacent set? • Set that differ by only 1 bit • Far enough not to be a hot set Original set Adjacent set LRU MRU Cache

  15. Victim Lookup • On a miss search the adjacent set • If found, bring it back to its original set miss Search Original set Move to original set Original set Search adjacent set Adjacent set hit LRU MRU Cache

  16. Virtual Victim Cache: Why it Works? • Reduces Conflict Misses • Provides extra associativity to the hot set • Reduces Capacity Misses • Puts the LRU block in a dead block • Fully associative cache would have replaced the LRU block • Increasing live blocks effectively increases capacity • Robust to False Positive Prediction • VVC will find that block in the adjacent set, avoids the miss

  17. Introduction • Virtual Victim Cache • Methodology • Results • Conclusion

  18. Experimental Methodology Simulator: Modified version of Simplescalar Benchmark: Spec CPU2000 and spec CPU2006

  19. Single Thread Speedup 1.3 2.6 1.7 1.7 1.3 0.9 Fully associative cache and 64KB victim cache both are unrealistic design

  20. Single Thread Speedup 1.2 2.6 1.6 1.4 1.7 The accuracy of the predictor is more important in dead block replacement

  21. Speedup for Multiple Threads 0.88 0.88 0.89 0.84 Blocks become less predictable in presence of multiple threads

  22. Tag Array Reads due to VVC Tag array reads in the baseline cache is 3.9% of the total number of the instructions executed , versus 4.9% for the VVC

  23. Conclusion • Skewed predictor improves accuracy by 4.7% • Virtual Victim Cache achieves • 12.1% speedup for single-threaded workloads • 4% speedup for multiple-threaded workloads • Future Work in Dead Block Prediction • Improve accuracy • Reduce overhead

  24. Thank you

  25. Extra slides

  26. Dead blocks as a Virtual Victim Cache • Placing victim blocks in to adjacent set • Evicted blocks are placed in invalid/predicted dead block of the adjacent set • If no such block is present victim blocks are placed in the LRU block • Then the receiver block is moved to the MRU position • Adaptive insertion is also used • Cache lookup for previously evicted block • original set lookup : miss • adjacent set lookup : hit • Block is refilled from the adjacent to original set • Receiver block in the adjacent set is marked as invalid • One bit keeps track of receiver blocks • Tag match in original accesses ignores the receiver blocks

  27. Reduction in Cache Area

  28. Predictor Coverage and False Positive Rate Amean 179.art 175.vpr 473.astar 181.mcf 429.mcf 188.ammp 197.parser 450.soplex 255.vortex 187.facerec 456.hmmer 300.twolf 464.h264ref 401.bzip2 256.bzip2 178.galgel

  29. Trace Based Dead Block Predictor Memory instruction sequence going to cache set s Fill action signature tag & data hit action pc m : ld a m+n+o m+n a m hit action Update signature pc n : ld a m+n+o a pc o : st a m+n+o a Update the predictor pc p : ld b m+n+o a pc q : ld c m+n+o a <signature m+n+o> pc r : st d m+n+o a 1 pc s : ld e m+n+o a <signature m> 0 pc t : ld f m+n+o a pc u : ld g evict action <signature m+n> pc v : ld h 0 pc w : ld i

  30. MPKI

  31. IPC

  32. Speedup 2.5 2.6 2.6

  33. Motivation Cache

  34. False Positive Prediction Shared cache contention results in more false positive predictions

  35. Predictor Table Hardware Budget With 8KB predictor, VVC achieves 5.4% speedup with original predictor where it achieves 12.1% speedup with skewed predictor

  36. Cache Efficiency VVC improves cache efficiency by 62% for multiple-threaded workloads and by 26% for single-threaded workloads

  37. Introduction • Background • Virtual Victim Cache • Methodology • Results • Conclusion

  38. Introduction • Background • Virtual Victim Cache • Methodology • Results • Conclusion

  39. Experimental Methodology Dead Block Predictor parameter Overhead is 3.4% of the total 2MB L2 cache space

  40. Reducing Dead Blocks: Virtual Victim Cache MRU LRU Cache Dead blocks all over the cache acts as a victim cache

  41. Virtual Victim Cache • Place evicted blocks in dead blocks of other adjacent sets • On a miss search the other adjacent sets for a match • If that block is found in adjacent set, bring it back to its original set Dead blocks across all over the cache acts as a victim cache

  42. Virtual Victim Cache: How it Works? • How to determine adjacent set? • Set that differ by only 1 bit, in our case 4th bit • Far enough not to be a hot set • How to find receiver block in the adjacent set? • Add 1 bit to receiver block • Where to place the receiver block? • Use dynamic insertion policy • Choose either LRU or MRU position

More Related