1 / 38

BALANCED CACHE

BALANCED CACHE. Ayşe BAKIR, Zeynep ZENGİN. Outline. Introduction Motivation The B-Cache Organization Experimental Methodology and Results Programmable Decoder Design Analysis Related Work Conclusion. Introduction. Increasing gap between memory latency and

Download Presentation

BALANCED CACHE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BALANCED CACHE Ayşe BAKIR, Zeynep ZENGİN

  2. Outline • Introduction • Motivation • The B-Cache Organization • Experimental Methodology and Results • Programmable Decoder Design • Analysis • Related Work • Conclusion Ayse Bakır,CMPE 511,Bogazici University

  3. Introduction • Increasing gap between memory latency and processor speed is a critical bottleneck to achieve a high performance computing system. • Multilevel memory hierarchy has been developed to hide the memory latency. Ayse Bakır,CMPE 511,Bogazici University

  4. Introduction PROCESSOR MAIN MEMORY LEVEL 1 LEVEL 2 Level one cache normally resides on a processor’s critical path, fast access to level one cache is an important issue for improved processor performance. Ayse Bakır,CMPE 511,Bogazici University

  5. Introduction There are two cache organization models that have been developed: • Direct-Mapped Cache: • Set-Associative Cache: Ayse Bakır,CMPE 511,Bogazici University

  6. Introduction • Direct-Mapped Cache: Ayse Bakır,CMPE 511,Bogazici University

  7. Introduction • Set Associative Cache: Ayse Bakır,CMPE 511,Bogazici University

  8. Introduction Ayse Bakır,CMPE 511,Bogazici University

  9. Introduction Frequent hit sets have many more cache hits than other sets. The cache misses occur more frequently in Frequent miss sets. Less accessed sets are accessed less than 1% of the total cache references. Ayse Bakır,CMPE 511,Bogazici University

  10. Introduction Balanced Cache (B-Cache): A mechanism to provide the benefit of cache block replacement while maintaining the constant access time of a direct-mapped cache Ayse Bakır,CMPE 511,Bogazici University

  11. Introduction • The decoder length of a traditional direct-mapped cache is increased by three bits: • accesses to heavily used sets can be reduced to 1/8th of the original design. • only 1/8th of the memory address space has a mapping to the cache sets. • A replacement policy is added. • A programmable decoder is used. Ayse Bakır,CMPE 511,Bogazici University

  12. Motivation - Example 8-bit adresses 0,1,8,9... 0,1,8,9 Ayse Bakır,CMPE 511,Bogazici University

  13. Motivation - Example 8-bit adress same as in 2-way cache X : invalid PD entry Ayse Bakır,CMPE 511,Bogazici University

  14. B-Cache Organization - Terminology Memory address mapping factor (MF): B-Cache associativity (BAS): PI : index length of PD NPI : index length of NPD OI : index length of original direct-mapped cache MF = 2(PI+NPI)/2OI ,where MF≥1 BAS = 2OI/2NPI ,whereBAS≥1 Ayse Bakır,CMPE 511,Bogazici University

  15. B-Cache Organization MF = 2(PI+NPI)/2OI =2(6+6)/29=8 BAS = 2(OI)/2NPI =2(3)/26=8 Ayse Bakır,CMPE 511,Bogazici University

  16. B-Cache Organization–Replacement Policy • Random Policy: Simple to design and needs very few extra hardware. • Least Recently Used(LRU): May achieve a better hit rate but will have more area overhead than the random policy. Ayse Bakır,CMPE 511,Bogazici University

  17. Experimental Methodology and Results • Miss rate is used as the primary metric to measure the BCache effectiveness, and MP and BAS parameters are determined. • Results are compared with baseline level one cache(a direct-mapped 16kB cache with a line size of 32 bytes for instruction and data caches) • 4-issue out-of-order processor simulator is used to collect the miss rate. 26 SPEC2K benchmarks are run using the SimpleScalar tool set. Ayse Bakır,CMPE 511,Bogazici University

  18. Experimental Methodology and Results 16 entry victim buffer set-associative caches B-Caches with dif. MFs Ayse Bakır,CMPE 511,Bogazici University

  19. Experimental Methodology and Results 16 entry victim buffer set-associative caches B-Caches with dif. MFs The miss rate reduction of the B-Cache is as good as a 4-way cache for the data cache. Forthe instruction cache, on average, the miss rate reduction is 5% better than a 4-way cache. Ayse Bakır,CMPE 511,Bogazici University

  20. Programmable Decoder Design • Latency, • Storage, • Power Costs Zeynep Zengin, CMPE511, Bogazici Univ.

  21. Timing Analysis • Critical path • Direct mapped: Tag side • B-Cache: May be on tag side or data side • B-Cache modifies local decoder Zeynep Zengin, CMPE511, Bogazici Univ.

  22. Timing Analysis Zeynep Zengin, CMPE511, Bogazici Univ.

  23. Storage Overhead • B-cache uses CAM cells additionally • CAM cell is 25% larger than the SRAM cell used by data and tag memory Zeynep Zengin, CMPE511, Bogazici Univ.

  24. Power Overhead • Extra power consumption: PD of eachsubarray. • Power reduction: • 3-bit data length reduction • Removal of 3 input NAND gates Zeynep Zengin, CMPE511, Bogazici Univ.

  25. ANALYSIS • Overall Performance • Overall Energy • Design Tradeoffs for MP and BAS for a FixedLength of PD • Balance Evaluation • The Effect of L1 Cache Sizes • Comparison Zeynep Zengin, CMPE511, Bogazici Univ.

  26. Overall Performance Zeynep Zengin, CMPE511, Bogazici Univ.

  27. Overall Energy • Static – Dynamic Power Dissipation • Charging and discharging of the load capacitance • Memory Related • Chip caches • Offchip memory Zeynep Zengin, CMPE511, Bogazici Univ.

  28. Design Tradeoffs for MP and BAS for a FixedLength of PD The question is whichdesign has a higher miss rate reduction??? Zeynep Zengin, CMPE511, Bogazici Univ.

  29. Design Tradeoffs for MP and BAS for a FixedLength of PD Zeynep Zengin, CMPE511, Bogazici Univ.

  30. Balance Evaluation • Frequent hit sets: Hits 2 times higher • Frequent Miss sets: misses 2 times higher • Less accessed sets: accesses below half Zeynep Zengin, CMPE511, Bogazici Univ.

  31. The miss rate reductions increase when the MF is increased • B-Cache, the design with MF = 8 and BAS = 8 is the best Zeynep Zengin, CMPE511, Bogazici Univ.

  32. Comparison • With a victim buffer: the miss rate reduction of the B-Cache is higher than the victim buffer • with a highly associative cache: • HAC is for low-power embedded systems • HAC is an extreme case of the B-Cache, where the decoder of the HAC is fully programmable. Zeynep Zengin, CMPE511, Bogazici Univ.

  33. RELATED WORK • Reduce the miss rate of direct mapped caches • Reduce the access time of set associative caches Zeynep Zengin, CMPE511, Bogazici Univ.

  34. Reducing Miss Rate of Direct Mapped Caches TECHNIQUES • Page allocation • Column associative cache • Adaptive group associative cache • Skewed associative cache Zeynep Zengin, CMPE511, Bogazici Univ.

  35. Reducing Access Time of Set-associative Caches • Partial address matcing : predicting hit way • Difference bit cache Zeynep Zengin, CMPE511, Bogazici Univ.

  36. B-CACHE SUMMARY • B-cache can be applied to both high performance and low-power embedded systems. • Balanced without any software intervention. • Feasible and easy to implement Zeynep Zengin, CMPE511, Bogazici Univ.

  37. Conclusion • B-Cacheallows the accesses to cache sets to be balanced by increasingthe decoder length and incorporating a replacement policy toa direct-mapped cache design. • programmable decoders dynamically determine whichmemory address has a mapping to the cache set • A 16kB levelone B-Cache outperforms a traditional same sized directmappedcache by 64.5% and 37.8% for instruction and datacache, respectively • Average IPCimprovement:5.9% • Energy reduction:2%. • Access time:same as a traditional directmappedcache Zeynep Zengin, CMPE511, Bogazici Univ.

  38. References • C. Zhang,”Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders”,ISCA 2006,IEEE. • C. Zhang,”Balanced Instruction Cache:Reducing Conflict Misses of Direct-Mapped Caches through Balanced Subarray Accesses”,IEEE Computer Architecture Letter, May 2005. • Wilkonson, B.(1996), “Computer Architecture: Design and Performance”, Prentice Hall Europe. • University of Maryland http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/cache/cache.html

More Related