1 / 45

Increasing Cache Efficiency by Eliminating Noise

Increasing Cache Efficiency by Eliminating Noise. Prateek Pujara & Aneesh Aggarwal {prateek, aneesh}@binghamton.edu http://caps.cs.binghamton.edu State University of New York, Binghamton. INTRODUCTION. C aches are very important to cover the processor-memory performance gap.

Download Presentation

Increasing Cache Efficiency by Eliminating Noise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing Cache Efficiency by Eliminating Noise Prateek Pujara & Aneesh Aggarwal{prateek, aneesh}@binghamton.edu http://caps.cs.binghamton.edu State University of New York, Binghamton

  2. INTRODUCTION • Caches are very important to cover the processor-memory performance gap. • Thus Caches should be utilized very efficiently. • Fetch only the useful data Cache Utilization : Percentage of the useful words out of the total words fetched into the cache.

  3. Utilization vs Block-Size • Larger Cache Blocks • Increase bandwidth requirement • Reduce utilization • Smaller Cache Blocks • Reduces bandwidth requirement • Increase utilization

  4. Percent Cache Utilization 16KB, 4-way Set Associative Cache, 32 byte block size

  5. Methods to improve utilization • Rearrange data/code • Dynamically adapt cache line size • Sub-blocking

  6. Benefits of Utilization improvement • Lower energy consumption By avoiding wastage of energy on useless words. • Improve performance By better utilizing the available cache space. • Reduce memory traffic By not fetching useless words.

  7. Our Goal • Improve Utilization • Predict the to-be-referenced words • Avoid cache pollution by fetching only the predicted words

  8. Our Contributions • Illustrate high predictability of cache noise • Propose efficient cache noise predictor • Show potential benefits of cache noise prediction based fetching in terms of • Cache utilization • Cache power consumption • Bandwidth requirement • Illustrate benefits of cache noise prediction for prefetching • Investigate cache noise prediction as an alternative to sub-blocking

  9. Cache Noise Prediction • Programs repeat the pattern of memory references. • Predict cache noise based on the history of words accessed in the cache blocks.

  10. Cache Noise Predictors 1) Phase Context Predictor (PCP) Records the words usage history of the most recently evicted cache block. 2) Memory Context Predictor (MCP) Assuming that data accessed from contiguous memory locations will be accessed in same fashion. 3) Code Context Predictor (CCP) Assuming that instructions in a particular portion of the code will access data in same fashion.

  11. Cache Noise Predictors • For code context predictors • Use higher order bits of PC as context • Store the context along with the cache block. • Add 2 bit vectors for each cache block • One for identifying the valid words present • One for storing the access pattern

  12. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100

  13. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110)

  14. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: X (100110) 1 1 1 0 0 Valid-Bit Last Word Usage History

  15. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (xxxxxx) 1 1 0 0 1 1 1 1 0 0 x x x x 0

  16. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (xxxxxx) 1 1 0 0 1 1 1 1 0 0 x x x x 0 Miss due to PC 1001100100 Only 1st and 2nd words are brought Evicted cache block was brought by PC 101110 and used only 1st word

  17. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (xxxxxx) 1 1 0 0 1 1 1 1 0 0 x x x x 0 Miss due to PC 1001100100 Only 1st and 2nd words are brought Evicted cache block was brought by PC 101110 and used only 1st word

  18. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (101110) 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1

  19. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (101110) 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 Miss due to PC 1011101100 Only 1st word brought Evicted block was brought by PC 101001 and used 2nd and 4th word

  20. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (101110) 1 1 0 0 1 1 1 1 0 0 1 0 0 0 1 Miss due to PC 1011101100 Only 1st word brought Evicted block was brought by PC 101001 and used 2nd and 4th word

  21. Code Context Predictor (CCP) • Say PC of an instruction is 1001100100 Code Context: Y (101001) X (100110) Z (101110) 1 0 1 0 1 1 1 1 0 0 1 0 0 0 1

  22. Predictability of CCP PCP - 56% MCP - 67% Predictability = Correct prediction/Total misses No prediction almost 0%

  23. Improving the Predictability • Miss Initiator Based History (MIBH) Words usage history based on the offset of the word that initiated the miss. • ORing Previous Two Histories (OPTH) Bitwise ORing past two histories.

  24. Predictability of CCP The predictability of PCP and MCP was about 68% and 75% respectively using both MIBH and OPTH.

  25. CCP Implementation valid-bit context usage words history

  26. CCP Implementation valid-bit valid-bit MIWO context MIWO words history usage words history usage MIWO -- Miss Initiator Word Offset

  27. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage MIWO -- Miss Initiator Word Offset

  28. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  29. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  30. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  31. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  32. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  33. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  34. CCP Implementation broadcast tag read/write port read/write port valid-bit valid-bit MIWO context MIWO words history usage words history usage = = = MIWO -- Miss Initiator Word Offset

  35. Experimental Setup • Applied noise prediction to L1 data cache • L1 Dcache of 16KB 4-way associative 32byte block size • Unified L2 cache of 512KB 8-way associative 64 byte block size • L1 Icache of 16KB direct mapped • ROB - 256 instructions • LSB - 64 entries • Issue Queue - 96 Int/64 FP

  36. Prediction Accuracies with 32/4, 16/8 & 16/4 CCP 32/4 16/8 16/4

  37. RESULTS

  38. Percentage Dynamic Energy Savings

  39. Prefetching • Processors employ prefetching to improve the cache miss rate. • Fetch the next cache block on a miss to exploit spatial locality. • The prefetched cache block is predicted to have the same pattern as that of the currently fetched block.

  40. Prefetching • Prefetched cache block updates the predictor table when evicted. • Prefetched cache block is stored without any context information • Whenever it is accessed for the first time, the context and the offset information is stored • Prefetched block does not update the predictor table when evicted.

  41. Prediction Accuracy with Prefetching Energy consumption reduced by about 22% Utilization increased by about 70% Miss Rate increased only by about 2% No Prefetching No Update Update

  42. Sub-blocking • Sub-blocking is used to • Reduce cache noise • Reduce bandwidth requirement • Limitations of sub-blocking • Increased miss rate Can we use cache noise prediction as an alternative to sub-blocking?

  43. Cache Noise Prediction vs Sub-blocking

  44. Conclusion • Cache noise is highly predictable. • Proposed cache noise predictors. • CCP achieves 75% prediction rate with correct prediction of 97% using a small 16 entry table. • Prediction without impact on IPC and minimal impact (0.1%) on miss rate. • Very effective with prefetching. • Compared to sub-blocking cache noise prediction based fetching improves • Miss rate by 97% and Utilization by 10%

  45. QUESTIONS ??? Prateek Pujara & Aneesh Aggarwal{prateek, aneesh}@binghamton.edu http://caps.cs.binghamton.edu State University of New York, Binghamton

More Related