1 / 27

Increasing the Cache Efficiency by Eliminating Noise

Increasing the Cache Efficiency by Eliminating Noise. Philip A. Marshall. Outline. Background Motivation for Noise Prediction Concepts of Noise Prediction Implementation of Noise Prediction Related Work Prefetching Data Profiling Conclusion. Background. Cache Fetch On Cache Miss

hova
Download Presentation

Increasing the Cache Efficiency by Eliminating Noise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall

  2. Outline • Background • Motivation for Noise Prediction • Concepts of Noise Prediction • Implementation of Noise Prediction • Related Work • Prefetching • Data Profiling • Conclusion

  3. Background • Cache Fetch • On Cache Miss • Prefetch • Exploiting Spatial Locality • Cache words are fetched in blocks • Fetch neighboring block(s) on a cache miss • Results in fewer cache misses • Fetches words that aren’t needed

  4. Background • Cache noise • Words that are fetched into the cache but never used • Cache utilization • The fraction of words in the cache that are used • Represents how efficiently the cache is used

  5. Motivation for Noise Prediction • Level 1 data cache utilization is ~57% for SPEC2K benchmarks [2] • Fetching unused words: • Increases bandwidth requirements between cache levels • Increases hardware and power requirements • Wastes valuable cache space [2] D. Burger et. al., Memory bandwidth limitations of future microprocessors, Proc. ISCA-23, 1996

  6. Motivation for Noise Prediction • Cache block size • Larger blocks • Exploit spatial locality better • Reduce cache tag overhead • Increase bandwidth requirements • Smaller blocks • Reduced cache noise • Any block size results in suboptimal performance

  7. Motivation for Noise Prediction • Sub-blocking • Only portions of the cache blocks are fetched • Decreases tag overhead by associating one tag with many sub-blocks • Words fetched must be in contiguous blocks of fixed size • High miss-rate and cache noise for non-contiguous access patterns

  8. Motivation for Noise Prediction • By predicting which words will actually be used, cache noise can be reduced • But: • Fetching fewer words could increase the number of cache misses

  9. Concepts of Noise Prediction • Selective fetching • For each block, fetch only the words that are predicted to be accessed • If no prediction is available, fetch the entire block • Uses a valid bit for each word and a words usage bit to track which words have been used

  10. Concepts of Noise Prediction • Cache Noise Predictors • Phase Context Predictor (PCP) • Based on the usage pattern of the most recently evicted block • Memory Context Predictor (MCP) • Based on the MSBs of the memory address • Code Context Predictor (CCP) • Based on the MSBs of the PC

  11. Concepts of Noise Prediction • Prediction table size • Larger tables decrease the probability of “no predictions” • Smaller tables use less power • A prediction is considered successful if all the needed words are fetched • If extra words are fetched, still considered a success

  12. Concepts of Noise Prediction • Improving Prediction • Miss Initiator Based History (MIBH) • Keep separate histories according to which word in the block caused the miss • Improves predictability if relative position of words accessed is fixed • Example: looping through a struct and accessing only one field

  13. Concepts of Noise Prediction • Improving Prediction • OR-ing Previous Two Histories (OPTH) • Increases predictability by looking at more than the most recent access • Reduces cache utilization • OR-ing more than two accesses reduces utilization substantially

  14. Results • Empirically, CCP provides the best results • MIBH greatly increases predictability • OPTH improves predictability only marginally while increasing cache noise • Cache utilization increased from 57% to 92%

  15. Results

  16. Results

  17. Related Work • Existing work focuses reducing cache misses, not on improving utilization • Sub-blocked caches used mainly to decrease tag overhead • Some existing work on prediction of which sub-blocks to load in a sub-blocked cache • No existing techniques for predicting and fetching non-contiguous words

  18. Related Work

  19. Prefetching • Prefetching improves the cache miss rate • Commonly, prefetching is implemented by also fetching the next block on a cache miss • Prefetching increases noise and increases bandwidth requirements

  20. Prefetching • Noise prediction leads to more intelligent prefetching but requires extra hardware • On average, prefetching with noise prediction leads to less energy consumption • In the worst case, energy requirements increase

  21. Prefetching

  22. Data Profiling • For some benchmarks there are a low number of predictions • The predictor table is too small to hold all the word usage histories • Don’t increasing table size, profile the data • Profiling increases prediction rate by ~7% • Gains aren’t as high as expected

  23. Data Profiling

  24. Analysis of Noise Prediction • Pros • Small increase in miss rate (0.1%) • Decreased power requirements in most cases • Decreased bandwidth requirements between cache levels • Adapts effective block size to access patterns • Dynamic technique but profiling can be used • Scaleable to different predictor sizes

  25. Analysis of Noise Prediction • Cons • Increased hardware overhead • Increases power in the worst case • Not all programs benefit • Profiling provides limited improvement

  26. Other Thoughts • How were benchmarks chosen? • 6 of 12 integer and 8 of 14 floating point SPEC2K benchmarks were used • Not all predictors were examined equally • 22-bit MCP predictor performed slightly poorer than a 28-bit CCP • 28-bit MCP? • How can the efficiency of the prediction table be increased?

More Related