1 / 35

Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines. Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt. HPCA 2007. Introduction. Caches are organized at linesize granularity

brody
Download Presentation

Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines Moinuddin K. Qureshi M. Aater Suleman Yale N. Patt HPCA 2007

  2. Introduction • Caches are organized at linesize granularity  Helps when spatial locality is high  Unused words when spatial locality is low • Unused words occupy space without contributing to cache hits • Filtering unused words allows cache to store more cache lines

  3. Problem: Not all words are useful Cache line (64B) divided into 8 words of 8B each (1 MB 8-way L2 cache) Words used per line (avg) On average less than 60% words used (4.7/8)

  4. Goal: Improving cache performance • Smaller linesize can result in fewer unused words • Smaller linesize degrades cache performance • Linesize of 32B increases MPKI for 14 of 16 benchmarks • Average MPKI increases by 25% Goal: Improving cache performance by filtering unused words Insight: Words usage stabilizes as line traverses from MRU to LRU

  5. MRU Recency Stack Pos 1 Pos 2 Pos 3 Pos 4 78% Pos 5 Pos 6 5% LRU Line Distillation (LDIS): Evict unused words when line crosses certain recency 6% 11% Insight Footprint = 8-bits per line that tracks word usage Max recency position before footprint update Most footprint updates occur early in recency stack

  6. Outline • Background • Line Distillation • Experimental Evaluation • Interaction with Compression • Related Work and Summary

  7. Line from memory WOC LOC valid bits footprint PROCESSOR ICACHE DCACHE (sectored) Framework for LDIS Distill Cache L2 Cache Line Organized Cache Word Organized Cache

  8. Evict A[1:6] Install A0,A7 Distill Cache (Operation) • Four cases: • Cache Miss: Access to line D • LOC Hit: Access to line B • WOC Hit: Access to line A (word A0) • Hole Miss: Access to line A (word A1) Traditional cache (4-way) Words used? MRU LRU A0,A7 C B D A (A0,A7 used) LOC WOC Invalidate all words of A in WOC. Fetch A from Memory and install in LOC Same as traditional cache Send A0 and A7 to L1 and valid bits Install Line D in LOC and update LRU state

  9. A0 E0 F0 G0 B0 H0 C0 D0 X0 X4 X5 X6 X1 X7 X2 X3 Median Threshold Filtering A line with many used words can evict several lines from WOC WOC Line X has all 8 words used 8 Lines evicted from WOC Increase lines in WOC by not installing lines for which used words > threshold “K” K = median words used in LOC line (computed at runtime)

  10. Outline • Background • Line Distillation • Experimental Evaluation • Interaction with Compression • Related Work and Summary

  11. Methodology • Configuration: • L2 cache: 1MB 8-way 64B linesize • (Distill cache gives 6 ways to LOC and 2 ways to WOC) • Out-of-order processor with 16KB 2-way L1s • 400 cycle memory • Benchmarks: • 15 SPEC2K benchmarks + health from olden suite • (A 250M instruction slice using SimPoint for SPEC2K)

  12. LDIS (No MT) LDIS (with MT) Results (%) Reduction in L2 MPKI LDIS (MT) reduces MPKI by 25%

  13. Set A Set B Set C Set D Set E Set F Set G Set H Distill cache ATD-LRU SCTR Set A Set B - + Set C Set D Set B Set E Set E Set F Set G Set G Set H Reverter Circuit (RC) • Tournament selection: Distill cache vs. traditional cache • Dynamic set sampling with 32 sets [Qureshi+ ISCA’06] (storage overhead of ATD: 1KB) For sets A, C, D, F, H: if (SCTR > 75%) Enable LDIS if (SCTR < 25%) Disable LDIS

  14. Results with RC LDIS (MT, No RC) LDIS (MT,RC) (%) Reduction in L2 MPKI RC disables LDIS when it increases MPKI. LDIS (MT,RC) reduces MPKI by 30%

  15. Overheads • Storage • Tags for WOC + footprint bits: 12.2% overhead • Latency • Tag-access (LOC+WOC) increases by one cycle • WOC hits incur two cycles to rearrange words • Power • Additional power of WOC tag-store

  16. IPC Results (%) IPC Improvement LDIS improves average IPC by 12%

  17. Outline • Background • Line Distillation • Experimental Evaluation • Interaction with Compression • Related Work and Summary

  18. Compression vs. LDIS • Several proposals to increase capacity via compression • Compression and LDIS fundamentally different • Compression exploits redundancy in stored data • LDIS leverages unused words for spare capacity • Footprint Aware Compression(FAC) combines both • FAC compresses used words before installing in WOC

  19. Results for FAC (%) Reduction in L2 MPKI 50 40 30 20 10 0 Compression FAC LDIS Compression and LDIS interact positively. FAC reduces MPKI by 50%

  20. Outline • Background • Line Distillation • Experimental Evaluation • Interaction with Compression • Related Work and Summary

  21. Related work • Spatial-Temporal Cache -Gonzales+ [ICS’95] • Spatial Locality Prediction –Johnson+ [ISCA’97] • Variable Linesize Cache –Veidenbaum+ [ICS’99] • Spatial Footprint Prediction –Kumar+ [ISCA’98], Pujara+ [HPCA’06] • Spatial Pattern Prediction -Chen+ [HPCA’05] LDIS is particularly suited for large caches and outperforms predictor-based techniques without requiring separate structure for tracking spatial footprint

  22. Contributions • Line Distillation: Filter unused words without a separate footprint predictor • Distill cache: Utilize extra capacity created by LDIS • Median Threshold Filtering and Reverter Circuit: Improve performance and robustness of LDIS Result: LDIS (MT+RC) reduces MPKI by 30% • Footprint Aware Compression: LDIS + compression Result: FAC reduces MPKI by 50%

  23. Questions

  24. Result comparing capacity

  25. Line Size vs. MPKI

  26. Distribution of Hit-Miss

  27. Average words usage (detailed)

  28. Result for 3 types of LDIS

  29. Replacement • LRU in LOC • WOC needs variable sized replacement • Only power-of-two sizes allowed in WOC • Placement constrained to alignment boundary • Random selection in case of multiple candidates

  30. Background (pictorial)

  31. Result LDIS vs. FAC (detailed)

  32. Comparison with SFP

  33. Appendix A: Other SPEC Benchmarks

  34. Appendix B: Cache Size vs. Density

  35. Summary • Many words in cache lines remain unused • Unused words unlikely to be accessed in less recent part of LRU stack  Line Distillation (LDIS) • Distill-cache utilizes extra capacity created by LDIS • LDIS reduces MPKI by 30% and improves IPC by 12% • “Footprint Aware Compression” combines LDIS and compression to reduce MPKI by 50%

More Related