1 / 25

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

2011 International Symposium on Performance Analysis on Systems and Software (ISPASS). Characterization and Dynamic Mitigation of Intra-Application Cache Interference. Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011. Today’s CMP systems. Memory Controller. L1D$. L1I$.

ayanna
Download Presentation

Characterization and Dynamic Mitigation of Intra-Application Cache Interference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi Princeton University 4/11/2011 1/23

  2. Today’s CMP systems Memory Controller L1D$ L1I$ L1I$ L1D$ L1I$ L1D$ L1I$ L1D$ L2$ L2$ L2$ L2$ Operating System SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 App. 2 Communication Bridge App. 2 App. 3 App. 1 App. 4 IO & QPI IO & QPI Shared 8MB L3 Cache 1/23

  3. Within a single application, cache interference can stem from… Memory Controller L1D$ L1I$ L1D$ L1D$ L1I$ L1D$ L1I$ L1I$ L2$ L2$ L2$ L2$ Operating System SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 Communication Bridge App. 1 IO & QPI IO & QPI HW Prefetch Req. TLB Miss Handling Shared 8MB L3 Cache Other OS Req. App. Data ld/st 2/23

  4. Real-System LLC Miss Characterization >50% of LLC misses are due to prefetching, TLB miss handling, other OS refs, etc. 3/23

  5. Prior Work for Intra-Application Cache Interference • But all require hardware modification • System-induced Cache Interference • Characterization indicates significant OS/user cache interference [Agarwal et al. TOC ’88][Torrellas et al. ASPLOS ’92] • Reduce TLB miss handling effects [Jacob, Mudge ASPLOS ’98][Bhargava et al. ASPLOS ’08] [Barr, Cox, and Rixner ISCA ’10] • Prefetch-induced Cache Interference • Prefetch buffer/filter [Peir et al. ICS ’02] [Hur and Lin MICRO ’06] • Replacement policies (Prefetch bit per cache line) [Alameldeen and Wood ISCA ’07] [Lin et al. HPCA ’01] • Prefetching algorithms [Ebrahimi et al. MICRO ’09] [Nesbit et al. ISCA ’07] [Iacobovici et al. ICS ’04] 1/23 4/23

  6. Contributions of This Paper • Cache interference within an application is a problem • Real-system characterization • Detailed full-system simulation • Dynamic management mechanisms • System-aware cache management • Real-system, real-time prefetch manager 1/23 5/23

  7. Talk Outline • Motivation and Prior Work • Measurement Methodology • Intra-Application Interference Characterization • Dynamic Mitigation of LLC Interference • System-Aware Cache Management • Real-System Dynamic Prefetch Manager • Conclusion 1/23 6/23

  8. Measurement Methodology • Real-system infrastructure • Intel Nehalem-based Core i7 (Bloomfield) • perfmon2to access hardware PMCs • Full-system simulation: Simics/GEMS • Simics/GEMS full system simulation • Benchmarks • SPEC CPU2006 benchmark suite 1/23 7/23

  9. System-Mode Reference Breakdown 80% of system references are due to TLB miss handling(details in the paper). 1/23 8/23

  10. Memory Reuse Characteristics Analysis for User References User System System cache lines destroy good data locality of user lines when sharing the cache! 1/23 9/23

  11. Memory Reuse Characteristics Analysis for System References User System Majority of system cache lines are not reused. Bypassing system cache lines? 1/23 10/23

  12. System-Aware Cache Management LRU MRU 0xEEEA Refs . . . . . . . . 1/23 11/23

  13. System-Aware Cache Management LRU MRU Refs 0X001A MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU 0XDFAE MID 1/23 12/23

  14. System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU 0XDFAE …. MID system SYS-LRUinsert 1/23 13/23

  15. System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xEEEA LRU …. MID system SYS-MIDinsert 1/23 14/23

  16. System-Aware Cache Management LRU MRU user Refs MRU 0XDADA . . . . 0XEEAF 0X1234 . . . . 0xBEEF LRU …. MID system SYS-DYNAMIC *Set sampling: DIP [Qureshi et al. ISCA ‘07] 1/23 15/23

  17. IPC Performance Improvement SYS-DYNAMIC improves performance for ALLapplications by as much as 10% (avg. of 3%). 1/23 16/23

  18. Talk Outline • Motivation and Prior Work • Measurement Methodology • Intra-Application Interference Characterization • Dynamic Mitigation of LLC Interference • System-Aware Cache Management • Real-System Dynamic Prefetch Manager • Conclusion 1/23 17/23

  19. Intra-application cache interference can also stem from hardware prefetching Memory Controller L1D$ L1I$ L1D$ L1I$ L1D$ L1I$ L1I$ L1D$ L2$ L2$ L2$ L2$ SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 L1 Instruction & Streamer Prefetchers Communication Bridge IO & QPI IO & QPI Mid-Level Cache (MLC) Spatial & Streamer Prefetchers Shared 8MB L3 Cache 1/23 18/23

  20. Intra-Application Interference Caused by Hardware Prefetching MLC Prefetcher OFF  Less LLC Misses for libquantum and sphinx3 1/23 19/23

  21. Dynamic Prefetch Management K Inst. K Inst. . . . . . N time MLC prefetchersON OFF ON Read RDTSC Read RDTSC t0 t1 t2 if ( t2 - t1 > t1 – t0) Turn ON MLC prefetchers; else Turn OFF MLC prefetchers; Use Nehalem’s Precise Event Based Sampling (PEBS) Sample application inst. count periodically. 1/23 20/23

  22. Dynamic Management Mitigating Prefetch-Induced LLC Interference Dynamic modulation of MLC prefetchers>> Static ON/OFF prefetch options. 1/23 21/23

  23. Summary • Dynamic System-AwareCache Management • Full-system evaluation (OS effects) • Performance improvement by as much as 10% (on avg. 3%). • Real-time Dynamic Prefetch Manager • Real-system implementation on Nehalem PEBS • 25% LLC miss count reduction  performance+, bandwidth & energy saving 1/23 22/23

  24. Characterization and Dynamic Mitigation of Intra-Application Cache Interference Memory Controller L1D$ L1I$ L1D$ L1D$ L1I$ L1D$ L1I$ L1I$ L2$ L2$ L2$ L2$ Operating System *Intra-application* cache Interference from modern hardware prefetching & OS influence app. performance significantly! SMT CPU Core 0 SMT CPU Core 1 SMT CPU Core 2 SMT CPU Core 3 Communication Bridge App. 1 IO & QPI IO & QPI HW Prefetch Req. TLB Miss Handling Shared 8MB L3 Cache App. Data ld/st Other OS Req. 1/23 23/23

  25. 2011 International Symposium on Performance Analysis on Systems and Software (ISPASS) Characterization and Dynamic Mitigation of Intra-Application Cache Interference Carole-Jean Wu and Margaret Martonosi {carolewu, mrm}@princeton.edu 1/23

More Related