1 / 21

MadCache : A PC-aware Cache Insertion Policy

MadCache : A PC-aware Cache Insertion Policy. Andrew Nere , Mitch Hayenga , and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20, 2010. Executive Summary.

hoang
Download Presentation

MadCache : A PC-aware Cache Insertion Policy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and MikkoLipasti PHARM Research Group University of Wisconsin – Madison June 20, 2010

  2. Executive Summary • Problem: Changing hardware and workloads encourage investigation of cache replacement/insertion policy designs • Proposal: MadCache uses PC history to choose cache insertion policy • Last level cache granularity • Individual PC granularity • Performance improvements over LRU • 2.5% improvement IPC (single thread) • 4.5% speedup and 6% speedup improvement (multithreaded)

  3. Motivation • Importance of investigating cache insertion policies • Direct affect on performance • LRU dominated hardware designs for many years • Changing workloads, levels of caches • Shared last-level cache • Cache behavior now depends on multiple running applications • One streaming thread can ruin the cache for everyone

  4. Previous Work • Dynamic insertion policies • DIP – Qureshi et. al – ISCA ’07 • Dueling sets select best of multiple policies • Bimodal Insertion Policy (BIP) offers thrash protection • TADIP – Jaleel et. al – PACT ’08 • Awareness of other threads’ workloads • Utilizing Program Counter information • Exhibit a useful amount of predictable behavior • Dead-block prediction and prefetching – ISCA ’01 • PC-based load miss prediction – MICRO ’95

  5. MadCache Proposal • Problem: With changing hardware and workloads, caches are subject to suboptimal insertion policies • Solution: Use PC information to create a better policy • Adaptive default cache insertion policy • Track PCs to determine the policy on a finer grain than DIP • Filter out streaming PCs Introducing MadCache!

  6. MadCache Design • Tracker Sets • Sample behavior of the cache • Enter the PCs into PC-Predictor Table • Determines default policy of cache • Uses set dueling - Qureshi et. al – ISCA ’07 • LRU and Bypassing Bimodal Insertion Policy (BBIP) • Follower Sets • Majority of the last level cache • Typically follow the default policy • Can override default cache policy (PC-Predictor Table)

  7. Tracker and Follower Sets Reuse Bit Index to PC- Predictor Last Level Cache BBIP Tracker Sets LRU Trackers Sets Follower Sets • Tracker Sets overhead • 1-bit to indicate if line was accessed again • 10/11 bits to index PC-Predictor table

  8. MadCache Design • PC-Predictor Table • Store PCs that have accessed Tracker Sets • Track behavior history using counter • Decrement if an address is used many times in the LLC • Increment if line is evicted and was never reused • Per-PC default policy override • LRU (default) plus BBIP override • BBIP (default) plus LRU override

  9. PC-Predictor Table Default Policy PC-Predictor Table (MSB) Counter PC (miss) Policy + PC (MSB) Counter # Entries (1 + 64 bits) • (6 bits) (9 bits) Hit? 0 1 • Parallel to cache miss, PC + current policy index PC-Predictor • If hit in table, follow the PC’s override policy • If miss in table, follow global default policy

  10. Multi-Threaded MadCache • Thread aware MadCache • Similar structures as single-threaded MadCache • Track based on current policy of other threads • Multithreaded MadCache extensions • Separate tracker sets for each thread • Each thread still tracks LRU and BBIP • PC-Predictor table • Extended number of entries • Indexed by thread-ID, policy, and PC • Set dueling PER THREAD

  11. Multi-threaded MadCache Default Policy PC-Predictor Table TID-0 (MSB) Counter TID + <P0,P1,P2,P3> + PC (MSB) Counter # Entries TID-1 (10 bits) (2 + 4 + 64 bits) • (6 bits) (9 bits) TID-2 TID-3 Last Level Cache Hit? TID-0 BBIP Tracker Sets 0 1 TID-0 LRU Tracker Sets Other Tracker Sets Follower Sets

  12. MadCache – Example Application • Deep Packet Inspection1 • Large match tables (1MB+) commonly used for DFA/XFA regular expression matching • Incoming byte stream from packets causes different table traversals • Table exhibits reuse between packets • Packets mostly streaming (backtracking implementation dependent) 1Evaluating GPUs for Network Packet Signature Matching – ISPASS ‘09

  13. MadCache – Example Application Current Processing Element Match Table Current Processing Element Current Processing Element Packet Current Processing Element Packet Current Processing Element Current Processing Element Packet • Packets mostly streaming • Frequently accessed Match Table contents held in L1/L2 • Less frequently accessed elements in LLC/memory

  14. MadCache – Example Application • DIP • Would favor BIP policy due to packet data streaming • LLC mixture of Match Table and useless packet data • MadCache • Would identify PCs associated with Match Table as useful • LLC populated almst entirely by Match Table Packet Data Table Data DIP LLC MadCache LLC

  15. Experimentation • 15 benchmarks from SPEC CPU2006 • 15 workload mixes for multithreaded experiments • 200 million cycle simulations

  16. Results – Single-threaded IPC normalized to LRU • 2.5% improvement across benchmarks tested • Slight improvement over DIP

  17. Results – Multithreaded Throughputnormalized to LRU • 6% improvement across mixes tested • DIP performs similarly to LRU

  18. Results Weighted speedup normalized to LRU • 4.5% improvement across benchmaks tested • DIP performs similarly to LRU

  19. Future Work • MadderCache? • Optimize size of structures • PC-Predictor Table size • Replace CAM with Hashed PC & Tag • Detailed analysis of benchmarks with MadCache • Extend PC Predictions • Don’t take into account sharers

  20. Conclusions • Cache behavior still evolving • Changing cachelevels, sharing,workloads • MadCache insertion policy uses PC information • PCs exhibit useful amount of predictable behavior • MadCache performance • 2.5% improvement IPC for single-threaded • 4.5% speedup, 6% throughput improvement for 4-threads • Sized to competition bit budget • Preliminary investigations show little impact with reduction in structures

  21. Questions?

More Related