1 / 34

SHiP : Signature-based Hit Predictor for High Performance Caching

SHiP : Signature-based Hit Predictor for High Performance Caching. * Carole-Jean Wu, # Aamer Jaleel , #, + William Hasenplaugh, * Margaret Martonosi, # Simon Steely Jr., #, + Joel Emer * Princeton University # Intel Corporation, VSSAD #,+ MIT.

shalom
Download Presentation

SHiP : Signature-based Hit Predictor for High Performance Caching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SHiP: Signature-based Hit Predictor forHigh Performance Caching *Carole-Jean Wu,#Aamer Jaleel, #,+William Hasenplaugh, *Margaret Martonosi, #Simon Steely Jr., #,+Joel Emer *Princeton University #Intel Corporation, VSSAD #,+MIT IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)

  2. Motivation • Factors making caching important • Increasing ratio of CPU speed to memory speed • Multi-core poses challenges on better shared cache management • LRU has been the standard LLC replacement policy • However LRU has problems!

  3. Problems with LRU Replacement • Working set larger than the cache causes thrashing miss miss miss miss miss Wsize LLCsize • References to non-temporal data (scans) discards frequently referenced working set hit hit hit miss hit miss miss scan scan scan LLCsize Wsize • scansoccur frequently in commercial workloads

  4. Desired Behavior from Cache Replacement • Working set larger than the cache  Preserve some of working set in the cache hit hit hit hit hit miss miss miss miss miss Wsize LLCsize [ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ] • Recurring scans  Preserve frequently referenced working set in the cache hit hit hit hit hit hit hit scan scan scan [ SRRIP (ISCA’10) achieves this effect ]

  5. Dynamic Re-Reference Interval Prediction ( DRRIP ) (SRRIP) Scan-Resistant ( BRRIP ) Thrash-Resistant insertion insertion 0 Imme- diate 1 Inter- mediate 2 far 3 distant No Victim No Victim No Victim re-reference eviction re-reference re-reference [ Jaleel et al., ISCA’10 ]

  6. SRRIP Not Always Scan Resistant… • LONG scans in access pattern hit miss hit hit miss “short” scan “long” scan

  7. SRRIP Not Always Scan Resistant… • LONG scans in access pattern hit miss hit hit miss “short” scan “long” scan • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans miss miss miss miss scan scan scan

  8. SRRIP Not Always Scan Resistant… • LONG scans in access pattern hit hit miss hit hit miss “short” scan “long” scan • Active working-set MUST beRE-REFERENCED at least ONCEbetween scans miss miss hit hit miss miss hit scan scan scan • Can We Be More Intelligent in Dealing with Scans?

  9. Closer Look at Scan Access Patterns scan scan No Future References Future Reference • Assuming Perfect Knowledge of Re-Reference Pattern

  10. Improving RRIP on Cache Insertions  Improve Insertion  scan 0 Imme- diate 1 Inter- mediate 2 far 3 distant No Victim No Victim No Victim re-reference eviction re-reference re-reference • Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion

  11. Focus of this Paper… • Goal: Learn re-reference interval of a cache line PREDICTOR 0: immediate 1: intermediate 2: far 3: distant cache access re-reference prediction • How Best to Learn the Re-Reference Interval?

  12. Learning Re-Reference Behavior scan scan REFERENCE SAME MEMORY REGION REFERENCED BY SIMILAR SET OF PCs • Can We Learn Re-References By Correlating Accesses With Some Other Information?

  13. Learning Re-Reference Behavior scan scan REFERENCE SAME MEMORY REGION REFERENCED BY SIMILAR SET OF PCs • Can We Learn Re-References By Correlating Accesses With Some Other Information?

  14. Using Signatures to Correlate Re-Reference • Different types of information: • Memory Region • Memory Instruction PC • Instruction Sequence • Observation: LLC accesses by the same “signature” tend to have similar re-reference patterns scan scan “signature“ • OBSERVE, LEARNandPREDICT Re-Reference Pattern of a Signature

  15. Observe Signature Re-Reference Behavior • Observe re-reference pattern in the baseline cache Address Load/Store • Cache Tag • Replacement State • Coherence State LLC

  16. Observe Signature Re-Reference Behavior • Observe re-reference pattern in the baseline cache • Hardware Required: • Was line re-referenced after cache insertion ( 1-bit ) • “Signature” responsible for cache insertion ( 14-bits ) Signature Address Load/Store • reuse bit • signature_insert metadata LLC

  17. Learn Signature Re-Reference Behavior • Learn signature re-reference behavior • Hardware Required: • Signature History Counter Table (SHCT) ( 16K, 2-bit counters ) • SHCT Training: • If evicted line reused: SHCT [ signature_insert ] ++ • If evicted line NOT reused: SHCT [ signature_insert ] -- counter = 0, signature NOT re-referenced counter != 0, signature re-referenced SHCT Last Level Cache (LLC)

  18. Signature-based Hit Predictor (SHiP) • Predict re-reference interval of line using SHCT SHiP SHCT 0: immediate 1: intermediate 2: far 3: distant cache hit/miss re-reference prediction signature

  19. Signature-based Hit Predictor (SHiP) • Predict re-reference interval using SHCT on CACHE MISS SHiP Re-Reference Predictions On Miss if ( SHCT [ signature ] == 0 ) if ( SHCT [ signature ] == 0 ) 0: immediate 1: intermediate 2: far 3: distant cache miss re-reference prediction predict DISTANT (i.e. 3) signature else predict FAR (i.e. 2)

  20. Signature-based Hit Predictor (SHiP) • Predict re-reference interval on CACHE HIT SHiP Re-Reference Predictions On Hit 0: immediate 1: intermediate 2: far 3: distant cache hit re-reference prediction Always predict IMMEDIATE (i.e. 0) signature

  21. SHiP – High Level Architectural Overview Signature Address Access Type data hit/miss SHiP SHCT Training SHCT signature_insert reuse_bit LLC hit/miss Re-Reference Prediction Last Level Cache (LLC)

  22. SHiP – High Level Architectural Overview Per-Line Overhead Can Be Reduced by using Set Sampling ( need only 32 - 64 sets ) Signature Address Access Type data hit/miss SHiP SHCT Training SHCT signature_insert reuse_bit LLC hit/miss Last Level Cache (LLC) Re-Reference Prediction

  23. SHiP – High Level Architectural Overview Per-Line Overhead Can Be Reduced by using Set Sampling ( need only 32 - 64 sets ) Address Access Type Signature data hit/miss SHiP SHCT Training SHCT ~6 KB NO CHANGE signature_insert reuse_bit LLC hit/miss Last Level Cache (LLC) Re-Reference Prediction

  24. Performance Comparison of Replacement Policies 16-way 2MB LLC Core i7 Type Hierarchy SHiP Significantly Improves Performance Across All Workload Categories

  25. Performance Comparison of Replacement PoliciesCRC Results Comparison 16-way 1MB Private Cache 65 Single-Threaded Workloads Averaged Across PC Games, Multimedia, Enterprise Server, SPEC CPU2006 Workloads S H i P SHiP • 16-way 4MB Shared Cache • 165 4-core Workloads SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies

  26. Total Storage Overhead (16-way Set Associative Cache) • LRU: 4-bits / cache block • Pseudo-LRU 1-bit / cache block • RRIP: [ ISCA’10 ] 2-bits / cache block • Seg-LRU: [ CRC’10 ] ~8-bits / cache block • SDBP: [ MICRO’10 ] ~10-bits / cache block • SHiP: [ MICRO’11 ] ~5-bits / cache block SHiP Outperforms State-of-the-Art with HW Similar to LRU

  27. Summary • Scan-resistance is an important problem in commercial workloads • State-of-the art policies do not fully address scan-resistance • Signatures help improve re-reference predictions to address scans • Need fine-grained re-reference predictions at insertion • Proposed a Simple and Practical Scan-Resistant Replacement • SHiP significantly outperforms winner of CRC Championship • SHiP requires less storage than CRC winner • HW overhead of SHiP is comparable to LRU

  28. Q&A

  29. Q&A

  30. Q&A

  31. Re-Reference Interval Prediction ( RRIP ) CAN INSERTION BE MORE INTELLIGENT? Scan-Resistant insertion 0 Imme- diate 1 Inter- mediate 2 far 3 distant No Victim No Victim No Victim re-reference eviction re-reference re-reference

  32. Using Signatures to Correlate Re-Reference Behavior SIGN ATURE a b a c d c Example Signatures Memory Region Program Counter Instruction Decode History scan scan No Future Cache Hits Future Cache Hits c a b d

  33. LRU vs. Re-Reference Interval Prediction (RRIP) 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 Physical Way # Physical Way # LRU Cache Tag Cache Tag c c g g d f h s e s b h b f d e “LRU Chain” position Re-Reference Prediction 1 0 2 2 RRIP Outperforms LRU with Storage Less Than LRU 5 4 3 6 0 7 0 2 2 2 3 0 3 1 RRIP

  34. Signature-based Hit Predictor (SHiP) • Goal: Predict the re-reference behavior of a signature • Learn Re-Reference Behavior: Signature Address Access Type data hit/miss LLC

More Related