1 / 26

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture. Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005. Cache Sharing in CMP. Processor Core 1. Processor Core 2. L1 $. L1 $.

Download Presentation

Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture Dhruba Chandra Fei Guo Seongbeom Kim Yan Solihin Electrical and Computer Engineering North Carolina State University HPCA-2005

  2. Cache Sharing in CMP Processor Core 1 Processor Core 2 L1 $ L1 $ L2 $ …… Chandra, Guo, Kim, Solihin - Contention Model

  3. Need a model to understand cache sharing impact Impact of Cache Space Contention • Application-specific (what) • Coschedule-specific (when) • Significant: Up to 4X cache misses, 65% IPC reduction Chandra, Guo, Kim, Solihin - Contention Model

  4. Related Work • Uniprocessor miss estimation: Cascaval et al., LCPC 1999 Chatterjee et al., PLDI 2001 Fraguela et al., PACT 1999 Ghosh et al., TPLS 1999 J. Lee at al., HPCA 2001 Vera and Xue, HPCA 2002 Wassermann et al., SC 1997 • Context switch impact on time-shared processor: Agarwal, ACM Trans. On Computer Systems, 1989 Suh et al., ICS 2001 • No model for cache sharing impact: • Relatively new phenomenon: SMT, CMP • Many possible access interleaving scenarios Chandra, Guo, Kim, Solihin - Contention Model

  5. Contributions • Inter-Thread cache contention models • 2 Heuristics models (refer to the paper) • 1 Analytical model • Input: circular sequence profiling for each thread • Output: Predicted num cache misses per thread in a co-schedule • Validation • Against a detailed CMP simulator • 3.9% average error for the analytical model • Insight • Temporal reuse patterns impact of cache sharing Chandra, Guo, Kim, Solihin - Contention Model

  6. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  7. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  8. Assumptions • One circular sequence profile per thread • Average profile yields high prediction accuracy • Phase-specific profile may improve accuracy • LRU Replacement Algorithm • Others are usu. LRU approximations • Threads do not share data • Mostly true for serial apps • Parallel apps: threads likely to be impacted uniformly Chandra, Guo, Kim, Solihin - Contention Model

  9. Outline • Model Assumptions • Definitions • Inductive Probability (Prob) Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  10. seq(5,8) cseq(5,7) cseq(4,5) cseq(1,2) Definitions • seqX(dX,nX) = sequence of nX accesses to dX distinct addresses by a thread X to the same cache set • cseqX(dX,nX) (circular sequence) = a sequence in which the first and the last accesses are to the same address A B C D A E E B Chandra, Guo, Kim, Solihin - Contention Model

  11. Circular Sequence Properties • Thread X runs alone in the system: • Given a circular sequence cseqX(dX,nX), the last access is a cache miss iff dX > Assoc • Thread X shares the cache with thread Y: • During cseqX(dX,nX)’s lifetime ifthere is a sequence of intervening accesses seqY(dY,nY), the last access of thread X is a miss iff dX+dY > Assoc Chandra, Guo, Kim, Solihin - Contention Model

  12. Y’s intervening access sequence X’s circular sequence cseqX(2,3) lifetime A B A U V V W Example • Assume a 4-way associative cache: No cache sharing: A is a cache hit Cache sharing: is A a cache hit or miss? Chandra, Guo, Kim, Solihin - Contention Model

  13. Y’s intervening access sequence X’s circular sequence cseqX(2,3) A B A U V V W Cache Hit Cache Miss Example • Assume a 4-way associative cache: A U B V V A W A U B V V W A seqY(3,4) intervening in cseqX’s lifetime seqY(2,3) intervening in cseqX’s lifetime Chandra, Guo, Kim, Solihin - Contention Model

  14. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  15. Inductive Probability Model • For each cseqX(dX,nX) of thread X • Compute Pmiss(cseqX): the probability of the last access is a miss • Steps: • Compute E(nY): expected number of intervening accesses from thread Y during cseqX’s lifetime • For each possible dY, compute P(seq(dY, E(nY)): probability of occurrence of seq(dY, E(nY)), • If dY + dX > Assoc, add to Pmiss(cseqX) • Misses = old_misses + ∑ Pmiss(cseqX) x F(cseqX)  Chandra, Guo, Kim, Solihin - Contention Model

  16. Computing P(seq(dY, E(nY))) • Basic Idea: • P(seq(d,n)) = A * P(seq(d-1,n)) + B * P(seq(d-1,n-1)) • Where A and B are transition probabilities • Detailed steps in paper seq(d,n) + 1 access to a distinct address + 1 access to a non-distinct address seq(d-1,n-1) seq(d,n-1) Chandra, Guo, Kim, Solihin - Contention Model

  17. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  18. Validation • SESC simulator • Detailed CMP + memory hierarchy • 14 co-schedules of benchmarks (Spec2K and Olden) • Co-schedule terminated when an app completes Chandra, Guo, Kim, Solihin - Contention Model

  19. Validation Error = (PM-AM)/AM • Larger error happens when miss increase is very large • Overall, the model is accurate Chandra, Guo, Kim, Solihin - Contention Model

  20. Other Observations • Based on how vulnerable to cache sharing impact: • Highly vulnerable (mcf, gzip) • Not vulnerable (art, apsi, swim) • Somewhat / sometimes vulnerable (applu, equake, perlbmk, mst) • Prediction error: • Very small, except for highly vulnerable apps • 3.9% (average), 25% (maximum) • Also small for different cache associativities and sizes Chandra, Guo, Kim, Solihin - Contention Model

  21. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  22. Case Study • Profile approx. by geometric progression F(cseq(1,*)) F(cseq(2,*)) F(cseq(3,*)) … F(cseq(A,*)) … Z Zr Zr2 … ZrA … • Z = amplitude • 0 < r < 1 = common ratio • Larger r  larger working set • Impact of interfering thread on the base thread? • Fix the base thread • Interfering thread: vary • Miss frequency = # misses / time • Reuse frequency = # hits / time Chandra, Guo, Kim, Solihin - Contention Model

  23. Base Thread: r = 0.5 (Small WS) • Base thread: • Not vulnerable to interfering thread’s miss frequency • Vulnerable to interfering thread’s reuse frequency Chandra, Guo, Kim, Solihin - Contention Model

  24. Base Thread: r = 0.9 (Large WS) • Base thread: • Vulnerable to interfering thread’s miss and reuse frequency Chandra, Guo, Kim, Solihin - Contention Model

  25. Outline • Model Assumptions • Definitions • Inductive Probability Model • Validation • Case Study • Conclusions Chandra, Guo, Kim, Solihin - Contention Model

  26. Conclusions • New Inter-Thread cache contention models • Simple to use: • Input: circular sequence profiling per thread • Output: Number of misses per thread in co-schedules • Accurate • 3.9% average error • Useful • Temporal reuse patterns cache sharing impact • Future work: • Predict and avoid problematic co-schedules • Release the tool at http://www.cesr.ncsu.edu/solihin Chandra, Guo, Kim, Solihin - Contention Model

More Related