1 / 26

PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches

PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches. Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying Tian. 36th ACM/IEEE International Symposium on Computer Architecture (ISCA ‘ 09). Manage Shared Caches.

gaye
Download Presentation

PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PIPP:Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H. Loh Georgia Institute of Technology Presented by: Yingying Tian 36th ACM/IEEE International Symposium on Computer Architecture (ISCA ‘09)

  2. Manage Shared Caches • Last Level Caches (LLCs) are shared by all cores in Chip Multi-Processors (CMPs). • Multiple cores compete for the limited LLC capacity. Core0 Core1 L1I L1D L1I L1D Core0’s Data Core1’s Data Last Level Cache (LLC) 2

  3. LRU leads to poor performance and fairness as a sharing-oblivious cache management policy. • Previous works tried to allocate LLC resources fairly via: • Capacity Management: way-partitioning (UCP) • Dead-Time Management: LRU insertion (TADIP) • PIPP: Do both capacity and dead time management better at the same time !

  4. Outline • Background and Motivation • Previous Work • PIPP • Evaluation • Conclusion

  5. UCP (Utility based Cache Partitioning) Core0 Core1 • ` Core 0 gets 5 ways Core 1 gets 3 ways *Some materials are taken from original presentation slides.

  6. DIP (Dynamic Insertion Policy) MRU LRU Incoming Block

  7. DIP (Dynamic Insertion Policy) MRU LRU Occupies one cache blockfor a long time with no benefit!

  8. DIP (Dynamic Insertion Policy) MRU LRU Incoming Block

  9. DIP (Dynamic Insertion Policy) MRU LRU

  10. DIP (Dynamic Insertion Policy) MRU LRU

  11. Cache Replacement Policy • Eviction: Which block should be replaced when a cache miss occurs? • LRU block • Insertion: For a coming block, where should it be inserted in the corresponding set? • MRU insertion (Default LRU replacement policy) • LRU insertion (Dead-on-arrival blocks) • Promotion: If a block is re-referenced, where should its position be adjusted? • Move to MRU position

  12. Insert Position = 3 (Target Allocation) New Promote To Evict Hit PIPP: Promotion/Insertion Pseudo-Partitioning • Insertion: Target partitioning: ∏ = {∏1, ∏2, …., ∏n}, ∑∏i= w (w is the associativity of the cache) On insertion, corei inserts its coming block in position ∏i. (Dynamically computed via UCP monitors or other ways.) • Promotion: One step toward MRU position with P and unchanged with 1-P. MRU LRU

  13. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request D Core1’s quota=3 1 A 2 3 4 B 5 C MRU LRU

  14. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request 6 Core0’s quota=5 1 A 2 3 4 D B 5 MRU LRU

  15. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request 7 Core0’s quota=5 1 A 2 6 3 4 D B MRU LRU

  16. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request D 1 A 2 7 6 3 4 D MRU LRU

  17. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request E Core1’s quota=3 3 1 A 2 7 6 D 4 MRU LRU

  18. PIPP Example Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request 2 1 A 2 7 6 E 3 D MRU LRU

  19. Pseudo-Partition Benefit Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request New Strict Partition MRU1 MRU0 LRU0 LRU1

  20. Pseudo-Partition Benefit Core0’s Block Core1’s Block Core0 quota: 5 blocks Core1 quota: 3 blocks Request New Pseudo Partition MRU LRU

  21. Methodology • SimpleScalar simulator for x86 • Intel Core 2 processor • 32KB, 8-way 3-cycle L1I-L1D for each core • A shared 4MB, 16-way, 11-cycle LLC • Multi-programmed workloads from SPEC CPU benchmarks. (2-core and 4-core workloads) • 500m insns warmup, 250m insns simulation

  22. Evaluation • 2-Core Weighted Speedup UCP Friendly TADIP Friendly PIPP outperforms LRU by 19.0%, UCP by 10.6%, TADIP by 10.1%

  23. 4-Core Weighted Speedup UCP Friendly TADIP Friendly PIPP outperforms LRU by 21.9%, UCP by 12.1%, TADIP by 17.5%

  24. Occupancy Control For most workloads, the partitioning deviation is within 1.0 of the target allocation, similar to UCP.

  25. Conclusion • Novel proposal on Insertion and Promotion • A single unified mechanism provides both capacity and dead time management • Outperforms prior UCP and TADIP

  26. Thank you ! Questions?

More Related