1 / 19

Optimizing Replication, Communication, and Capacity Allocation in CMPs

Optimizing Replication, Communication, and Capacity Allocation in CMPs. Z. Chishti, M. D. Powell, and T. N. Vijaykumar Presented by: Siddhesh Mhambrey. Published in Proceedings of the 32nd International Symposium on Computer Architecture, pages 357-368, June 2005. Motivation.

moses-nixon
Download Presentation

Optimizing Replication, Communication, and Capacity Allocation in CMPs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing Replication, Communication, and Capacity Allocation in CMPs Z. Chishti, M. D. Powell, and T. N. Vijaykumar Presented by: Siddhesh Mhambrey Published in Proceedings of the 32nd International Symposium on Computer Architecture, pages 357-368, June 2005.

  2. Motivation • Emerging trend for CMPs • New Challenges in Cache design policies • Increased capacity pressure on the on-chip memory- Need for large on chip capacity for multiple cores • Increased cache latencies in large caches- Wire delays Need for a cache design that tackles these challenges

  3. Cache Organization • Goal: • Utilize Capacity Effectively- Reduce capacity misses • Mitigate Increased Latencies- Keep wire delays small • Shared • High Capacity but increased latency • Private • Low Latency but limited capacity Neither private nor shared caches provide both goals

  4. Latency-Capacity Tradeoff • SMPs and DSMs have same goals in terms of cache design • Capacity • CMPs have limited on-chip memories • SMPs have large off-chip memories • Latency of accesses • SMPs have slow off-chip access • CMPs have fast on-chip access CMPs change Latency-Capacity Tradeoff in two ways

  5. Novel Mechanisms • Controlled Replication • Avoid copies for some read-only shared data • In-Situ Communication • Use fast on-chip communication to avoid coherence miss of read-write-shared data • Capacity Stealing • Allow a core to steal another core’s unused capacity • Hybrid cache • Private Tag Array and Shared Data Array • CMP-NuRAPID(Non-Uniform access with Replacement and Placement using Distance associativity) • Performance • CMP-NuRAPID improves performance by 13% over a shared cache and 8% over a private cache for three commercial multithreaded workloads Three novel mechanisms to exploit the changes in Latency-Capacity tradeoff

  6. CMP-NuRAPID • Non-Uniform Access and Distance Associativity • Caches divided into d-groups • D-group preference 4-core CMP with CMP-NuRAPID

  7. CMP-NuRAPID Organization Data Array Tag Arrays CMP NuRAPID Tag and Data Arrays

  8. CMP-NuRAPID Organization • Private Tag Array • Shared Data Array • Leverages forward and reverse pointers Single copy of block shared by multiple tags Data for one core in different d-groups Extra Level of Indirection for novel mechanisms

  9. Mechanisms • Controlled Replication • In-Situ Communication • Capacity Stealing

  10. Controlled Replication • On a read miss- Updates tag pointer to point to the already-on-chip block • On a subsequent read-Data copy is made in the reader’s closest d-group to avoid slow accesses in future

  11. Mechanisms • Controlled Replication • In-Situ Communication • Capacity Stealing

  12. In-Situ Communication • Enforce single copy of read-write shared block in L2 and keep the block in communication (C) state Replace M to S transition by M to C transition Fast communication with capacity savings

  13. Mechanisms • Controlled Replication • In-Situ Communication • Capacity Stealing

  14. Capacity Stealing • Demotion: Demote less frequently used data to un-used frames in d-groups closer to core with less capacity demands. • Promotion: if tag hit occurs on a block in farther d-group promote it Data for one core in different d-groups Use of unused capacity in a neighboring core

  15. Methodology • Full-system simulation of 4-core CMP using Simics • CMP NuRAPID: 8 MB, 8-way • 4 d-groups,1-port for each tag array and data d-group • Compare to • Private 2 MB, 8-way, 1-port per core • CMP-SNUCA: Shared with non-uniform-access, no replication

  16. Results Multi-Threaded Workloads Multi-programmed Workloads

  17. Summary

  18. Conclusions • CMPs change the Latency Capacity tradeoff • Controlled Replication, In-Situ Communication and Capacity Stealing are novel mechanisms to exploi the change in the Latency-Capacity tradeoff • CMP-NuRAPID is a hybrid cache that uses incorporates the novel mechanisms • For commercial multi-threaded workloads– 13% better than shared, 8% better than private • For multi-programmed workloads– 28% better than shared, 8% better than private

  19. Thank you Questions?

More Related