1 / 19

Shared Last-Level TLBs for Chip Multiprocessors

Shared Last-Level TLBs for Chip Multiprocessors. Abhishek Bhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011. Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar. Translation Lookaside Buffer. Contribution. SLL TLB design explored for the first time

monita
Download Presentation

Shared Last-Level TLBs for Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shared Last-Level TLBs for Chip Multiprocessors AbhishekBhattacharjee Daniel Lustig Margaret Martonosi HPCA 2011 Presented by: Apostolos Kotsiolis CS 7123 – Research Seminar

  2. Translation Lookaside Buffer

  3. Contribution • SLL TLB design explored for the first time • Analyze SLL TLB benefits for parallel programs • Analyze multi-programmed fashion workloads consisting of sequential applications

  4. Previous and Related work • Private Multilevel TLB Hierarchies • Intel i7, AMD K7-K8-K10, SPARC64-III • No Sharing between cores • Waste of resources • Inter-Core Cooperative Prefetching • Two types of predictable misses: • Inter-Core Shared (ICS) • Leader-Follower Prefetching • Inter-Core Predictable Stride (ICPS) • Distance-Based Cross-Core Prefetching

  5. Shared Last-Level TLBs • Exploit inter-core sharing in parallel programs • Flexible regarding where entries can be placed • Both parallel and sequential workloads are benefited • Greater Hit rate • CPU Performance boosted

  6. Shared Last-Level TLBs

  7. Shared Last-Level TLBs with simple Stride Prefetching

  8. Methodology • Parallel applications • Different Sequential application on each core • Two distinct evaluation sets

  9. Methodology • Benchmarks

  10. SLL TLBs: Parallel Workload Results • SLL TLBs versus Private L2 TLBs

  11. SLL TLBs: Parallel Workload Results • SLL TLBs versus ICC Prefetching

  12. SLL TLBs: Parallel Workload Results • SLL TLBs versus ICC Prefetching

  13. SLL TLBs: Parallel Workload Results • SLL TLBs with Simple Stride Prefetching

  14. SLL TLBs: Parallel Workload Results • SLL TLBs at Higher Core Counts

  15. SLL TLBs: Parallel Workload Results • Performance Analysis

  16. SLL TLBs: Multiprogrammed Workload Results • Multiprogrammed Workloads with One Application Pinned per Core

  17. SLL TLBs: Multiprogrammed Workload Results • Performance Analysis

  18. Conclusion-Benefits: • On Parallel Workloads: • Elimination of 7-79% of L1 TLBs misses exploiting parallel program inter-core sharing • Outperform conventional per-core private L2 TLBs by average of 27% • Improve CPI up to 0.25 • On multiprogrammed sequential workloads: • Improve over private L2 TLBs by average of 21% • Improve CPI up to 0.4

  19. Thank You!Questions??

More Related