inter core cooperative tlb prefetchers for chip multiprocessors
Download
Skip this Video
Download Presentation
Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

Loading in 2 Seconds...

play fullscreen
1 / 11

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors. Abhishek Bhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10 . TLB management. Hardware-managed TLB No need for expensive interrupts

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors' - kyna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
inter core cooperative tlb prefetchers for chip multiprocessors

Inter-Core Cooperative TLB Prefetchersfor Chip Multiprocessors

AbhishekBhattacharjee and Margaret Martonosi

Department of Electrical Engineering

Princeton University

ASPLOS’10

tlb management
TLB management
  • Hardware-managed TLB
    • No need for expensive interrupts
    • Pipeline remains largely unaffected
    • OS cannot employ alternate design
  • Software-managed TLB
    • Data structure design is flexible since the OS controls the page table walk
    • Miss handler is also instructions
      • It may itself miss in the inst. cache.
    • Data cache may be polluted by the page table walk
multiprocessor tlb miss
Multiprocessor TLB miss
  • CMP maintains per-core instruction and data TLBs.
  • Significant similarities

exist in TLB miss patterns

among multiple cores.

predictable tlb m iss pattern
Predictable TLB Miss Pattern
  • Inter-core Shared (ICS) TLB Misses
    • Translation accessed by a previous miss on any of the other cores with the same virtual page, physical page, context ID, and page size
    • Leader-Follower prefetching
  • Inter-core Predictable Stride (ICPS) TLB Misses
    • A stride of S if its virtual page V+S differs by S from the virtual page V of the preceding matching miss
      • Core 0 TLB Miss virtual pages : 3, 4, 6, 7
      • Core 1 TLB Miss virtual pages : 7, 8, 10, 11
        • Core distances are 1, 2, 1
      • Although the cores are missing on different virtual pages, they both have the same distance pattern in their misses
    • Distance-based cross-core prefetching
leader follower prefetching
Leader-Follower Prefetching
  • If a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually
  • Pushing virtual page entry into the followers’ TLB
  • Not directly into the TLB, but instead insert into a small separate Prefetch Buffer(PB).
    • The bad prefetch may be harmful in that it will be unused.
    • The prefetch may be harmful in that it will evict existing PB entries too early
leader follower prefetching1
Leader-Follower Prefetching
  • Case 1
    • D-TLB miss / PB hit on core 0
      • remove the entry from core 0’s PB
      • Add the entry to its TLB
  • Case 2
    • D-TLB miss / PB miss on core 1
      • Translation is located and refilled into the D-TLB
      • Prefetched(pushed) into PBs of the other cores
leader follower prefetching2
Leader-Follower Prefetching
  • Prefetch a translation into all the follower cores every time a TLB and PB miss occurs on the leader core
    • This approach may be over-aggressive
  • Confidence estimation
    • 2-bit saturating counters
      • Core 0 has counters for cores 1 to N-1
      • B-bit confidence counter is greater or equal to 2B-1, prefetch to a follower
leader follower prefetching3
Leader-Follower Prefetching
  • Case 1
    • PB hit on core 0 and insert PB entry into D-TLB
    • Identify the initiating core(core 1)
    • Increment core 1’s confidence counter corresponding to core 0
  • Case 2
    • D-TLB / PB miss on core 1
    • Check the confidence counter ≥2B-1
    • If core 1’s counter corresponding to core 0 is above this value, pushes the translation into core 0’s PB
  • Case 3
    • PB entry is evicted from core N-1 without being used.
    • Send message –bad prefetch- to the core that initiated this entry (core 1)
    • Core 1’s counter corresponding to core N-1 is decremented
distance based cross core prefetching
Distance-Based Cross-Core Prefetching
  • Although the cores are missing on different virtual pages, they can both have the same distance pattern in their misses
  • Record repetitive distance-pairs to find the next predicted distance and hence the next virtual pages.
    • Find the stride patterns
distance based cross core prefetching1
Distance-Based Cross-Core Prefetching
  • 1. PB miss : calculate the current distance (current TLB miss virtual page - last virtual page)
  • 2. Look up the distance table(DT) using the current distance & the last distance
  • 3. DT extracts predicted future distances from the stored distance-pairs
    • (1,2), (2,1)……
  • 4. the predicted distances are used to calculate the corresponding virtual pages and insert into PB
result
Result

16 entries in PB, Average 46%

ad