1 / 11

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors. Abhishek Bhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10 . TLB management. Hardware-managed TLB No need for expensive interrupts

kyna
Download Presentation

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inter-Core Cooperative TLB Prefetchersfor Chip Multiprocessors AbhishekBhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10

  2. TLB management • Hardware-managed TLB • No need for expensive interrupts • Pipeline remains largely unaffected • OS cannot employ alternate design • Software-managed TLB • Data structure design is flexible since the OS controls the page table walk • Miss handler is also instructions • It may itself miss in the inst. cache. • Data cache may be polluted by the page table walk

  3. Multiprocessor TLB miss • CMP maintains per-core instruction and data TLBs. • Significant similarities exist in TLB miss patterns among multiple cores.

  4. Predictable TLB Miss Pattern • Inter-core Shared (ICS) TLB Misses • Translation accessed by a previous miss on any of the other cores with the same virtual page, physical page, context ID, and page size • Leader-Follower prefetching • Inter-core Predictable Stride (ICPS) TLB Misses • A stride of S if its virtual page V+S differs by S from the virtual page V of the preceding matching miss • Core 0 TLB Miss virtual pages : 3, 4, 6, 7 • Core 1 TLB Miss virtual pages : 7, 8, 10, 11 • Core distances are 1, 2, 1 • Although the cores are missing on different virtual pages, they both have the same distance pattern in their misses • Distance-based cross-core prefetching

  5. Leader-Follower Prefetching • If a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually • Pushing virtual page entry into the followers’ TLB • Not directly into the TLB, but instead insert into a small separate Prefetch Buffer(PB). • The bad prefetch may be harmful in that it will be unused. • The prefetch may be harmful in that it will evict existing PB entries too early

  6. Leader-Follower Prefetching • Case 1 • D-TLB miss / PB hit on core 0 • remove the entry from core 0’s PB • Add the entry to its TLB • Case 2 • D-TLB miss / PB miss on core 1 • Translation is located and refilled into the D-TLB • Prefetched(pushed) into PBs of the other cores

  7. Leader-Follower Prefetching • Prefetch a translation into all the follower cores every time a TLB and PB miss occurs on the leader core • This approach may be over-aggressive • Confidence estimation • 2-bit saturating counters • Core 0 has counters for cores 1 to N-1 • B-bit confidence counter is greater or equal to 2B-1, prefetch to a follower

  8. Leader-Follower Prefetching • Case 1 • PB hit on core 0 and insert PB entry into D-TLB • Identify the initiating core(core 1) • Increment core 1’s confidence counter corresponding to core 0 • Case 2 • D-TLB / PB miss on core 1 • Check the confidence counter ≥2B-1 • If core 1’s counter corresponding to core 0 is above this value, pushes the translation into core 0’s PB • Case 3 • PB entry is evicted from core N-1 without being used. • Send message –bad prefetch- to the core that initiated this entry (core 1) • Core 1’s counter corresponding to core N-1 is decremented

  9. Distance-Based Cross-Core Prefetching • Although the cores are missing on different virtual pages, they can both have the same distance pattern in their misses • Record repetitive distance-pairs to find the next predicted distance and hence the next virtual pages. • Find the stride patterns

  10. Distance-Based Cross-Core Prefetching • 1. PB miss : calculate the current distance (current TLB miss virtual page - last virtual page) • 2. Look up the distance table(DT) using the current distance & the last distance • 3. DT extracts predicted future distances from the stored distance-pairs • (1,2), (2,1)…… • 4. the predicted distances are used to calculate the corresponding virtual pages and insert into PB

  11. Result 16 entries in PB, Average 46%

More Related