Inter core cooperative tlb prefetchers for chip multiprocessors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 11

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors. Abhishek Bhattacharjee and Margaret Martonosi Department of Electrical Engineering Princeton University ASPLOS’10 . TLB management. Hardware-managed TLB No need for expensive interrupts

Download Presentation

Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Inter core cooperative tlb prefetchers for chip multiprocessors

Inter-Core Cooperative TLB Prefetchersfor Chip Multiprocessors

AbhishekBhattacharjee and Margaret Martonosi

Department of Electrical Engineering

Princeton University

ASPLOS’10


Tlb management

TLB management

  • Hardware-managed TLB

    • No need for expensive interrupts

    • Pipeline remains largely unaffected

    • OS cannot employ alternate design

  • Software-managed TLB

    • Data structure design is flexible since the OS controls the page table walk

    • Miss handler is also instructions

      • It may itself miss in the inst. cache.

    • Data cache may be polluted by the page table walk


Multiprocessor tlb miss

Multiprocessor TLB miss

  • CMP maintains per-core instruction and data TLBs.

  • Significant similarities

    exist in TLB miss patterns

    among multiple cores.


Predictable tlb m iss pattern

Predictable TLB Miss Pattern

  • Inter-core Shared (ICS) TLB Misses

    • Translation accessed by a previous miss on any of the other cores with the same virtual page, physical page, context ID, and page size

    • Leader-Follower prefetching

  • Inter-core Predictable Stride (ICPS) TLB Misses

    • A stride of S if its virtual page V+S differs by S from the virtual page V of the preceding matching miss

      • Core 0 TLB Miss virtual pages : 3, 4, 6, 7

      • Core 1 TLB Miss virtual pages : 7, 8, 10, 11

        • Core distances are 1, 2, 1

      • Although the cores are missing on different virtual pages, they both have the same distance pattern in their misses

    • Distance-based cross-core prefetching


Leader follower prefetching

Leader-Follower Prefetching

  • If a core (the leader) TLB misses on a particular virtual page entry, other cores (the followers) will also typically TLB miss on the same virtual page eventually

  • Pushing virtual page entry into the followers’ TLB

  • Not directly into the TLB, but instead insert into a small separate Prefetch Buffer(PB).

    • The bad prefetch may be harmful in that it will be unused.

    • The prefetch may be harmful in that it will evict existing PB entries too early


Leader follower prefetching1

Leader-Follower Prefetching

  • Case 1

    • D-TLB miss / PB hit on core 0

      • remove the entry from core 0’s PB

      • Add the entry to its TLB

  • Case 2

    • D-TLB miss / PB miss on core 1

      • Translation is located and refilled into the D-TLB

      • Prefetched(pushed) into PBs of the other cores


Leader follower prefetching2

Leader-Follower Prefetching

  • Prefetch a translation into all the follower cores every time a TLB and PB miss occurs on the leader core

    • This approach may be over-aggressive

  • Confidence estimation

    • 2-bit saturating counters

      • Core 0 has counters for cores 1 to N-1

      • B-bit confidence counter is greater or equal to 2B-1, prefetch to a follower


Leader follower prefetching3

Leader-Follower Prefetching

  • Case 1

    • PB hit on core 0 and insert PB entry into D-TLB

    • Identify the initiating core(core 1)

    • Increment core 1’s confidence counter corresponding to core 0

  • Case 2

    • D-TLB / PB miss on core 1

    • Check the confidence counter ≥2B-1

    • If core 1’s counter corresponding to core 0 is above this value, pushes the translation into core 0’s PB

  • Case 3

    • PB entry is evicted from core N-1 without being used.

    • Send message –bad prefetch- to the core that initiated this entry (core 1)

    • Core 1’s counter corresponding to core N-1 is decremented


Distance based cross core prefetching

Distance-Based Cross-Core Prefetching

  • Although the cores are missing on different virtual pages, they can both have the same distance pattern in their misses

  • Record repetitive distance-pairs to find the next predicted distance and hence the next virtual pages.

    • Find the stride patterns


Distance based cross core prefetching1

Distance-Based Cross-Core Prefetching

  • 1. PB miss : calculate the current distance (current TLB miss virtual page - last virtual page)

  • 2. Look up the distance table(DT) using the current distance & the last distance

  • 3. DT extracts predicted future distances from the stored distance-pairs

    • (1,2), (2,1)……

  • 4. the predicted distances are used to calculate the corresponding virtual pages and insert into PB


Result

Result

16 entries in PB, Average 46%


  • Login