A Hybrid Adaptive Feedback Based Prefetcher

A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University

Motivation • Can’t always expect high prefetch accuracy & timeliness • Potential can be lost when these are low • Adaptive schemes adjust aggressiveness based on effectiveness • Adaption and selectiveness as important as address prediction

Our Scheme – Hybrid Adaptive Prefetcher (HAP) • Start with good address prediction – Stride / Sequential hybrid • Sequential prefetching scheme requires no warmup • Stride prefetcher is more robust • Issue prefetches selectively • Incorporate a published adaptive prefetch method • Feedback Directed Prefetching (Srinath et. al, HPCA 2007) • Improve with bandwidth adaption

Related Work – Feedback Directed Prefetching (HPCA 2007) • Prefetcher aggressiveness defined by prefetch distance and degree • Aggressiveness adjusted dynamically based on three feedback metrics • Percentage of useful prefetches • Percentage of late prefetches • Percentage of prefetches which cause demand misses (cache pollution)

Differences between FDP and our scheme • Use both L1 and L2 prefetching • Scheme is modified to support L1/L2 • Use a hybrid stride / sequential prefetching scheme • A bandwidth based feedback metric is proposed • No cache pollution metric

Stride/Sequential Prefetching Scheme – Training Stride Prefetcher • Use a PC-indexed stride prediction scheme 1. Compute new stride using this field and current address value Stride Prediction Table Entry 2. Store computed stride 3. Increment count for unchanged stride Reset otherwise • Entry is trained if Count is above a threshold value

Stride/Sequential Prefetching Scheme – Issuing Prefetches • Check stride table on demand miss / hit to prefetched line • Issue stride prefetches based on degree and distance • Sequential prefetches • If no valid / trained stride entry • If previous line present in cache • Issue sequential prefetches based on degree

Adjusting Aggressiveness with Feedback Metrics • Prefetch Accuracy – Percentage of prefetches used by a demand request • Prefetch Lateness – Percentage of accurate prefetches which are late • Bandwidth Contention – Percentage of clock cycles during which cache bandwidth is above a threshold • Evaluate separately for L1 and L2 • Evaluate periodically after fixed number of cycles. Adjust aggressiveness if justified.

Storage efficient Miss Status Hit Registers (MSHRs) • Used to track all inflight / inqueue memory requests at both cache levels • Entry allocated for each • outstanding L1 and / or • L2 request. Valid bit set. MSHR Entry 2. Two bit cache level field indicates L1 only, L2 only or combined L1 / L2 3. Two prefetch bits indicate prefetch requests 4. Concurrent L1 and L2 requests to the same line share the same MSHR entry

ImplementingFeedback Metrics • Prefetch Accuracy • Prefetch bit set for prefetched line brought into cache • Bit set in MSHR for inflight / inqueue prefetched lines • Increment accurate count if demand request finds a set bit • Reset bit after increment • Accuracy is based on percentage of total prefetches issued

ImplementingFeedback Metrics • Prefetch Lateness • Prefetch bit (s) set in MSHR for a prefetched inflight / inqueue line • On demand miss, late prefetch detected • If a valid MSHR entry exists for this miss • If prefetch bit for the correct cache level is set • Reset bit after incrementing late count • Lateness is based on percentage of useful prefetches

Implementing Feedback Metrics • Bandwidth Contention - 1 • Use MSHR to monitor total outstanding L1 and L2 requests in every cycle • Increment counter for every cycle that total is above threshold • The contention rate is based on percentage of total cycles • Bandwidth Contention - 2 • Prefetches not issued if outstanding requests are above threshold

Adjusting Aggressiveness • Evaluate metrics at fixed intervals • Determine if high or low based on a threshold • May adjust aggressiveness based on following criteria Aggressiveness Policy

Prefetcher Aggressiveness Levels • Aggressiveness adjusted in increments of one Very Conservative Middle Aggressiveness Very Aggressive Prefetcher Aggressiveness Levels

Experimental Evaluation - Setup • Evaluate 15 SPEC CPU 2006 Benchmarks using CMPSim Simulator • Evaluate for three competition configurations • Config 1 – 2048 KB L2 Cache, unlimited bandwidth • Config 2 – 2048 KB L2 Cache, limited bandwidth • Config 3 – 512 KB L2 Cache, limited bandwidth • Limited bandwidth configs allow one L1 issue per cycle and one L2 per 10 cycles

Experimental Evaluation - Setup • Compare our scheme, Hybrid Adaptive Predictor (HAP) to four configurations • No prefetching • Middle Aggressive Stride • Very Aggressive Stride • Modified Feedback Directed Prefetcher • Uses both L1 / L2 prefetching • Does not use a cache pollution metric

Results - Expectations • Very aggressive stride will do better on some, worse on other benchmarks • Adaptive schemes will perform at least as well as non-adaptive • Unlimited bandwidth and large cache configurations benefit aggressive schemes

Results – Bandwidth Unlimited, 2 MB L2 Config • HAP outperforms other prefetchers for all benchmarks except lbm • Performance benefit compared to mid-aggressive stride is 11% • on average and 46% versus no prefetching.

Results – Bandwidth Limited, 2 MB L2 Config • HAP is best on average. Aggressive stride performs best in three • benchmarks (mcf, lbm and soplex) • Performance benefit compared to mid-aggressive stride is 9% • on average and 45% versus no prefetching.

Results – Bandwidth Limited, 512 KB L2 Config • Results are similar to Config 2 • Performance benefit compared to mid-aggressive stride is 8% • on average and 44% versus no prefetching.

Results (All benchmarks) – Bandwidth Limited, 2 MB L2 Config • Additional benchmarks are mostly unaffected by prefetching • Performance benefit compared to mid-aggressive stride is 6% • on average and 29% versus no prefetching for all benchmarks.

Conclusions • A well designed and adaptive prefetching scheme is very effective • Very aggressive stride works best for some benchmarks • A cache pollution metric may improve results

THANK YOU QUESTIONS?

A Hybrid Adaptive Feedback Based Prefetcher

A Hybrid Adaptive Feedback Based Prefetcher

Presentation Transcript

FARM: A Feedback based Adaptive Resource Management for Autonomous Hot-Spot Convergence System

Hybrid A

Fast Adaptive Hybrid Mesh Generation Based on Quad-tree Decomposition

Section Based Relevance Feedback

Real-Time Hybrid Simulation with Model-Based Multi-Metric Feedback

Adaptive Web-based Learning

A modular positive feedback-based gene amplifier

A hybrid SVM based decision tree

A Hybrid Edge-Enhanced Motion Adaptive Deinterlacer

A Hybrid TOA/RSS Based Location Estimation

A Hybrid SVD and Wavelet based Watermarking

Adaptive Hybrid Mesh Refinement for Multiphysics Applications

Organic-based Hybrid Nanostructures

Practice Based Assessor Feedback

Adaptive Topology Discovery in Hybrid Wireless Networks

Implicit Adaptive Volume Ray Casting – A Hybrid Approach

Feedback Control for Adaptive Live Video Streaming

A Hybrid TOA/RSS Based Location Estimation

Graph-based Adaptive Diagnosis

A Hybrid TOA/RSS Based Location Estimation