1 / 17

DATA ADDRESS PREDICTION

DATA ADDRESS PREDICTION. Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003. Motivation. Large and increasing gap between CPU and memory speeds Miss penalty on today’s processors over 600 cycles Load latency is bottleneck on performance Solution: Prefetch

emery
Download Presentation

DATA ADDRESS PREDICTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATA ADDRESS PREDICTION Zohair Hyder Armando Solar-Lezama CS252 – Fall 2003

  2. Motivation • Large and increasing gap between CPU and memory speeds • Miss penalty on today’s processors over 600 cycles • Load latency is bottleneck on performance Solution: Prefetch • Static: Compilers may insert prefetch instructions. Limited because of lack of run-time information • Dynamic: High adaptability

  3. Metrics • Coverage: Fraction of DL1 misses that hit in prefetch buffer • Higher implies lower load latency • Accuracy: Fraction of prefetches that are actually used by CPU • Higher implies less memory bandwidth needed • Tradeoffs between coverage and accuracy • For given memory bandwidth, coverage is probably more important

  4. Architecture • Prefetch buffer acts as Level 1.5 cache • Hit time of prefetch buffer is same as DL1 because of small size and same associativity • Demand fetches always get priority over prefetches • Predictor uses DL1 miss information to determine prefetches

  5. Previous Approaches • Stream buffers • Introduced by Jouppi in 1990 • Kessler and Palacharla augmented them in 1994 to allow filtering and prefetching for non-unit strides • Reference Prediction Table • Introduced by Baer and Chen in 1992 to detect arbitrary strides • Markov Predictor: • Introduced by Joseph and Grunwald in 1999

  6. Reference Prediction Table • RPT indexed by PC of load instruction • RPT holds last effective address, and offset with second to last effective address • If current effective address results in same offset, then prefetch

  7. Markov Predictor • Index by current address: table holds 4 possible next addresses • Issue all 4 into prefetch request queue • If queue is full, replace an element with lower priority • LRU prioritization: more recently used has higher priority

  8. Strides • Consider the following statements in a loop: n += k; u += x[n]; v += y[n]; where k is larger than the block size. The miss address stream will be: A, B, A+k, B+k, A+2k, B+2k • Stream buffers perform poorly in interleaved access streams • RPT works great. • Markov predictor is incapable of detecting ANY strides.

  9. Our Contributions • Difference markov predictor: • Use similar markov implementation • Predict differences rather than addresses • Input to predictor is current difference, output is predicted difference • Bayes predictor: • Use 3 inputs: current difference, current PC, and current address • Output is predicted difference

  10. Difference Markov Predictor • Use difference coding • Index by current difference = current address – last address • Predict next difference

  11. Difference Markov - Advantages • Works well with small table size • Detects strides, even in interleaved access streams • More compact than RPT, e.g. stride of 1 needs a single entry • Performs especially well on floating point applications that are stride-intensive • The Joseph-Grunwald markov predictor is incapable of predicting any address it has not yet seen • Performs only slightly worse than Joseph-Grunwald markov on integer applications: difference correlation information can contain address correlation information too

  12. Bayes Predictor • Predicts based on current PC, current address and current difference • Use Naïve Bayes method to combine information from all 3 • Predict next difference

  13. Bayes Predictor - Details • Idea: • For every possible Δn+1, calculate P(Δn+1 | Δn,PC, Addr) • Predict the Δn+1 with highest probability • If missing data, use the conditional probabilities given the data we have. • Implementation • Assume Independence! P(Δn+1 | Δn,PC, Addr)=P(Δn | Δn+1)*P(PC| Δn+1)*P(Addr | Δn+1)*P(Δn+1) P(Δn , PC, Addr) • Keep a limited number of the Ps in a table. • Integer representation

  14. Bayes - Advantages • Works well for small table size • Performs well on both Floating Point and Integer applications • Detects most forms of regularity that we have observed in applications • Has good accuracy across applications

  15. Performance For SPEC2000

  16. Performance With Table Size

  17. Conclusion • Both our predictors have high coverage: for most applications higher than any other predictor • Bayes predictor generally has best accuracy across applications • Difference markov has fairly good accuracy too • Difference markov predictor has great performance even with small tables, and requires very simple hardware • Bayes predictor needs more complex hardware

More Related