1 / 17

CS 7810 Lecture 8

CS 7810 Lecture 8. Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998. Lifetime of a Load. LSQ Basics. An incomplete store stalls all future loads – No Speculation – the paper is overly conservative

laken
Download Presentation

CS 7810 Lecture 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 7810 Lecture 8 Memory Dependence Prediction using Store Sets G.Z. Chrysos and J.S. Emer Proceedings of ISCA-25 1998

  2. Lifetime of a Load

  3. LSQ Basics • An incomplete store stalls all future loads – No • Speculation – the paper is overly conservative • because it also waits for store values • Most of these stalls are unnecessary – artificial • dependences

  4. Aggressive Approach • Assume that loads do not conflict with earlier • stores – all loads and stores execute out of order • -- Naive Speculation • When there is a conflict, the load behaves like a • branch mispredict – all subsequent instructions • are squashed and re-fetched • Expensive – 30-cycle penalty • Rename checkpoints for all instructions • Re-execute only the dependent instructions? – more complex, better performance

  5. Ideal Model • In the perfect model, loads only wait for conflicting • stores – no artificial dependences and no • memory-order violations

  6. False Dependences and Violations

  7. Store Sets Concept • For every load, keep track of all stores that it • has conflicted with in the past • A load does not issue if members of its store • set have not finished (dependences are introduced • at the time of dispatch) • The implementation is easy if • a load depends on only one store • a store is present in only one store set

  8. Trivial Implementations • Execution time normalized to an ideal store set • implementation

  9. Ideal Store Set Predictor • An occasional memory-order violation can • introduce many false dependencies – hence, • use saturating counters

  10. Implementation Overview • Every ld/st depends on the last store in its set • Causes serialized stores and false dependences st st st st st

  11. Store Set Implementation • Every load and store belong to one color – keep track of the • last writer for each color – mpreds can pose problems • Colors are merged as you discover m-o violations

  12. Store Set Merging • Store set merging improves performance by 12% • Note that merging happens gradually – no need to • instantly correct all entries in the table

  13. Design Details • Merging store sets • To deal with occasional dependences and conflicts • clear the table every million cycles • use saturating counters for each entry • The SSIT needs 4K entries and the LFST needs • 128 entries

  14. Results

  15. Related Work • Store barrier cache: identify stores that are likely • to pose conflicts • Keep track of all store-load conflict pairs and • associatively check for dependences while • dispatching instructions

  16. Next Week’s Paper • “Effective Hardware-Based Prefetching for • High-Performance Microprocessors”, T.F. Chen • and J.L. Baer, IEEE Transactions on Computers, • May 1995

  17. Title • Bullet

More Related