1 / 21

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors. Onur Mutlu, The University of Texas at Austin Jared Start, Microprocessor Research, Intel Labs Chris Wilkerson, Desktop Platforms Group, Intel Corp Yale N. Patt, The University of Texas at Austin

ginata
Download Presentation

Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Runahead Execution:An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start, Microprocessor Research, Intel Labs Chris Wilkerson, Desktop Platforms Group, Intel Corp Yale N. Patt, The University of Texas at Austin Presented by: Mark Teper

  2. Outline • The Problem • Related Work • The Idea: Runahead Execution • Details • Results • Issues

  3. Brief Overview • Instruction Window: • Set of in-order instructions that have not yet been commited • Scheduling Window • Set of unexecuted instructions needed to selected for execution What can go wrong? Program Flow Instruction Window … Scheduling Windows Execution Units

  4. The Problem Instruction Window … Program Flow Unexecuted Instruction Executing Instruction Long Running Instruction Commited Instruction

  5. Better Filling the Instruction Window IPC

  6. Related Work • Caches: • Alter size and structure of caches • Attempt to reduce unnecessary memory reads • Prefetching: • Attempt to fetch data into nearby cache before needed • Hardware & software techniques • Other techniques: • Waiting instruction buffer (WIB) • Long-latency block retirements

  7. RunAhead Execution • Continue executing instructions during long stalls • Disregard results once data is available Instruction Window … Program Flow Checkpoint Unexecuted Instruction Executing Instruction Long Running Instruction Commited Instruction

  8. Benefits • Acts as a high accuracy prefetcher • Software prefetchers have less information • Hardware prefetchers can’t analyze code as well • Biase predictors • Makes use of cycles that are otherwise wasted

  9. Entering RunAhead • Processors can enter run-ahead mode at any point • L2 Cache Misses used in paper • Architecture needs to be able to checkpoint and restore register state • Including branch-history register and return address stack

  10. Handling Avoided Read • Run Ahead trigger returns immediately • Value is marked as INV • Processor continues fetching and executing instructions ld r1, [r2] Add r3, r2, r2 Add r3, r1, r2 move r1, 0 R1 R2 R3

  11. Executing Instruction in RunAhead • Instructions are fetched and executed as normal • Instructions are committed retired out of the instruction window in program order • If the instructions registers are INV it can be retired without executing • No data is ever observable outside the CPU

  12. Branches during RunAhead • Divergence Points: Incorrect INV value branch prediction

  13. Exiting RunAhead • Occurs when stalling memory access finally returns • Checkpointed architecture is restored • All instructions in the machine are flushed • Processor starts fetching again at instruction which caused RunAhead execution • Paper presented optimization where fetching started slightly before stalled instruction returned

  14. Biasing Branch Predictors • RunAhead can cause branch predictors to be biased twice on the same branch • Several Alternatives: • Always train branch predictors • Never train branch predictors • Create list of predicted branches • Create separate Branch Predictor

  15. RunAhead Cache • RunAhead execution disregards stores • Can’t produce externally observable results • However, this data is needed for communication • Solution: Run-Ahead cache Loop: … store r1, [r2] add r1, r3, r1 store r1, [r4] load r1, [r2] bne r1, r5, Loop

  16. Loads If address is INV data is automatically INV Next look in: Store buffer RunAhead Cache Finally go to memory In in cache treat as valid If not treat as INV, don’t stall Stores Use store-buffer as usual On Commit: If address is INV ignore Otherwise write data to RunAhead Cache Stores and Loads in Run Ahead

  17. Better Run-Ahead Cache Results • Found that not passing data from stores to loads resulted in poor performance • Significant number of INV results

  18. Details: Architecture

  19. Better Results

  20. Better Results (2)

  21. Issues • Some wrong assumptions about future machines • Future baseline corresponds poorly to modern architectures • Not a lot of details of architectural requirement for this technique • Increase architecture size • Increase power-requirements

More Related