1 / 27

CSCE430/830 Computer Architecture

CSCE430/830 Computer Architecture. Instruction-level parallelism: Advanced HW Approaches. Lecturer: Prof. Hong Jiang. Fall, 2006. ILP: Advanced HW Approaches.

olin
Download Presentation

CSCE430/830 Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE430/830 Computer Architecture Instruction-level parallelism: Advanced HW Approaches Lecturer: Prof. Hong Jiang Fall, 2006

  2. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: control dependences rapidly become the limiting factor as the amount of ILP to be exploited increases, which is particularly true when multiple instructions are to be issued per cycle. • Basic Branch Prediction and Branch-Prediction Buffers • A small memory indexed by the lower portion of the address of the branch instruction, containing a bit that says whether the branch was recently taken or not –simple, and useful only when the branch delay is longer than the time to calculate the target address • The prediction bit is inverted each time there is a wrong prediction – an accuracy problem (mispredict twice); a remedy: 2-bit predictor, a special case of n-bit predictor (saturating counter), which performs well (accuracy:99-82%) Taken Not taken Predict taken Predict taken 11 10 Taken Taken Not taken Not taken Predict not taken Predict not taken 00 01 Taken Not taken

  3. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The behavior of branch b3 is correlated with the behavior of branches b1 and b2 (b1 & b2 both not taken  b3 will be taken); A predictor that uses only the behavior of a single branch to predict the outcome of that branch can never capture this behavior. • Branch predictors that use the behavior of other branches to make prediction are called correlating predictors or two-level predictors.

  4. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction:

  5. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction:

  6. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction:

  7. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction:

  8. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  9. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  10. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  11. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  12. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  13. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  14. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  15. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • The standard predictor mispredicted all branches!

  16. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Correlating Branch Predictors • With the 1-bit correlation predictor, also called a (1,1) predictor, the only misprediction is on the first iteration! • In general case an (m,n) predictor uses the behavior of the last m branches to choose from 2m branch predictors, each of which is an n-bit predictor for a single branch. Lower-bits of Branch address 2-bit per-branch predictors 4 xx prediction xx • The number of bits in an (m,n) predictor is: • 2m*n *(number of prediction entries selected by the branch address) 2-bit global branch history (shift register)

  17. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Performance of Correlating Branch Predictors

  18. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Tournament Predictors: Adaptively Combining Local and Global Predictors • Takes the insight that adding global information to local predictors helps improve performance to the next level, by • Using multiple predictors, usually one based on global information and one based on local information, and • Combining them with a selector • Better accuracy at medium sizes (8K bits – 32K bits) and more effective use of very large numbers of prediction bits: the right predictor for the right branch • Existing tournament predictors use a 2-bit saturating counter per branch to choose among two different predictors: 0/0, 1/0,1/1 0/0, 0/1,1/1 The counter is incremented whenever the “predicted” predictor is correct and the other predictor is incorrect, and it is decremented in the reverse situation Use predictor 1 Use predictor 2 0/1 1/0 1/0 0/1 0/1 Use predictor 1 Use predictor 2 1/0 0/0, 1/1 0/0, 1/1 State Transition Diagram

  19. ILP: Advanced HW Approaches • Dynamic Hardware Branch Prediction: • Performance of Tournament Predictors: Prediction due to local predictor Misprediction rate of 3 different predictors

  20. Instruction-Level Parallelism • Dynamic Hardware Branch Prediction: • The Alpha 21264 Branch Predictor: • 4K 2-bit saturating counters indexed by the local branch address to choose from among: • A Global Predictor that has • 4K entries that are indexed by the history of the last 12 branches; • Each entry is a standard 2-bit predictor • A Local Predictor that consists of a two-level predictor • At the top level is a local history table consisting of 1024 10-bit entries, with each entry corresponding to the most recent 10 branch outcomes for the entry; • At the bottom level is a table of 1K entries, indexed by the 10-bit entry of the top level, consisting of 3-bit saturating counters which provide the local prediction • It uses a total of 29K bits for branch prediction, resulting in very high accuracy: 1 misprediction in 1000 for SPECfp95 and 11.5 in 1000 for SPECint95

  21. ILP: Advanced HW Approaches • High-Performance Instruction Delivery: • Branch-Target Buffers • Branch-prediction cache that stores the predicted address for the next instruction after a branch: • Predicting the next instruction address before decoding the current instruction! • Accessing the target buffer during the IF stage using the instruction address of the fetched instruction (a possible branch) to index the buffer. PC of instruction to fetch Look up Predicted PC Number of entries in branch-target buffer Branch predicted taken or untaken No: instruction is not predicted to be branch; proceed normally = Yes: then instruction is a taken branch and predicted PC should be used as the next PC

  22. Handling branch-target buffers: Integrated Instruction Fetch Units: to meet the demands of multiple-issue processors, recent designs have used an integrated instruction fetch unit that integrates several functions: Integrated branch prediction– the branch predictor becomes part of the instruction fetch unit and is constantly predicting branches, so as to drive the fetch pipeline Instruction prefetch– to deliver multiple instructions per clock, the instruction fetch unit will likely need to fetch ahead, autonomously managing the prefetching of instructions and integrating it with branch prediction Instruction memory access and buffering – encapsulates the complexity of fetching multiple instructions per clock, trying to hide the cost of crossing cache blocks, and provides buffering, acting as an on-demand unit to provide instructions to the issue stage as needed and in the quantity needed ILP: Advanced HW Approaches Send PC to memory and branch-target buffer IF Entry found in branch-target buffer? No Yes ID Send out predicted PC Is instruction a taken branch? No Yes Yes No Taken branch? Normal instruction execution (0 cycle penalty) Mispredicted branch, kill fetched instruction; restart fetch at other target; delete entry from target buffer (2 cycle penalty) Branch correctly predicted; continue execution with no stalls (0 cycle penalty) Enter branch instruction address and next PC into branch-target buffer (2 cycle penalty) EX

  23. ILP: Advanced HW Approaches • Taking Advantage of More ILP with Multiple Issue • Superscalar: issue varying numbers of instructions per cycle that are either statically scheduled (using compiler techniques, thus in-order execution) or dynamically scheduled (using techniques based on Tomasulo’s algorithm, thus out-order execution); • VLIW (very long instruction word): issue a fixed number of instructions formatted either as one large instruction or as a fixed instruction packet with the parallelism among instructions explicitly indicated by the instruction (hence, they are also known as EPIC, explicitly parallel instruction computers). VLIW and EPIC processors are inherently statically scheduled by the compiler.

  24. ILP: Advanced HW Approaches • Taking Advantage of More ILP with Multiple Issue • Multiple Instruction Issue with Dynamic Scheduling: dual-issue with Tomasulo’s

  25. ILP: Advanced HW Approaches • Taking Advantage of More ILP with Multiple Issue: resource usage

  26. ILP: Advanced HW Approaches • Taking Advantage of More ILP with Multiple Issue • Multiple Instruction Issue with Dynamic Scheduling: + an adder and a CBD

  27. ILP: Advanced HW Approaches • Taking Advantage of More ILP with Multiple Issue: more resource

More Related