1 / 16

CPE 335 Computer Organization Basic MIPS Pipelining – Part III

CPE 335 Computer Organization Basic MIPS Pipelining – Part III. Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html. DM. DM. DM. Reg. Reg. Reg. Reg. Reg. Reg. IM. IM. IM. IM. ALU. ALU. ALU. ALU. beq. DM. Reg.

ozzy
Download Presentation

CPE 335 Computer Organization Basic MIPS Pipelining – Part III

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE 335 Computer Organization Basic MIPS Pipelining – Part III Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

  2. DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU beq DM Reg Reg Branch Instructions Cause Control Hazards • The address of the instruction to be fetched after the beq instruction is known in the MEM stage • Dependencies backward in time cause hazards I n s t r. O r d e r lw Inst 3 Inst 4

  3. DM DM Reg Reg Reg Reg IM IM IM ALU ALU ALU stall stall stall lw DM Reg Inst 3 Fixing Control Hazard by Stalls • Delay fetching the next instruction 3 cycles • IF (ID/EX.Branch) Stall the pipeline for the next 3 cycles • Is it actually a stall ?!!! beq I n s t r. O r d e r

  4. Control Hazards Flush theseinstructions (Set controlvalues to 0)

  5. DM DM Reg Reg Reg Reg IM IM ALU ALU stall lw Reducing the Cost of Control Hazard • Approach I: Modify the ID Stage • Compute the branch address in the ID stage • Compare the two register in the ID stage using additional hardware. • This reduces the stalls to one • Any complications ? beq

  6. Reducing the Cost of Control Hazard • Approach I – continued • IF (ID/EX.Branch) then Flush IF/ID register

  7. Reducing the Cost of Control Hazard • Approach II (Static Branch Prediction) • Assume the branch is not taken always and fetch the next sequential instruction • If the assumption is true, no additional cost is associated with the branch • If the assumption is false, we have to ignore the fetched instruction and fetch the instruction at the branch address • IF (ID/EX.Branch) and ID/EX.ZEROFlush IF/ID register • - Unlike Approach I, Flushing here is conditional !

  8. Reducing the Cost of Control Hazard • Approach III (Dynamic Branch Prediction) • Use a history table or branch prediction buffer to store the branch prediction based on last branch resutl • The table is addressable by the lower bits of the branch instruction address. • If the branch is predicted untaken: • Fetch the next sequential instruction. • Later, if it comes out the branch is taken, then flush the pipeline, i.e. one cycle is lost. • If the branch is predicted taken: • we still have to wait for the computation of the branch address, so we have to wait one cycle • use branch target buffer to store the branch address of this instruction

  9. Reducing the Cost of Control Hazard • Approach III (Dynamic Branch Prediction) • 1-bit dynamic branch predictor • Use one bit to store the prediction • Update prediction • Performance shortcoming ?! • Consider branching in loops ! We may miss predict twice.

  10. Reducing the Cost of Control Hazard • Approach III (Dynamic Branch Prediction) • 2-bit dynamic branch predictor • The prediction should be wrong twice before it is changed. Strong Weak Strong Weak

  11. Example • Consider a certain program that have a conditional branch instruction whose outcome is given below when the program is executed. • T-T-N-T-T-N-T • List predictions for the following branch prediction schemes and find the prediction accuracy. • Predict always taken • Predict always untaken • 1-bit predictor, initialized to predict taken • 2-bit predictor, initialized to weakly predict taken

  12. Example • Actual branch actions : T-T-N-T-T-N-T • Predict as always taken • Predictions : T-T-T-T-T-T-T • Accuracy = 5/7 = 71% • Predict as always untaken • Predictions : N-N-N-N-N-N-N • Accuracy = 2/7 = 29% • 1-bit predictor initialized to predict taken • Predictions: T-T-T-N-T-T-T-N • Accuracy = 3/7 = 43% • 2-bit predictor initialized to weakly predict taken • Predictions: T-T-T-T-T-T-T • Accuracy = 5/7 = 71%

  13. Example • Let’s compare the performance of single-cycle, multi-cycle, and pipeline implementation of MIPS processor given the operation times and instruction mix below. Assume that: • Branch decision is done in the MEM cycle. Branch handling in the pipeline implementation is done by stalling the pipeline. • Half of the load instructions incur load-use hazard. Forwarding is implemented.

  14. Example • Clock cycle time • Single-cycle = 200 + 50 + 100 + 50 + 200 = 600 ps • Multi-cycle = 200 ps • Pipeline = 200 ps • CPI • Single-cycle = 1 • Multi-cycle = 5x 0.25 + 4x0.52 + 4x0.10 + 3x0.11 + 3x0.02 • =4.12 • Pipeline = 0.125x2 + 0.125x1 + 0.52x1 + 0.1x1 + 0.11x4 + • 0.02 x2 = 1.585 • Execution time per instruction • Single-cycle = 600 ps • Multi-cycle = 4.12 x 200 ps = 824 ps • Pipeline = 1.585 x 200 = 317 ps

  15. Exercise • Redo the computations in the previous example by assuming that branch prediction is used in the pipelined implementation and one-quarter of the branches are miss predicted !

  16. Summary • All modern day processors use pipelining • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Potential speedup: a CPI of 1 and fast a CC • Pipeline rate limited by slowest pipeline stage • Unbalanced pipe stages makes for inefficiencies • The time to “fill” pipeline and time to “drain” it can impact speedup for deep pipelines and short code runs • Must detect and resolve hazards • Stalling negatively affects CPI (makes CPI greater than the ideal of 1)

More Related