1 / 43

Chapter 6 Pipelining to Increase Effective Computer Speed

This chapter provides an overview of pipelining components and datapaths in order to increase the effective speed of a computer. It discusses the stages of pipelining, single cycle and multiple cycle implementations, and the obstacles and hazards involved. It also covers data and control paths in a MIPS pipeline.

irvingy
Download Presentation

Chapter 6 Pipelining to Increase Effective Computer Speed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6Pipeliningto Increase Effective Computer Speed

  2. Overview of Components and Datapaths Fetch  Decode  Execute Memory Access  Write Back

  3. MIPS Instruction Times “Stages” Require varying, but “same” magnitude of times Usually memory access requires the most time.

  4. Start the next instruction before the current one has completed Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Pipeline Implementation: IFetch Dec Exec Mem WB lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type Multiple Cycle Implementation:

  5. Datapath Modifications for MIPS Pipelining What do we need to add/modify in our MIPS datapath? Add State registers between each pipeline stage to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Add Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 IFetch/Dec Read Address PC Read Data Dec/Exec Address Exec/Mem Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Sign Extend 16 32 System Clock

  6. Graphical View of Pipelining DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Time to “drain” the pipeline Time to “fill” the pipeline Time (clock cycles) Inst 0 I n s t r. O r d e r Inst 1 Inst 2 Inst 3 Inst 4 Once the pipeline is full, if one instruction is completed every cycle, the CPI approches 1

  7. Pipelining Obstacles Pipeline Hazards structural hazards: attempt to use the same resource by two different instructions at the same time data hazards: attempt to use data before it is ready An instruction’s source operand(s) are produced by a prior instruction still in the pipeline control hazards: attempt to make a decision about program control flow before the condition has been evaluated and the new PC target address calculated branch instructions • Can always resolve hazards by waiting • pipeline control must detect the hazard • and take action to resolve hazards

  8. A Single Memory Would Be a Structural Hazard Writing into memory Reading data from memory Mem Mem Mem Mem Mem Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Mem Mem Mem Mem Mem ALU ALU ALU ALU ALU Reading instruction from memory Time (clock cycles) lw I n s t r. O r d e r sw Mem Inst 2 Inst 3 Inst 4 Mem • Fix by using Dual Port Memory

  9. How About Register File Access? DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU clock edge that controls loading of pipeline state registers clock edge that controls register writing Time (clock cycles) add $1, I n s t r. O r d e r Inst 1 Inst 2 add $2,$1, Fix register file access hazards: writes in the first half of the cycle reads in the second half

  10. Register Usage Can Cause Data Hazards Dependencies backward in time cause data hazards DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU add $1, sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 • Read before write data hazard

  11. Fix a Register Use Data Hazard by stalling DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg stall IM IM IM IM IM ALU ALU ALU ALU ALU stall sub $4,$1,$5 and $6,$1,$7 add $1, I n s t r. O r d e r Can fix data hazard by waiting –stall– but impacts CPI

  12. Fix a Register use Data Hazard by “Forwarding data” DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU add $1, I n s t r. O r d e r sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 Fix data hazards by forwarding results as soon as they are available to where they are needed

  13. Load-use Can Cause Data Hazards Dependencies backward in time cause datahazards DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU lw $1,4($2) I n s t r. O r d e r sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 • Load-use data hazard

  14. Forwarding for Load-use Data Hazards Will still need one stall cycle even with forwarding DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU lw $1,4($2) I n s t r. O r d e r sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5

  15. Summary – Data hazards • data hazards occur do to attempts to use data before it is ready • Can by handled by inserting stalls (or inserting noops) • Can be handled by forwarding data to the earliest point it is available • Load followed by use still requires one stall

  16. MIPS Pipeline Data and Control Paths PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 RegWrite Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 MemtoReg Read Address ALUSrc PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl MemWrite MemRead Sign Extend 16 32 ALUOp RegDst

  17. Summary – Structural Hazards • Structural hazards occur due attempts to use the same resource by two different instructions at the same time • Memory - Use Dual Port memory • Registers – Write in first half of cycle and read in second half • Control Signals – add bits to the pipeline registers to carry control signals generated in the DI stage to subsequent stages • Writeback - Add bits to the pipeline registers to carry the destination address for the writeback stage

  18. Example of a Control Hazard

  19. Control Hazards • When the flow of instruction addresses is not sequential (i.e., PC = PC + 4); incurred by change of flow instructions • Conditional branches (beq, bne) • Unconditional branches (j, jal, jr) • Exceptions • Possible approaches • Stall (impacts CPI) • Move decision point as early in the pipeline as possible, thereby reducing the number of stall cycles • Delay decision (requires compiler support) • Predict and hope for the best ! • Control hazards occur less frequently than data hazards, but there is nothing as effective against control hazards as forwarding is for data hazards

  20. Branch Instruction Control hazards DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU beq DM Reg Reg I n s t r. O r d e r lw Inst 3 Inst 4

  21. Branch Instruction Control Hazard - stall DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM IM ALU ALU ALU ALU ALU ALU stall stall stall lw DM Reg Inst 3 beq I n s t r. O r d e r Fix branch hazard by waiting–stall– but again affects CPI

  22. Example of a Control Hazard

  23. DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU flush Jump Control hazards Require One Stall • Jumps not decoded until ID, so one stall is needed j I n s t r. O r d e r j target Fix jump hazard by waiting –stall– but affects CPI

  24. Moving Branch Decisions Earlier in Pipe (Control hazard) • Move the branch decision hardware back to the EX stage to reduce stalls • Add hardware to compute the branch target address and evaluate the branch decision to the ID stage

  25. 0 1 0 IF.Flush 0 Supporting ID Stage Branches – Leave this for you  Branch PCSrc Hazard Unit ID/EX EX/MEM Control IF/ID Add MEM/WB 4 Shift left 2 Add Compare Read Addr 1 Instruction Memory Data Memory RegFile Read Addr 2 Read Address Read Data 1 PC Read Data Write Addr ALU Address ReadData 2 Write Data Write Data ALU cntrl 16 Sign Extend 32 Forward Unit  Forward Unit

  26. becomes becomes becomes if $2=0 then add $1,$2,$3 if $1=0 then add $1,$2,$3 sub $4,$5,$6 add $1,$2,$3 if $1=0 then sub $4,$5,$6 Delayed decision - Scheduling Branch Delay Slots A. From before branch B. From branch target C. From fall through • A is the best choice, fills delay slot and reduces IC • In B and C, the sub instruction may need to be copied, increasing IC • In B and C, must be okay to execute sub when branch fails add $1,$2,$3 if $1=0 then add $1,$2,$3 if $2=0 then sub $4,$5,$6 delay slot delay slot add $1,$2,$3 if $1=0 then sub $4,$5,$6 delay slot

  27. Prediciton - Static Branch Prediction • Resolve branch hazards by assuming a given outcome and proceeding without waiting to see the actual branch outcome • Predict not taken – always predict branches will not be taken, continue to fetch from the sequential instruction stream, only when branch is taken does the pipeline stall Restart the pipeline at the branch destination

  28. I n s t r. O r d e r DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg flush IM IM IM IM ALU ALU ALU ALU 16 and $6,$1,$7 20 or r8,$1,$9 Flushing with Misprediction (Not Taken) 4 beq $1,$2,2 8 sub $4,$1,$5

  29. Prediction – Not taken: Branching Structures • Predict not taken works well for “top of the loop” branching structures Loop: beq $1,$2,Out 1nd loop instr . . . last loop instr j Loop Out: fall out instr • But such loops have jumps at the bottom of the loop to return to the top of the loop – and incur the jump stall overhead • Predict not taken doesn’t work well for “bottom of the loop” branching structures Loop: 1st loop instr 2nd loop instr . . . last loop instr bne $1,$2,Loop fall out instr

  30. Prediction - Static Branch Prediction • Resolve branch hazards by assuming a given outcome and proceeding • Predict taken – predict branches will always be taken • Predict taken always incurs one stall cycle in a loop • Doesn’t support forward branch likelyhood (often taken by exception) • Prediction taken or not taken - most often best judged by the compiler • Dynamic branch prediction – predict branches at run-time using run-time information • Branch prediction bit(s) (1 bit and 2 bit are classic) • Branch prediction table(s) and buffer(s)

  31. Dynamic Prediction - 1-bit Prediction Accuracy • A 1-bit predictor will be incorrect twice when not taken • Assume predict_bit = 0 to start (indicating branch not taken) and loop control is at the bottom of the loop code • First time through the loop, the predictor mispredicts the branch since the branch is taken back to the top of the loop; invert prediction bit (predict_bit = 1) • As long as branch is taken (looping), prediction is correct • Exiting the loop, the predictor again mispredicts the branch since this time the branch is not taken falling out of the loop; invert prediction bit (predict_bit = 0) Loop: 1st loop instr 2nd loop instr . . . last loop instr bne $1,$2,Loop fall out instr • For 10 times through the loop we have a 80% prediction accuracy for a branch that is taken 90% of the time

  32. Dynamic Prediction - 2-bit Predictors • A 2-bit scheme can give 90% accuracy since a prediction must be wrong twice before the prediction bit is changed Loop: 1st loop instr 2nd loop instr . . . last loop instr bne $1,$2,Loop fall out instr Taken Not taken Predict Taken Predict Taken Taken Not taken Taken Not taken Predict Not Taken Predict Not Taken Taken Not taken

  33. Dynamic Prediction - Branch Tables / Buffers • A Branch Hit Table lists the history of branches at this location and can give good prediction reliability • Branch Target Buffer can provide early branch address calculation

  34. Dealing with Exceptions • Exceptions (aka interrupts) are just another form of control hazard. Exceptions arise from • R-type arithmetic overflow • Trying to execute an undefined instruction • An I/O device request • An OS service request (e.g., a page fault, TLB exception) • A hardware malfunction • The pipeline has to stop executing the offending instruction in midstream, let all prior instructions complete, flush all following instructions, set a register to show the cause of the exception, save the address of the offending instruction, and then jump to a prearranged address (the address of the exception handler code) • The software (OS) looks at the cause of the exception and “deals” with it

  35. Two Types of Exceptions • Interrupts – asynchronous to program execution • caused byexternal events • may be handled between instructions, so can let the instructions currently active in the pipeline complete before passing control to the OS interrupt handler • simply suspend and resume user program • Traps on Exception – synchronous to program execution • caused byinternal events • condition must be remedied by the trap handler for that instruction, so much stop the offending instruction midstream in the pipeline and pass control to the OS trap handler • the offending instruction may be retried (or simulated by the OS) and the program may continue or it may be aborted

  36. DM Reg Reg IM ALU Where in the Pipeline Exceptions Occur • Arithmetic overflow • Undefined instruction • I/O service request • Hardware malfunction Stage(s)? Synchronous? EX ID any any yes yes no no Be aware that multiple exceptions can occur simultaneously in a single clock cycle

  37. Additions to MIPS to Handle Exceptions • Cause register (records exceptions) – hardware to record in Cause the exceptions and a signal to control writes to it • EPC (Exception Program Counter) register (records the addresses of the offending instructions) – hardware to record in EPC the address of the offending instruction and a signal to control writes to it • Mechanism to load the PC with the address of the exception handler • Mechanism to flush offending instruction and the ones that follow it

  38. PCSrc Branch 8000 0180hex Hazard Unit ID/EX EX/MEM 0 1 0 Control IF/ID Add MEM/WB 4 Shift left 2 Add Compare IF.Flush Read Addr 1 Instruction Memory Data Memory RegFile Read Addr 2 0 Read Address Read Data 1 PC Read Data Write Addr ALU Address ReadData 2 Write Data Write Data ALU cntrl 16 Sign Extend 32 Forward Unit 0 1 Forward Unit 0 Datapath with Controls for Exceptions ID.Flush 

  39. Summary • All modern day processors use pipelining for performance (a CPI of 1 and fast a CC) • Pipeline clock rate limited by slowest pipeline stage – so designing a balanced pipeline is important • Must detect and resolve hazards • Structural hazards – resolved by pipeline hardware • Data hazards • Stall (impacts CPI) • Forward (requires hardware support – sometimes complex) • Control hazards – put the branch decision hardware in as early a stage in the pipeline as possible • Stall (impacts CPI) • Delay decision (usually requires compiler support) • Static and dynamic prediction(requires hardware support) • Exception hazards • Interrupts (pipelining adds inherent delays) • Traps on processing exceptions (flush partially executed instructions)

  40. More Performance • Two options: • Increase the depth of the pipeline to increase the clock rate – superpipelining ( e.g. increasing number of cycles per instruction and allowing memory to have 2 cycles, and fetch early) • Fetch (and execute) more than one instructions at one time (expand every pipeline stage to accommodate multiple instructions) – multiple-issue • Launching multiple instructions per stage allows the instruction execution rate, CPI, to be less than 1 • If the datapath has a five stage pipeline, how many instructions are active in the pipeline at any given time?

  41. Multiple-Issue Processor Styles • Static multiple-issue processors (VLIW) • Decisions on which instructions to execute simultaneously are being made at compile time by the compiler • E.g., IA-64 ISA (EPIC - Explicit Parallel Instruction Computer), Intel Itanium and Itanium 2 • Particularly powerful for executing “unraveled” loops • Dynamic multiple-issue processors (superscalar) • Decisions on which instructions to execute simultaneously are being made at run time by the hardware • E.g., IBM Power 2, Pentium 4

  42. Multiple-Issue Datapath Responsibilities • Must handle, with a combination of hardware and software fixes, the fundamental limitations of • Storage (data) dependencies – data hazards • Limitation more severe in a SS/VLIW processor due to (usually) low Instruction Level Processing (ILP) • Procedural dependencies – control hazards • Ditto, but even more severe • Use dynamic branch prediction to help resolve the ILP issue • Resource conflicts – structural hazards • A SS/VLIW processor has a much larger number of potential resource conflicts • Functional units may have to arbitrate for result buses and register-file write ports • Resource conflicts can be eliminated by duplicating the resource or by pipelining the resource

  43. HW 4 (Due Nov 20) Do the following exercises in the Text: 6.4 6.22 6.39 For your lab project 2, only design the Hamming code logic circuit. You don’t need to build it.

More Related