1 / 73

MIPS Pipelining

MIPS Pipelining. Chapter 4 Sections 4.5 – 4.8 Dr. Iyad F. Jafar. Outline. Introduction Why Pipelining? MIPS Pipelined Datapath MIPS Pipelined Control Pipelining Hazards Structural Hazards Data Hazards Control Hazards Exceptions and Interrupts Fallacies and Pitfalls

alia
Download Presentation

MIPS Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIPS Pipelining Chapter 4 Sections 4.5 – 4.8 Dr. Iyad F. Jafar

  2. Outline • Introduction • Why Pipelining? • MIPS Pipelined Datapath • MIPS Pipelined Control • Pipelining Hazards • Structural Hazards • Data Hazards • Control Hazards • Exceptions and Interrupts • Fallacies and Pitfalls • Reading Assignment

  3. Introduction • Single-cycle datapath • Simple! • Hardware replication? • Cycle time? • Multi-cycle datapath • More involved • Less HW replication of major units • Better performance if the delay of major functional units is balanced! • Can we do any better? • Pipelining!

  4. IFetch IFetch IFetch Exec Exec Exec Mem Mem Mem WB WB WB Introduction • Pipelining • In Multi-cycle, only one major unit is used in each cycle while other units are idle! • Why not to use them to do something else? • Basically, start the next instruction before the current one is finished! Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Dec LW Dec SW Dec R-Type

  5. Introduction • Pipelining • The time required to execute one instruction (Instruction latency) is not affected! • However, the number of instructions finished per unit time (Throughput) is increased • Thus, Pipelining improves the throughput not latency! • Most modern processors are pipelined! • Notes • As in multi-cycle, the cycle time is determined by the slowest unit! • However, similar to single-cycle, we can get one instruction done every cycle! • It is assumed that all instructions take the same number of cycles!

  6. Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Pipeline Implementation: IFetch Dec Exec Mem WB lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type Introduction R-type Multiple Cycle Implementation:

  7. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Time to fill the pipeline Why Pipelining? Time (clock cycles) • For Performance! Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 (similar to Single-cycle) I n s t r. O r d e r Inst 1 Inst 2 Inst 3 Inst 4 Inst 5

  8. Why Pipelining? • Example 1. Comparing pipelining to single-cycle Consider a program that consists of a large number of LOAD instructions only that is executed on a single-cycle CPU and 5-stage pipelined CPU with the operation time for the major units (memory, ALU, and register file) to be 200 ps in both cases. 1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining. 2) Determine the time required to finish executing the first 3 LOAD instructions 3) Repeat (1) and (2) if the delay of the register file is 100 ps instead of 200 ps. Cycle times for the two implementations CCSC = 200 + 200 + 200 + 200 + 200 = 1000 ps CCPP = 200 ps

  9. Why Pipelining? • Example 1. Comparing pipelining to single-cycle 1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining. Single-cycle TimeSC = 1000 ps x 1000000 = 1,000,000,000 ps Pipelining TimePP = 1000 ps + 200 ps x 999999 = 200,000,800 ps Speeup = 1,000,000,000 / 200,000,800 = 4.99998 (very close to the number of stages)

  10. Why Pipelining? • Example 1. Comparing pipelining to single-cycle 2) Determine the time required to finish executing the first 3 LOAD instructions and compute the speed up of pipelining Single-cycle TimeSC = 1000 x 3 = 3000 ps Pipelining TimePP = 200 x 5 +200 + 200 = 1400 ps Speeup = 3000 / 1400 = 2.14 (less than the number of stages)

  11. Why Pipelining? • Example 1. Comparing pipelining to single-cycle 3) Repeat (1) and (2) if the delay of the register file is 100 ps . CCSC = 200 + 100 + 200 + 200 + 100 = 800 ps CCPP = 200 ps For 1,000,000 instructions TimeSC = 800 x 1,000,000 = 800,000,000 ps TimePP = 1000+ 200x999,999 = 200,000,800ps Speeup = 800,000,000/ 200,000,600 = 3.99998 (<5) For 3 instructions TimeSC = 800 x 3 = 2400 ps TimePP = 1000 + 200x 2 = 1400 ps Speeup = 2400/ 1400 = 1.71 (<5)

  12. Why Pipelining? • Example 1. Summary • Ideally,the pipeline speedup is n times faster than the single-cycle, where n is the number of pipeline stages. • In the 5-stage MIPS, the pipelined version would be 5 times faster. • When the pipeline is full, the throughput will be one instruction per cycle • Many factors affect pipelining performance • Time to fill empty the pipeline • Number of instructions to execute • Unbalancecd delay of pipeline stages • Instruction mix • Pipeline hazards • Ideally, the number of cycles required to finish M instructions in N-stages pipeline is N + M – 1

  13. Pipelined MIPS Datapath • What do we need to implement pipelining? • We need to consider the following: • The execution of instructions is divided into 5 stages (cycles): Instruction fetch (IF) , Instruction decode (ID), Execute (EX), Memory Access (MEM), Write Back (WB) • Instruction flow is from left to right except in two cases • In the write-back stage where the result is written into the register file in the middle of the datapath • Choosing between the incremented PC and the branch address in the MEM stage • In pipelining, all units are operating in every cycle; thus we have to duplicate hardware where needed • Since the execution is over multiple cycles, we need to add State (Pipeline) registers between stages to preserve intermediate data and control for each instruction. • These registers hold the values to be used in later stages as long as they are needed.

  14. Pipelined MIPS Datapath IF ID EX MEM WB + + 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 IFetch/Dec Read Addr 2 Read Address Read Data PC Dec/Exec Address Exec/Mem Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Sign Extend 16 32 System Clock Any problem?

  15. Pipelined MIPS Datapath IF ID EX MEM WB + + 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 IFetch/Dec Read Addr 2 Read Address Read Data PC Dec/Exec Address Exec/Mem Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Sign Extend 16 32 System Clock Need to preserve the destination register !

  16. Pipelined MIPS Datapath • Example 2. Execution of LW instruction (1) Instruction Fetch: Put PC and the loaded instruction in the IF/ID register

  17. Pipelined MIPS Datapath • Example 2. Execution of LW instruction (2) Instruction Decode and Read Registers: Store Reg[rs], Reg[rt], sign extended offset , rd,rt, and the updated PC (why?) in the ID/EX register

  18. MIPS Pipelining • Example 2. Execution of LW instruction (3) Execute Or Address Calculation: Store branch address, Reg[rt], result, and zero flag in the EX/MEM register

  19. Pipelined MIPS Datapath • Example 2. Execution of LW instruction (4) Memory Access: Store the data from memory into MEM/WB register

  20. Pipelined MIPS Datapath • Example 2. Execution of LW instruction (5) Write Back: Copy the data loaded in the MEM/WB register to register file

  21. Pipelined MIPS Datapath • Required data fields in the pipelining registers • Data fields are moved from one pipeline register to another every clock cycle until they are no longer needed

  22. Pipelined MIPS Control • All control signals can be determined during Decode stage while they are needed in later stages! • Solution! Expand the pipeline registers to store and move the control signals between stages until they are needed

  23. Pipelined MIPS Control • Define the control signals and generate them in the decode stage • For the time being, no explicit write signals are required for the pipeline registers since the are updated every cycle

  24. Pipelined MIPS Control • Control signals needed in each stage • Control signal values based on instruction type

  25. MIPS Pipeline • Example 3. Given the code segment and the register contents below, show the contents of the data and control fields in the pipeline registers if the sixth instruction has been fetched (i.e. the beginning of cycle 7)

  26. DM DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM IM ALU ALU ALU ALU ALU ALU MIPS Pipeline Time • Example 3. Multi-cycle diagram lw $10, 20($1) I n s t r. O r d e r sub $11,$1,$2 add $12,$3,$4 lw $13, 24($1) add $3,$2,$1 sub $1,$5,$6

  27. MIPS Pipeline sub $1,$5,$6 • Example 3. Single-cycle diagram add $3,$2,$1 lw $13, 24($1) add $12,$3,$4 sub $11,$1,$2

  28. MIPS Pipeline • Example 3. At the beginning of cycle 7, the sixth instruction is stored in the IF/ID register while the data and control for earlier instructions are pushed to next pipeline registers and the register files. Thus, • IF/ID register • No control signals are stored • Store the instruction sub $1,$5,$6 and PC+4 • IF/ID.Instruction = 0x00A60822 • IF/ID.PC = 0x00000018

  29. MIPS Pipeline • Example 3. • ID/EX register • Store the information of add $3,$2,$1 and PC+4 • ID/EX.PC = 0x00000014 • ID/EX.RegRsContents = 0x00000005 • ID/EX.RegRtContents = 0x00000001 • ID/EX.RegRt = (00001)2 • ID/EX.RegRd = (00011)2 • ID/EX.SignExtend = 0x00001820 • Control Information • ID/EX.MemToReg = 0 • ID/EX.RegWrite = 1 • ID/EX.MemRead = 0 • ID/EX.MemWrite = 0 • ID/EX.Branch = 0 • ID/EX.ALUSrc = 0 • ID/EX.RegDst = 1 • ID/EX.ALUOp = (10)2

  30. MIPS Pipeline • Example 3. • EX/MEM register • Store the information of lw $13,24($1), branch address, and memory address • EX/MEM.BranchAddress = 0x00000070 • EX/MEM.ALUOut = 0x00000019 • EX/MEM.Zero = 0 • EX/MEM.RegDestination= (01101)2 • EX/MEM.RegRtContents = 0x0000000A • Control Information • EX/MEM.MemToReg = 0 • EX/MEM.RegWrite = 1 • EX/MEM.MemRead = 1 • EX/MEM.MemWrite = 0 • EX/MEM.Branch = 0

  31. MIPS Pipeline • Example 3. • MEM/WB register • Store the information of add $12, $3,$4, addition result, and data memory • MEM/WB.RegDestination= (01100)2 • MEM/WB.ALUOut = 0xFFFFFFFD • MEM/WB.MemoryData = XXXX • Control Information • MEM/WB.MemToReg = 0 • MEM/WB.RegWrite = 1 • For the sub $11, $1,$2 • It will be writing (1 - 5) to $11

  32. Pipelining Hazards • In general, pipelining is effective! • MIPS ISA makes even easy • All instructions are of the same length (32 bits) • Can fetch the next instruction once the current is being decoded • Few instruction formats with symmetry across them • Can read the register file in the 2nd stage • Memory access is through the Load and Store instructions • Can use the execute stage to compute the address • Each MIPS instruction writes at most one result in the MEM or WB stage • Is it that easy? Any complications? • YES! • PIPELINING HAZARDS !

  33. Pipelining Hazards • Hazards - problems the might occur during pipeline operation • Three basic sources • Structural Hazards • In pipelining, all functional units are used in any cycle • What if two instructions use the same functional unit in the same cycle? • Data Hazards • In pipelining, execution of instructions is overlapped • What if the operand(s) of some instruction comes from an earlier instruction that is still in the pipeline? • Control Hazards • In pipelining, an instruction is fetched every cycle • What if an instruction is a jump or a branch instruction that evaluates to true? The following instruction(s) in the pipeline might not be correct? • Simple Solution? • Wait until the issue is resolved!

  34. Mem Mem Mem Mem Mem Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Mem Mem Mem Mem Mem ALU ALU ALU ALU ALU Structural Hazards Reading from memory twice in the same cycle! Time (clock cycles) • Single Memory! lw I n s t r. O r d e r Inst 1 Inst 2 Inst 3 Inst 4 Solution: Use two memories; Data and Instruction!

  35. DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU clock edge that controls loading of pipeline state registers clock edge that controls register writing Structural Hazards Time (clock cycles) • Single Register File! One instruction is writing and the other is reading the register file? add $1, I n s t r. O r d e r Inst 1 Solution: Design the register file to write in the first half of the cycle and read in the second half! Inst 2 add $2,$1,

  36. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Data Hazards add $1, sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 • Dependencies backward in time cause hazards • This is called Read-after-Write (RAW) data hazard • Register-use data hazard Solution?

  37. DM DM DM Reg Reg Reg Reg Reg Reg stall IM IM IM ALU ALU ALU stall sub $4,$1,$5 and $6,$1,$7 Data Hazards • Simply, wait for the earlier instruction to finish! This is called stalling the pipeline! However, this affects the CPI? add $1, I n s t r. O r d e r Do we need two stalls all the time?

  38. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Data Hazards lw$1,5($s1) sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 • Dependencies backward in time cause hazards • It is a Read-after-Write (RAW) data hazard • Load-use data hazard Solution?

  39. DM DM DM Reg Reg Reg Reg Reg Reg stall IM IM IM ALU ALU ALU stall sub $4,$1,$5 and $6,$1,$7 Data Hazards • Again, wait for the LW instruction to finish by stalling the pipeline! However, this affects the CPI? lw$1, I n s t r. O r d e r

  40. Data Hazards • Example 4. how many cycles are actually required to execute the following code? Assume the pipeline is already full. add $1, $2, $5 add $5, $3, $1 sub $10, $7, $8 sub $5, $6, $7 lw $3, 45($9) add $3, $3, $8 Ideally, and since the pipeline is full, each instruction requires 1 cycle. Thus, we need 6 cycles (CPI =6/6= 1). However, … Register-use data hazard Adds 2 cycles by stalls Load-use data hazard Adds 2 cycles by stalls Thus, 10 cycles are needed. CPI = 10/6 = 1.667 ?? Performance ?? Can we do any better?

  41. Data Hazards • Fixing Register-use Hazard by Forwarding • Note that data produced by an instruction and needed by a later instruction is pushed through the pipeline registers until it is saved into the register file ! • Why not to read the data from the pipeline registers before it is stored ? • This is called forwarding! • What is required? • Need to detect the hazard • Is any of the source registers for the instruction the same as the destination register for an earlier instruction that is still in the pipeline? • Need to create a path to pass the data between pipeline stages • Instead of reading the source registers of the instruction from the register file, read them from the pipeline registers

  42. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Data Hazards • Fixing Register-use Hazard by Forwarding add $1, I n s t r. O r d e r sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 No Stalls!

  43. Data Hazards • Forwarding Hardware implementation Note that forwarding could be from EX/MEM or from MEM/WB! Why?

  44. Data Hazards • Forwarding Hardware implementation • Inside the forwarding unit • Forwarding from EX/MEM (MEM Stage) if (EX/MEM.RegWrite and (EX/MEM.RegRd != 0) and (EX/MEM.RegRd = ID/EX.RegRs)) then ForwardA = From EX/MEM if (EX/MEM.RegWrite and (EX/MEM.RegRd != 0) and (EX/MEM.RegRd = ID/EX.RegRt)) then ForwardB = From EX/MEM • Why to check the RegWrite signal? • Why to check the Zero register?

  45. Data Hazards • Forwarding Hardware implementation • Inside the forwarding unit • Forwarding from MEM/WB (WB Stage) if (MEM/WB.RegWrite and (MEM/WB.RegRd != 0) and (MEM/WB.RegRd = ID/EX.RegRs)) then ForwardA = From MEM/WB if (MEM/WB.RegWrite and (MEM/WB.RegRd != 0) and (MEM/WB.RegRd = ID/EX.RegRt)) then ForwardB = From MEM/WB

  46. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Data Hazards • Can the forwarding hardware be used with Load-use data hazard? lw$1,4($2) I n s t r. O r d e r sub $4,$1,$5 and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 We still need 1 Stall for the instruction following the load?

  47. Data Hazards • How to stall the pipeline? • Stall is required when the instruction in the EX stage is Load and the one in the ID stage depends on the loaded value • The Load instruction moves normally to EX/MEM on the next cycle • The conflicting instruction (the instruction following the load) should stay in the decode stage? How? • Don’t write the IF/ID register  need IF/IDWrite Signal • Don’t update the PC  need PCWrite Signal • The control signals of the instruction in the decode stage are stored as 0’s (WHY?) in the ID/EX  need a multiplexor for the control signals • Controlling the process requires a special unit; Hazard Detection Unit

  48. Data Hazards • Stall Implementation

  49. Data Hazards • Stall Implementation • Inside hazard detection unit if (ID/EX.MemRead and [(ID/EX.RegRt == IF/ID.RegRs) or (ID/EX.RegRt == IF/ID.RegRt)]) then PCWrite = 0 IF/IDWrite = 0 Select 0’s as control signals Any Problem? Do we need to stall in all cases? How about j and jal that come immediately after load with rs and/or rt fields being the same as the rt field of the load?

  50. Data Hazards • Example 5. Consider the following code segment in C A = B + E C = B + F (1) Generate the MIPS code assuming that variables A, B, C, E, and F are in memory and addressable with offsets 0, 4, 8, 12, and 16 from $t0 (2) Find all the data hazards and determine the number of cycles required to run the code. Assume forwarding is implemented. (3) Can you reorder the code to reduce the stalls ?

More Related