480 likes | 618 Views
CMPE 421 Parallel Computer Architecture. Part 2: Hardware Solution: Forwarding. Hardware Solution: Forwarding. Idea: use intermediate data , do not wait for result to be finally written to the destination register. Two steps: Detect data hazard
 
                
                E N D
CMPE 421Parallel Computer Architecture Part 2: Hardware Solution: Forwarding
Hardware Solution: Forwarding • Idea: use intermediate data, do not wait for result to be finally written to the destination register. Two steps: • Detect data hazard • Forward intermediate data to resolve hazard
Review: MIPS Pipeline Data and Control Paths PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 RegWrite Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 MemtoReg Read Address ALUSrc PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl How many bits wide is each pipeline register? PC – 32 bits IF/ID – 64 bits ID/EX – 9 + 32x4 + 10 = 147 EX/MEM – 5 + 1 + 32x3 + 5 = 107 MEM/WB – 2 + 32x2 + 5 = 71 MemWrite MemRead Sign Extend 16 32 ALUOp RegDst
Pipelined Datapath with Control II (as before) Control signals emanate from the control portions of the pipeline registers
Data Forwarding • Plan: • allow inputs to the ALU not just from ID/EX, but also later pipeline registers, and • use multiplexors and control signals to choose appropriate inputs to ALU sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Fig 6.29 Dependencies between pipelines move forward in time
Possible Hazard Conditions can be detected by following notations during forwarding technique • Hazard conditions: 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt • Eg., in the earlier example (Fig. 6-29), first hazard between sub $2, $1, $3 and and $12, $2, $5 is detected when the and is in EX stage and the sub is in MEM stage because • EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a) • Similar to above this time dependency between “sub” & “or” can be detected as • MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 (2b) • The two dependencies on “sub”-”add” are not hazard • Another form of forwarding but it occurs within the reg file • There is no hazard between sub-sw
Remarks: • -We don’t have any WB hazard • Why? Assume that REG. file supplies the correct result if the next instruction in the ID stage can read the register written by the current instruction in the WB stage • -Whether to forward also depends on: • if the later instruction is going to write a register – if not, no need to forward, even if there is register number match as in conditions above • If the destination register of the later instruction is $0 – in which case there is no need to forward value ($0 is always 0 and never overwritten)
Forwarding Hardware • To Detect hazard forwarding unit should be added by inserting Mux’es to the ALU inputs (see Fig 6.30) • For forwarding just the R-type instrucutions initially sub, add, and , …. • Forward A and Forward B control lines to select MUX inputs that will go into ALU • Forwarding unit will be in EX stage because the ALU forwarding MUX’es are found in that stage
Forwarding Hardware FIGURE 6.31 The control values for the forwarding multiplexors in Figure 6.30. The signed immediate that is another input to the ALU is described in the Elaboration at the end of this section.
Data Forwarding (Bypassing) • Take the result from the earliest point that it exists in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle • For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by • adding multiplexors to the inputs of the ALU • connecting the Rd write data in EX/MEM or MEM/WB to either (or both) of the EX’s stage Rs and Rt ALU mux inputs • adding the proper control hardware to control the new muxes • Other functional units may need similar forwarding logic (e.g., the DM) • With forwarding can achieve a CPI of 1 even in the presence of data dependencies
ForwardingHardware FIGURE 6.30 On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit. The new hardware is shown in color. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal. As in the earlier discussion, this ignores forwarding of a store value to a store instruction. Also note that this mechanism works for slt instructions as well. Datapath before adding forwarding hardware Datapath after adding forwarding hardware
Elaboration on Forwarding Hardware FIGURE 6.33 The datapath modified to resolve hazards via forwarding. Compared with the datapath in Figure 6.30, the additions are the multiplexors to the inputs to the ALU. This figure is a more stylized drawing, however, leaving out details from the full datapath, such as the branch hardware and the sign extension hardware.
DM DM DM Reg Reg Reg Reg Reg Reg stall IM IM IM ALU ALU ALU stall sub $4,$1,$5 and $6,$7,$1 Review: One Way to “Fix” a Data Hazard Fix data hazard by waiting – stall – but impacts CPI add $1, I n s t r. O r d e r
DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Review: Another Way to “Fix” a Data Hazard Fix data hazards by forwarding results as soon as they are available to where they are needed add $1, I n s t r. O r d e r sub $4,$1,$5 and $6,$7,$1 or $8,$1,$1 sw $4,4($1) Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.
Data Forwarding Control Conditions Forwards the result from the previous instr. to either input of the ALU • EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the second previous instr. to either input of the ALU
DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU Forwarding Illustration add $1, I n s t r. O r d e r sub $4,$1,$5 and $6,$7,$1 EX/MEM hazard forwarding MEM/WB hazard forwarding Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.
DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU Yet Another Complication! • Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded? I n s t r. O r d e r add $1,$1,$2 add $1,$1,$3 add $1,$1,$4
DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU Yet Another Complication! • Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded? • Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded. I n s t r. O r d e r add $1,$1,$2 add $1,$1,$3 add $1,$1,$4 Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded.
Corrected Data Forwarding Control Conditions • MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRs) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (EX/MEM.RegisterRd != ID/EX.RegisterRt) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
EX/MEM.RegisterRd ID/EX.RegisterRt MEM/WB.RegisterRd ID/EX.RegisterRs Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl 16 32 Sign Extend EX/MEM.RegWrite Forward Unit MEM/WB.RegWrite
Forwarding Hardware with Control Called forwarding unit, not hazard detection unit, because once data is forwarded there is no hazard! Datapath with forwarding hardware and control wires – certain details, e.g., branching hardware, are omitted to simplify the drawing Note: so far we have only handled forwarding to R-type instructions…!
Example • Consider the following code sequence in which the dependencies have been highlighted • sub $2, $1, $3 • and $4, $2, $5 • or $4, $4, $2 • add $9, $4, $2 • We’ll try to keep the example short. — We’ll skip the first two cycles, since they’re the same as before.
Example:Forwarding • Execution example: Clock cycle 3 sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Clock cycle 4
Example:Forwarding • Execution example (cont.): Clock cycle 5 sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Clock cycle 6
DM DM Reg Reg Reg Reg IM IM ALU ALU Memory-to-Memory Copies • For loads immediately followed by stores (memory-to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input. • Would need to add a Forward Unit and a mux to the memory access stage I n s t r. O r d e r lw$1,4($2) sw $1,4($3) What if lw was replaced with add $1, - is forwarding still needed? From where, to where? What if $1 was used to compute the effective address (it would be a load-use data hazard and would require a stall insertion between the lw and sw)
Load-use Hazard Detection Unit • Forwarding is not the solution for all data hazard conditions, For ex. When an instruction tries to read a register following a “lw”, that writes the same register  Problem will occur • In clk 4 the correct value of reg 2 is not available at the beginning • Therefore, In addition to FU, we need hazard detection unit (HDU) is to be operated in during ID stage • HDU will insert stall between the “load” and its use
Data Hazards and Stalls • Load word can still cause a hazard: • an instruction tries to read a register following a load instruction that writes to the same register • therefore, we need a hazard detection unit to stall the pipeline after the load instruction lw $2, 20($1) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 Slt $1, $6, $7 As even a pipeline dependency goes backward in time forwarding will not solve the hazard
Stalling Resolves a Hazard lw $2, 20($1) and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 Slt $1, $6, $7 • Same instruction sequence as before for which forwarding by itself could not resolve the hazard: • Hazard detection unit inserts a 1-cycle bubble in the pipeline, after which all pipeline register dependencies go forward so then the forwarding unit can handle them and there are no more hazards • AND instruction is turned into NOP all instructions beginning with AND instruction are delayed one cycle. Hazard forces the AND and OR instructions to repeat in clock cycle 4, what they did in clock cycle 3
DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM IM ALU ALU ALU ALU ALU ALU and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 DM Reg Example:Forwarding with Load-use Data Hazards lw $1,4($2) I n s t r. O r d e r sub $4,$1,$5
DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM IM ALU ALU ALU ALU ALU ALU and $6,$1,$7 and $6,$1,$7 or $8,$1,$9 or $8,$1,$9 xor $4,$1,$5 xor $4,$1,$5 DM Reg Forwarding with Load-use Data Hazards lw $1,4($2) I n s t r. O r d e r DM Reg Reg stall sub $4,$1,$5 sub $4,$1,$5 The one case where forwarding cannot save anything when an instruction tries to read a register following a load instruction that writes the same register.
Load-use Hazard Detection Unit • Hazard detection unit implements the following check if to stall if ( ID/EX.MemRead // if the instruction in the EX stage is a load… and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register // of the instruction in the ID stage, then… stall the pipeline • Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use • The first line tests to see if the instruction now in the EX stage is a lw; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the load-use instruction) • After this one cycle stall, the forwarding logic can handle the remaining data hazards
Stall Hardware • Along with the Hazard Unit, we have to implement the stall • Prevent the instructions in the IF and ID stages from progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing • Hazard detection Unit controls the writing of the PC (PC.write) and IF/ID (IF/ID.write) registers • Insert a “bubble” between the lw instruction (in the EX stage) and the load-use instruction (in the ID stage) (i.e., insert a noop in the execution stream) • Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 (noop). The Hazard Unit controls the mux that chooses between the real control values and the 0’s. • Let the lw instruction and the instructions after it in the pipeline (before it in the code) proceed normally down the pipeline
ID/EX.MemRead 0 ID/EX.RegisterRt Adding the Hazard Hardware PCSrc Hazard Unit ID/EX EX/MEM 0 IF/ID 1 Control Add MEM/WB Branch Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data ALU cntrl 16 32 Sign Extend Forward Unit
Mechanics of Stalling • If the check to stall verifies, then the pipeline needs to stall only 1 clock cycle after the load as after that the forwarding unit can resolve the dependency • What the hardware does to stall the pipeline 1 cycle: • does not let the IF/ID register change (disable write!) – this will cause the instruction in the ID stage to repeat, i.e., stall • therefore, the instruction, just behind, in the IF stage must be stalled as well – so hardware does not let the PC change (disable write!) – this will cause the instruction in the IF stage to repeat, i.e., stall • changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so effectively the instruction just behind the load becomes a nop – a bubble is said to have been inserted into the pipeline • note that we cannot turn that instruction into an nop by 0ing all the bits in the instruction itself – recall nop = 00…0 (32 bits) – because it has already been decoded and control signals generated
Pipelined Datapath with Control II (as before) Control signals emanate from the control portions of the pipeline registers
Hazard Detection Unit + Forwarding Unit Datapath with forwarding hardware, the hazard detection unit and controls wires – certain details, e.g., branching hardware are omitted to simplify the drawing
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example:
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example:
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example:
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example:
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example:
lw $2, 20($1) and $4, $2, $5 or $4, $4, $2 add $9, $4, $2 Stalling • Execution example: