ECE369 Pipelining

ECE369Pipelining

addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle (like the register file, but this is likely not efficient to do in a real machine). All instructions use the same format (shown below), but not all instructions use all of the fields. Assume that each unused field is set to 0.

Pipelining One CPU manufacturer has proposed the 10-stage pipeline shown below. Here are the correspondences between this and the MIPS pipeline: • Instructions are fetched in the FET stage. • Register reading is performed in the REG stage. • ALU operations and memory accesses are both done in the EXE stage. • Branches are resolved in the DET stage. • WRB is the writeback stage. • Write and Read on Memory or Register File can occur in the same cycle Without forwarding, how many stall cycles are needed for the following code? Show your work to get credit. lw $t0, 0($a0) add $v1, $t0, $t0

Solution

Loop: LW R1, 0(R2) ADDI R1, R1,#1 SW R1, 0(R2) ADDI R2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? -no forwarding or bypassing hardware. -all memory and register writes occur during the first half and reads occur during the second half of the clock cycle. (a register read and a register write in the same cycle forwards through the register file). -branching is handled by flushing the pipeline and branches are resolved in Memory stage.

branches are resolved in MEM. Second iterations starts 17 clock cycles after the first instructions. Last iterations takes 18 cycles. Loop executes 99 times. => 98*17+18=1684cycles.

Loop: LW R1, 0(R2) ADDI R1, R1,#1 SW R1, 0(R2) ADDI R2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? -with forwarding and bypassing hardware. -all memory and register writes occur during the first half and reads occur during the second half of the clock cycle. (a register read and a register write in the same cycle forwards through the register file). -Assume that branch is resolved in Memory stage and handled by predicting it as not taken. {Use (m) for branch mis-prediction in the table}

branches are resolved in MEM. Second iterations starts 10 clock cycles after the first instructions. Last iterations takes 11 cycles. Loop executes 99 times. => 98*10+11=991cycles.

Loop: LW R1, 0(R2) ADDI R1, R1,#1 SW R1, 0(R2) ADDI R2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop Assume that the initial value of R3 is R2+396, How many cycles does this loop take to execute? • Assuming the MIPS pipeline with a single cycle delayed branch and normal forwarding and bypassing hardware, • Schedule the instructions in the loop including the branch delay slot. • You may reorder the instructions and modify the individual instruction operands, but do not undertake other loop transformations that change the number or opcode of the instructions in the loop. • Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.

Loop: LW R1, 0(R2) ADDI R1, R1,#1 SW R1, 0(R2) ADDI R2, R2, #4 SUBR4, R3, R2 BNEZ R4, Loop =98*6+10=598 clocks

ECE369 Pipelining

ECE369 Pipelining

Presentation Transcript

PIPELINING

Pipelining

Pipelining

ECE369 Chapter 2

ECE369 Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining