CMPE 421 Advanced Parallel Computer Architecture

CMPE 421Advanced Parallel Computer Architecture Pipeline datapath and Control

A Pipeline DatapathRevised: Single Cycle Datapath

Instruction Pipelining • An Instruction is divided into five pipelined stages. This means that five instructions will be in execution during any single cycle. For this reason we must separate the datapath into 5 pieces • Instruction Fetch • Instruction Decode/ Register Fetch • ALU Operation • Data Memory access • Write result into register

Instruction Pipelining • Instructions and data move from left to right through these five stages as the complete the executuion • However, there are two exceptions to this left to right flow of instructions • The write back stage in which the result is written back into the register file • Selecting the next value of the PC, choosing between the incremented PC and the branch address

LW instruction for pipelined Datapath • To maintain proper time order, this stylized datapath breaks the register file into two logical parts: registers read during register fetch (ID) and registers written during write back (WB). This dual use is represented by drawing the unshaded left half of the register fi le using dashed lines in the ID stage, when it is not being written, and the unshaded right half in dashed lines in the WB stage, when it is not being read. As before, we assume the register fi le is written in the first half of the clock cycle and the register fi le is read during the second half.

LW instruction for pipelined Datapath • The pipeline registers, in color, separate each pipeline stage. • The pipeline register is used to pass any information needed in the next pipe stage • They are labeled by the stages that they separate; for example, the ﬁrst is labeled IF/ID because it separates the instruction fetch and instruction decode stages. The registers must be wide enough to store all the data corresponding to the lines that go through them. For example, the IF/ID register must be 64 bits wide, because it must hold both the 32-bit instruction fetched from memory and the incremented 32-bit PC address. • We will expand these registers over the course of this chapter, but for now the other three pipeline registers contain 128, 97, and 64 bits, respectively

INSTRUCTION FETCH

Instruction Decode

Execution stage

Memory stage Data memory is read using the address in the EX/MEM pipeline registers, and the data is placed in the MEM/WB pipeline register. Next, data is read from the MEM/WB pipeline register and written into the register ﬁ le in the middle of the datapath.

A Bug! • When the value read from memory is written back to the register file, the inputs to the register file (write register #) are from a different instruction! • To fix the bug we need to save the part of the lw instruction (5 bits of it specify which register should get the value from memory).

The corrected pipeline datapath to properly handle the load instrucution The write register number now comes from the MEM/WB pipeline register along with the data. The register number is passed from the ID pipe stage until it reaches the MEM/WB pipeline register, adding ﬁ ve more bits to the last three pipeline registers.

Five pipe stages of the store instruction

Store Datapath: Stage 3

Pipeline Control • Just as control was added to single cycle and multi-cycle implementations we must add it to the pipelined processor • Unlike single cycle and multi-cycle, no instruction determines how all the control signals should be set • Pipelining the datapath leaves the meaning of control lines unchanged • Control signals are pipelined too (grouped by stage) • The control unit is combinational again

Review: Single Cycle Control

Implementing Pipeline Control • Use a Main Control unit to generate signals during RF/ID Stage • Control signals for EX (ExtOp, ALUSrc, …) used 1 cycle later • Control signals for Mem (MemWr, Branch) used 2 cycles later • Control signals for WB (MemtoReg, MemWr) used 3 cycles later

Assumptions for pipelining the control signals • Initial design – motivated by single-cycle datapath control – use the same control signals • Observe: • No separate write signal for the PC as it is written every cycle • No separate write signals for the pipeline registers (IF/ID, ID/EX, EX/MEM, MEM/WB), as they are written every cycle • No separate read signal for instruction memory as it is read every clock cycle • No separate read signal for register file as it is read every clock cycle • Need to set control signals during each pipeline stage • Since control signals are associated with components active during a single pipeline stage, can group control lines into five groups according to pipeline stage Will be modified by hazard detection unit!!

Implementing Pipeline Control

C A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 R A L U S r c g n R e a d e o i R r e g i s t e r 1 t P C c o R e a d t u r m d a t a 1 t R e a d s e n Z e r o M r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M u u W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u x 1 R e g D s t Putting it All Together P C S r c I D / E X 0 M W B u E X / M E M x 1 o n t r o l M W B M E M / W B E X M W B I F / I D e t i r W m e M A d d r e s s A L U A d d r e s s D a t a m e m o r y 6 I n s t r u c t i o n [ 1 5 – 1 1 ]

IF Reg EX MEM WB IF Reg EX MEM IF IF Reg EX MEM WB IF Reg EX MEM WB IF Reg EX MEM WB Comparison Cycle 1 Cycle 2 Clk Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Multiple Cycle Implementation: Load Store R-type Pipeline Implementation: Load Store R-type

Got it?

CMPE 421 Advanced Parallel Computer Architecture

CMPE 421 Advanced Parallel Computer Architecture

Presentation Transcript

Advanced Computer Architecture

Parallel Computer Architecture

Computer Architecture Parallel Processors

Advanced Computer Architecture Data-Level Parallel Architectures

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421 Parallel Computer Architecture

CMPE 421

CMPE 421 Parallel Computer Architecture

CMPE 421 Advanced Computer Architecture

Advanced Computer Architecture

Advanced Computer Architecture

CMPE 421 Parallel Computer Architecture

Advanced Computer Architecture

CMPE 511 COMPUTER ARCHITECTURE

CMPE 421 Advanced Computer Architecture

CMPE 325 Computer Architecture II

Advanced Computer Architecture