Lec 9: Pipelining

Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

Basic Pipelining Five stage “RISC” load-store architecture • Instruction fetch (IF) • get instruction from memory • Instruction Decode (ID) • translate opcode into control signals and read regs • Execute (EX) • perform ALU operation • Memory (MEM) • Access memory if load/store • Writeback (WB) • update register file Following slides thanks to Sally McKee

Sample Code (Simple) • Assume eight-register machine • Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

+ A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data mem instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 15-17 M U X Bits 21-23 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slides thanks to Sally McKee

Pipelining Recap • Powerful technique for masking latencies • Logically, instructions execute one at a time • Physically, instructions execute in parallel • Instruction level parallelism • Decouples the processor model from the implementation • Interface vs. implementation • BUT dependencies between instructions complicate the implementation

What Can Go Wrong? • Data hazards • register reads occur in stage 2 • register writes occur in stage 5 • could read the wrong value if is about to be written • Control hazards • branch instruction may change the PC in stage 4 • what do we fetch before that?

Handling Data Hazards • Detect and Stall • If hazards exist, stall the processor until they go away • Safe, but not great for performance • Detect and Forward • If hazards exist, fix up the pipeline to get the correct value (if possible) • Most common solution for high performance

Handling Data Hazards III: • Detect • dest of early instrs == source of current • Forward: • New bypass datapaths route computed data to where it is needed • New MUX and control to pick the right data • Beware: Stalling may still be required even in the presence of forwarding

Different type of Hazards • Use register value in subsequent instruction add 3 1 2 nand 5 3 4 or 6 3 1 sub 6 3 1 Slides thanks to Sally McKee

Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you read the wrong value of R3

Handling Data Hazards: Forwarding • No point forwarding to decode • Forward to EX stage • From output of ALU and MEM stages

Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decodeexecute memory writeback

Data Hazards: AL ops add 3 1 2 and 8 0 2 nand 5 3 4 time add fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback

DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU Forwarding Illustration add $3 I n s t r. O r d e r add DM Reg Reg nand $5 $3

Data Hazards: AL ops add 3 1 2 and 8 0 2 and 8 2 0 nand 5 3 4 time add fetch decode execute memory writeback add fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback Write in first half of cycle, read in second half of cycle

Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback

Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memorywriteback add fetch decode execute memory writeback nand fetch decodeexecute memory writeback

Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memorywriteback nand fetch decodeexecute memory writeback

DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU A tricky case add 1 1 2 add 11 3 add 1 1 4 I n s t r. O r d e r add 1 1 2 add 11 3 add 1 1 4

Another tricky case add 0 4 1 add 30 4

Handling Data Hazards: Summary • Forward: • New bypass datapaths route computed data to where it is needed • Beware: Stalling may still be required even in the presence of forwarding • For lw • MIPS 2000/3000: a delay slot • MIPS 4000 onwards: stall • But really rely on compiler to treat it like delay slot

Detection • Detection: • Compare regA with previous DestRegs • Compare regB with previous DestRegs • regA and regB going to ALU • Previous destregs • Previously in ALU, Memory and Writeback

+ A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data mem instruction R3 ALU result mdata R4 M U X valB R5 R6 M U X data R7 offset dest valB dest dest dest op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

Sample Code Which data hazards do you see? add 3 1 2 nand 5 3 4 add 7 6 3 lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

First half of cycle 3 + Hazard A L U fwd fwd fwd M U X 1 2 1 0 R0 3 14 R1 regA 7 R2 Inst mem Register file regB 14 M U X PC Data mem nand 5 3 4 10 R3 3 11 R4 M U X 77 7 R5 data 1 R6 M U X 8 R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

End of cycle 3 + A L U H1 M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 10 M U X PC Data mem add 7 6 3 10 R3 3 21 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

First half of cycle 4 + New Hazard A L U H1 M U X 1 3 2 0 R0 21 14 R1 regA M U X 3 7 R2 Inst mem Register file regB 10 M U X PC Data mem add 7 6 3 10 R3 3 21 11 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

End of cycle 4 + A L U H2 H1 M U X 1 4 3 0 R0 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem lw 6 24(3) 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ 1 21 A L U H2 H1 First half of cycle 5 M U X 1 4 3 0 R0 3 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem lw 6 24(3) 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U H2 H1 End of cycle 5 M U X 1 5 4 0 R0 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data mem sw 6 12(2) 21 R3 6 22 11 R4 7 5 M U X 77 R5 data 1 R6 M U X 8 R7 24 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

en + en A L U H2 H1 First half of cycle 6 M U X 1 5 4 Hazard 0 R0 6 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data mem sw 6 12(2) 21 R3 22 11 R4 6 7 5 M U X 77 R5 L 1 R6 M U X data 8 R7 24 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U nop H2 End of cycle 6 M U X 1 5 0 R0 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data mem sw 6 12(2) 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 lw add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U H2 First half of cycle 7 M U X 1 5 Hazard 0 R0 6 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data mem sw 6 12(2) 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 nop lw add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U H3 End of cycle 7 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 99 11 R4 6 6 M U X -2 7 R5 data 1 R6 M U X 22 R7 12 sw nop lw IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ 99 12 A L U H3 First half of cycle 8 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 99 11 R4 6 M U X -2 7 R5 data 99 R6 M U X 8 R7 12 sw nop lw IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

+ A L U H3 End of cycle 8 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 111 11 R4 6 M U X -2 7 R5 data 99 R6 M U X 8 R7 12 sw nop IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

Control Hazards beqz 3 goToZero nand 5 1 4 add 6 7 8 time beqz fetch decode execute memory writeback nand fetch decode execute memory writeback

Control Hazards for 4-stage pipeline beqz 3 goToZero nand 5 1 4 add 6 7 8 time beqz F/D execute memory writeback nand F/D execute memory writeback

Control Hazards • Stall • Inject NOPs into the pipeline when the next instruction is not known • Pros: simple, clean; Cons: slow • Delay Slots • Tell the programmer that the N instructions after a jump will always be executed, no matter what the outcome of the branch • Pros: The compiler may be able to fill the slots with useful instructions; Cons: breaks abstraction boundary • Speculative Execution • Insert instructions into the pipeline • Replace instructions with NOPs if the branch comes out opposite of what the processor expected • Pros: Clean model, fast; Cons: complex

Control Hazards for 4-stage pipeline beqz 3 goToZero nand 5 1 4 add 6 7 8 goToZero: lui 2 1 time beqz F/D execute memory writeback nand Delay slot! F/D execute memory writeback lui F/D execute memory writeback

Hazard summary • Data hazards • Forward • Stall for lw/sw • Control hazard • Delay slot

Pipelining for Other Circuits • Pipelining can be applied to any combinational logic circuit • What’s the circuit latency at 2ns. per gate? • What’s the throughput? • What is the stage breakdown? Where would you place latches? • What’s the throughput after pipelining?

Lec 9: Pipelining

Lec 9: Pipelining

Presentation Transcript

PIPELINING

Introduction to Processor Architecture

MIPS Pipelining

EECC551 Exam Review

CMPUT680 - Winter 2006

Part 9 Instruction Level Parallelism (ILP) - Concurrency

Lecture 3: Dynamic ILP

Amélioration des performances

Part 8 Instruction Level Parallelism (ILP) - Pipelining

September 17, 2004 Prof. Seok-Bum Ko Electrical Engineering University of Saskatchewan

并行计算

CSE 502: Computer Architecture

Pipelined Processor Design

Chapter 2 Pipelined Processors

CS136, Advanced Architecture

CS5365

Instruction Set Architectures

CS4100: 計算機結構 Pipelining