1 / 46

Lec 9: Pipelining

Lec 9: Pipelining. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Basic Pipelining. Five stage “RISC” load-store architecture Instruction fetch (IF) get instruction from memory Instruction Decode (ID) translate opcode into control signals and read regs Execute (EX)

mara-poole
Download Presentation

Lec 9: Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lec 9: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

  2. Basic Pipelining Five stage “RISC” load-store architecture • Instruction fetch (IF) • get instruction from memory • Instruction Decode (ID) • translate opcode into control signals and read regs • Execute (EX) • perform ALU operation • Memory (MEM) • Access memory if load/store • Writeback (WB) • update register file Following slides thanks to Sally McKee

  3. Sample Code (Simple) • Assume eight-register machine • Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

  4. + A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data mem instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 15-17 M U X Bits 21-23 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  5. Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slides thanks to Sally McKee

  6. Pipelining Recap • Powerful technique for masking latencies • Logically, instructions execute one at a time • Physically, instructions execute in parallel • Instruction level parallelism • Decouples the processor model from the implementation • Interface vs. implementation • BUT dependencies between instructions complicate the implementation

  7. What Can Go Wrong? • Data hazards • register reads occur in stage 2 • register writes occur in stage 5 • could read the wrong value if is about to be written • Control hazards • branch instruction may change the PC in stage 4 • what do we fetch before that?

  8. Handling Data Hazards • Detect and Stall • If hazards exist, stall the processor until they go away • Safe, but not great for performance • Detect and Forward • If hazards exist, fix up the pipeline to get the correct value (if possible) • Most common solution for high performance

  9. Handling Data Hazards III: • Detect • dest of early instrs == source of current • Forward: • New bypass datapaths route computed data to where it is needed • New MUX and control to pick the right data • Beware: Stalling may still be required even in the presence of forwarding

  10. Different type of Hazards • Use register value in subsequent instruction add 3 1 2 nand 5 3 4 or 6 3 1 sub 6 3 1 Slides thanks to Sally McKee

  11. Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you read the wrong value of R3

  12. Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you read the wrong value of R3

  13. Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decode execute memory writeback If not careful, you read the wrong value of R3

  14. Handling Data Hazards: Forwarding • No point forwarding to decode • Forward to EX stage • From output of ALU and MEM stages

  15. Data Hazards: AL ops add 3 1 2 nand 5 3 4 time add fetch decode execute memory writeback nand fetch decodeexecute memory writeback

  16. Data Hazards: AL ops add 3 1 2 and 8 0 2 nand 5 3 4 time add fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback

  17. DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU Forwarding Illustration add $3 I n s t r. O r d e r add DM Reg Reg nand $5 $3

  18. Data Hazards: AL ops add 3 1 2 and 8 0 2 and 8 2 0 nand 5 3 4 time add fetch decode execute memory writeback add fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback Write in first half of cycle, read in second half of cycle

  19. Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memory writeback add fetch decode execute memory writeback nand fetch decode execute memory writeback

  20. Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memorywriteback add fetch decode execute memory writeback nand fetch decodeexecute memory writeback

  21. Data Hazards: lw lw 3 12(zero) and 8 0 2 nand 5 3 4 time lw fetch decode execute memorywriteback nand fetch decodeexecute memory writeback

  22. DM DM DM Reg Reg Reg Reg Reg Reg IM IM IM ALU ALU ALU A tricky case add 1 1 2 add 11 3 add 1 1 4 I n s t r. O r d e r add 1 1 2 add 11 3 add 1 1 4

  23. Another tricky case add 0 4 1 add 30 4

  24. Handling Data Hazards: Summary • Forward: • New bypass datapaths route computed data to where it is needed • Beware: Stalling may still be required even in the presence of forwarding • For lw • MIPS 2000/3000: a delay slot • MIPS 4000 onwards: stall • But really rely on compiler to treat it like delay slot

  25. Detection • Detection: • Compare regA with previous DestRegs • Compare regB with previous DestRegs • regA and regB going to ALU • Previous destregs • Previously in ALU, Memory and Writeback

  26. + A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Inst mem Register file regB valA M U X PC Data mem instruction R3 ALU result mdata R4 M U X valB R5 R6 M U X data R7 offset dest valB dest dest dest op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  27. Sample Code Which data hazards do you see? add 3 1 2 nand 5 3 4 add 7 6 3 lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

  28. Sample Code Which data hazards do you see? add 3 1 2 nand 5 3 4 add 7 6 3 lw 6 24(3) sw 6 12(2) Slides thanks to Sally McKee

  29. First half of cycle 3 + Hazard A L U fwd fwd fwd M U X 1 2 1 0 R0 3 14 R1 regA 7 R2 Inst mem Register file regB 14 M U X PC Data mem nand 5 3 4 10 R3 3 11 R4 M U X 77 7 R5 data 1 R6 M U X 8 R7 add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  30. End of cycle 3 + A L U H1 M U X 1 3 2 0 R0 14 R1 regA 7 R2 Inst mem Register file regB 10 M U X PC Data mem add 7 6 3 10 R3 3 21 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  31. First half of cycle 4 + New Hazard A L U H1 M U X 1 3 2 0 R0 21 14 R1 regA M U X 3 7 R2 Inst mem Register file regB 10 M U X PC Data mem add 7 6 3 10 R3 3 21 11 11 R4 5 M U X 77 11 R5 data 1 R6 M U X 8 R7 nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  32. End of cycle 4 + A L U H2 H1 M U X 1 4 3 0 R0 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem lw 6 24(3) 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  33. + 1 21 A L U H2 H1 First half of cycle 5 M U X 1 4 3 0 R0 3 14 R1 regA 21 M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem lw 6 24(3) 10 R3 -2 11 R4 7 5 3 M U X 77 10 R5 data 1 R6 M U X 8 R7 add nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  34. + A L U H2 H1 End of cycle 5 M U X 1 5 4 0 R0 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data mem sw 6 12(2) 21 R3 6 22 11 R4 7 5 M U X 77 R5 data 1 R6 M U X 8 R7 24 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  35. en + en A L U H2 H1 First half of cycle 6 M U X 1 5 4 Hazard 0 R0 6 14 R1 regA -2 M U X 7 R2 Inst mem Register file regB 21 M U X PC Data mem sw 6 12(2) 21 R3 22 11 R4 6 7 5 M U X 77 R5 L 1 R6 M U X data 8 R7 24 lw add nand IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  36. + A L U nop H2 End of cycle 6 M U X 1 5 0 R0 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data mem sw 6 12(2) 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 lw add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  37. + A L U H2 First half of cycle 7 M U X 1 5 Hazard 0 R0 6 14 R1 regA 22 M U X 7 R2 Inst mem Register file regB M U X PC Data mem sw 6 12(2) 21 R3 31 11 R4 6 7 M U X -2 R5 data 1 R6 M U X 8 R7 nop lw add IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  38. + A L U H3 End of cycle 7 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 99 11 R4 6 6 M U X -2 7 R5 data 1 R6 M U X 22 R7 12 sw nop lw IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  39. + 99 12 A L U H3 First half of cycle 8 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 99 11 R4 6 M U X -2 7 R5 data 99 R6 M U X 8 R7 12 sw nop lw IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  40. + A L U H3 End of cycle 8 M U X 1 5 0 R0 14 R1 regA M U X 7 R2 Inst mem Register file regB 1 M U X PC Data mem 21 R3 111 11 R4 6 M U X -2 7 R5 data 99 R6 M U X 8 R7 12 sw nop IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  41. Control Hazards beqz 3 goToZero nand 5 1 4 add 6 7 8 time beqz fetch decode execute memory writeback nand fetch decode execute memory writeback

  42. Control Hazards for 4-stage pipeline beqz 3 goToZero nand 5 1 4 add 6 7 8 time beqz F/D execute memory writeback nand F/D execute memory writeback

  43. Control Hazards • Stall • Inject NOPs into the pipeline when the next instruction is not known • Pros: simple, clean; Cons: slow • Delay Slots • Tell the programmer that the N instructions after a jump will always be executed, no matter what the outcome of the branch • Pros: The compiler may be able to fill the slots with useful instructions; Cons: breaks abstraction boundary • Speculative Execution • Insert instructions into the pipeline • Replace instructions with NOPs if the branch comes out opposite of what the processor expected • Pros: Clean model, fast; Cons: complex

  44. Control Hazards for 4-stage pipeline beqz 3 goToZero nand 5 1 4 add 6 7 8 goToZero: lui 2 1 time beqz F/D execute memory writeback nand Delay slot! F/D execute memory writeback lui F/D execute memory writeback

  45. Hazard summary • Data hazards • Forward • Stall for lw/sw • Control hazard • Delay slot

  46. Pipelining for Other Circuits • Pipelining can be applied to any combinational logic circuit • What’s the circuit latency at 2ns. per gate? • What’s the throughput? • What is the stage breakdown? Where would you place latches? • What’s the throughput after pipelining?

More Related