1 / 27

Lec 8: Pipelining

Lec 8: Pipelining. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Basic Pipelining. Five stage “RISC” load-store architecture Instruction fetch (IF) get instruction from memory Instruction Decode (ID) translate opcode into control signals and read regs Execute (EX)

elton
Download Presentation

Lec 8: Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lec 8: Pipelining Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University

  2. Basic Pipelining Five stage “RISC” load-store architecture • Instruction fetch (IF) • get instruction from memory • Instruction Decode (ID) • translate opcode into control signals and read regs • Execute (EX) • perform ALU operation • Memory (MEM) • Access memory if load/store • Writeback (WB) • update register file Following slides thanks to Sally McKee

  3. Pipelined Implementation • Break the execution of the instruction into cycles (five, in this case) • Design a separate stage for the execution performed during each cycle • Build pipeline registers (latches) to communicate between the stages Slides thanks to Sally McKee

  4. Stage1: Fetch and Decode • Design a datapath that can fetch an instruction from memory every cycle • Use PC to index memory to read instruction • Increment the PC (assume no branches for now) • Write everything needed to complete execution to the pipeline register (IF/ID) • The next stage will read this pipeline register • Note that pipeline register must be edge triggered Slides thanks to Sally McKee

  5. M U X 1 PC + 1 incr Instruction bits IF / ID Pipeline register Rest of pipelined datapath Instruction Memory/ Cache PC en en Slides thanks to Sally McKee

  6. Stage2: Decode • Reads the IF/ID pipeline register, decodes instruction, and reads register file (specified by regA and regB of instruction bits) • Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction • Write everything needed to complete execution to the pipeline register (ID/EX) • Pass on the offset field and destination register specifiers (or simply pass on the whole instruction!) • Pass on PC+1 even though decode didn’t use it

  7. PC + 1 PC + 1 Contents Of regA Destreg Contents Of regB Data Instruction bits Control Signals ID / EX Pipeline register IF / ID Pipeline register regA Register File regB Rest of pipelined datapath Stage 1: Inst Fetch datapath en Slides thanks to Sally McKee

  8. Stage3: Execute • Design a datapath that performs the proper ALU operation for the instruction specified and values present in the ID/EX pipeline register • The inputs are the contents of regA and either the contents of regB or the offset field in the instruction • Also, calculate PC+1+offset, in case this is a branch • Write everything needed to complete execution to the pipeline register (EX/Mem) • ALU result, contents of regB and PC+1+offset • Instruction bits for opcode and destReg specifiers

  9. Alu Result Contents Of regA Rest of pipelined datapath Contents Of regB M U X contents of regB A L U Control Signals ID / EX Pipeline register EX/Mem Pipeline register PC + 1 + offset Magic PC + 1 Stage 2: Decode datapath Control Signals Slides thanks to Sally McKee

  10. Stage4: Memory Operation • Design a datapath that performs the proper memory operation for the instruction specified and values present in the EX/Mem pipeline register • ALU result contains address for ld and st instructions • Opcode bits control memory R/W and enable signals • Write everything needed to complete execution to the pipeline register (Mem/WB) • ALU result and MemData • Instruction bits for opcode and destReg specifiers

  11. This goes back to the MUX before the PC in stage 1 MUX control for PC input Alu Result Alu Result Rest of pipelined datapath Control Signals Mem/WB Pipeline register EX/Mem Pipeline register PC+1 +offset Data Memory Stage 3: Execute datapath contents of regB Memory Read Data en R/W Control Signals Slides thanks to Sally McKee

  12. Stage5: Write Back • Design a datapath that completes the execution of this instruction, writing to the register file if required • Write MemData to destReg for ld instruction • Write ALU result to destReg for arithmetic/logic instructions • Opcode bits also control register write enable signal Slides thanks to Sally McKee

  13. This goes back to data input of register file M U X bits 0-2 This goes back to the destination register specifier M U X bits 15-17 register write enable Alu Result Memory Read Data Stage 4: Memory datapath Control Signals Mem/WB Pipeline register Slides thanks to Sally McKee

  14. Sample Code (Simple) • Assume eight-register machine • Run the following code on a pipelined datapath add 3 1 2 ; reg 3 = reg 1 + reg 2 nand 6 4 5 ; reg 6 = ~(reg 4 & reg 5) lw 4 20 (2) ; reg 4 = Mem[reg2+20] add 5 2 5 ; reg 5 = reg 2 + reg 5 sw 7 12(3) ; Mem[reg3+12] = reg 7 Slides thanks to Sally McKee

  15. + A L U M U X 1 target PC+1 PC+1 0 R0 R1 regA ALU result R2 Register file regB valA M U X PC Inst mem Data mem instruction R3 ALU result mdata R4 valB R5 R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 15-17 M U X Bits 21-23 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  16. + A L U M U X 1 0 0 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem nop 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Initial State Bits 0-2 0 0 0 Bits 15-17 M U X Bits 21-23 nop nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Slides thanks to Sally McKee

  17. + A L U add 3 1 2 M U X 1 0 1 0 0 R0 0 36 R1 0 9 R2 Register file 0 M U X PC Inst mem Data mem add 3 1 2 12 R3 0 0 18 R4 7 0 R5 41 R6 M U X data 22 R7 0 dest 0 Fetch: add 3 1 2 Bits 0-2 0 0 0 Bits 15-17 M U X Bits 21-23 nop nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 1 Slides thanks to Sally McKee

  18. + A L U nand 6 4 5 add 3 1 2 M U X 1 0 2 1 0 R0 0 36 R1 1 0 9 R2 Register file 2 36 M U X PC Inst mem Data mem nand 6 4 5 12 R3 0 0 18 R4 7 9 R5 41 R6 M U X data 22 R7 3 dest 0 Fetch: nand 6 4 5 Bits 0-2 3 0 0 Bits 15-17 M U X Bits 21-23 add nop nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 2 Slides thanks to Sally McKee

  19. + A L U lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 4 3 2 0 R0 0 36 R1 4 0 36 9 R2 Register file 5 18 M U X PC Inst mem Data mem lw 4 20(2) 12 R3 45 0 18 R4 9 7 7 R5 41 R6 M U X data 22 R7 6 dest 9 Fetch: lw 4 20(2) Bits 0-2 3 6 3 0 Bits 15-17 M U X Bits 21-23 nand add nop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 3 Slides thanks to Sally McKee

  20. + A L U add 5 2 5 lw 4 20(2) nand 6 4 5 add 3 1 2 M U X 1 8 4 3 0 R0 0 36 R1 2 45 18 9 R2 Register file 4 9 M U X PC Inst mem Data mem add 5 2 5 12 R3 -3 0 18 R4 45 7 7 18 R5 41 R6 M U X data 22 R7 20 dest 7 Fetch: add 5 2 5 Bits 0-2 3 6 4 6 3 Bits 15-17 M U X Bits 21-23 lw nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 4 Slides thanks to Sally McKee

  21. + A L U sw 7 12(3) add 5 2 5 lw 4 20 (2) nand 4 5 6 add M U X 1 23 5 4 0 R0 0 45 36 R1 2 -3 9 9 R2 Register file 5 9 M U X PC Inst mem Data mem sw 7 12(3) 45 R3 29 0 18 R4 -3 7 7 R5 41 R6 M U X data 22 R7 20 5 dest 18 Fetch: sw 7 12(3) Bits 0-2 6 3 4 5 4 6 Bits 15-17 M U X Bits 21-23 add lw nand IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 5 Slides thanks to Sally McKee

  22. + A L U sw 7 12(3) add 5 2 5 lw 4 20(2) nand M U X 1 9 5 0 R0 0 -3 36 R1 3 29 9 9 R2 Register file 7 45 M U X PC Inst mem Data mem 45 R3 16 99 18 R4 29 7 7 22 R5 -3 R6 M U X data 22 R7 12 dest 7 No more instructions Bits 0-2 4 6 5 7 5 4 Bits 15-17 M U X Bits 21-23 sw add lw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 6 Slides thanks to Sally McKee

  23. + A L U sw 7 12(3) add 5 2 5 lw M U X 1 15 0 R0 0 36 R1 16 45 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 57 0 99 R4 16 7 R5 -3 R6 M U X data 22 R7 12 dest 22 No more instructions Bits 0-2 5 4 7 7 5 Bits 15-17 M U X Bits 21-23 sw add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 7 Slides thanks to Sally McKee

  24. + A L U sw 7 12(3) add M U X 1 0 R0 16 36 R1 57 9 R2 Register file M U X PC Inst mem Data mem 45 R3 0 99 22 R4 57 16 R5 -3 R6 M U X data 22 R7 dest 22 No more instructions Bits 0-2 5 7 Bits 15-17 M U X Bits 21-23 sw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 8 Slides thanks to Sally McKee

  25. + A L U sw M U X 1 0 R0 36 R1 9 R2 Register file M U X PC Inst mem Data mem 45 R3 99 R4 16 R5 -3 R6 M U X data 22 R7 dest No more instructions Bits 0-2 Bits 15-17 M U X Bits 21-23 IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 9 Slides thanks to Sally McKee

  26. Time Graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw add sw fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback Slides thanks to Sally McKee

  27. Pipelining Recap • Powerful technique for masking latencies • Logically, instructions execute one at a time • Physically, instructions execute in parallel • Instruction level parallelism • Decouples the processor model from the implementation • Interface vs. implementation • BUT dependencies between instructions complicate the implementation

More Related