1 / 39

CEG3420 Computer Design Introduction to Pipelining

CEG3420 Computer Design Introduction to Pipelining. A. B. C. D. Recap: Sequential Laundry. 2 AM. 12. 6 PM. 1. 8. 7. 11. 10. 9. Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take?. 30. 30. 30. 30. 30. 30. 30. 30. 30. 30.

ataret
Download Presentation

CEG3420 Computer Design Introduction to Pipelining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CEG3420 Computer Design Introduction to Pipelining

  2. A B C D Recap: Sequential Laundry 2 AM 12 6 PM 1 8 7 11 10 9 • Sequential laundry takes 8 hours for 4 loads • If they learned pipelining, how long would laundry take? 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 T a s k O r d e r Time

  3. 30 30 30 30 30 30 30 A B C D Recap: Pipelining Lessons (its intuitive!) 6 PM 7 8 9 • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Multiple tasks operating simultaneously using different resources • Potential speedup = Number pipe stages • Pipeline rate limited by slowest pipeline stage • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup • Stall for Dependences Time T a s k O r d e r

  4. Recap: Ideal Pipelining Assume instructions are completely independent! Maximum Speedup  Number of stages Speedup Time for unpipelined operation Time for longest stage IF DCD EX MEM WB IF DCD EX MEM WB IF DCD EX MEM WB IF DCD EX MEM WB IF DCD EX MEM WB Example: 40ns data path, 5 stages, Longest stage is 10 ns, Speedup 4

  5. Recap: Graphically Representing Pipelines • Can help with answering questions like: • how many cycles does it take to execute this code? • what is the ALU doing during cycle 4? • use this representation to help understand datapaths

  6. Recap: Can pipelining get us into trouble? • Yes:Pipeline Hazards • structural hazards: attempt to use the same resource two different ways at the same time • e.g., multiple memory accesses, multiple register writes • solutions: multiple memories, stretch pipeline • control hazards: attempt to make a decision before condition is evaulated • e.g., any conditional branch • solutions: prediction, delayed branch • data hazards: attempt to use item before it is ready • e.g., add r1,r2,r3; sub r4, r1 ,r5; lw r6, 0(r7); or r8, r6 ,r9 • solutions: forwarding/bypassing, stall/bubble

  7. Recap: Pipelined Datapath with Data Stationary Control IAU npc Just like Time-State! I mem Regs lw $2,20($5) PC Operand Register Selects im n op rw B A <= PC + 4 + immed ALU Op alu S MEM Op D mem m Result Reg Select and Enable Regs

  8. Recap • Pipelining is a fundamental concept • multiple steps using distinct resources • Utilize capabilities of the Datapath by pipelined instruction processing • start next instruction while working on the current one • limited by length of longest stage (plus fill/flush) • detect and resolve hazards • What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores • Hazards make it hard • We’ll build a simple pipeline and look at these issues

  9. Processor Input Control Memory Datapath Output The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Today’s Topics: • Recap last lecture • Pipelined Control/ Do it yourself Pipelined Control • Administrivia • Hazards/Forwarding • Exceptions • Review MIPS R3000 pipeline • Advanced Pipelining?

  10. A M S B D Recap: Control Diagram IR <- Mem[PC]; PC < PC+4; A <- R[rs]; B<– R[rt] S <– A + B; S <– A or ZX; S <– A + SX; S <– A + SX; If Cond PC < PC+SX; M <– S M <– Mem[S] Mem[S] <- B M <– S R[rd] <– S; R[rt] <– S; R[rd] <– M; Equal Reg. File Reg File Exec PC IR Next PC Inst. Mem Mem Access Data Mem

  11. But recall use of “Data Stationary Control” • The Main Control generates the control signals during Reg/Dec • Control signals for Exec (ExtOp, ALUSrc, ...) are used 1 cycle later • Control signals for Mem (MemWr Branch) are used 2 cycles later • Control signals for Wr (MemtoReg MemWr) are used 3 cycles later Reg/Dec Exec Mem Wr ExtOp ExtOp ALUSrc ALUSrc ALUOp ALUOp Main Control RegDst RegDst Ex/Mem Register IF/ID Register Mem/Wr Register ID/Ex Register MemWr MemWr MemWr Branch Branch Branch MemtoReg MemtoReg MemtoReg MemtoReg RegWr RegWr RegWr RegWr

  12. A M S B D PC Datapath + Data Stationary Control IR v v v fun rw rw rw wb wb wb Inst. Mem Decode me me WB Ctrl rt Mem Ctrl rs ex op im rs rt Reg. File Reg File Exec Mem Access Data Mem Next PC

  13. Let’s Try it Out 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 these addresses are octal

  14. n n n n A M S B = IF D Next PC 10 PC Start: Fetch 10 Inst. Mem Decode WB Ctrl Mem Ctrl IR im rs rt Reg. File Reg File Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

  15. n n n A M S B = ID D IF Next PC 14 PC Fetch 14, Decode 10 lw r1, r2(35) Inst. Mem Decode WB Ctrl Mem Ctrl IR im 2 rt Reg. File Reg File Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

  16. n n M S B = D ID IF Next PC 20 PC Fetch 20, Decode 14, Exec 10 addI r2, r2, 3 Inst. Mem Decode WB Ctrl lw r1 Mem Ctrl IR 35 2 rt Reg. File Reg File r2 Exec Mem Access Data Mem EX 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

  17. n M B = D EX ID Next PC 24 IF PC Fetch 24, Decode 20, Exec 14, Mem 10 addI r2, r2, 3 sub r3, r4, r5 Inst. Mem Decode WB Ctrl lw r1 Mem Ctrl IR 3 4 5 Reg. File Reg File r2 r2+35 Exec Mem Access Data Mem M 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15

  18. Administrative Issues • Schedule Ahead • Course Feedback • Like on-line lecture notes!! pace of class!! • Like Computers in the news!! • Prerequisite Quiz? 39 great, 2 so-so, 1 bad idea • Online Submission? • Spread TA office hours? • Slow lectures last 20 minutes? • Computers in the news: • Alpha/Intel patent scabble to be settled this week? midterm 8 9 10 11 12 13 14 15 16 M T W T F M T W T F M T W T F M T W T F M T W T F M T W T F M T W T F M T W T F M T W T F pipeline (5) cache(6) xtra & writeup final report proj present last lecture

  19. r5 = WB D M EX Next PC ID 30 IF PC Fetch 30, Dcd 24, Ex 20, Mem 14, WB 10 beq r6, r7 100 Inst. Mem Decode WB Ctrl addI r2 lw r1 sub r3 Mem Ctrl IR 6 7 Reg. File Reg File r4 M[r2+35] r2+3 Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Note Delayed Branch: always execute ori after beq

  20. r7 = D EX Next PC 100 ID PC IF Fetch 100, Dcd 30, Ex 24, Mem 20, WB 14 ori r8, r9 17 Inst. Mem Decode WB Ctrl addI r2 sub r3 Mem Ctrl beq IR 9 xx 100 r1=M[r2+35] Reg. File Reg File r6 r2+3 r4-r5 Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 WB M

  21. = D WB Next PC M ___ EX PC ID Fetch 104, Dcd 100, Ex 30, Mem 24, WB 20 ? Inst. Mem Decode WB Ctrl Mem Ctrl IR Reg. File Reg File Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Fill it in yourself!

  22. = D Next PC WB ___ PC EX Fetch 110, Dcd 104, Ex 100, Mem 30, WB 24 ? ? Inst. Mem Decode WB Ctrl Mem Ctrl IR ? Reg. File Reg File ? Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 M Fill it in yourself!

  23. = D Next PC ___ WB PC Fetch 114, Dcd 110, Ex 104, Mem 100, WB 30 ? ? ? Inst. Mem Decode WB Ctrl Mem Ctrl IR ? Reg. File Reg File ? ? Exec Mem Access Data Mem 10 lw r1, r2(35) 14 addI r2, r2, 3 20 sub r3, r4, r5 24 beq r6, r7, 100 30 ori r8, r9, 17 34 add r10, r11, r12 100 and r13, r14, 15 Fill it in yourself! M

  24. Pipeline Hazards Again I-Fet ch DCD MemOpFetch OpFetch Exec Store IFetch DCD ° ° ° Structural Hazard I-Fet ch DCD OpFetch Jump Control Hazard IFetch DCD ° ° ° IF DCD EX Mem WB RAW (read after write) Data Hazard IF DCD EX Mem WB WAW Data Hazard (write after write) IF DCD EX Mem WB IF DCD OF Ex Mem IF DCD OF Ex RS WAR Data Hazard (write after read)

  25. RAW Data Hazard IF DCD EX Mem WB IF DCD EX Mem WB WAW Data Hazard IF DCD EX Mem WB IF DCD OF Ex Mem IF DCD OF Ex RS RAW Data Hazard Data Hazards • Avoid some “by design” • eliminate WAR by always fetching operands early (DCD) in pipe • eleminate WAW by doing all WBs in order (last stage, static) • Detect and resolve remaining ones • stall or forward (if possible)

  26. Hazard Detection • Suppose instruction i is about to be issued and a predecessor instruction j is in the instruction pipeline. • A RAW hazard exists on register if Rregs( i ) Wregs( j ) • Keep a record of pending writes (for inst's in the pipe) and compare with operand regs of current instruction. • When instruction issues, reserve its result register. • When on operation completes, remove its write reservation. • A WAW hazard exists on register if Wregs( i ) Wregs( j ) • A WAR hazard exists on register if Wregs( i ) Rregs( j )

  27. Record of Pending Writes IAU • Current operand registers • Pending writes • hazard <= ((rs == rwex) & regWex) OR ((rs == rwmem) & regWme) OR ((rs == rwwb) & regWwb) OR ((rt == rwex) & regWex) OR ((rt == rwmem) & regWme) OR ((rt == rwwb) & regWwb) npc I mem Regs op rw rs rt PC im n op rw B A alu n op rw S D mem m n op rw Regs

  28. Resolve RAW by forwarding IAU • Detect nearest valid write op operand register and forward into op latches, bypassing remainder of the pipe • Increase muxes to add paths from pipeline registers • Data Forwarding = Data Bypassing npc I mem Regs op rw rs rt PC Forward mux im n op rw B A alu n op rw S D mem m n op rw Regs

  29. What about memory operations? ° If instructions are initiated in order and operations always occur in the same stage, there can be no hazards between memory operations! ° What does delaying WB on arithmetic operations cost? – cycles ? – hardware ? ° What about data dependence on loads? R1 <- R4 + R5 R2 <- Mem[ R2 + I ] R3 <- R2 + R1 => op Rd Ra Rb op Rd Ra Rb A B Rd R "Delayed Loads" T Rd to reg file

  30. Compiler Avoiding Load Stalls:

  31. What about Interrupts, Traps, Faults? • External Interrupts: • Allow pipeline to drain, • Load PC with interupt address • Faults (within instruction, restartable) • Force trap instruction into IF • disable writes till trap hits WB • must save multiple PCs or PC + state Refer to MIPS solution

  32. Exception Handling IAU npc detect bad instruction address I mem Regs lw $2,20($5) PC detect bad instruction im n op rw B A detect overflow alu S detect bad data address D mem m Allow exception to take effect Regs

  33. Exception Problem • Exceptions/Interrupts: 5 instructions executing in 5 stage pipeline • How to stop the pipeline? • Restart? • Who caused the interrupt? Stage Problem interrupts occurring IF Page fault on instruction fetch; misaligned memory access; memory-protection violation ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch; misaligned memory access; memory-protection violation; memory error • Load with data page fault, Add with instruction page fault? • Solution 1: interrupt vector/instruction 2: interrupt ASAP, restart everything incomplete

  34. Resolution: Freeze above & Bubble Below IAU npc I mem freeze Regs op rw rs rt PC bubble im n op rw B A alu n op rw S D mem m n op rw Regs

  35. FYI: MIPS R3000 clocking discipline phi1 • 2-phase non-overlapping clocks • Pipeline stage is two (level sensitive) latches phi2 phi1 phi2 phi1 Edge-triggered

  36. Resource Usage TLB TLB I-cache RF WB ALUALU D-Cache MIPS R3000 Instruction Pipeline Decode Reg. Read Inst Fetch ALU / E.A Memory Write Reg TLB I-Cache RF Operation WB E.A. TLB D-Cache Write in phase 1, read in phase 2 => eliminates bypass from WB

  37. Time (clock cycles) IF ID/RF EX MEM WB add r1,r2,r3 Reg Reg Im ALU Dm I n s t r. O r d e r sub r4,r1,r3 Im Dm Reg Reg ALU Im Dm Reg Reg and r6,r1,r7 ALU Im Dm Reg Reg or r8,r1,r9 ALU Im Dm Reg Reg xor r10,r1,r11 ALU Recall: Data Hazard on r1 With MIPS R3000 pipeline, no need to forward from WB stage

  38. MIPS R3000 Multicycle Operations op Rd Ra Rb Ex: Multiply, Divide, Cache Miss Stall all stages above multicycle operation in the pipeline Drain (bubble) stages below it Use control word of local stage state to step through multicycle operation mul Rd Ra Rb A B Rd R Rd T to reg file

  39. Summary • Pipelines pass control information down the pipe just as data moves down pipe • Forwarding/Stalls handled by local control • Exceptions stop the pipeline • MIPS I instruction set architecture made pipeline visible (delayed branch, delayed load) • More performance from deeper pipelines, parallelism

More Related