1 / 47

Pipelining Chapter 6

Pipelining Chapter 6. Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu. Automobile Team Assembly. 1 hour. 1 hour. 1 hour. 1 hour.

Download Presentation

Pipelining Chapter 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PipeliningChapter 6 Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu ELEC 5200-001/6200-001 Lecture 12

  2. Automobile Team Assembly 1 hour 1 hour 1 hour 1 hour 1 car assembled every four hours 6 cars per day 180 cars per month 2,040 cars per year ELEC 5200-001/6200-001 Lecture 12

  3. Automobile Assembly Line Task 2 1 hour Task 3 1 hour Task 4 1 hour Task 1 1 hour Mecahnical Electrical Painting Testing First car assembled in 4 hours (pipeline latency) thereafter 1 car per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per year ELEC 5200-001/6200-001 Lecture 12

  4. Throughput: Team Assembly Red car started Red car completed Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Blue car started Blue car completed Time Time of assembling one car = n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput = 1/n cars per unit time ELEC 5200-001/6200-001 Lecture 12

  5. Throughput: Assembly Line Car 1 Car 2 Car 3 Car 4 . . Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing Car 1 complete Car 2 complete time Time to complete first car = n time units (latency) Cars completed in time T = T – n + 1 Throughput = 1- (n - 1)/ T car per unit time Throughput(assembly line) 1 – (n - 1)/ Tn(n – 1) ----------------------------------------- = ----------------- = n - ----------- → n Throughput(team assembly) 1/nT as T→∞ ELEC 5200-001/6200-001 Lecture 12

  6. Some Features of Assembly Line Electrical parts delivered (JIT) Task 2 1 hour Task 3 1 hour Task 4 1 hour Task 1 1 hour Mecahnical Electrical Painting Testing 3 cars in the assembly line are suspects, to be removed (flush pipeline) Defect found Stall assembly line to fix the cause of defect ELEC 5200-001/6200-001 Lecture 12

  7. Pipelining in a Computer • Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. • Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. • Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. • Break each instruction down into a fixed number of tasks so that instructions can be executed in a staggered fashion. ELEC 5200-001/6200-001 Lecture 12

  8. Single-Cycle Datapath “No operation on data” inserted to equalize instruction lengths. ELEC 5200-001/6200-001 Lecture 12

  9. Execution Time: Single-Cycle 0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Clock cycle = 8 ns Total time for executing three lw instructions = 24 ns ELEC 5200-001/6200-001 Lecture 12

  10. Execution Time: Pipeline 0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0) IF ID EX MEM RW IF ID EX MEM RW IF ID EX MEM RW Clock cycle = 2 ns, four times slower than single-clock Total time for executing three lw instructions = 14 ns Single-cycle time 24 Performance ratio = ------------------------- = --- = 1.7 Pipeline time 14 ELEC 5200-001/6200-001 Lecture 12

  11. Pipeline Performance Clock cycle = 2 ns 1,003 lw instructions: Total time for executing 1,003 lw instructions = 2,014 ns Single-cycle time 8,024 Performance ratio = ------------------------- = ------- = 3.98 Pipeline time 2,014 10,003 lw instructions: Performance ratio = 80,024/20,014 = 3.998 → Clock cycle ratio (4) Pipeline performance approaches to clock-cycle ratio for long programs. ELEC 5200-001/6200-001 Lecture 12

  12. Single-Cycle Datapath WB: write back ID: Instr. decode, reg. file read EX: Execute, address calc. MEM: mem. access IF: Instr. fetch 4 Add 1 mux 0 ALU Branch opcode MemtoReg CONTROL 26-31 RegWrite ALUSrc 21-25 zero MemWrite MemRead ALU Instr. mem. PC Reg. File Data mem. 1 mux 0 16-20 0 mux 1 1 mux 0 11-15 RegDst ALUOp ALU Cont. Sign ext. Shift left 2 0-15 0-5 ELEC 5200-001/6200-001 Lecture 12

  13. Pipelining of RISC Instructions(From Lecture 4) Fetch Instruction Examine Opcode Fetch Operands Perform Operation Store Result IF ID EX MEM WB Instruction Instruction Execute Memory Write Fetch Decode and Operation Back Fetch operands to Reg file Although an instruction takes five clock cycles, one instruction is completed every cycle. ELEC 5200-001/6200-001 Lecture 12

  14. Pipeline Registers IF/ID ID/EX EX/MEM 4 1 mux 0 Add ALU Branch opcode MemtoReg CONTROL 26-31 MEM/WB RegWrite ALUSrc 21-25 zero MemWrite MemRead ALU Instr. mem. PC Reg. File Data mem. 1 mux 0 16-20 0 mux 1 1 mux 0 11-15 This requires a CONTROL that is different from single-cycle RegDst ALUOp ALU Cont. Sign ext. Shift left 2 0-15 0-5 ELEC 5200-001/6200-001 Lecture 12

  15. Pipeline Register Functions • Four pipeline registers are added: ELEC 5200-001/6200-001 Lecture 12

  16. Pipelined Datapath IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 zero 21-25 Instr mem ALU 16-20 Data mem. Reg. File PC 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw 0-15 ELEC 5200-001/6200-001 Lecture 12

  17. DM IM ID, REG. FILE READ ALU REG. FILE WRITE MEM/WB IF/ID ID/EX EX/MEM Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5 ELEC 5200-001/6200-001 Lecture 12

  18. Add Instruction • add $t0, $s1, $s2 Machine instruction word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 function CC1 CC2 CC3 CC4 CC5 DM IM ID, REG. FILE READ ALU REG. FILE WRITE MEM/WB IF/ID ID/EX EX/MEM IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2 ELEC 5200-001/6200-001 Lecture 12

  19. Pipelined Datapath for add IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 s1 zero $s1 21-25 Instr mem ALU addr Data mem data 16-20 Reg. File PC $s2 s2 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw t0 0-15 ELEC 5200-001/6200-001 Lecture 12

  20. DM IM ID, REG. FILE READ ALU REG. FILE WRITE MEM/WB IF/ID ID/EX EX/MEM Load Instruction • lw $t0, 1200 ($t1) 100011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200 ELEC 5200-001/6200-001 Lecture 12

  21. Pipelined Datapath for lw IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 t1 zero $t1 21-25 Instr mem ALU addr Data mem data 16-20 Reg. File PC 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw t0 0-15 1200 ELEC 5200-001/6200-001 Lecture 12

  22. DM IM ID, REG. FILE READ ALU REG. FILE WRITE MEM/WB IF/ID ID/EX EX/MEM Store Instruction • sw $t0, 1200 ($t1) 101011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200 CC1 CC2 CC3 CC4 CC5 IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) ← $t0 ELEC 5200-001/6200-001 Lecture 12

  23. Pipelined Datapath for sw IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 t1 zero $t1 21-25 Instr mem ALU addr Data mem data 16-20 Reg. File PC $t0 t0 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw 0-15 1200 ELEC 5200-001/6200-001 Lecture 12

  24. Executing a Program Consider a five instruction segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 ELEC 5200-001/6200-001 Lecture 12

  25. DM DM DM DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ID, REG. FILE READ ALU ALU REG. FILE WRITE REG. FILE WRITE IM IM ID, REG. FILE READ ID, REG. FILE READ ALU ALU IM IM REG. FILE WRITE REG. FILE WRITE MEM/WB MEM/WB MEM/WB MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM EX/MEM ID/EX ID/EX EX/MEM EX/MEM IF/ID IF/ID IF/ID IF/ID ID/EX ID/EX Program Execution time CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) sub $11, $2, $3 Program instructions add $12, $3, $4 lw $13, 24($1) add $14, $5, $6 ELEC 5200-001/6200-001 Lecture 12

  26. CC5 MEM: sub $11, $2, $3 WB: lw $10, 20($1) IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4 IF/ID ID/EX EX/MEM MEM/WB 4 1 mux 0 Add ALU opcode Shift left 2 26-31 zero 21-25 Instr mem ALU 16-20 Data mem. Reg. File PC 1 mux 0 0 mux 1 Sign ext. 11-15 for R-type 16-20 for I-type lw 0-15 ELEC 5200-001/6200-001 Lecture 12

  27. Advantages of Pipeline • After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency (5 cycles): • Pipeline latency is defined as the number of stages in the pipeline. • The clock cycle is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. • For multicycle datapath, CPI = 3. …. • So, pipelined execution is faster, but . . . ELEC 5200-001/6200-001 Lecture 12

  28. Science is always wrong. It never solves a problem Without creating ten more. George Bernard Shaw ELEC 5200-001/6200-001 Lecture 12

  29. Pipeline Hazards • Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the presentinstruction. • Three types of hazards: • Structural hazard • Data hazard • Control hazard ELEC 5200-001/6200-001 Lecture 12

  30. Structural Hazard • Two instructions cannot execute due to a resource conflict. • Example: Consider a computer with a common data and instruction memory. The third cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the third instruction requires instruction fetch (memory read). This will cause a memory resource conflict. ELEC 5200-001/6200-001 Lecture 12

  31. DM DM DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ID, REG. FILE READ ALU ALU REG. FILE WRITE REG. FILE WRITE IM IM ID, REG. FILE READ ALU IM REG. FILE WRITE MEM/WB MEM/WB MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM EX/MEM ID/EX ID/EX EX/MEM IF/ID IF/ID IF/ID ID/EX Example of Structural Hazard time CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 Program instructions lw $13, 24($1) ELEC 5200-001/6200-001 Lecture 12

  32. Possible Remedies for Structural Hazards • Provide duplicate hardware resources in datapath. • Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble. ELEC 5200-001/6200-001 Lecture 12

  33. DM DM DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ID, REG. FILE READ ALU ALU REG. FILE WRITE REG. FILE WRITE IM IM ID, REG. FILE READ ALU IM REG. FILE WRITE MEM/WB MEM/WB MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM EX/MEM ID/EX ID/EX EX/MEM IF/ID IF/ID IF/ID ID/EX Stall (Bubble) for Structural Hazard time CC1 CC2 CC3 CC4 CC5 lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 Program instructions Stall (bubble) lw $13, 24($1) ELEC 5200-001/6200-001 Lecture 12

  34. Data Hazard • Data hazard means that an instruction cannot be completed because the needed data, to be generated by another instruction in the pipeline, is not available. • Example: consider to instructions: • add $s0, $t0, $t1 • sub $t2, $s0, $t3 # needs $s0 ELEC 5200-001/6200-001 Lecture 12

  35. DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM ID/EX IF/ID Example of Data Hazard time CC1 CC2 CC3 CC4 CC5 Write s0 in CC5 add $s0, $t0, $t1 sub $t2, $s0, $t3 Read s0 and t3 in CC3 Program instructions ELEC 5200-001/6200-001 Lecture 12

  36. Forwarding or Bypassing • Output of a resource used by an instruction is forwarded to the input of a resource being used by another instruction. • Forwarding can eliminate some, but not all, data hazards. ELEC 5200-001/6200-001 Lecture 12

  37. DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM MEM/WB IF/ID ID/EX EX/MEM MEM/WB EX/MEM ID/EX IF/ID Forwarding for Data Hazard time CC1 CC2 CC3 CC4 CC5 Write s0 in CC5 add $s0, $t0, $t1 Forwarding sub $t2, $s0, $t3 Read s0 and t3 in CC3 Program instructions ELEC 5200-001/6200-001 Lecture 12

  38. DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM ID/EX IF/ID Forwarding Alone May Not Work time CC1 CC2 CC3 CC4 CC5 Write s0 in CC5 lw $s0, 20($s1) sub $t2, $s0, $t3 Read s0 and t3 in CC3 Program instructions data needed by sub data available from memory ELEC 5200-001/6200-001 Lecture 12

  39. DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM ID/EX IF/ID Use Bubble and Forwarding time CC1 CC2 CC3 CC4 CC5 Write s0 in CC5 lw $s0, 20($s1) Forwarding stall (bubble) sub $t2, $s0, $t3 Program instructions ELEC 5200-001/6200-001 Lecture 12

  40. Avoiding Stall by Code Reorder C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 hazard: stall for $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 hazard: stall for $t4 sw $t5, 16,($t0) ELEC 5200-001/6200-001 Lecture 12

  41. Reordered Code C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2no hazard sw $t3, 12($t0) add $t5, $t1, $t4no hazard sw $t5, 16,($t0) ELEC 5200-001/6200-001 Lecture 12

  42. Control Hazard • Next instruction is not known! • Example: Instruction being executed is branch-type, which will determine the next instruction: • add $4, $5, $6 • beq $1, $2, 40 • next instruction • . . . • 40 or $7, $8, $9 ELEC 5200-001/6200-001 Lecture 12

  43. DM DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM ID, REG. FILE READ ALU IM REG. FILE WRITE MEM/WB MEM/WB MEM/WB IF/ID EX/MEM ID/EX EX/MEM ID/EX EX/MEM IF/ID IF/ID ID/EX Stall on Branch time CC1 CC2 CC3 CC4 CC5 add $4, $5, $6 beq $1, $2, 40 Program instructions Stall (bubble) next instruction or or $7, $8, $9 ELEC 5200-001/6200-001 Lecture 12

  44. Ways to Handle Branch • Stall or bubble • Branch prediction: • Heuristics • Next instruction • Prediction based on statistics (dynamic) • Hardware decision (dynamic) • Prediction error: pipeline flush • Delayed branch ELEC 5200-001/6200-001 Lecture 12

  45. Stall on branch I add $4, $5, $6 beq $1, $2, 40 next instruction . . . I+40 or $7, $8, $9 Delayed branch I+1 beq $1, $2, 40 add $4, $5, $6 next instruction . . . I+40 or $7, $8, $9 Delayed Branch Example ELEC 5200-001/6200-001 Lecture 12

  46. DM DM DM ID, REG. FILE READ IM ALU REG. FILE WRITE ID, REG. FILE READ ALU REG. FILE WRITE IM ID, REG. FILE READ ALU IM REG. FILE WRITE MEM/WB MEM/WB EX/MEM MEM/WB IF/ID ID/EX EX/MEM ID/EX EX/MEM IF/ID IF/ID ID/EX Delayed Branch time CC1 CC2 CC3 CC4 CC5 beq $1, $2, 40 Program instructions add $4, $5, $6 next instruction or or $7, $8, $9 ELEC 5200-001/6200-001 Lecture 12

  47. Summary: Hazards • Structural hazards • Cause: resource conflict • Remedies: (i) hardware resources, (ii) stall (bubble) • Data hazards • Cause: data unavailablity • Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering • Control hazards • Cause: out-of-sequence execution (branch or jump) • Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush ELEC 5200-001/6200-001 Lecture 12

More Related