1 / 115

Pipelining: Basics & Hazards

05. Pipelining: Basics & Hazards. Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2017. Pipelining? Basics & Hazards. Pipelining? y ou already knew!. Cafeteria:. kinda miss zjg?. Cafeteria:. Did you wait until all others finish?. kinda miss zjg?. Order. Cafeteria:. Pay.

tmarquez
Download Presentation

Pipelining: Basics & Hazards

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 05 Pipelining:Basics & Hazards Kai Bu kaibu@zju.edu.cn http://list.zju.edu.cn/kaibu/comparch2017

  2. Pipelining?Basics & Hazards

  3. Pipelining?you already knew!

  4. Cafeteria: kinda miss zjg?

  5. Cafeteria: Did you wait until all others finish? kinda miss zjg?

  6. Order Cafeteria:

  7. Pay Cafeteria:

  8. Enjoy Cafeteria:

  9. Enjoy Cafeteria: while some others are Ordering or Paying

  10. Observations? Cafeteria:

  11. Observations? Cafeteria: besides eating… Ordering or Paying

  12. Observations? Cafeteria: co-use dependent function areas speed up the dining process of all

  13. Observations? Cafeteria: individual perspective? speed up the dining process of all order pay enjoy

  14. Observations? Cafeteria: individual perspective? speed up the dining process of all fastest if only one to server order pay enjoy

  15. Observations? Cafeteria: individual perspective? speed up the fastest if only one to server order pay enjoy …… a potentially very, very long queue

  16. Observations? Cafeteria: individual perspective? fastest if only one to server order pay enjoy …… a potentially very, very long queue

  17. Observations Cafeteria: • Average - faster • Individual – slower (service time) but much less time in queue • Individual – faster: queue + service

  18. (classic) laundry example

  19. Laundry Example Ann, Brian, Cathy, Dave Each has one load of clothes to wash, dry, fold. washer 30 mins dryer 40 mins folder 20 mins

  20. Sequential Laundry 6 Hours Time What would you do? 30 40 20 30 40 20 30 40 20 30 40 20 A Task Order B C D

  21. Sequential Laundry 6 Hours Time What would you do? 30 40 20 30 40 20 30 40 20 30 40 20 A Task Order B C D

  22. Pipelined Laundry 3.5 Hours Time 30 40 40 40 40 20 A Task Order B C D

  23. Pipelined Laundry 3.5 Hours Time Observations • A task has a series of stages; 30 40 40 40 40 20 A Task Order B C D

  24. Pipelined Laundry 3.5 Hours Time Observations • A task has a series of stages; • Stage dependency: e.g., wash before dry; 30 40 40 40 40 20 A Task Order B C D

  25. Pipelined Laundry 3.5 Hours Time Observations • A task has a series of stages; • Stage dependency: e.g., wash before dry; • Multi tasks with overlapping stages; 30 40 40 40 40 20 A Task Order B C D

  26. Pipelined Laundry 3.5 Hours Time Observations • A task has a series of stages; • Stage dependency: e.g., wash before dry; • Multi tasks with overlapping stages; • Simultaneously use diff resources to speed up; 30 40 40 40 40 20 A Task Order B C D

  27. Pipelined Laundry 3.5 Hours Time Observations • A task has a series of stages; • Stage dependency: e.g., wash before dry; • Multi tasks with overlapping stages; • Simultaneously use diff resources to speed up; • Slowest stage determines the finish time; 30 40 40 40 40 20 A Task Order B C D

  28. Pipelined Laundry 3.5 Hours Time Observations • No speed up for individual task; e.g., A still takes 30+40+20=90 30 40 40 40 40 20 A Task Order B C D

  29. Pipelined Laundry 3.5 Hours Time Observations • No speed up for individual task; e.g., A still takes 30+40+20=90 • But speed up for average task execution time; e.g., 3.5*60/4=52.5 < 30+40+20=90 30 40 40 40 40 20 A Task Order B C D

  30. Pipeline Elsewhere:Assembly Line Cola Auto

  31. What exactly is pipelining in computer arch?

  32. Pipelining • An implementation technique whereby multiple instructions are overlapped in execution. e.g., B wash while A dry • Essence: Start executing one instruction before completing the previous one. • Significance: Make fast CPUs. A B

  33. (ideal) Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold 40min T1 A T2 B A T3 C B A B D C T4

  34. Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold 40min T1 A T2 B A T3 C B A B D C T4

  35. Balanced Pipeline • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold 40min T1 A T2 B A T3 C B A B D C T4

  36. Balanced Pipeline One task/instruction per 40 mins • Equal-length pipe stages e.g., Wash, dry, fold = 40 mins per unpipelined laundry time = 40x3 mins 3 pipe stages – wash, dry, fold • Performance Time per instruction by pipeline = Time per instr on unpipelined machine Number of pipe stages Speed up by pipeline = Number of pipe stages 40min T1 A T2 B A T3 C B A B D C T4

  37. Pipelining Terminology • Latency: the time for an instruction to complete. • Throughput of a CPU: the number of instructions completed per second. • Clock cycle: time duration of one lockstep - everything in CPU moves in lockstep; • Processor Cycle: time required between moving an instruction one step down the pipeline; = time required to complete a pipe stage; = max(times for completing all stages); = one or two clock cycles, but rarely more. • CPI: clock cycles per instruction

  38. How does pipelining work?

  39. Example: RISC Architecture

  40. RISC: Reduced Instruction Set Computer Properties: • All operations on data apply to data in registers and typically change the entire register (32 or 64 bits per reg); • Only load and store operations affect memory; load: move data from mem to reg; store: move data from reg to mem; • Only a few instruction formats; fixed length.

  41. RISC: Reduced Instruction Set Computer 32 registers 3 classes of instructions ALU (Arithmetic Logic Unit) instructions Load (LD) and store (SD) instructions Branches and jumps

  42. ALU Instructions • ALU (Arithmetic Logic Unit) instructions operate on two regs or a reg + a sign-extended immediate; store the result into a third reg; e.g., add (DADD), subtract (DSUB) logical operations AND, OR

  43. Load and Store Instructions • Load (LD) and store (SD) instructions operands: base register + offset; the sum (called effective address) is used as a memory address; Load: use a second reg operand as the destination for the data loaded from memory; Store: use a second reg operand as the source of the data stored into memory.

  44. Branch and Jumps • conditional transfers of control • Branch: specify the branch condition with a set of condition bits or comparisons between two regs or between a reg and zero; decide the branch destination by adding a sign-extended offset to the current PC (program counter);

  45. Finally, RISC’s 5-Stage Pipeline

  46. RISC’s 5-Stage Pipeline at most 5 clock cycles per instruction IF ID EX MEM WB

  47. Stage 1: IF at most 5 clock cycles per instruction – 1 IF ID EX MEM WB • Instruction Fetch cycle send the PC to memory; fetch the current instruction from mem; PC = PC + 4; //each instr is 4 bytes

  48. Stage 2: ID at most 5 clock cycles per instruction – 2 IF ID EX MEM WB • Instruction Decode/register fetch cycle decode the instruction; read the registers (corresponding to register source specifiers);

  49. Stage 3: EX at most 5 clock cycles per instruction – 3 IFID EX MEM WB • Execution/effective address cycle ALU operates on the operands from ID: 3 functions depending on the instr type - 1 Memory reference: ALU adds base register and offset to form effective address;

  50. Stage e: EX at most 5 clock cycles per instruction – 3 IFID EX MEM WB • Execution/effective address cycle ALU operates on the operands from ID: 3 functions depending on the instr type - 2 Register-Register ALU instruction: ALU performs the operation specified by opcode on the values read from the register file;

More Related