1 / 95

CS2100 Computer Organisation http://www.comp.nus.edu.sg/~cs2100/

CS2100 Computer Organisation http://www.comp.nus.edu.sg/~cs2100/. Pipelining (AY2013/2014) Semester 2. Road Map: Part II. Performance. Assembly Language. Processor: Datapath. Pipelining MIPS Pipeline Pipeline datapath & control Pipeline hazards Branch prediction. Processor:

kassia
Download Presentation

CS2100 Computer Organisation http://www.comp.nus.edu.sg/~cs2100/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS2100 Computer Organisationhttp://www.comp.nus.edu.sg/~cs2100/ Pipelining (AY2013/2014) Semester 2

  2. Road Map: Part II Performance Assembly Language Processor: Datapath • Pipelining • MIPS Pipeline • Pipeline datapath & control • Pipeline hazards • Branch prediction Processor: Control Pipelining Cache Pipelining

  3. Pipelining: Overview • MIPS Pipeline Stages • Pipelined Datapath and Control • Pipeline Hazards • Structural • Data • Control • Forwarding • Branch prediction Pipelining

  4. Inspiration: Assembly Line Pipelining

  5. A B C D The Laundary Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold and stash • Washertakes 30 minutes • Dryer takes 30 minutes • “Folder” takes 30 minutes • “Stasher” takes 30 minutesto put clothes into drawers Pipelining

  6. 2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time A B C D Sequential Laundry • Sequential laundry takes 8 hours for 4 loads • Steady state: 1 load every 2 hours • If they learned pipelining, how long would laundry take? Pipelining

  7. 2 AM 12 6 PM 1 8 7 11 10 9 Time T a s k O r d e r 30 30 30 30 30 30 30 A B C D Pipelined Laundry • Pipelined laundry takes 3.5 hours for 4 loads! • Steady state: 1 load every 30 min • Potential speedup = 2 hr/30 min = 4 (# stages) • Time to fill pipeline takes 2 hours  speedup  Pipelining

  8. 2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r Time 30 30 30 30 30 30 30 30 30 30 30 A B C D What IF: Slow Dryer • Pipelined laundry now takes 5.5 hours! • Steady state: One load every 1 hr (dryer speed) • Pipeline rate is limited by the slowest stage Pipelining

  9. 2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r Time 30 30 30 30 30 30 30 30 A Dependency B Stall C D What IF: Dependency • Brian is using the laundry for the first time; he wants to see the outcome of one wash + dry cycle first before putting in his clothes • Pipelined laundry now takes 4 hours Pipelining

  10. Pipelining Lessons Pipelining • Pipelining doesn’t help latencyof single task: • It helps the throughputof entire workload • Multiple tasks operating simultaneously using different resources • Possible delays: • Pipeline rate limited by slowest pipeline stage • Stall for dependencies ( more on this later )

  11. Datapath and Control MIPS Pipeline Pipelining

  12. MIPS Pipeline Stages (1/2) • FiveExecution Stages • IF: Instruction Fetch • ID: Instruction Decode and Register Read • EX: Execute an operation or calculate an address • MEM: Access an operand in data memory • WB: Write back the result into a register • Idea: • Each execution stage takes 1 clock cycle • General flow of data is from one stage to the next • Exceptions: • Update of PC and write back of register file – more about this later… Pipelining

  13. MIPS Pipeline Stages (2/2) Pipelining

  14. Time IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Program Flow (Instructions) IF ID EX MEM WB Pipelined Execution: Illustration Several instructions are in the pipeline simultaneously! Pipelining

  15. MIPS Pipeline: Datapath(1/3) • Single-cycle implementation: • Update all state elements (PC, register file, data memory) at the end of a clock cycle • Pipelined implementation: • One cycle per pipeline stage • Data required for each stage needs to be stored separately (why?) Pipelining

  16. MIPS Pipeline: Datapath(2/3) • Data used by subsequent instructions: • Store in programmer-visible state elements: PC, register file and memory • Data used by same instruction in later pipeline stages: • Additional registers in datapath called pipeline registers • IF/ID: register between IF and ID • ID/EX: register between ID and EX • EX/MEM: register between EX and MEM • MEM/WB: register between MEM and WB • Why no register at the end of WB stage? Pipelining

  17. MIPS Pipeline: Datapath(3/3) Coloured rectangles are pipeline registers Pipelining

  18. Pipeline Datapath: IF Stage • At the end of a cycle, IF/ID receives (stores): • Instruction read from DataMemory[ PC ] • PC + 4 • PC + 4 • Also connected to one of the MUX's inputs (another coming later) Pipelining

  19. Pipeline Datapath: IDStage Pipelining

  20. Pipeline Datapath: EX Stage Pipelining

  21. Pipeline Datapath: MEMStage Pipelining

  22. Pipeline Datapath: WB Stage Pipelining

  23. Corrected Datapath (1/2) • Observe the “Write register” number • Supplied by the IF/ID pipeline register • It is NOT the correct write register for the instruction now in WB stage! • Solution: • Pass “Write register” number from ID/EX through EX/MEM to MEM/WB pipeline register for use in WB stage • i.e. let the "Write register" number follows the instruction through the pipeline until it is needed in WB stage Pipelining

  24. Corrected Datapath (2/2) Pipelining

  25. Pipeline Control: Main Idea • Same control signals as single-cycle datapath • Difference: Each control signal belongs to a particular pipeline stage Pipelining

  26. Pipeline Control: Try it! Try associating each control signal with the correct pipeline stage. Pipelining

  27. Pipeline Control: Grouping Pipelining Group control signals according to pipeline stage

  28. Pipeline Control: Implementation Signal used in EX stage and onwards Signal generated in ID stage Signal propagated along until utilized Pipelining

  29. MIPS Pipeline: Datapath and Control Pipelining

  30. Exercise: Pipeline Control Signals • Indicate the control signals utilized in each cycle for the following sequence of instructions: add, sub, lw, sw Pipelining 

  31. pipeline performance Pipelining

  32. Single Cycle Processor: Performance Pipelining • Cycle time: • = • = Time for operation in stage k • N = Number of stages • Total Execution Time for I instructions: • = =

  33. Single Cycle Processor: Example • Cycle Time • Choose the longest total time = 8ns • To execute 100 instructions: • 100 x 8ns = 800ns Pipelining

  34. Multi-Cycle Processor: Performance Pipelining • Cycle time: • = • Total Execution Time for I instructions: • = = • Average CPI is needed because each instruction take different number of cycles to finish

  35. Multi-Cycle Processor: Example • Cycle Time: • Choose the longest stage time = 2ns • To execute 100 instructions, with a given average CPI of 4.6 • 100 x 4.6 x 2ns = 920ns Pipelining

  36. Pipeline Processor: Performance Pipelining • Cycle Time: • = • = Overhead for pipelining, e.g. pipeline register • Cycles needed for I instructions: • Total Time needed for I instructions : • = =

  37. Pipeline Processor: Example • Cycle Time • assume pipeline register delay of 0.5ns • longest stage time + overhead = 2 + 0.5 = 2.5ns • To execute 100 instructions: • (100 + 5 – 1) x 2.5ns = 260ns Pipelining

  38. Pipeline Processor: Ideal Speedup Pipelining • Assumptions for ideal case: • Every stage takes the same amount of time:  = • No pipeline overhead  • Number of instructions I, is much larger than number of stages, N • Note: The above also shows how pipeline processor loses performance

  39. Pipeline Processor: Ideal Speedup Conclusion: Pipeline processor can gain N times speedup, where N is the number of pipeline stages Pipelining = =

  40. Cycle 1 Cycle 2 Clk Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk IF ID EX MEM WB IF ID EX MEM IF Load Store R-type IF ID EX MEM WB Load Store IF ID EX MEM WB R-type IF ID EX MEM WB Summary for different implementations Single-Cycle Multi-Cycle Pipeline Pipelining

  41. Review Question • Given this code: add $t0, $s0, $s1 sub $t1, $s0, $s1 sll $t2, $s0, 2 srl $t3, $s1, 2 How many cycles will it take to execute the code on a single-cycle datapath? How long will it take to execute the code on a single-cycle datapath, assuming a 100 MHz clock? How many cycles will it take to execute the code on a 5-stage MIPS pipeline? How long will it take to execute the code on a 5-stage MIPS pipeline? Assume pipeline is completely balanced and the single-cycle datapath could support 100MHz clock frequency. Pipelining 

  42. Grinding to a halt….. pipeline hazards Pipelining

  43. Pipeline Hazards • Speedup from pipeline implementation: • Based on the assumption that a new instructions can be "pumped" into pipeline every cycle • However, there are pipeline hazards • Problems that prevent next instruction from immediately following previous instruction • Structural hazards: • Simultaneous use of a hardware resource • Data hazards: • Data dependencies between instructions • Control hazards: • Change in program flow Instruction Dependencies Pipelining

  44. Graphical Notation for Pipeline Pipelining Horizontal = the stages of an instruction Vertical = the instructions in different pipeline stages

  45. Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU ALU Structure Hazard: Example Time (clock cycles) Load and Inst3 access memory in the same cycle! Reg Mem Reg Load Inst 1 Instruction Order Inst 2 Mem Mem Reg Reg Inst 3 Pipelining If there is only a single memory module:

  46. Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU Bubble Bubble Bubble Bubble Solution 1: Stall the Pipeline Time (clock cycles) Delay (Stall) Inst 3 for 1 cycle Reg Mem Load Reg Inst 1 Inst 2 Instruction Order Stall Bubble Inst 3 Pipelining

  47. Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU ALU Solution 2: Separate Memory Time (clock cycles) Load uses Data Memory Reg Mem Reg Load Inst 1 Instruction Order Inst 2 Mem Mem Reg Reg Inst 3 Inst3 uses Instr. Memory Pipelining • Split memory into: • Data and Instruction memory

  48. ALU Mem Mem ALU Mem Reg Reg Reg ALU Mem Reg Mem ALU Quiz: Is there another conflict? Time (clock cycles) Inst 0 Mem Reg Reg Inst 1 Instruction Order Inst 2 Inst 3 Mem Reg Mem Reg Pipelining

  49. Instruction Dependencies Pipelining • Instructions can have relationship that prevent pipeline execution: • Although a partial overlap maybe possible in some cases • When different instructions accesses (read/write) the same register • Register contention is the cause of dependency • Known as data dependency • When the execution of an instruction depends on another instruction • Control flow is the cause of dependency • Known as control dependency • Failure to handle dependencies can affect program correctness!

  50. Data Dependency: RAW i1:add $1, $2, $3 #writes to $1 i2: sub $4, $1, $5 #reads from $1 • Effect of incorrect execution: • If i2 reads register $1 before i1 can write back the result, i2 will get a stale result (old result) Pipelining • "Read-After-Write" Definition: • Occurs when a later instruction reads from the destination register written by an earlier instruction • Also known as true data dependency

More Related