1 / 29

Introduction to Pipelined Datapath

This presentation introduces the concept of pipelined datapath in computer architecture and assembly language. It covers the advantages and disadvantages of single cycle and multicycle implementations, as well as the basics of pipelining. The MIPS processor is used as an example to illustrate the pipelined datapath.

brookew
Download Presentation

Introduction to Pipelined Datapath

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 14:332:331Computer Architecture and Assembly LanguageSpring 2005Week 11Introduction to Pipelined Datapath [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

  2. Head’s Up • Reminders • Pipelined datapath and control • HW#6 will be handed out soon

  3. MDR Review: Multicycle Data and Control Path PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control FSM ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 0 1 Address Memory 0 PC 0 Read Addr 1 A Read Data 1 IR Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) 0 ALUout ALU Write Addr Write Data 1 Read Data 2 B 0 1 Write Data 4 1 0 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]

  4. Review: RTL Summary

  5. Review: Multicycle Datapath FSM Decode 0 IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite Instr Fetch 1 Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 Start (Op = R-type) (Op = beq) 2 (Op = lw or sw) (Op = j) 6 8 9 ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 PCSource=10 PCWrite Execute (Op = lw) (Op = sw) 3 5 7 Memory Access RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 4 RegDst=0 RegWrite MemtoReg=1 PCWriteCond=0 Write Back

  6. Review: FSM Implementation PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg Combinational control logic PCSource Outputs ALUOp ALUSourceB ALUSourceA RegWrite RegDst Inputs Op5 Op4 Op3 Op2 Op1 Op0 Next State State Reg Inst[31-26] System Clock

  7. Single Cycle Disadvantages & Advantages • Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction • Is wasteful of area since some functional units must (e.g., adders) be duplicated since they can not be shared during a clock cycle but • Is simple and easy to understand Cycle 1 Cycle 2 Clk Single Cycle Implementation: lw sw Waste

  8. Multicycle Advantages & Disadvantages • Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step • balance the amount of work to be done in each step • restrict each step to use only one major functional unit • Multicycle implementations allow • functional units to be used more than once per instruction as long as they are used on different clock cycles • faster clock rates • different instructions to take a different number of clock cycles but • Requires additional internal state registers, muxes, and more complicated (FSM) control

  9. IFetch Dec Exec Mem WB The Five Stages of Load Instruction • IFetch: Instruction Fetch and Update PC • Dec: Registers Fetch and Instruction Decode • Exec: Execute R-type; calculate memory address • Mem: Read/write the data from/to the Data Memory • WB: Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 lw

  10. multicycle clock slower than 1/5th of single cycle clock due to stage flipflop overhead IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Single Cycle vs. Multiple Cycle Timing Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste Multiple Cycle Implementation: Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type

  11. IFetch IFetch IFetch Dec Dec Dec Exec Exec Exec Mem Mem Mem WB WB WB Pipelined MIPS Processor • Start the next instruction while still working on the current one • improves throughput - total amount of work done in a given time • instruction latency (execution time, delay time, response time) is not reduced - time from the start of an instruction to its completion Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw sw R-type

  12. IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch wasted cycle IFetch Dec Exec Mem WB IFetch Dec Exec Mem WB IFetch Dec Exec Mem WB Single Cycle, Multiple Cycle, vs. Pipeline Single Cycle Implementation: Cycle 1 Cycle 2 Clk Load Store Waste Multiple Cycle Implementation: Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type Pipeline Implementation: lw sw R-type

  13. Pipelining the MIPS ISA • What makes it easy • all instructions are the same length (32 bits) • few instruction formats (three) with symmetry across formats • memory operations can occur only in loads and stores • operands must be aligned in memory so a single data transfer requires only one memory access • What makes it hard • structural hazards: what if we had only one memory • control hazards: what about branches • data hazards: what if an instruction’s input operands depend on the output of a previous instruction

  14. MIPS Pipeline Datapath Modifications • What do we need to add/modify in our MIPS datapath? • State registers between pipeline stages to isolate them IFetch Dec Exec Mem WB 1 0 Add Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Address Read Addr 2 IFetch/Dec Read Address PC Read Data Dec/Exec 1 Write Addr ALU Read Data 2 Mem/WB 0 Exec/Mem Write Data 0 Write Data 1 Sign Extend 16 32 System Clock

  15. MIPS Pipeline Control Path Modifications • All control signals are determined during Decode • and held in the state registers between pipeline stages IFetch Dec Exec Mem WB 1 0 Control Add Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Address Read Addr 2 IFetch/Dec Read Address PC Read Data Dec/Exec 1 Write Addr ALU Read Data 2 Mem/WB 0 Exec/Mem Write Data 0 Write Data 1 Sign Extend 16 32 System Clock

  16. DM Reg Reg IM ALU Graphically Representing MIPS Pipeline • Can help with answering questions like: • how many cycles does it take to execute this code? • what is the ALU doing during cycle 4? • is there a hazard, why does it occur, and how can it be fixed?

  17. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Time to fill the pipeline Why Pipeline? For Throughput! Time (clock cycles) Inst 0 Once the pipeline is full, one instruction is completed every cycle I n s t r. O r d e r Inst 1 Inst 2 Inst 3 Inst 4

  18. Can pipelining get us into trouble? • Yes:Pipeline Hazards • structural hazards: attempt to use the same resource by two different instructions at the same time • data hazards: attempt to use item before it is ready • instruction depends on result of prior instruction still in the pipeline • control hazards: attempt to make a decision before condition is evaulated • branch instructions • Can always resolve hazards by waiting • pipeline control must detect the hazard • take action (or delay action) to resolve hazards

  19. Reading data from memory Mem Mem Mem Mem Mem Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg Mem Mem Mem Mem Mem ALU ALU ALU ALU ALU Reading instruction from memory A Unified Memory Would Be a Structural Hazard Time (clock cycles) lw I n s t r. O r d e r Inst 1 Inst 2 Inst 3 Inst 4

  20. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU How About Register File Access? Time (clock cycles) Can fix register file access hazard by doing reads in the second half of the cycle and writes in the first half. add I n s t r. O r d e r Inst 1 Inst 2 add Inst 4

  21. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Branch Instructions Cause Control Hazards • Dependencies backward in time cause hazards add I n s t r. O r d e r beq lw Inst 3 Inst 4

  22. DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM ALU ALU ALU ALU stall stall lw Inst 3 One Way to “Fix” a Control Hazard add Can fix branch hazard by waiting – stall – but affects throughput I n s t r. O r d e r beq

  23. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Register Usage Can Cause Data Hazards • Dependencies backward in time cause hazards add r1,r2,r3 I n s t r. O r d e r sub r4,r1,r5 and r6,r1,r7 or r8, r1, r9 xor r4,r1,r5

  24. DM DM DM Reg Reg Reg Reg Reg Reg stall IM IM IM ALU ALU ALU stall sub r4,r1,r5 and r6,r1,r7 One Way to “Fix” a Data Hazard Can fix data hazard by waiting – stall – but affects throughput add r1,r2,r3 I n s t r. O r d e r

  25. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Loads Can Cause Data Hazards • Dependencies backward in time cause hazards lw r1,100(r2) I n s t r. O r d e r sub r4,r1,r5 and r6,r1,r7 or r8, r1, r9 xor r4,r1,r5

  26. DM DM DM DM DM Reg Reg Reg Reg Reg Reg Reg Reg Reg Reg IM IM IM IM IM ALU ALU ALU ALU ALU Stores Can Cause Data Hazards • Dependencies backward in time cause hazards add r1,r2,r3 I n s t r. O r d e r sw r1,100(r5) and r6,r1,r7 or r8, r1, r9 xor r4,r1,r5

  27. DM Reg Reg IM IM ALU ALU Other Pipeline Structures Are Possible • What about (slow) multiply operation? • let it take two cycles • What if the data memory access is twice as slow as the instruction memory? • make the clock twice as slow or … • let data memory access take two cycles (and keep the same clock rate) MUL DM2 DM1 Reg Reg

  28. Reg EX DM Reg Reg IM IM ALU ALU Sample Pipeline Alternatives • ARM7 • StrongARM-1 • XScale PC update IM access decode reg access ALU op DM access shift/rotate commit result (write back) Reg DM2 IM1 DM1 IM2 Reg SHFT PC update BTB access start IM access decode reg 1 access DM write reg write ALU op start DM access exception shift/rotate reg 2 access IM access

  29. Summary • All modern day processors use pipelining • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Multiple tasks operating simultaneously using different resources • Potential speedup = Number of pipe stages • Pipeline rate limited by slowest pipeline stage • Unbalanced lengths of pipe stages reduces speedup • Time to “fill” pipeline and time to “drain” it reduces speedup • Must detect and resolve hazards • Stalling negatively affects throughput

More Related