410 likes | 573 Views
Seoul National University. Pipelined Implementation : Part II. Seoul National University. Overview. Make the pipelined processor work! Data Hazards An instruction having register R as source follows shortly after another instruction having register R as destination
E N D
Seoul National University Pipelined Implementation : Part II
Seoul National University Overview Make the pipelined processor work! • Data Hazards • An instruction having register R as source follows shortly after another instruction having register R as destination • A common condition, don’t want to slow down pipeline • Control Hazards • Mispredictedconditional branch • Our design predicts all branches as being taken • Naïve pipeline executes two extra instructions • Getting return address for ret instruction • Naïve pipeline executes three extra instructions • Making Sure It Really Works • What if multiple special cases happen simultaneously?
Seoul National University Pipeline Stages • Fetch • Select current PC • Read instruction • Compute incremented PC • Decode • Read program registers • Execute • Operate ALU • Memory • Read or write data memory • Write Back • Update register file
Seoul National University Seoul National University PIPE- Hardware • Pipeline registers hold intermediate values from previous stages for the instruction • Forward (Upward) Paths • Values passed from one stage to next • Cannot jump past stages • e.g., valC passes through decode
Seoul National University Cycle 4 M M_ valE = 10 M_ dstE = % edx f e_ valE 0 + 3 = 3 E_ dstE = % eax D D Error f f valA valA R[ R[ ] ] = = 0 0 % % edx edx f f valB valB R[ R[ ] ] = = 0 0 % % eax eax Data Dependencies: No Nop 1 2 3 4 5 6 7 8 F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F D E M W 0x00c: addl % edx ,% eax F D E M W 0x00e: halt E
Seoul National University Cycle 6 W f R[ ] 3 % eax • • • D D D f f f valA valA valA R[ R[ R[ ] ] ] = = = 10 10 10 Error % % % edx edx edx f f f valB valB valB R[ R[ R[ ] ] ] = = = 0 0 0 % % % eax eax eax Data Dependencies: 2 Nop’s 1 2 3 4 5 6 7 8 9 10 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl % edx ,% eax F F D D E E M M W W 0x010: halt F F D D E E M M W W W W f f R[ R[ ] ] 3 3 % % eax eax • • •
E M W F F D E M W Stalling for Data Dependencies 1 2 3 4 5 6 7 8 9 10 11 • If an instruction follows too closely after another that writes its source register, slow it down • Hold instruction in decode • Dynamically inject nop’s (i.e., bubbles) into execute stage 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W
Seoul National University Stall Condition • Source Registers • srcA and srcB of the instruction in decode stage • Destination Registers • dstE and dstM fields • Instructions in execute, memory, and write-back stages • Special Case • Don’t stall for register ID 15 (0xF) • Indicates absence of register operand • Don’t stall for failed conditional move
Seoul National University E M W F F D E M W Cycle 6 W W_dstE = %eax W_valE = 3 • • • D srcA = %edx srcB = %eax Detecting Stall Condition 1 2 3 4 5 6 7 8 9 10 11 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W
Seoul National University E M W E_dstE = %eax M_dstE = %eax W_dstE = %eax D D D srcA = %edx srcB = %eax srcA = %edx srcB = %eax srcA = %edx srcB = %eax Stalling X3 1 2 3 4 5 6 7 8 9 10 11 0x000: irmovl $10,%edx F D E M W 0x006: irmovl $3,%eax F D E M W bubble E M W bubble E M W bubble E M W 0x00c: addl %edx,%eax F D D D D E M W 0x00e: halt F F F F D E M W Cycle 6 Cycle 5 Cycle 4 • • • • • •
Seoul National University 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax Write Back 0x00c: addl %edx,%eax Memory 0x00e: halt Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Execute 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble Decode 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble bubble Fetch 0x006: irmovl $3,%eax bubble bubble bubble 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt What Happens When Stalling? • Stalled instruction held back in decode stage • Following instruction stays in fetch stage • Bubbles injected into execute stage • Like dynamically generated nop’s • Move through later stages
Seoul National University Implementing Stalling • Pipeline Control • Combinational logic detects stall condition • Sets mode signals for how pipeline registers should be updated Pipe control logic W_stat W stat icode valE valM dstE dstM W_stall M_icode M_bubble M stat icode Cnd valE valA dstE dstM m_stat stat e_Cnd set_cc CC E_dstM E_icode E_bubble E stat icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode srcA D_bubble D stat icode ifun rA rB valC valP D_stall F_stall F predPC
Seoul National University Input = y Output = x Output = y _ _ Rising Normal clock x x y y stall bubble = 0 = 0 Input = y Output = x Output = x _ _ Rising Stall clock x x x x stall bubble = 1 = 0 Input = y Output = x Output = nop _ _ Rising n clock Bubble o p stall bubble = 0 = 1 Pipeline Register Modes x x
Seoul National University Data Forwarding • Naïve Pipeline • Register isn’t written until completion of write-back stage • Source operands read from register file in decode stage • Observation • Value to be written to register generated much earlier (in execute or memory stage) • Trick • Pass value directly from execute or memory stage of the generating instruction to decode stage • Needs to be available at the end of decode stage
Seoul National University 10 F F D D E E M M W W Cycle 6 W f R[ ] 3 W_ dstE = % eax % eax W_ valE = 3 • • • D f valA R[ ] = 10 srcA = % edx % edx srcB = % eax f valB W_ valE = 3 Data Forwarding Example 1 2 3 4 5 6 7 8 9 F F D D E E M M W W 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl % edx ,% eax 0x010: halt • irmovl in write-back stage • Destination value in W pipeline register • Forward as valB for decode stage
Seoul National University Bypass Paths • Decode Stage • Forwarding logic selects valA and valB • Normally from register file • Forwarding: get valA or valB from later pipeline stages • Forwarding Sources • Execute: valE • Memory: valE, valM • Write back: valE, valM
Seoul National University Data Forwarding Example #2 1 2 3 4 5 6 7 8 F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F D E M W 0x00c: addl % edx ,% eax F D E M W 0x00e: halt Cycle 4 • Register %edx • Value generated by ALU during previous cycle • Forward from memory as valA • Register %eax • Value generated by ALU during current cycle • Forward from execute as valB M M_ dstE = % edx M_ valE = 10 E E_ dstE = % eax f e_ valE 0 + 3 = 3 D srcA = f valA M_ valE = 10 % edx srcB = % eax f valB e_ valE = 3
Seoul National University D f valA R[ ] = 10 % edx f valB R[ ] = 0 % eax Forwarding Priority 1 2 3 4 5 6 7 8 9 10 • Multiple Forwarding Choices • Which one should have priority? • Should be same as sequential execution semantics • Use matching value from earliest pipeline stage F F D D E E M M W W 0x000: irmovl $1, %eax F F D D E E M M W W 0x006: irmovl $2, %eax F F D D E E M M W W 0x00c: irmovl $3, %eax F F D D E E M M W W 0x012: rrmovl %eax, %edx F F D D E E M M W W 0x014: halt Cycle 5 E M W W W W f f f f f f R[ R[ R[ R[ R[ R[ ] ] ] ] ] ] 3 3 1 3 3 2 % % % % % % eax eax eax eax eax eax D D f f valA valA R[ R[ ] ] = = ? 10 % % eax edx f f valB valB R[ 0
Seoul National University Implementing Forwarding • Add additional feedback paths from E, M, and W pipeline registers into decode stage • Create logic blocks to select from multiple sources for valA and valB in decode stage
Seoul National University Implementing Forwarding ## What should be the A value? intnew_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == e_dstE: e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ];
Seoul National University 1 2 3 4 5 6 7 8 9 10 11 F D E M W 0x000: irmovl $128,% edx F D E M W 0x006: irmovl $3,% ecx F D E M W 0x00c: rmmovl % ecx , 0(% edx ) F D E M W 0x012: irmovl $10,% ebx F F D D E E M M W W % eax # Load % eax 0x018: mrmovl 0(% edx ), F D E M W % eax # Use % eax 0x01e: addl % ebx , F D E M W 0x020: halt Cycle 7 Cycle 8 M M M_ dstE = M_ dstM = % eax % ebx M_ valE = 10 m_ valM f M[128] = 3 • • • D D Error valA valA f f M_ M_ valE valE = = 10 10 valB valB f f R[ R[ ] ] = = 0 0 % % eax eax Limitation of Forwarding • Load-use dependency • Value needed by the end of decode stage in cycle 7 • Value read from memory in memory stage of cycle 8
Seoul National University 1 2 3 4 5 6 7 8 9 10 11 12 F F D D E E M M W W 0x000: irmovl $128,% edx F F D D E E M M W W 0x006: irmovl $3,% ecx F F D D E E M M W W 0x00c: rmmovl % ecx , 0(% edx ) F F D D E E M M W W 0x012: irmovl $10,% ebx F F F D D D E E E M M M W W W % eax # Load % eax 0x018: mrmovl 0(% edx ), E M W bubble F D D E M W % eax # Use % eax 0x01e: addl % ebx , F F D E M W 0x020: halt Cycle 8 W W W_ W_ dstE dstE = = % % ebx ebx W_ W_ valE valE = 10 = 10 M M M_ M_ dstM dstM = = % % eax eax m_ m_ valM valM f f M[128] M[128] = = 3 3 • • • D D valA valA f f W_ W_ valE valE = = 10 10 valB valB f f m_ m_ valM valM = = 3 3 Avoiding Load/Use Hazard • Stall the instruction that uses the loaded value for one cycle • Then, pick up loaded value by forwarding from memory stage
Seoul National University Detecting Load/Use Hazard
Seoul National University 1 2 3 4 5 6 7 8 9 10 11 12 F F D D E E M M W W 0x000: irmovl $128,% edx F F D D E E M M W W 0x006: irmovl $3,% ecx F F D D E E M M W W 0x00c: rmmovl % ecx , 0(% edx ) F F D D E E M M W W 0x012: irmovl $10,% ebx % eax # Load % eax F F F D D D E E E M M M W W W 0x018: mrmovl 0(% edx ), bubble E M W % eax # Use % eax F D D E M W 0x01e: addl % ebx , F F D E M W 0x020: halt Control for Load/Use Hazard • Stall instructions in fetch and decode stages • Inject bubble into execute stage
Seoul National University Branch Misprediction Example 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute
Seoul National University 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W 0x000: xorl % eax ,% eax F F D D E E M M W W 0x002: jne target # Not taken F D 0x011: t: irmovl $2,% edx # Target E M W -> bubble F 0x017: irmovl $3,% ebx # Target+1 D E M W -> bubble F F D D E E M M W W 0x007: irmovl $1,% eax # Fall through F F D D E E M M W W 0x00d: nop Handling Misprediction Predict branch as taken • Twoinstructions are fetched from the target Cancel when mispredicted • Detect branch not-taken in execute stage • During the nextcycle, replace instructions in execute and decode by bubbles • No side effects have occurred yet
Seoul National University Detecting Mispredicted Branch
Seoul National University 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W 0x000: xorl % eax ,% eax F F D D E E M M W W 0x002: jne target # Not taken F D 0x011: t: irmovl $2,% edx # Target E M W -> bubble F 0x017: irmovl $3,% ebx # Target+1 D E M W -> bubble F F D D E E M M W W 0x007: irmovl $1,% eax # Fall through F F D D E E M M W W 0x00d: nop Control for Misprediction
Seoul National University Return Example 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Previously executed three additional instructions
Seoul National University W W valM valM = = 0x0b 0x0b • • • Correct Return Example F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl $5,% esi # Return • As ret passes through pipeline, stall at fetch stage • While in decode, execute, and memory stage • Inject bubble into decode stage • Release stall when reach write-back stage F F f f valC valC 5 5 f f rB rB % % esi esi
Seoul National University Detecting Return
Seoul National University Control for Return F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl $5,% esi # Return
Seoul National University (Initial) Summary • Detection • Action (on next cycle)
Seoul National University Implementing Pipeline Control • Combinational logic generates pipeline control signals Pipe control logic W_stat W stat icode valE valM dstE dstM W_stall M_icode M_bubble M stat icode Cnd valE valA dstE dstM m_stat stat e_Cnd set_cc CC E_dstM E_icode E_bubble E stat icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode srcA D_bubble D stat icode ifun rA rB valC valP D_stall F_stall F predPC
Seoul National University Initial Version of Pipeline Control boolF_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; boolD_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; boolD_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; boolE_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };
Seoul National University Control Combinations • Special cases that can arise during the same clock cycle • Combination A • Not-taken branch • ret instruction at branch target • Combination B • Instruction that reads from memory to %esp • Followed by ret instruction
Seoul National University 1 1 1 Mispredict Mispredict ret ret ret M M M M M JXX JXX E E E E E ret ret ret D D D D D Combination A Control Combination A • Should handle as mispredicted branch • Stalls F pipeline register • But PC selection logic will be using M_valM anyhow
Seoul National University Control Combination B 1 1 1 Load/use ret ret ret M M M M Load E E E E Use ret ret ret D D D D Combination B • Would attempt to bubble and stall pipeline register D • Signaled by processor as pipeline error
Seoul National University Handling Control Combination B 1 1 1 Load/use ret ret ret M M M M Load E E E E Use ret ret ret D D D D Combination B • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle
Seoul National University Corrected Pipeline Control Logic boolD_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }); • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle
Seoul National University Pipeline Summary • Data Hazards • Most handled by forwarding • No performance penalty • Load/use hazard requires one cycle stall • Control Hazards • Cancel instructions when detect mispredicted branch • Two clock cycles wasted • Stall fetch stage while ret passes through pipeline • Three clock cycles wasted • Control Combinations • Must analyze carefully • First version had subtle bug • Only arises with unusual instruction combination