1 / 41

Seoul National University

Seoul National University. Pipelined Implementation : Part II. Seoul National University. Overview. Make the pipelined processor work! Data Hazards An instruction having register R as source follows shortly after another instruction having register R as destination

zahina
Download Presentation

Seoul National University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Seoul National University Pipelined Implementation : Part II

  2. Seoul National University Overview Make the pipelined processor work! • Data Hazards • An instruction having register R as source follows shortly after another instruction having register R as destination • A common condition, don’t want to slow down pipeline • Control Hazards • Mispredictedconditional branch • Our design predicts all branches as being taken • Naïve pipeline executes two extra instructions • Getting return address for ret instruction • Naïve pipeline executes three extra instructions • Making Sure It Really Works • What if multiple special cases happen simultaneously?

  3. Seoul National University Pipeline Stages • Fetch • Select current PC • Read instruction • Compute incremented PC • Decode • Read program registers • Execute • Operate ALU • Memory • Read or write data memory • Write Back • Update register file

  4. Seoul National University Seoul National University PIPE- Hardware • Pipeline registers hold intermediate values from previous stages for the instruction • Forward (Upward) Paths • Values passed from one stage to next • Cannot jump past stages • e.g., valC passes through decode

  5. Seoul National University Cycle 4 M M_ valE = 10 M_ dstE = % edx f e_ valE 0 + 3 = 3 E_ dstE = % eax D D Error f f valA valA R[ R[ ] ] = = 0 0 % % edx edx f f valB valB R[ R[ ] ] = = 0 0 % % eax eax Data Dependencies: No Nop 1 2 3 4 5 6 7 8 F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F D E M W 0x00c: addl % edx ,% eax F D E M W 0x00e: halt E

  6. Seoul National University Cycle 6 W f R[ ] 3 % eax • • • D D D f f f valA valA valA R[ R[ R[ ] ] ] = = = 10 10 10 Error % % % edx edx edx f f f valB valB valB R[ R[ R[ ] ] ] = = = 0 0 0 % % % eax eax eax Data Dependencies: 2 Nop’s 1 2 3 4 5 6 7 8 9 10 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl % edx ,% eax F F D D E E M M W W 0x010: halt F F D D E E M M W W W W f f R[ R[ ] ] 3 3 % % eax eax • • •

  7. E M W F F D E M W Stalling for Data Dependencies 1 2 3 4 5 6 7 8 9 10 11 • If an instruction follows too closely after another that writes its source register, slow it down • Hold instruction in decode • Dynamically inject nop’s (i.e., bubbles) into execute stage 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W

  8. Seoul National University Stall Condition • Source Registers • srcA and srcB of the instruction in decode stage • Destination Registers • dstE and dstM fields • Instructions in execute, memory, and write-back stages • Special Case • Don’t stall for register ID 15 (0xF) • Indicates absence of register operand • Don’t stall for failed conditional move

  9. Seoul National University E M W F F D E M W Cycle 6 W W_dstE = %eax W_valE = 3 • • • D srcA = %edx srcB = %eax Detecting Stall Condition 1 2 3 4 5 6 7 8 9 10 11 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W

  10. Seoul National University E M W E_dstE = %eax M_dstE = %eax W_dstE = %eax D D D srcA = %edx srcB = %eax srcA = %edx srcB = %eax srcA = %edx srcB = %eax Stalling X3 1 2 3 4 5 6 7 8 9 10 11 0x000: irmovl $10,%edx F D E M W 0x006: irmovl $3,%eax F D E M W bubble E M W bubble E M W bubble E M W 0x00c: addl %edx,%eax F D D D D E M W 0x00e: halt F F F F D E M W Cycle 6 Cycle 5 Cycle 4 • • • • • •

  11. Seoul National University 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax Write Back 0x00c: addl %edx,%eax Memory 0x00e: halt Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Execute 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble Decode 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble bubble Fetch 0x006: irmovl $3,%eax bubble bubble bubble 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt What Happens When Stalling? • Stalled instruction held back in decode stage • Following instruction stays in fetch stage • Bubbles injected into execute stage • Like dynamically generated nop’s • Move through later stages

  12. Seoul National University Implementing Stalling • Pipeline Control • Combinational logic detects stall condition • Sets mode signals for how pipeline registers should be updated Pipe control logic W_stat W stat icode valE valM dstE dstM W_stall M_icode M_bubble M stat icode Cnd valE valA dstE dstM m_stat stat e_Cnd set_cc CC E_dstM E_icode E_bubble E stat icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode srcA D_bubble D stat icode ifun rA rB valC valP D_stall F_stall F predPC

  13. Seoul National University Input = y Output = x Output = y _ _ Rising Normal clock x x y y stall bubble = 0 = 0 Input = y Output = x Output = x _ _ Rising Stall clock x x x x stall bubble = 1 = 0 Input = y Output = x Output = nop _ _ Rising n clock Bubble o p stall bubble = 0 = 1 Pipeline Register Modes x x

  14. Seoul National University Data Forwarding • Naïve Pipeline • Register isn’t written until completion of write-back stage • Source operands read from register file in decode stage • Observation • Value to be written to register generated much earlier (in execute or memory stage) • Trick • Pass value directly from execute or memory stage of the generating instruction to decode stage • Needs to be available at the end of decode stage

  15. Seoul National University 10 F F D D E E M M W W Cycle 6 W f R[ ] 3 W_ dstE = % eax % eax W_ valE = 3 • • • D f valA R[ ] = 10 srcA = % edx % edx srcB = % eax f valB W_ valE = 3 Data Forwarding Example 1 2 3 4 5 6 7 8 9 F F D D E E M M W W 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl % edx ,% eax 0x010: halt • irmovl in write-back stage • Destination value in W pipeline register • Forward as valB for decode stage

  16. Seoul National University Bypass Paths • Decode Stage • Forwarding logic selects valA and valB • Normally from register file • Forwarding: get valA or valB from later pipeline stages • Forwarding Sources • Execute: valE • Memory: valE, valM • Write back: valE, valM

  17. Seoul National University Data Forwarding Example #2 1 2 3 4 5 6 7 8 F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F D E M W 0x00c: addl % edx ,% eax F D E M W 0x00e: halt Cycle 4 • Register %edx • Value generated by ALU during previous cycle • Forward from memory as valA • Register %eax • Value generated by ALU during current cycle • Forward from execute as valB M M_ dstE = % edx M_ valE = 10 E E_ dstE = % eax f e_ valE 0 + 3 = 3 D srcA = f valA M_ valE = 10 % edx srcB = % eax f valB e_ valE = 3

  18. Seoul National University D f valA R[ ] = 10 % edx f valB R[ ] = 0 % eax Forwarding Priority 1 2 3 4 5 6 7 8 9 10 • Multiple Forwarding Choices • Which one should have priority? • Should be same as sequential execution semantics • Use matching value from earliest pipeline stage F F D D E E M M W W 0x000: irmovl $1, %eax F F D D E E M M W W 0x006: irmovl $2, %eax F F D D E E M M W W 0x00c: irmovl $3, %eax F F D D E E M M W W 0x012: rrmovl %eax, %edx F F D D E E M M W W 0x014: halt Cycle 5 E M W W W W f f f f f f R[ R[ R[ R[ R[ R[ ] ] ] ] ] ] 3 3 1 3 3 2 % % % % % % eax eax eax eax eax eax D D f f valA valA R[ R[ ] ] = = ? 10 % % eax edx f f valB valB R[ 0

  19. Seoul National University Implementing Forwarding • Add additional feedback paths from E, M, and W pipeline registers into decode stage • Create logic blocks to select from multiple sources for valA and valB in decode stage

  20. Seoul National University Implementing Forwarding ## What should be the A value? intnew_E_valA = [ # Use incremented PC D_icode in { ICALL, IJXX } : D_valP; # Forward valE from execute d_srcA == e_dstE: e_valE; # Forward valM from memory d_srcA == M_dstM : m_valM; # Forward valE from memory d_srcA == M_dstE : M_valE; # Forward valM from write back d_srcA == W_dstM : W_valM; # Forward valE from write back d_srcA == W_dstE : W_valE; # Use value read from register file 1 : d_rvalA; ];

  21. Seoul National University 1 2 3 4 5 6 7 8 9 10 11 F D E M W 0x000: irmovl $128,% edx F D E M W 0x006: irmovl $3,% ecx F D E M W 0x00c: rmmovl % ecx , 0(% edx ) F D E M W 0x012: irmovl $10,% ebx F F D D E E M M W W % eax # Load % eax 0x018: mrmovl 0(% edx ), F D E M W % eax # Use % eax 0x01e: addl % ebx , F D E M W 0x020: halt Cycle 7 Cycle 8 M M M_ dstE = M_ dstM = % eax % ebx M_ valE = 10 m_ valM f M[128] = 3 • • • D D Error valA valA f f M_ M_ valE valE = = 10 10 valB valB f f R[ R[ ] ] = = 0 0 % % eax eax Limitation of Forwarding • Load-use dependency • Value needed by the end of decode stage in cycle 7 • Value read from memory in memory stage of cycle 8

  22. Seoul National University 1 2 3 4 5 6 7 8 9 10 11 12 F F D D E E M M W W 0x000: irmovl $128,% edx F F D D E E M M W W 0x006: irmovl $3,% ecx F F D D E E M M W W 0x00c: rmmovl % ecx , 0(% edx ) F F D D E E M M W W 0x012: irmovl $10,% ebx F F F D D D E E E M M M W W W % eax # Load % eax 0x018: mrmovl 0(% edx ), E M W bubble F D D E M W % eax # Use % eax 0x01e: addl % ebx , F F D E M W 0x020: halt Cycle 8 W W W_ W_ dstE dstE = = % % ebx ebx W_ W_ valE valE = 10 = 10 M M M_ M_ dstM dstM = = % % eax eax m_ m_ valM valM f f M[128] M[128] = = 3 3 • • • D D valA valA f f W_ W_ valE valE = = 10 10 valB valB f f m_ m_ valM valM = = 3 3 Avoiding Load/Use Hazard • Stall the instruction that uses the loaded value for one cycle • Then, pick up loaded value by forwarding from memory stage

  23. Seoul National University Detecting Load/Use Hazard

  24. Seoul National University 1 2 3 4 5 6 7 8 9 10 11 12 F F D D E E M M W W 0x000: irmovl $128,% edx F F D D E E M M W W 0x006: irmovl $3,% ecx F F D D E E M M W W 0x00c: rmmovl % ecx , 0(% edx ) F F D D E E M M W W 0x012: irmovl $10,% ebx % eax # Load % eax F F F D D D E E E M M M W W W 0x018: mrmovl 0(% edx ), bubble E M W % eax # Use % eax F D D E M W 0x01e: addl % ebx , F F D E M W 0x020: halt Control for Load/Use Hazard • Stall instructions in fetch and decode stages • Inject bubble into execute stage

  25. Seoul National University Branch Misprediction Example 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute

  26. Seoul National University 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W 0x000: xorl % eax ,% eax F F D D E E M M W W 0x002: jne target # Not taken F D 0x011: t: irmovl $2,% edx # Target E M W -> bubble F 0x017: irmovl $3,% ebx # Target+1 D E M W -> bubble F F D D E E M M W W 0x007: irmovl $1,% eax # Fall through F F D D E E M M W W 0x00d: nop Handling Misprediction Predict branch as taken • Twoinstructions are fetched from the target Cancel when mispredicted • Detect branch not-taken in execute stage • During the nextcycle, replace instructions in execute and decode by bubbles • No side effects have occurred yet

  27. Seoul National University Detecting Mispredicted Branch

  28. Seoul National University 1 2 3 4 5 6 7 8 9 10 F F D D E E M M W W 0x000: xorl % eax ,% eax F F D D E E M M W W 0x002: jne target # Not taken F D 0x011: t: irmovl $2,% edx # Target E M W -> bubble F 0x017: irmovl $3,% ebx # Target+1 D E M W -> bubble F F D D E E M M W W 0x007: irmovl $1,% eax # Fall through F F D D E E M M W W 0x00d: nop Control for Misprediction

  29. Seoul National University Return Example 0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x20 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer • Previously executed three additional instructions

  30. Seoul National University W W valM valM = = 0x0b 0x0b • • • Correct Return Example F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl $5,% esi # Return • As ret passes through pipeline, stall at fetch stage • While in decode, execute, and memory stage • Inject bubble into decode stage • Release stall when reach write-back stage F F f f valC valC 5 5 f f rB rB % % esi esi

  31. Seoul National University Detecting Return

  32. Seoul National University Control for Return F D E M W 0x026: ret F D E M W bubble F D E M W bubble F D E M W bubble F F D D E E M M W W 0x00b: irmovl $5,% esi # Return

  33. Seoul National University (Initial) Summary • Detection • Action (on next cycle)

  34. Seoul National University Implementing Pipeline Control • Combinational logic generates pipeline control signals Pipe control logic W_stat W stat icode valE valM dstE dstM W_stall M_icode M_bubble M stat icode Cnd valE valA dstE dstM m_stat stat e_Cnd set_cc CC E_dstM E_icode E_bubble E stat icode ifun valC valA valB dstE dstM srcA srcB d_srcB d_srcA srcB D_icode srcA D_bubble D stat icode ifun rA rB valC valP D_stall F_stall F predPC

  35. Seoul National University Initial Version of Pipeline Control boolF_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; boolD_stall = # Conditions for a load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }; boolD_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode }; boolE_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Load/use hazard E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

  36. Seoul National University Control Combinations • Special cases that can arise during the same clock cycle • Combination A • Not-taken branch • ret instruction at branch target • Combination B • Instruction that reads from memory to %esp • Followed by ret instruction

  37. Seoul National University 1 1 1 Mispredict Mispredict ret ret ret M M M M M JXX JXX E E E E E ret ret ret D D D D D Combination A Control Combination A • Should handle as mispredicted branch • Stalls F pipeline register • But PC selection logic will be using M_valM anyhow

  38. Seoul National University Control Combination B 1 1 1 Load/use ret ret ret M M M M Load E E E E Use ret ret ret D D D D Combination B • Would attempt to bubble and stall pipeline register D • Signaled by processor as pipeline error

  39. Seoul National University Handling Control Combination B 1 1 1 Load/use ret ret ret M M M M Load E E E E Use ret ret ret D D D D Combination B • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle

  40. Seoul National University Corrected Pipeline Control Logic boolD_bubble = # Mispredicted branch (E_icode == IJXX && !e_Cnd) || # Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }); • Load/use hazard should get priority • ret instruction should be held in decode stage for additional cycle

  41. Seoul National University Pipeline Summary • Data Hazards • Most handled by forwarding • No performance penalty • Load/use hazard requires one cycle stall • Control Hazards • Cancel instructions when detect mispredicted branch • Two clock cycles wasted • Stall fetch stage while ret passes through pipeline • Three clock cycles wasted • Control Combinations • Must analyze carefully • First version had subtle bug • Only arises with unusual instruction combination

More Related