280 likes | 292 Views
Pipelining II. Systems I. Topics Pipelining hardware: registers and feedback paths Difficulties with pipelines: hazards Method of mitigating hazards. W. _. i. c. o. d. e. ,. W. _. v. a. l. M. W. _. v. a. l. E. ,. W. _. v. a. l. M. ,. W. _. d. s. t. E. ,. W.
E N D
Pipelining II Systems I Topics • Pipelining hardware: registers and feedback paths • Difficulties with pipelines: hazards • Method of mitigating hazards
W _ i c o d e , W _ v a l M W _ v a l E , W _ v a l M , W _ d s t E , W _ d s t M W v a l M D a t a D a t a M _ i c o d e , M e m o r y m e m o r y m e m o r y M _ B c h , M _ v a l A A d d r , D a t a M B c h v a l E C C C C E x e c u t e A L U A L U a l u A , a l u B E v a l A , v a l B d _ s r c A , D e c o d e A A B B d _ s r c B M M R R e e g g i i s s t t e e r r R R e e g g i i s s t t e e r r f f i i l l e e f f i i l l e e E E W r i t e b a c k D v a l P i c o d e , i f u n , r A , r B , v a l C v a l P I n s t r u c t i o n P C F e t c h I n s t r u c t i o n P C m e m o r y i n c r e m e n t m e m o r y i n c r e m e n t p r e d P C P C f _ P C F Adding Pipeline Registers
SEQ+ Hardware • Still sequential implementation • Reorder PC stage to put at beginning PC Stage • Task is to select PC for current instruction • Based on results computed by previous instruction Processor State • PC is no longer stored in register • But, can determine PC based on other stored information
PIPE- Hardware • Pipeline registers hold intermediate values from instruction execution Forward (Upward) Paths • Values passed from one stage to next • Cannot jump past stages • e.g., valC passes through decode
Feedback Paths Predicted PC • Guess value of next PC Branch information • Jump taken/not-taken • Fall-through or target address Return point • Read from memory Register updates • To register file write ports
File: demo-basic.ys 1 2 3 4 5 6 7 8 9 irmovl $1,%eax #I1 F D E M W irmovl $2,%ecx #I2 F D E M W F D W M E irmovl $3,%edx #I3 I4 I3 I2 I1 I5 irmovl $4,%ebx #I4 F D E M W halt #I5 F D E M W F D E M W Cycle 5 Pipeline Demonstration
1 2 3 4 5 6 7 8 9 10 11 # demo-h3.ys F F D D E E M M W W 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: nop F F D D E E M M W W 0x00f: addl % edx ,% eax F F D D E E M M W W 0x011: halt Cycle 6 W W f f R[ R[ ] ] 3 3 % % eax eax Cycle 7 D D f f valA valA R[ R[ ] ] = = 10 10 % % edx edx f f valB valB R[ R[ ] ] = = 3 3 % % eax eax Data Dependencies: 3 Nop’s
1 2 3 4 5 6 7 8 9 10 # demo-h2.ys F F D D E E M M W W 0x000: irmovl $10,% edx F F D D E E M M W W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl % edx ,% eax F F D D E E M M W W 0x010: halt Cycle 6 W W W f f f R[ R[ R[ ] ] ] 3 3 3 % % % eax eax eax • • • • • • D D D f f f valA valA valA R[ R[ R[ ] ] ] = = = 10 10 10 Error % % % edx edx edx f f f valB valB valB R[ R[ R[ ] ] ] = = = 0 0 0 % % % eax eax eax Data Dependencies: 2 Nop’s Can’t transport value produced by first instruction back in time
# demo-h1.ys 1 2 3 4 5 6 7 8 9 F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: addl % edx ,% eax F F D D E E M M W W 0x00f: halt Cycle 5 W W f f R[ R[ ] ] 10 10 % % edx edx M M_ valE = 3 M_ dstE = % eax • • • D D Error f f valA valA R[ R[ ] ] = = 0 0 % % edx edx f f valB valB R[ R[ ] ] = = 0 0 % % eax eax Data Dependencies: 1 Nop Now a problem with both operands
1 2 3 4 5 6 7 8 # demo-h0.ys F D E M W 0x000: irmovl $10,% edx F D E M W 0x006: irmovl $3,% eax F D E M W 0x00c: addl % edx ,% eax F D E M W 0x00e: halt Cycle 4 M M_ valE = 10 M_ dstE = % edx E f e_ valE 0 + 3 = 3 E_ dstE = % eax D D Error f f valA valA R[ R[ ] ] = = 0 0 % % edx edx f f valB valB R[ R[ ] ] = = 0 0 % % eax eax Data Dependencies: No Nop Wow - we really missed the boat here…
Predicting the PC • Start fetch of new instruction after current one has completed fetch stage • Not enough time to reliably determine next instruction • Guess which instruction will follow • Recover if prediction was incorrect
Our Prediction Strategy Instructions that Don’t Transfer Control • Predict next PC to be valP • Always reliable Call and Unconditional Jumps • Predict next PC to be valC (destination) • Always reliable Conditional Jumps • Predict next PC to be valC (destination) • Only correct if branch is taken • Typically right 60% of time Return Instruction • Don’t try to predict
Recovering from PC Misprediction • Mispredicted Jump • Will see branch flag once instruction reaches memory stage • Can get fall-through PC from valA • Return Instruction • Will get return PC when ret reaches write-back stage • In both cases • Need to throw away instructions fetched between prediction and resolution
Branch Misprediction Example • Should only execute first 7 instructions demo-j.ys 0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute
Branch Misprediction Trace • Incorrectly execute two instructions at branch target
Return Example demo-ret.ys 0x000: irmovl Stack,%esp # Intialize stack pointer 0x006: nop # Avoid hazard on %esp 0x007: nop 0x008: nop 0x009: call p # Procedure call 0x00e: irmovl $5,%esi # Return point 0x014: halt 0x020: .pos 0x20 0x020: p: <op> # procedure 0x021: <op> 0x022: <op> 0x023: ret 0x024: irmovl $1,%eax # Should not be executed 0x02a: irmovl $2,%ecx # Should not be executed 0x030: irmovl $3,%edx # Should not be executed 0x036: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x100 0x100: Stack: # Stack: Stack pointer
Incorrect Return Example • Incorrectly execute 3 instructions following ret
Pipeline Summary Concept • Break instruction execution into 5 stages • Run instructions through in pipelined mode Limitations • Can’t handle dependencies between instructions when instructions follow too closely • Data dependencies • One instruction writes register, later one reads it • Control dependency • Instruction sets PC in way that pipeline did not predict correctly • Mispredicted branch and return
The problem is hazards Make the pipelined processor work! Data Hazards • Instruction having register R as source follows shortly after instruction having register R as destination • Common condition, don’t want to slow down pipeline Control Hazards • Mispredict conditional branch • Our design predicts all branches as being taken • Naïve pipeline executes two extra instructions • Getting return address for ret instruction • Naïve pipeline executes three extra instructions Making Sure It Really Works • What if multiple special cases happen simultaneously?
How do we fix the Pipeline? Pad the program with NOPs • Yuck! Stall the pipeline • Data hazards • Wait for producing instruction to complete • Then proceed with consuming instruction • Control hazards • Wait until new PC has been determined • Then begin fetching Forward data within the pipeline • Grab the result from somewhere in the pipe • After it has been computed • But before it has been written back
E M W F F D E M W Stalling for Data Dependencies 1 2 3 4 5 6 7 8 9 10 11 # demo-h2.ys • If instruction follows too closely after one that writes register, slow it down • Hold instruction in decode • Dynamically inject nop into execute stage 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W
M_Bch e_Bch CC CC d_ srcA d_ srcB Write back Stall Condition W icode valE valM dstE dstM data out read Data Data Mem . control memory memory write Source Registers • srcA and srcB of current instruction in decode stage Destination Registers • dstE and dstM fields • Instructions in execute, memory, and write-back stages Special Case • Don’t stall for register ID 8 • Indicates absence of register operand Memory data in Addr M_ valA M icode Bch valE valA dstE dstM ALU ALU ALU fun. ALU ALU A B Execute E icode ifun valC valA valB dstE dstM srcA srcB d_ rvalA Select dstE dstM srcA srcB A W_ valM Decode A B M Register Register file W_ valE file E D icode ifun rA rB valC valP Predict PC Instruction PC Instruction PC memory increment memory increment Fetch f_PC M_ valA Select PC W_ valM F predPC
E M W F F D E M W Cycle 6 W W_dstE = %eax W_valE = 3 • • • D srcA = %edx srcB = %eax Detecting Stall Condition 1 2 3 4 5 6 7 8 9 10 11 # demo-h2.ys 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop F D E M W bubble 0x00e: addl %edx,%eax D D E M W 0x010: halt F F D E M W
E M W E_dstE = %eax M_dstE = %eax W_dstE = %eax D D D srcA = %edx srcB = %eax srcA = %edx srcB = %eax srcA = %edx srcB = %eax Stalling X3 1 2 3 4 5 6 7 8 9 10 11 # demo-h0.ys 0x000: irmovl $10,%edx F D E M W 0x006: irmovl $3,%eax F D E M W bubble E M W bubble E M W bubble E M W 0x00c: addl %edx,%eax F D D D D E M W 0x00e: halt F F F F D E M W Cycle 6 Cycle 5 Cycle 4 • • • • • •
# demo-h0.ys 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax Write Back 0x00c: addl %edx,%eax Memory 0x00e: halt Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Execute 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble Decode 0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble bubble bubble Fetch 0x006: irmovl $3,%eax bubble bubble bubble 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00c: addl %edx,%eax 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt 0x00e: halt What Happens When Stalling? • Stalling instruction held back in decode stage • Following instruction stays in fetch stage • Bubbles injected into execute stage • Like dynamically generated nop’s • Move through later stages
Implementing Stalling Pipeline Control • Combinational logic detects stall condition • Sets mode signals for how pipeline registers should update
Rising Rising Input = y Output = x Output = y _ _ clock clock Normal x x y y stall bubble = 0 = 0 Rising Rising Input = y Output = x Output = x _ _ clock clock Stall x x x x stall bubble = 1 = 0 Rising Rising Input = y Output = x Output = nop _ _ clock clock n Bubble o p stall bubble = 0 = 1 Pipeline Register Modes x x
Summary Today • Data hazards (read after write) • Control hazards (branch, return) • Mitigating hazards through stalling Next Time • Hazard mitigation through pipeline forwarding • Hardware support for forwarding • Forwarding to mitigate control (branch) hazards