Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline

Computer Architecture: A Constructive Approach Branch DirectionPrediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/6.S078

NA pred with decode feedback RegRead Fetch Decode Execute Memory Write-back xf df F D R X M W fr dr rr xr mr NextAddressPrediction http://csg.csail.mit.edu/6.S078

Decode detected mispredicts Can we do better than PC+4? • Non-branch • When nextPC != PC+4 => use PC+4 • Unconditional target known at decode • When nextPC != known target => use known target • Conditional branch • When nextPC != PC+4 or decoded target => use PC+4 http://csg.csail.mit.edu/6.S078

Dynamic Branch Prediction • Branch direction prediction: • Learn and predict the direction a branch will go • Standard prediction principles: • Temporal correlation • The way a branch resolves may be a good predictor of the way it will resolve at the next execution • Spatial correlation • Several branches may resolve in a highly correlated manner (a preferred path of execution) http://csg.csail.mit.edu/6.S078

Fetch PC 0 0 I-Cache k 2k-entry BHT, 1bits/entry BHT Index Instruction Opcode offset + Branch? Taken/¬Taken? Target PC One-bit predictor Predict branch will go same direction it went last time Fetch Decode http://csg.csail.mit.edu/6.S078

One-bit predictor // Interface interfaceDirectionPred; methodActionValue#(Tuple2#(Bool, DirInfo)) predict(Addraddr); methodAction train(DirInfodirInfo, Bool taken); endinterface // Feedback information typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedefDirLineIndexDirInfo; http://csg.csail.mit.edu/6.S078

One-bit predictor (continued) • Array of prediction bits • Return prediction saved in array • Update array with last actualbehavior When should we train? modulemkDirectionPredictor(DirectionPred); RegFile#(DirLineIndex, Bool) dirArray<- mkRegFileFull(); methodActionValue#(Tuple2#(Bool, DirInfo)) predict(Addraddr); DirLineIndex index = truncate(addr >> 2); returntuple2(dirArray.sub(index), index); endmethod methodAction train(DirInfodirInfo, Bool taken); DirLineIndex index = dirInfo; dirArray.upd(index, taken); endmethod endmodule http://csg.csail.mit.edu/6.S078

Two-bit PredictorSmith, 1981 How well does one-bit predictor do on short trip count loops? • Assume 2 direction predictionbits per instruction Implement using saturating counter http://csg.csail.mit.edu/6.S078

Saturating Counter How do we determine prediction from counter? typedef Bit#(2) Counter; functionCounter updateCounter(Booldir, Counter counter); returndir?saturatingInc(counter) :saturatingDec(counter); endfunction functionCounter saturatingInc(Counter counter); letplusOne= counter + 1; return(plusOne== 0)?counter:plusOne; endfunction functionCounter saturatingDec(Counter counter); return(counter == 0)?0:counter-1; endfunction http://csg.csail.mit.edu/6.S078

Fetch PC 0 0 k 2k-entry BHT, 1bits/entry BHT Index Taken/¬Taken? Two-bit predictor http://csg.csail.mit.edu/6.S078

Two-bit predictor • Feedback state for training typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; // DirInfo data typedefstruct { DirLineIndex index; Counter counter; } DirInfoderiving(Bits, Eq); modulemkDirectionPredictor(DirectionPred); // Direction predictor state RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); http://csg.csail.mit.edu/6.S078

Two-bit predictor (continued) • Training information is index and counter • Prediction is high bit of counter • Train by updating counter methodActionValue#(Tuple2#(Bool, DirInfo)) predict(Addraddr); DirInfo info = ? info.index = truncate(addr >> 2); info.counter = cntArray.sub(index); Bool taken = (truncate(counter >> 1) == 1); returntuple2(taken, info); endmethod methodAction train(DirInfo info, Bool taken); cntArray.upd(info.index, updateCounter(taken, info.counter)); endmethod endmodule http://csg.csail.mit.edu/6.S078

Exploiting Spatial CorrelationYeh and Patt, 1992 if (x[i] < 7) then y += 1; if (x[i] < 5) then c -= 4; If first condition false, second condition also false Also works well for short trip count loops. Implemented with a history register, ‘hist’, that records the direction of the last N branches executed by the processor. http://csg.csail.mit.edu/6.S078

Ghist predictor typedef 64 BPRows; typedef Bit#(TLog#(BPRows)) DirLineIndex; typedef Bit#(2) Counter; // DirInfo data typedefstruct { DirLineIndexhist; Counter counter; } DirInfo deriving(Bits, Eq); modulemkDirectionPredictor(DirectionPred); // Direction predictor state Reg#(DirLineIndex) hist <- mkReg(0); RegFile#(DirLineIndex,Counter) cntArray <- mkRegFileFull(); http://csg.csail.mit.edu/6.S078

Global history predictor • Calculate feedback information • Shift new prediction into history register How good are predictions while waiting for training? methodActionValue#(Tuple2#(Bool, DirInfo)) predict(Addraddr); DirInfo info = ?; info.hist = hist; info.counter = cntArray.sub(hist); Bit#(1) pred = truncate(info.counter >> 1); hist <= truncate(hist << 1 | zeroExtend(pred)); returntuple2((pred== 1), info); endmethod http://csg.csail.mit.edu/6.S078

Global history predictor • Restore history to state it would be in after the desired prediction What is the state of ‘hist’ afterredirects from decode and execute? method Actiontrain(DirInfo info, Bool taken); counterArray.upd(info.hist, updateCounter(taken, info.counter)); endmethod methodAction repair(DirInfoinfo, Booltaken); hist <= truncate((info.hist << 1) | zeroExtend(pack(taken))); endmethod endmodule http://csg.csail.mit.edu/6.S078

NA pred with decode feedback RegRead Fetch Decode Execute Memory Write-back xf df F D R X M W fr dr rr xr mr NextAddressPrediction DirectionPrediction http://csg.csail.mit.edu/6.S078

Direction prediction recipe • Execute • Send redirects on mispredicts (unchanged) • Send direction prediction training • Decode • Check if next address matches direction pred • Send redirect if different • Fetch • Generate prediction • Learn from feedback • Accept redirects from later stages http://csg.csail.mit.edu/6.S078

Add direction feedback • Feedback needs information for training direction predictor typedefstruct { Bool correct; NaInfonaPredInfo; AddrnextAddr; DirInfodirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback<- mkFIFOF; http://csg.csail.mit.edu/6.S078

Execute (branch analysis) • Recall: may have been set in decode • Always send feedback // after executing instruction... letnextEeEpoch = eeEpoch; letcond = execData.execInst.cond; letnextPc= cond?execData.execInst.addr: execData.pc+4; if (nextPC!= execData.nextAddrPred) nextEeEpoch += 1; eeEpoch<= newEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: (nextPC == execData.nextAddrPred), taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • New exec epoch • Same decepoch • Determine if epoch of incoming instruction is on good path ruledoDecode; letdecData = newDecData(fr.first); letcorrectPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); letinstResp = decData.fInst.instResp; letpcPlus4 = decData.pc+4; if(correctPath) begin decData.decInst= decode(instResp, pcPlus4); lettarget = knownTargetAddr(decData.decInst); letbrClass = getBrClass(decData.decInst); letpredTarget = decData.nextAddrPred; letpredDir = decData.takenPred; http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • Calculate target as best as decode can • Wrongnext addr? • New dec epoch • Tell exec addr of next instruction! • Send feedback • Enqueue to next stage on correct path let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default:decData.nextAddrPred; endcase; if(decodedTarget!= predTarget) begin decData.decEpoch= decData.decEpoch + 1; decData.nextAddrPred= decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path http://csg.csail.mit.edu/6.S078

Decode with mispredict detect • Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them. else begin // incorrect path decData.decEpoch= ddEpoch; decData.execEpoch= deEpoch; end ddEpoch<= decData.decEpoch; deEpoch<= decData.execEpoch; fr.deq; endrule http://csg.csail.mit.edu/6.S078

Handling redirect from execute Train and repair on redirect Just train on correct prediction if (execFeedback.notEmpty) begin match{.execEpoch, .fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc<= feedback.nextAddr; endelsebegin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end end http://csg.csail.mit.edu/6.S078

Handling redirect from decode Just repair never train on feedback from decode elseif (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch, .decEpoch, .fb} = decFeedback.first; if (execEpoch== feEpoch) begin if (!fb.correct) begin// epoch unchanged fdEpoch<= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc<= feedback.nextAddr; end else// dec feedback on correct prediction enqInst; end else// dec feedback, but in fetch is in new exec epoch enqInst; else // no feedback enqInst; http://csg.csail.mit.edu/6.S078

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline