1 / 24

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. NA pred with decode feedback. Reg Read. Fetch. Decode. Execute. Memory. Write- back. xf.

babu
Download Presentation

Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/6.S078

  2. NA pred with decode feedback RegRead Fetch Decode Execute Memory Write-back xf df F D R X M W fr dr rr xr mr NextAddressPrediction DirectionPrediction http://csg.csail.mit.edu/6.S078

  3. Direction prediction recipe • Execute • Send redirects on mispredicts (unchanged) • Send direction prediction training • Decode • Check if next address matches direction pred • Send redirect if different (update naPred) • Fetch • Generate prediction • Learn from feedback • Accept redirects from later stages http://csg.csail.mit.edu/6.S078

  4. Epoch management recipe • Execute • On exec epoch mismatch - poison instruction • Otherwise, • On mispredict – change exec epoch and redirect. • Decode • On new exec epoch – update local exec/decode epochs • Otherwise, • On decode epoch mismatch – drop instruction • If not dropped, • On next addrmispredict– change decode epoch and redirect. • Fetch • On exec redirect – update local exec epoch • On decode redirect – if for current exec epoch then update localdecode epoch http://csg.csail.mit.edu/6.S078

  5. Add direction feedback • Feedback needs information for training direction predictor • Execute epoch • Decode epoch • Execute epoch typedefstruct { Bool correct; NaInfonaPredInfo; AddrnextAddr; DirInfodirPredInfo; Bool taken; } Feedback deriving (Bits, Eq); FIFOF#(Tuple3#(Epoch,Epoch,Feedback)) decFeedback<-mkFIFOF; FIFOF#(Tuple2#(Epoch,Feedback)) execFeedback<- mkFIFOF; http://csg.csail.mit.edu/6.S078

  6. Execute (branch analysis) • Note: mayhave been reset in decode • Always send feedback // after executing instruction... letnextEeEpoch = eeEpoch; letcond = execData.execInst.cond; letnextPc= cond?execData.execInst.addr: execData.pc+4; let correctPred = (nextPC == execData.nextAddrPred); if (!correctPred) nextEeEpoch += 1; eeEpoch<= nextEeEpoch; execFeedback.enq(tuple2(nextEeEpoch, Feedback{correct: correctPred, taken: cond, dirPredInfo: execData.dirPredInfo, naPredInfo: execData.naPredInfo, nextAddr: nextPc})); // enqueue instruction to next stage http://csg.csail.mit.edu/6.S078

  7. Decode with mispredict detect • New exec epoch • Same decepoch • Determine if epoch of incoming instruction is on good path ruledoDecode; letdecData = newDecData(fr.first); letcorrectPath = (decData.execEpoch != deEpoch) ||(decData.decEpoch == ddEpoch); letinstResp = decData.fInst.instResp; letpcPlus4 = decData.pc+4; if(correctPath) begin decData.decInst= decode(instResp, pcPlus4); lettarget = knownTargetAddr(decData.decInst); letbrClass = getBrClass(decData.decInst); letpredTarget = decData.nextAddrPred; letpredDir = decData.dirPred; http://csg.csail.mit.edu/6.S078

  8. Decode with mispredict detect • Calculate target as best as decode can • Wrong next addr? • New dec epoch • Tell exec addr of next instruction! • Send feedback • Enqueue to next stage on correct path let decodedTarget = case (brClass) NonBranch: pcPlus4; UncondKnown: target; CondBranch: (predDir?target:pcPlus4); default:decData.nextAddrPred; endcase; if (decodedTarget!= predTarget) begin decData.decEpoch= decData.decEpoch + 1; decData.nextAddrPred= decodedTarget; decFeedback.enq( tuple3(decData.execEpoch, decData.decEpoch, Feedback{correct: False, naPredInfo: decData.naPredInfo, nextAddr: decodedTarget, dirPredInfo: decData.dirPredInfo, taken: decData.takenPred})); end dr.enq(decData); end // of correct path http://csg.csail.mit.edu/6.S078

  9. Decode with mispredict detect • Preserve current epoch if instruction on incorrect path decData.*Epoch have been set properly so we always save them. else begin // incorrect path decData.decEpoch= ddEpoch; decData.execEpoch= deEpoch; end ddEpoch<= decData.decEpoch; deEpoch<= decData.execEpoch; fr.deq; endrule http://csg.csail.mit.edu/6.S078

  10. Integration into Fetch rule doFetch(); function Action enqInst(); action let d <- mem.side(MemReq{op: Ld, addr: fetchPC, data:?}; match {.nAddrPred,.naPredInfo}<-naPred.predict(fetchPc); match {.dirPred,.dirPredInfo}<-dirPred.predict(fetchPc); FBundlefInst = FBundle{instResp: d}; FDatafData = FData{pc: fetchPc, fInst: fInst, inum: iNum, execEpoch: feEpoch, naPredInfo:naPredInfo, nextAddrPred:nAddrPred, dirPredInfo:dirPredInfo, dirPred:dirPred }; iNum<= iNum + 1; fetchPc<= nAddrPred; fr.enq(fData); endaction endfunction http://csg.csail.mit.edu/6.S078

  11. Handling redirect from execute Train and repair on redirect Just train on correct prediction if (execFeedback.notEmpty) begin match{.execEpoch, .fb} = execFeedback.first; execFeedback.deq; if(!fb.correct) begin dirPred.repair(fb.dirPredInfo, fb.taken); dirPred.train(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); naPred.train(fb.naPredInfo, fb.nextAddr); feEpoch <= execEpoch; fetchPc<= feedback.nextAddr; endelsebegin dirPred.train(fb.dirPredInfo, fb.taken); naPred.train(fb.naPredInfo, fb.nextAddr); enqInst; end end http://csg.csail.mit.edu/6.S078

  12. Handling redirect from decode Just repair never train on feedback from decode elseif (decFeedback.notEmpty) begin decFeedback.deq; match {.execEpoch, .decEpoch, .fb} = decFeedback.first; if (execEpoch== feEpoch) begin if (!fb.correct) begin// epoch unchanged fdEpoch<= decEpoch; dirPred.repair(fb.dirPredInfo, fb.taken); naPred.repair(fb.naPredInfo, fb.nextAddr); fetchPc<= feedback.nextAddr; end else// dec feedback on correct prediction enqInst; end else// dec feedback, but fetch is in new exec epoch enqInst; else // no feedback enqInst; http://csg.csail.mit.edu/6.S078

  13. Immediate update issues Note: In the lab code we communicate the branch type of each instruction to allow training and repair to decide if they want to perform updates or not based on instruction type. • If the direction director does not update immediately on predictions things are easy. But if the predictor updates, we will predict and update the predictor on non-branches. • Possible solutions: • Move direction prediction to decode, so we know not to update on non-branches. But makes timing more critical. • Simply use direction predictor even on non-branch instructions. • Note: for superscaler issue designs this is a less significant problem. http://csg.csail.mit.edu/6.S078

  14. Predictor Primitive Index Prediction Depth P Update I U Width • Indexed table holding values • Operations • Predict • Update • Algebraic notation Prediction = P[Width, Depth](Index; Update) http://csg.csail.mit.edu/6.s078

  15. One-bit Predictor Simple temporal prediction 1 bit PC Prediction P I Taken U A21064(PC; T) = P[ 1, 2K ](PC; T) What happens on loop branches? At best, mispredicts twice for every use of loop. http://csg.csail.mit.edu/6.s078

  16. Two-bit Predictor 2 bits PC Prediction P Taken I +/- Adder U Counter[W,D](I; T) = P[W, D](I; if T then P+1 else P-1) A21164(PC; T) = MSB(Counter[2, 2K](PC; T)) http://csg.csail.mit.edu/6.s078

  17. History Register PC History P Taken I Concatenate U History(PC, T) = P(PC; P || T) http://csg.csail.mit.edu/6.s078

  18. Global History 0 Global History Prediction Concat +/- Taken GHist(;T) = MSB(Counter(History(0, T); T)) Ind-Ghist(PC;T) = MSB(Counter(PC || Hist(GHist(;T);T))) Can we take advantage of a pattern at a particular PC? http://csg.csail.mit.edu/6.s078

  19. Local History Local History Prediction PC Concat +/- Taken LHist(PC, T) = MSB(Counter(History(PC; T); T)) Can we take advantage of the global pattern at a particular PC? http://csg.csail.mit.edu/6.s078

  20. Two-level Predictor PC Global History Prediction 0 Concat Concat +/- Taken 2Level(PC, T) = MSB(Counter(History(0; T)||PC; T)) http://csg.csail.mit.edu/6.s078

  21. 0 0 Fetch PC k Two-Level Branch Predictor Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct) 2-bit global branch history shift register Shift in Taken/¬Taken results of each branch http://csg.csail.mit.edu/6.s078 Taken/¬Taken?

  22. Gshare Predictor PC Global History Prediction 0 xor Concat +/- Taken 2Level(PC, T) = MSB(Counter(History(0; T) PC; T)) http://csg.csail.mit.edu/6.s078

  23. Choosing Predictors LHist Prediction GHist Chooser Chooser = MSB(P(PC; P + (A==T) - (B==T)) or Chooser = MSB(P(GHist(PC; T); P + (A==T) - (B==T)) http://csg.csail.mit.edu/6.s078

  24. Tournament Branch Predictor(Alpha 21264) Local history table (1,024x10b) Local prediction (1,024x3b) Global Prediction (4,096x2b) Choice Prediction (4,096x2b) PC Prediction Global History (12b) Choice predictor learns whether best to use local or global branch history in predicting next branch Global history is speculatively updated but restored on mispredict Claim 90-100% success on range of applications http://csg.csail.mit.edu/6.s078

More Related