1 / 12

Branch Predictor Design for AE64000

Branch Predictor Design for AE64000. Lynn Choi Department of Electronics and Computer Engineering Korea University lchoi@korea.ac.kr Session: 5D Paper: 8. Motivation. Demand for high performance embedded processors ㅡ High-end embedded applications ㅡ Many uses of embedded processors

sukey
Download Presentation

Branch Predictor Design for AE64000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Branch Predictor Design for AE64000 Lynn Choi Department of Electronics and Computer Engineering Korea University lchoi@korea.ac.kr Session: 5D Paper: 8

  2. Motivation • Demand for high performance embedded processors ㅡHigh-end embedded applications ㅡMany uses of embedded processors • Addition of a branch predictor ㅡTo achieve higher performance ㅡThe most cost-effective method

  3. AE64000 Characteristics • IFU to minimize performance decrease caused by LERI’s • Additional two pipeline stages (IFU1+IFU2) to eliminate LERI’s • 3 line buffers to store 12 instructions • PrePC in IFU and PC in the pipeline core • Branch misprediction penalty • Branch misprediction penalty : 3 cycles

  4. Branch Predictor Design for AE64000 • Issues in branch predictor design for AE64000 • AE64000 has additional two stages (IFU1-IFU2) in front of 5-stage pipeline core. At which pipeline stage prediction should be performed?  IFU1 stage • Due to line buffers in the IFU, predicted target addresses need to be buffered as well to verify branch prediction results  need buffers for predicted branch target addresses (PTAB) • Since 4 instructions are fetched at a time, multiple branches can be fetched at a time as well.  Only the first taken branch will be predicted. To do that, TAC has the precise target address. • Branch misprediction penalty • Can be reduced from 3 to 2 cycles by updating PPC at the same cycle that PC is updated by adding a MUX in the IFU

  5. Branch Predictor For AE64000 • Separate BPT with TAC • PTAB to store predicted target address for instructions in the line buffer • Branch prediction verification in the ID stage

  6. Predicted Target Address Buffer • Predicted Target Address Buffer (PTAB) • For branch instructions in the line buffer • When we send a branch instruction to the pipeline core, we also send the corresponding predicted target address

  7. Simulation Environment • Developed a cycle-accurate AE64000 simulator • Simulated 1 billion instructions • 30 minutes on P4 1.6GHz with 512MB RAM • Indirect branches are not predicted in the simulation • Input: AE64000 compiler binary, memory & predictor configuration parameters • Output: IPC, BPT/TAC hit ratios, etc. • Benchmark • SPECint95 (compress, go) • Dhrystone • Whetstone • Predictors tested • Last-time predictor • Bimodal predictor • G-share predictor Simulator Block Diagram

  8. Simulation Results • Without branch predictor (IPC)

  9. Simulation Results • Last-time branch predictor

  10. Simulation Results (cont’d) • Bimodal Branch Predictor

  11. Simulation Results (cont’d) • G-share Branch Predictor

  12. Conclusion • Simulation result analysis • Consider both performance and area • The additional performance gain by g-share and bimodal predictors are negligible compared to their size and complexity. • Final design • Last-time predictor with 4-way set-associative 8-entry TAC with LRU replacement • IPC is improved 10% by reducing the branch prediction penalty from 3 to 2 cycles • Additional 15% IPC improvement by branch predictor • About 11500 gate (about 2.64% area) in Verilog HDL model • Thus, we can improve the performance of AE64000 by 25% with less than 3% cost

More Related