Renjiu Thomas, Manoij Franklin, Chris Wilkerson, and Jared Stark Presenter: Xiaoxiao Wang

Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlated Branches from a Large Global History Renjiu Thomas, Manoij Franklin, Chris Wilkerson, and Jared Stark Presenter: Xiaoxiao Wang

Agenda • Motivation and Related Work • Identifying Affector Branches at Run-time • Building Predictors Using Affector Information • Experimental Results • Conclusion

Motivation and Related Work • Processor pipelines have been growing deeper. Branch misprediction penalty will become very high[18]. • Small predictors’ accuracy can be greatly improved by size increase, but not for large predictors [1][5][12][13][16][19]. • Larger predictors increase prediction delay [2][8][16]. • Future transistor budgets permit larger area for branch predictors [4][16].

How to Improve Prediction Rate? • Not all branches in the long history may be correlated to the branch under prediction [11][20][21]--- more selective use. • Two primary reasons for related branches [6]: 1) proceeding branch’s outcome affects computation that determines the outcome of the succeeding branch (affector). 2) computations affecting their outcomes are (fully of partially) based on the same (or related) information (forerunner). • Identify correlated branches from a large global history.

Identifying Affector Branches at Run-time BB0 R1=R2 • B8 is to be determined • Latest 5 branches {B0 B2 B3 B5 B7} (TNTTN) • Affector blocks for B8: {BB2 BB3 BB7} => Affector branch for B8: {B0, B2, B5} • Affector Branch Bitmap for B8 is 11010 • Tracking the runtime dataflow and determine the affector branches for the last updates of each Architecture Register. B0 N T R1=R2+4 BB2 B2 N R2=R1+R2 BB3 B3 N BB5 BB6 B5 T R3=R4+4 BB7 B7 N BB8 B8 If R2==R3 N T

Affectors Affector Register File (ARF) Structure • Keepa separate record of affector information corresponding to each architecture register a entry in ARF. 0 1 2 30 1 0 0 0 0 0 1 1 0 1 0 31

Affector Branch Bitmap (ABB) Generation Algorithm • Principle 1: When the processor encounters a conditional branch, all entries in the ARF are shifted left by 1 bit and fill 0. • Principle 2: When the processor encounters a register-writing instruction, the ARF entries corresponding to the source registers are read, OR’ed together and written to the ARF entry corresponding to the destination register with a 1 in LSB. • Principle 3: When the processor encounters a conditional branch instruction, the ARF entries corresponding to its source registers are read and OR’ed generating ABB.

R0 X X X X 0 R0 0 0 0 0 0 X X X X 1 1 0 0 0 0 R1 R1 X X X X 0 1 1 0 0 0 R2 R2 X X X X 0 0 0 0 1 0 R3 R3 Princ3: R0 X X X 0 0 ABB X X X 1 0 R1 Princ1: X X X 0 0 1 1 0 1 0 R2 X X X 0 0 R3 R0 X X X 0 0 X X X 1 0 R1 Princ2: X X X 1 1 R2 X X X 0 0 R3 Affector Branch Bitmap (ABB) Generation Algorithm ARF after I2 ARF after B7 I0: R1=R2 BB0 B0 N T I2: R1=R2+4 BB2 B2 ARF after B2 N I3:R2=R1+R2 BB3 B3 N BB5 BB6 B5 ARF after I3 T I7: R3=R4+4 BB7 B7 N BB8 B8 If R2==R3 N T

Princ4: Shift right by 4-bit Misprediction Recovery • Principle 4: When a branch misprediction is detected, speculative updates to ARF after the mispredicted branch should be shifted out. Mispredicted branch 1 0 0 0 0 0 1 1 0 1 0 X X X X 1 0 0 0 0 0 1

Global History 1 0 1 1 0 1 1 1 0 1 1 … 1 0 1 Affector Bitmap 0 0 0 0 1 1 0 1 1 0 0 … 0 0 1 Mask (AND) Fold XOR Predictor Look Up Index Building Predictors Using Affector Information: Zeroing Scheme • Turning off Non-affector Bits and Hashing: Zeroing Scheme • All non-affector bits in the long global history are masked to become zeros by ANDing the branch’s ABB and the long global history. • Result is hashed down to the required number of bits using a fold and XOR hash technique. • The identified affectors are retained in their respective positions. 0 0 0 0 0 1 0 1 0 0 0 … 0 0 1

Fold XOR Building Predictors Using Affector Information: Packing Scheme • Turning off Non-affector Bits Packing and Hashing: Packing Scheme • Remove the non-affectors altogether. • Result is hashed down to the required number of bits using a fold and XOR hash technique. • The identified affectors are not retained in their respective positions. Global History 1 0 1 1 0 1 1 1 0 1 1 … 1 0 1 Affector Bitmap 0 0 0 0 1 1 0 1 1 0 0 … 0 0 1 Mask (AND) 0 0 0 0 0 1 0 1 0 0 0 … 0 0 1 Pack 0 11 0 … 1 Predictor Look Up Index

Read ARF Read Corrector Predictor (Rare Event Prediction) Hash Instruction Line Predictor Global History Compare Tag Primary Global Predictor Hit Corrector Prediction Proposed Predictor Organization Stage1 Stage2 Stage3 Stage4 One Cycle Prediction Primary Prediction (Perceptron or YAGS)

Experiment Setup • SimpleScalar v3.0 using Alpha ISA • 12 benchmarks from SPEC95 and SPEC2000 integer benchmark suites.

Experimental Evaluation Figure 1. Misprediction Results for Zeroing and Packing techniques for our Corrector Predictor along with (i) Perceptron Primary Predictor; (ii) YAGS Primary Predictor

Experimental Evaluation Figure 2. (i) Performance of a Modeled Superscalar for Various Branch Corrector Predictor Schemes. (ii) Per-benchmark Misprediction Rates for the Corresponding Corrector Predictors.

Conclusion • The hard-to-predict branches of a primary global predictor is predicted by a very accurate corrector predictor with one or two cycles additional latency. • A technique by which a long global history can be used for this corrector predictor by identifying correlated branches in history using run-time dataflow information is proposed. • Two prediction schemes Zeroing and Packing are proposed. • Adding a 8KB affector history based corrector predictor to a 16KB perceptron primary predictor decreases the average misprediction rate for 12 benchmarks from 6.3% to 5.7%.

Renjiu Thomas, Manoij Franklin, Chris Wilkerson, and Jared Stark Presenter: Xiaoxiao Wang

Renjiu Thomas, Manoij Franklin, Chris Wilkerson, and Jared Stark Presenter: Xiaoxiao Wang

Presentation Transcript

Venturing Leader Specific Training

Benjamin Franklin

Classroom Presenter 3

ECC6 Upgrade Project Lessons Learned Arieh Stark – Keter Plastic Ltd

Thomas Cowperthwait Eakins [1844 – 1916]

How Ben Franklin Stole the Lightning

The Lancet’s Stillbirth Series

Franklin D. Roosevelt and the shadow of War

Final Exam Review

PowerPoint Presentations

Presenter Introduction

Franklin County ADVISORY GROUP 2013 ANNUAL MEETING

When Cryptography Meets Storage

E-Rate for Beginners

Franklin D. Roosevelt

C R E A T I V E S

Data Integration with Uncertainty

Topic Models for Social Network Analysis and Bibliometrics

Venturing Leader Specific Training

The Presenter Manifesto : 8 Distinctions of a World Class Presenter by @eric_feng @slidecomet @itseugenec