CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

1 / 35

# CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction - PowerPoint PPT Presentation

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction . Sandeep K. S. Gupta School of Computing and Informatics Arizona State University. Based on Slides by David Patterson, Al Davis, and Luddy Harrison. Agenda. Dynamic Branch Prediction 1-Bit Predictor 2-Bit Predictor

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction' - arlene

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

Sandeep K. S. Gupta

School of Computing and Informatics

Arizona State University

Based on Slides by David Patterson, Al Davis, and Luddy Harrison

Agenda
• Dynamic Branch Prediction
• 1-Bit Predictor
• 2-Bit Predictor
• Correlating Predictor
• Tournament Predictor
• Programming Assignment 1: Case Study 2 on pg 149 – Modeling a Branch Predictor in C or JAVA.

CSE420/598

Dynamic Branch Prediction
• Why does prediction work?
• Underlying algorithm has regularities
• Data that is being operated on has regularities
• Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems
• Is dynamic branch prediction better than static branch prediction?
• Seems to be
• There are a small number of important branches in programs which have dynamic behavior

CSE420/598

Control Hazard (Recap)
• In the 5-stage in-order processor: assume always taken or assume always not taken; if the branch goes the other way, squash mis-fetched instructions
• Modern out-of-order processors: dynamic branch prediction
• Branch predictor: a cache of recent branch outcomes

CSE420/598

Pipeline without Branch Predictor

PC

IF (br)

Compare

Br-target

PC + 4

In the 5-stage pipeline, a branch completes in two cycles 

If the branch went the wrong way, one incorrect instr is fetched 

One stall cycle per incorrect branch

CSE420/598

Pipeline with Branch Predictor

PC

IF (br)

Compare

Br-target

Branch

Predictor

In the 5-stage pipeline, a branch completes in two cycles 

If the branch went the wrong way, one incorrect instr is fetched 

One stall cycle per incorrect branch

CSE420/598

Branch Mispredict Penalty
• Performance = ƒ(accuracy, cost of misprediction)
• Assume: no data or structural hazards; only control hazards; every 5th instruction is a branch; branch predictor accuracy is 90%
• Slowdown = 1 / (1 + stalls per instruction)
• Stalls per instruction = % branches x %mispreds x penalty

= 20% x 10% x 1

= 0.02

• Slowdown = 1/1.02 ; if penalty = 20, slowdown = 1/1.4

CSE420/598

Dynamic Branch Prediction – 1 Bit Prediction
• Branch History Table (BHT): Lower bits of PC address index table of 1-bit values
• Says whether or not branch taken last time
• For each branch, keep track of what happened last time and use that outcome as the prediction

CSE420/598

1-bit BHT a.k.a Branch Prediction Buffer (BPB)

Predict:If BPB entry is 0, fetch PC+1If BPB entry is 1, fetch L

Update:If branch is taken, BPB := 1If branch is not taken, BPB := 0

CSE420/598

Twice Mispredicted Loop Branches

L: ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11, #1BNE L

SUB R10, R10, #1 BNE M

CSE420/598

Problem with 1-bit BHT
• What are prediction accuracies for branches 1 and 2 ?

while (1) {

for (i=0;i<10;i++) { branch-1

}

for (j=0;j<20;j++) { branch-2

}}

• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):
• End of loop case, when it exits instead of looping as before
• First time through loop and on next time through code, when it predicts exit instead of looping

CSE420/598

2-Bit Prediction
• For each branch, maintain a 2-bit saturating counter:
• if the branch is taken: counter = min(3,counter+1)
• if the branch is not taken: counter = max(0,counter-1)
• If (counter >= 2), predict taken, else predict not taken
• Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”)
• Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor)
• Can be easily extended to N-bits (in most processors, N=2)

CSE420/598

T

Predict Taken

Predict Taken

T

NT

NT

NT

Predict Not

Taken

Predict Not

Taken

T

T

NT

Dynamic Branch Prediction
• Solution: 2-bit scheme where change prediction only if get misprediction twice in a row
• Red: stop, not taken
• Green: go, taken
• Adds hysteresis to decision making process

CSE420/598

Bimodal Predictor

Table of

16K entries

of 2-bit

saturating

counters

14 bits

Branch PC

CSE420/598

BHT Accuracy
• Mispredict because either:
• Wrong guess for that branch
• Got branch history of wrong branch when index the table
• 4096 entry table:

Integer

CSE420/598

Floating Point

Correlating Predictors
• Basic branch prediction: maintain a 2-bit saturating counter for each entry (or use 10 branch PC bits to index into one of 1024 counters) – captures the recent “common case” for each branch
• If a branch recently went 01111, expect 0; if it recently went 11101, expect 1; can we have a separate counter for each case?
• If the previous branches went 01, expect 0; if the previous branches went 11, expect 1; can we have a separate counter for each case?
• Hence, build correlating predictors

CSE420/598

Local/Global Predictors
• Instead of maintaining a counter for each branch to capture the common case,
• Maintain a counter for each branch and surrounding pattern
• If the surrounding pattern belongs to the branch being predicted, the predictor is referred to as a local predictor
• If the surrounding pattern includes neighboring branches, the predictor is referred to as a global predictor

CSE420/598

Global Predictor

A single register that keeps track

of recent history for all branches

Table of

16K entries

of 2-bit

saturating

counters

00110101

8 bits

6 bits

Branch PC

Also referred to as a two-level predictor

CSE420/598

Local Predictor

Also a two-level predictor that only

uses local histories at the first level

Branch PC

Table of

16K entries

of 2-bit

saturating

counters

Use 6 bits of branch PC to

index into local history table

10110111011001

14-bit history

indexes into

next level

Table of 64 entries of 14-bit

histories for a single branch

CSE420/598

Correlated Branch Prediction
• Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table
• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters
• Thus, old 2-bit BHT is a (0,2) predictor
• Global Branch History: m-bit shift register keeping T/NT status of last m branches.
• Each entry in table has mn-bit predictors.

CSE420/598

Correlating Branches
• (2,2) predictor
• – Behavior of recent branches selects between four predictions of next branch, updating just that prediction

4

2-bits per branch predictor

Prediction

2-bit global branch history

CSE420/598

Accuracy of Different Schemes

20%

4096 Entries 2-bit BHT

Unlimited Entries 2-bit BHT

1024 Entries (2,2) BHT

18%

16%

14%

12%

11%

Frequency of Mispredictions

10%

8%

6%

6%

6%

6%

5%

5%

4%

4%

2%

1%

1%

0%

0%

nasa7

matrix300

tomcatv

doducd

spice

fpppp

gcc

expresso

eqntott

li

4,096 entries: 2-bits per entry

Unlimited entries: 2-bits/entry

1,024 entries (2,2)

CSE420/598

Tournament Predictors

• A local predictor might work well for some branches or
• programs, while a global predictor might work well for others
• Provide one of each and maintain another predictor to
• identify which predictor is best for each branch

Local

Predictor

M

U

X

Global

Predictor

Branch PC

Tournament

Predictor

Table of 2-bit

saturating counters

CSE420/598

Global Predictor – Example

What is the total capacity of this branch predictor?

A single register that keeps track

of recent history for all branches

Table of

2-bit

saturating

counters

00110101

10 bits

4 bits

Branch PC

Also referred to as a two-level predictor

CSE420/598

Local Predictor – Example

What is the total capacity of this branch predictor?

Branch PC

Table of

2-bit

saturating

counters

Use 8 bits of branch PC to

index into local history table

10110111

Table of 8-bit histories

for a single branch

CSE420/598

Example

• Consider the following tournament branch predictor: Fourteen bits of
• the PC are used to index into a table of 3-bit saturating counters that
• predict whether we should use a local or global prediction. The global
• predictor concatenates 8 bits of branch PC and 6 bits of global history
• to index into 2-bit saturating counters. The local predictor uses 8 bits
• of branch PC to select an 8-bit local history that then indexes into a
• table of 2-bit saturating counters. What is the capacity of each
• structure in this branch predictor?

CSE420/598

Tournament Predictors
• Multilevel branch predictor
• Use n-bit saturating counter to choose between predictors
• Usual choice between global and local predictors

CSE420/598

Tournament Predictors

Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:

• Global predictor
• 4K entries index by history of last 12 branches (212 = 4K)
• Each entry is a standard 2-bit predictor
• Local predictor
• Local history table: 1024 10-bit entries recording last 10 branches, index by branch address
• The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters

CSE420/598

Comparing Predictors (Fig. 2.8)
• Advantage of tournament predictor is ability to select the right predictor for a particular branch
• Particularly crucial for integer benchmarks.
• A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks

CSE420/598

6% misprediction rate per branch SPECint (19% of INT instructions are branch)

2% misprediction rate per branch SPECfp(5% of FP instructions are branch)

SPECint2000

SPECfp2000

CSE420/598

Branch Target Prediction

• In addition to predicting the branch direction, we must
• also predict the branch target address
• Branch PC indexes into a predictor table; indirect branches
• might be problematic
• Most common indirect branch: return from a procedure –
• can be easily handled with a stack of return addresses

CSE420/598

Summary
• When comparing Branch predictors – ensure that they are of same “size”.
• Correlating predictor’s predict branch direction based on behavior of neighboring branches
• Tournament predictors select between global and local predictors
• Integer benchmarks benefit greatly from global and correlating predictors
• Next class BTB, Dynamic Scheduling of Instructions.

CSE420/598