cse 420 598 computer architecture lec 9 chapter 2 branch prediction l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction PowerPoint Presentation
Download Presentation
CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

Loading in 2 Seconds...

play fullscreen
1 / 35

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction . Sandeep K. S. Gupta School of Computing and Informatics Arizona State University. Based on Slides by David Patterson, Al Davis, and Luddy Harrison. Agenda. Dynamic Branch Prediction 1-Bit Predictor 2-Bit Predictor

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction' - arlene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cse 420 598 computer architecture lec 9 chapter 2 branch prediction

CSE 420/598 Computer Architecture Lec 9 – Chapter 2 - Branch Prediction

Sandeep K. S. Gupta

School of Computing and Informatics

Arizona State University

Based on Slides by David Patterson, Al Davis, and Luddy Harrison

agenda
Agenda
  • Dynamic Branch Prediction
  • 1-Bit Predictor
  • 2-Bit Predictor
  • Correlating Predictor
  • Tournament Predictor
  • Programming Assignment 1: Case Study 2 on pg 149 – Modeling a Branch Predictor in C or JAVA.

CSE420/598

dynamic branch prediction
Dynamic Branch Prediction
  • Why does prediction work?
    • Underlying algorithm has regularities
    • Data that is being operated on has regularities
    • Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems
  • Is dynamic branch prediction better than static branch prediction?
    • Seems to be
    • There are a small number of important branches in programs which have dynamic behavior

CSE420/598

control hazard recap
Control Hazard (Recap)
  • In the 5-stage in-order processor: assume always taken or assume always not taken; if the branch goes the other way, squash mis-fetched instructions
  • Modern out-of-order processors: dynamic branch prediction
  • Branch predictor: a cache of recent branch outcomes

CSE420/598

slide6

Pipeline without Branch Predictor

PC

IF (br)

Reg Read

Compare

Br-target

PC + 4

In the 5-stage pipeline, a branch completes in two cycles 

If the branch went the wrong way, one incorrect instr is fetched 

One stall cycle per incorrect branch

CSE420/598

slide7

Pipeline with Branch Predictor

PC

IF (br)

Reg Read

Compare

Br-target

Branch

Predictor

In the 5-stage pipeline, a branch completes in two cycles 

If the branch went the wrong way, one incorrect instr is fetched 

One stall cycle per incorrect branch

CSE420/598

branch mispredict penalty
Branch Mispredict Penalty
  • Performance = ƒ(accuracy, cost of misprediction)
  • Assume: no data or structural hazards; only control hazards; every 5th instruction is a branch; branch predictor accuracy is 90%
  • Slowdown = 1 / (1 + stalls per instruction)
  • Stalls per instruction = % branches x %mispreds x penalty

= 20% x 10% x 1

= 0.02

  • Slowdown = 1/1.02 ; if penalty = 20, slowdown = 1/1.4

CSE420/598

dynamic branch prediction 1 bit prediction
Dynamic Branch Prediction – 1 Bit Prediction
  • Branch History Table (BHT): Lower bits of PC address index table of 1-bit values
    • Says whether or not branch taken last time
    • No address check
  • For each branch, keep track of what happened last time and use that outcome as the prediction

CSE420/598

1 bit bht a k a branch prediction buffer bpb
1-bit BHT a.k.a Branch Prediction Buffer (BPB)

Predict:If BPB entry is 0, fetch PC+1If BPB entry is 1, fetch L

Update:If branch is taken, BPB := 1If branch is not taken, BPB := 0

CSE420/598

twice mispredicted loop branches
Twice Mispredicted Loop Branches

M: ADD R1, R2, R3

L: ADD R4, R5, R6 MUL R7, R8, R9 SUB R11, R11, #1BNE L

SUB R10, R10, #1 BNE M

CSE420/598

problem with 1 bit bht
Problem with 1-bit BHT
  • What are prediction accuracies for branches 1 and 2 ?

while (1) {

for (i=0;i<10;i++) { branch-1

}

for (j=0;j<20;j++) { branch-2

}}

  • Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iterations before exit):
    • End of loop case, when it exits instead of looping as before
    • First time through loop and on next time through code, when it predicts exit instead of looping

CSE420/598

2 bit prediction
2-Bit Prediction
  • For each branch, maintain a 2-bit saturating counter:
    • if the branch is taken: counter = min(3,counter+1)
    • if the branch is not taken: counter = max(0,counter-1)
  • If (counter >= 2), predict taken, else predict not taken
  • Advantage: a few atypical branches will not influence the prediction (a better measure of “the common case”)
  • Especially useful when multiple branches share the same counter (some bits of the branch PC are used to index into the branch predictor)
  • Can be easily extended to N-bits (in most processors, N=2)

CSE420/598

dynamic branch prediction16

T

Predict Taken

Predict Taken

T

NT

NT

NT

Predict Not

Taken

Predict Not

Taken

T

T

NT

Dynamic Branch Prediction
  • Solution: 2-bit scheme where change prediction only if get misprediction twice in a row
  • Red: stop, not taken
  • Green: go, taken
  • Adds hysteresis to decision making process

CSE420/598

slide17

Bimodal Predictor

Table of

16K entries

of 2-bit

saturating

counters

14 bits

Branch PC

CSE420/598

bht accuracy
BHT Accuracy
  • Mispredict because either:
    • Wrong guess for that branch
    • Got branch history of wrong branch when index the table
  • 4096 entry table:

Integer

CSE420/598

Floating Point

correlating predictors
Correlating Predictors
  • Basic branch prediction: maintain a 2-bit saturating counter for each entry (or use 10 branch PC bits to index into one of 1024 counters) – captures the recent “common case” for each branch
  • Can we take advantage of additional information?
    • If a branch recently went 01111, expect 0; if it recently went 11101, expect 1; can we have a separate counter for each case?
    • If the previous branches went 01, expect 0; if the previous branches went 11, expect 1; can we have a separate counter for each case?
  • Hence, build correlating predictors

CSE420/598

local global predictors
Local/Global Predictors
  • Instead of maintaining a counter for each branch to capture the common case,
    • Maintain a counter for each branch and surrounding pattern
    • If the surrounding pattern belongs to the branch being predicted, the predictor is referred to as a local predictor
    • If the surrounding pattern includes neighboring branches, the predictor is referred to as a global predictor

CSE420/598

slide21

Global Predictor

A single register that keeps track

of recent history for all branches

Table of

16K entries

of 2-bit

saturating

counters

00110101

8 bits

6 bits

Branch PC

Also referred to as a two-level predictor

CSE420/598

slide22

Local Predictor

Also a two-level predictor that only

uses local histories at the first level

Branch PC

Table of

16K entries

of 2-bit

saturating

counters

Use 6 bits of branch PC to

index into local history table

10110111011001

14-bit history

indexes into

next level

Table of 64 entries of 14-bit

histories for a single branch

CSE420/598

correlated branch prediction
Correlated Branch Prediction
  • Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table
  • In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters
    • Thus, old 2-bit BHT is a (0,2) predictor
  • Global Branch History: m-bit shift register keeping T/NT status of last m branches.
  • Each entry in table has mn-bit predictors.

CSE420/598

correlating branches
Correlating Branches
  • (2,2) predictor
    • – Behavior of recent branches selects between four predictions of next branch, updating just that prediction

Branch address

4

2-bits per branch predictor

Prediction

2-bit global branch history

CSE420/598

accuracy of different schemes
Accuracy of Different Schemes

20%

4096 Entries 2-bit BHT

Unlimited Entries 2-bit BHT

1024 Entries (2,2) BHT

18%

16%

14%

12%

11%

Frequency of Mispredictions

10%

8%

6%

6%

6%

6%

5%

5%

4%

4%

2%

1%

1%

0%

0%

nasa7

matrix300

tomcatv

doducd

spice

fpppp

gcc

expresso

eqntott

li

4,096 entries: 2-bits per entry

Unlimited entries: 2-bits/entry

1,024 entries (2,2)

CSE420/598

slide26

Tournament Predictors

  • A local predictor might work well for some branches or
  • programs, while a global predictor might work well for others
  • Provide one of each and maintain another predictor to
  • identify which predictor is best for each branch

Local

Predictor

M

U

X

Global

Predictor

Branch PC

Tournament

Predictor

Table of 2-bit

saturating counters

CSE420/598

slide27

Global Predictor – Example

What is the total capacity of this branch predictor?

A single register that keeps track

of recent history for all branches

Table of

2-bit

saturating

counters

00110101

10 bits

4 bits

Branch PC

Also referred to as a two-level predictor

CSE420/598

slide28

Local Predictor – Example

What is the total capacity of this branch predictor?

Branch PC

Table of

2-bit

saturating

counters

Use 8 bits of branch PC to

index into local history table

10110111

Table of 8-bit histories

for a single branch

CSE420/598

slide29

Example

  • Consider the following tournament branch predictor: Fourteen bits of
  • the PC are used to index into a table of 3-bit saturating counters that
  • predict whether we should use a local or global prediction. The global
  • predictor concatenates 8 bits of branch PC and 6 bits of global history
  • to index into 2-bit saturating counters. The local predictor uses 8 bits
  • of branch PC to select an 8-bit local history that then indexes into a
  • table of 2-bit saturating counters. What is the capacity of each
  • structure in this branch predictor?

CSE420/598

tournament predictors
Tournament Predictors
  • Multilevel branch predictor
  • Use n-bit saturating counter to choose between predictors
  • Usual choice between global and local predictors

CSE420/598

tournament predictors31
Tournament Predictors

Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:

  • Global predictor
    • 4K entries index by history of last 12 branches (212 = 4K)
    • Each entry is a standard 2-bit predictor
  • Local predictor
    • Local history table: 1024 10-bit entries recording last 10 branches, index by branch address
    • The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters

CSE420/598

comparing predictors fig 2 8
Comparing Predictors (Fig. 2.8)
  • Advantage of tournament predictor is ability to select the right predictor for a particular branch
    • Particularly crucial for integer benchmarks.
    • A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks

CSE420/598

pentium 4 misprediction rate per 1000 instructions not per branch
Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)

6% misprediction rate per branch SPECint (19% of INT instructions are branch)

2% misprediction rate per branch SPECfp(5% of FP instructions are branch)

SPECint2000

SPECfp2000

CSE420/598

slide34

Branch Target Prediction

  • In addition to predicting the branch direction, we must
  • also predict the branch target address
  • Branch PC indexes into a predictor table; indirect branches
  • might be problematic
  • Most common indirect branch: return from a procedure –
  • can be easily handled with a stack of return addresses

CSE420/598

summary
Summary
  • When comparing Branch predictors – ensure that they are of same “size”.
  • Correlating predictor’s predict branch direction based on behavior of neighboring branches
  • Tournament predictors select between global and local predictors
  • Integer benchmarks benefit greatly from global and correlating predictors
  • Next class BTB, Dynamic Scheduling of Instructions.

CSE420/598