Cmpe 421 advanced computer architecture
This presentation is the property of its rightful owner.
Sponsored Links
1 / 37

CMPE 421 Advanced Computer Architecture PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

CMPE 421 Advanced Computer Architecture. Control Hazard and Prediction PART4. Control Hazards. When the flow of instruction addresses is not sequential (i.e., PC = PC + 4); incurred by change of flow instructions Conditional branches ( beq , bne ) Unconditional branches ( j, jal, jr )

Download Presentation

CMPE 421 Advanced Computer Architecture

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cmpe 421 advanced computer architecture

CMPE 421 Advanced Computer Architecture

Control Hazard and Prediction

PART4


Control hazards

Control Hazards

  • When the flow of instruction addresses is not sequential (i.e., PC = PC + 4); incurred by change of flow instructions

    • Conditional branches (beq, bne)

    • Unconditional branches (j, jal, jr)

    • Exceptions

  • Possible approaches

    • Stall (impacts CPI)

    • Move decision point as early in the pipeline as possible, thereby reducing the number of stall cycles

    • Delay decision (requires compiler support)

    • Predict and hope for the best !

  • Control hazards occur less frequently than data hazards, but there is nothing as effective against control hazards as forwarding is for data hazards


Data hazards for branches

IF

IF

IF

IF

ID

ID

ID

ID

EX

EX

EX

EX

MEM

MEM

MEM

MEM

WB

WB

WB

WB

Data Hazards for Branches

  • If a comparison register is a destination of 2nd or 3rd preceding ALU instruction

add $1, $2, $3

add $4, $5, $6

beq $1, $4, target

  • Can resolve using forwarding


Data hazards for branches1

IF

IF

ID

ID

EX

EX

MEM

MEM

WB

WB

Data Hazards for Branches

  • If a comparison register is a destination of preceding ALU instruction or 2nd preceding load instruction

    • Need 1 stall cycle

lw $1, addr

add $4, $5, $6

IF

ID

beq stalled

ID

EX

MEM

WB

beq $1, $4, target


Data hazards for branches2

IF

ID

EX

MEM

WB

Data Hazards for Branches

  • If a comparison register is a destination of immediately preceding load instruction

    • Need 2 stall cycles

lw $1, addr

IF

ID

beq stalled

ID

beq stalled

ID

EX

MEM

WB

beq $1, $0, target


Jumps incur one stall

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

ALU

ALU

ALU

flush

Jumps Incur One Stall

  • Jumps not decoded until ID, so one flush is needed

  • Fortunately, jumps are very infrequent – only ~3% of the instruction mix

Fix jump hazard by waiting – stall – but affects CPI

j

I

n

s

t

r.

O

r

d

e

r

j target


Supporting id stage jumps

Jump

PCSrc

ID/EX

Shift

left 2

EX/MEM

IF/ID

Control

Add

MEM/WB

PC+4[31-28]

Branch

Add

4

Shift

left 2

Read Addr 1

Instruction

Memory

Data

Memory

Register

File

Read

Data 1

Read Addr 2

0

Read

Address

PC

Read

Data

Address

Write Addr

ALU

Read

Data 2

Write Data

Write Data

ALU

cntrl

16

32

Sign

Extend

Forward

Unit

Supporting ID Stage Jumps


Datapath branch and jump hardware

Jump

PCSrc

Shift

left 2

PC+4[31-28]

Branch

Add

Shift

left 2

Datapath Branch and Jump Hardware

ID/EX

EX/MEM

IF/ID

Control

Add

MEM/WB

4

Read Addr 1

Instruction

Memory

Data

Memory

Register

File

Read

Data 1

Read Addr 2

Read

Address

PC

Read

Data

Address

Write Addr

ALU

Read

Data 2

Write Data

Write Data

ALU

cntrl

16

32

Sign

Extend

Forward

Unit


Branch hazard

Branch Hazard

  • The delay is determining the next instruction to fetch is called “Control” or “Branch” hazard

  • Arising from the need to make decision based on the results of one instruction while other are executing

  • Pipeline can not know what the next instruction should be (in only receives branch instruction from memory)

  • Whether do branch or not is not know (taken) until MEM stage

  • These type of hazards occur less frequently

    • 15-20 % of all instructions are branches


Solution 1 always stall

Solution 1: Always stall

  • Assume extra H/W is added to test registers calculate the branch target address and update the PC during the second stage (see fig 6.7 and page 13)

    • The lw instr. is executed if the branch fails (fig 6.8 A) is stalled 200ps clock cycle before starting.

    • Disadvantage: For longer pipelines will slowdown the total exec. time.


Branch hazard stall on branch

Branch Hazard - Stall on Branch

FIGURE 6.7 Pipeline showing stalling on every conditional branch as solution to control hazards. This example assumes the conditional branch is taken, and the instruction at the destination of the branch is the OR instruction. There is a one-stage pipeline stall, or bubble, after the branch. In reality, the process of creating a stall is slightly more complicated, as we will see in Section 6.6. The effect on performance, however, is the same as would occur if a bubble were inserted. Page 380


Solution 2 prediction need to add h w for flushing instructions if are wrong

Solution 2: (Prediction) need to add H/W for flushing instructions if are wrong

  • Assume branch is not taken (Fig 6.8)

    • If so, pipeline proceeds at full speed

    • Fetch the next instruction in program order

    • When the branch is resolved:

      • If the branch is not taken, keep going, no problem

      • If the branch is taken, we need to flush 3 instructions in the pipeline

    • We use nops to discard instructions in the IF, ID, EX stage.

    • In the case of branches, we need to flush the instructions from the pipeline, so that they don't have any effect


Branch hazard prediction

Branch Hazard - Prediction

FIGURE 6.8 Predicting that branches are not taken as a solution to control hazard. (A)The top drawing shows the pipeline when the branch is not taken. (B) The bottom drawing shows the pipeline when the branch is taken. As we noted in Figure 6.7, the insertion of a bubble in this fashion simplifies what actually happens, at least during the first clock cycle immediately following the branch. Section 6.6 will reveal the details.


Case of dependence

Case of Dependence

  • We do not know which instructions actually follows the branch, until MEM stage of branch instruction

  • The pipeline, however will have already fetched 3 instructions (and or add) from the not taken path. (See Figure 6.37)

  • Stalls reduce performance

    • But are required to get correct results

  • Compiler can arrange code to avoid hazards and stalls

    • Requires knowledge of the pipeline structure


Cmpe 421 advanced computer architecture

The trouble with branches

Flush theseinstructions

(Set controlvalues to 0)

PC

FIGURE 6.37 The impact of the pipeline on the branch instruction. The numbers to the left of the instruction (40, 44, . . . ) are the addresses of the instructions. Since the branch instruction decides whether to branch in the MEM stage—clock cycle 4 for the beq instruction above—the three sequential instructions that follow the branch will be fetched and begin execution. Without intervention, those three following instructions will begin execution before beq branches to lw at location 72. (Figure 6.7 assumed extra hardware to reduce the control hazard to one clock cycle; this figure uses the nonoptimized datapath.)


The trouble with branches

DM

DM

Reg

Reg

Reg

Reg

IM

IM

ALU

ALU

flush

flush

flush

beq target

The trouble with branches

beq

Fix branch hazard by waiting – stall – but affects CPI

I

n

s

t

r.

O

r

d

e

r


Reducing branch delay

Reducing Branch Delay

  • Move hardware to determine outcome to ID stage

    • Target address adder

    • Register comparator

  • Example: branch taken

    36: sub $10, $4, $840: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7 ...72: lw $4, 50($7)


Example branch taken

Example: Branch Taken


Example branch taken1

Example: Branch Taken


Solutions for branch hazards

Solutions for branch hazards

  • Say, we move branch logic earlier in the pipeline:

    • ID stage is the earliest time possible

    • Need circuitry to calculate branch target address

      • Easy, since we have PC and offset from the IF stage

    • Need circuitry to evaluate branch condition:

      • Harder!

      • Branch condition may be dependent on earlier instructions!

      • Moving the condition checking earlier introduces more data hazards between the branch and earlier instructions on which the branch depends! (see page 3, 4 and 5)


Solutions for branch hazards1

Solutions for branch hazards

  • Say, we move branch logic earlier in the pipeline:

    • Need to take care of data hazards before the branch

    • Forwarding from the EX/MEM and the MEM/WB stage if the branch depends on prior instruction (page 4)

    • Data hazard can still occur, if the immediately preceding instruction generates a register which is used for the comparison in the branch.

    • At decode stage we need to decide whether we should bypass the ALU and use the dedicated branch condition logic (see page 21)


Moving branch decisions earlier in pipe

Moving Branch Decisions Earlier in Pipe

  • Move the branch decision hardware back to the EX stage

    • Reduces the number of stall (flush) cycles to two

    • Adds an and gate and a 2x1 mux to the EX timing path

  • Add hardware to compute the branch target address and evaluate the branch decision to the ID stage

    • Reduces the number of stall (flush) cycles to one (like with jumps)

      • But now need to add forwarding hardware in ID stage

    • Computing branch target address can be done in parallel with RegFile read (done for all instructions – only used when needed)

    • Comparing the registers can’t be done until after RegFile read, so comparing and updating the PC adds a mux, a comparator, and an and gate to the ID timing path

  • For deeper pipelines, branch decision points can be even later in the pipeline, be needing more stalls


Id branch forwarding issues

ID Branch Forwarding Issues

WBadd3 $1,

MEMadd2 $3,

EXadd1 $4,

IDbeq $1,$2,Loop

IFnext_seq_instr

  • MEM/WB “forwarding” is taken care of by the normal RegFile write before read operation (during same clock cycle)

  • Need to forward from the EX/MEM pipeline stage to the ID comparison hardware for cases like

WBadd3 $3,

MEMadd2 $1,

EXadd1 $4,

IDbeq $1,$2,Loop

IFnext_seq_instr

if (IDcontrol.Branch

and (EX/MEM.RegisterRd != 0)

and (EX/MEM.RegisterRd = IF/ID.RegisterRs))

ForwardC = 1

if (IDcontrol.Branch

and (EX/MEM.RegisterRd != 0)

and (EX/MEM.RegisterRd = IF/ID.RegisterRt))

ForwardD = 1

Forwards the result from the second previous instr. to either input of the compare


Supporting id stage branches

0

1

0

IF.Flush

0

Supporting ID Stage Branches

Branch

PCSrc

Hazard

Unit

ID/EX

EX/MEM

Control

IF/ID

Add

MEM/WB

4

Shift

left 2

Add

Compare

Read Addr 1

Instruction

Memory

Data

Memory

RegFile

Read Addr 2

Read

Address

Read Data 1

PC

Read Data

Write Addr

ALU

Address

ReadData 2

Write Data

Write Data

ALU

cntrl

16

Sign

Extend

32

Forward

Unit

Forward

Unit


Summary static branch prediction

Summary: Static Branch Prediction

  • Resolve branch hazards by assuming a given outcome and proceeding without waiting to see the actual branch outcome

  • Predict not taken – always predict branches will not be taken, continue to fetch from the sequential instruction stream, only when branch is taken does the pipeline stall

    • If taken, flush instructions after the branch (earlier in the pipeline)

      • in IF, ID, and EX stages if branch logic in MEM – three stalls

      • In IF and ID stages if branch logic in EX – two stalls

      • in IF stage if branch logic in ID – one stall

    • ensure that those flushed instructions haven’t changed the machine state – automatic in the MIPS pipeline since machine state changing operations are at the tail end of the pipeline (MemWrite (in MEM) or RegWrite (in WB))

    • restart the pipeline at the branch destination


Flushing

I

n

s

t

r.

O

r

d

e

r

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

flush

IM

IM

IM

IM

ALU

ALU

ALU

ALU

16 and $6,$1,$7

20 or r8,$1,$9

Flushing

4 beq $1,$2,2

  • To flush the IF stage instruction, assert IF.Flush to zero the instruction field of the IF/ID pipeline register (transforming it into a noop)

8 sub $4,$1,$5


Dynamic branch prediction topic for term project

Dynamic Branch Prediction(Topic for Term Project)

  • In deeper and superscalar pipelines, branch penalty is more significant

  • Use dynamic prediction

    • Branch prediction buffer (aka branch history table)

    • Indexed by recent branch instruction addresses

    • Stores outcome (taken/not taken)

    • To execute a branch

      • Check table, expect the same outcome

      • Start fetching from fall-through or target

      • If wrong, flush pipeline and flip prediction


Dynamic branch prediction

Dynamic Branch Prediction

  • A branch prediction buffer (aka branch history table (BHT)) in the IF stage addressed by the lower bits of the PC, contains a bit passed to the ID stage through the IF/ID pipeline register that tells whether the branch was taken the last time it was execute

    • Prediction bit may predict incorrectly (may be a wrong prediction for this branch this iteration or may be from a different branch with the same low order PC bits) but the doesn’t affect correctness, just performance

      • Branch decision occurs in the ID stage after determining that the fetched instruction is a branch and checking the prediction bit

    • If the prediction is wrong, flush the incorrect instruction(s) in pipeline, restart the pipeline with the right instruction, and invert the prediction bit

      • A 4096 bit BHT varies from 1% misprediction (nasa7, tomcatv) to 18% (eqntott)


Branch target buffer

BTB

Instruction

Memory

0

Read

Address

PC

Branch Target Buffer

  • The BHT predicts when a branch is taken, but does not tell where its taken to!

    • A branch target buffer (BTB) in the IF stage can cache the branch target address, but we also need to fetch the next sequential instruction. The prediction bit in IF/ID selects which “next” instruction will be loaded into IF/ID at the next clock edge

      • Would need a two read port instruction memory

  • Or the BTB can cache the branch takeninstructionwhile the instruction memory is fetching the next sequential instruction

  • If the prediction is correct, stalls can be avoided no matter which direction they go


1 bit prediction accuracy

1-bit Prediction Accuracy

  • Assume predict_bit = 0 to start (indicating branch not taken) and loop control is at the bottom of the loop code

  • First time through the loop, the predictor mispredicts the branch since the branch is taken back to the top of the loop; invert prediction bit (predict_bit = 1)

  • As long as branch is taken (looping), prediction is correct

  • Exiting the loop, the predictor again mispredicts the branch since this time the branch is not taken falling out of the loop; invert prediction bit (predict_bit = 0)

  • A 1-bit predictor will be incorrect twice when not taken

Loop: 1st loop instr

2nd loop instr

.

.

.

last loop instr

bne $1,$2,Loop

fall out instr

  • For 10 times through the loop we have a 80% prediction accuracy for a branch that is taken 90% of the time


1 bit predictor shortcoming

1-Bit Predictor: Shortcoming

  • Inner loop branches mispredicted twice!

outer: … …inner: …

beq …, …, inner … beq …, …, outer

  • Mispredict as taken on last iteration of inner loop

  • Then mispredict as not taken on first iteration of inner loop next time around


2 bit predictors

2-bit Predictors

  • A 2-bit scheme can give 90% accuracy since a prediction must be wrong twice before the prediction bit is changed

right 9 times

Loop: 1st loop instr

2nd loop instr

.

.

.

last loop instr

bne $1,$2,Loop

fall out instr

wrong on loop fall out

Taken

Not taken

1

Predict

Taken

Predict

Taken

1

11

10

Taken

right on 1st iteration

Not taken

Taken

Not taken

0

Predict

Not Taken

00

Predict

Not Taken

0

  • BHT also stores the initial FSM state

01

Taken

Not taken


  • Login