Mips pipelining
Download
1 / 73

MIPS Pipelining - PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on

MIPS Pipelining. Chapter 4 Sections 4.5 – 4.8 Dr. Iyad F. Jafar. Outline. Introduction Why Pipelining? MIPS Pipelined Datapath MIPS Pipelined Control Pipelining Hazards Structural Hazards Data Hazards Control Hazards Exceptions and Interrupts Fallacies and Pitfalls

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' MIPS Pipelining' - alia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Mips pipelining

MIPS Pipelining

Chapter 4

Sections 4.5 – 4.8

Dr. Iyad F. Jafar


Outline
Outline

  • Introduction

  • Why Pipelining?

  • MIPS Pipelined Datapath

  • MIPS Pipelined Control

  • Pipelining Hazards

  • Structural Hazards

  • Data Hazards

  • Control Hazards

  • Exceptions and Interrupts

  • Fallacies and Pitfalls

  • Reading Assignment


Introduction
Introduction

  • Single-cycle datapath

    • Simple!

    • Hardware replication?

    • Cycle time?

  • Multi-cycle datapath

    • More involved

    • Less HW replication of major units

    • Better performance if the delay of major functional units is balanced!

  • Can we do any better?

    • Pipelining!


Introduction1

IFetch

IFetch

IFetch

Exec

Exec

Exec

Mem

Mem

Mem

WB

WB

WB

Introduction

  • Pipelining

    • In Multi-cycle, only one major unit is used in each cycle while other units are idle!

    • Why not to use them to do something else?

    • Basically, start the next instruction before the current one is finished!

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6

Cycle 7

Cycle 8

Dec

LW

Dec

SW

Dec

R-Type


Introduction2
Introduction

  • Pipelining

    • The time required to execute one instruction (Instruction latency) is not affected!

    • However, the number of instructions finished per unit time (Throughput) is increased

    • Thus, Pipelining improves the throughput not latency!

    • Most modern processors are pipelined!

    • Notes

      • As in multi-cycle, the cycle time is determined by the slowest unit!

      • However, similar to single-cycle, we can get one instruction done every cycle!

      • It is assumed that all instructions take the same number of cycles!


Introduction3

Single Cycle Implementation:

Cycle 1

Cycle 2

Clk

lw

sw

Waste

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6

Cycle 7

Cycle 8

Cycle 9

Cycle 10

Clk

lw

sw

R-type

IFetch

Dec

Exec

Mem

WB

IFetch

Dec

Exec

Mem

IFetch

Pipeline Implementation:

IFetch

Dec

Exec

Mem

WB

lw

IFetch

Dec

Exec

Mem

WB

sw

IFetch

Dec

Exec

Mem

WB

R-type

Introduction

R-type

Multiple Cycle Implementation:


Why pipelining

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Time to fill the pipeline

Why Pipelining?

Time (clock cycles)

  • For Performance!

Once the pipeline is full, one instruction is completed every cycle, so CPI = 1 (similar to Single-cycle)

I

n

s

t

r.

O

r

d

e

r

Inst 1

Inst 2

Inst 3

Inst 4

Inst 5


Why pipelining1
Why Pipelining?

  • Example 1. Comparing pipelining to single-cycle

    Consider a program that consists of a large number of LOAD instructions only that is executed on a single-cycle CPU and 5-stage pipelined CPU with the operation time for the major units (memory, ALU, and register file) to be 200 ps in both cases.

    1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining.

    2) Determine the time required to finish executing the first 3 LOAD instructions

    3) Repeat (1) and (2) if the delay of the register file is 100 ps instead of 200 ps.

    Cycle times for the two implementations

    CCSC = 200 + 200 + 200 + 200 + 200 = 1000 ps

    CCPP = 200 ps


Why pipelining2
Why Pipelining?

  • Example 1. Comparing pipelining to single-cycle

    1) Determine the time required to finish executing 1,000,000 LOAD instructions and compute the speed up of pipelining.

Single-cycle

TimeSC = 1000 ps x 1000000 = 1,000,000,000 ps

Pipelining

TimePP = 1000 ps + 200 ps x 999999 = 200,000,800 ps

Speeup = 1,000,000,000 / 200,000,800 = 4.99998

(very close to the number of stages)


Why pipelining3
Why Pipelining?

  • Example 1. Comparing pipelining to single-cycle

    2) Determine the time required to finish executing the first 3 LOAD instructions and compute the speed up of pipelining

Single-cycle

TimeSC = 1000 x 3 = 3000 ps

Pipelining

TimePP = 200 x 5 +200 + 200 = 1400 ps

Speeup = 3000 / 1400 = 2.14

(less than the number of stages)


Why pipelining4
Why Pipelining?

  • Example 1. Comparing pipelining to single-cycle

    3) Repeat (1) and (2) if the delay of the register file is 100 ps .

    CCSC = 200 + 100 + 200 + 200 + 100 = 800 ps

    CCPP = 200 ps

For 1,000,000 instructions

TimeSC = 800 x 1,000,000 = 800,000,000 ps

TimePP = 1000+ 200x999,999 = 200,000,800ps

Speeup = 800,000,000/ 200,000,600 = 3.99998 (<5)

For 3 instructions

TimeSC = 800 x 3 = 2400 ps

TimePP = 1000 + 200x 2 = 1400 ps

Speeup = 2400/ 1400 = 1.71 (<5)


Why pipelining5
Why Pipelining?

  • Example 1. Summary

  • Ideally,the pipeline speedup is n times faster than the single-cycle, where n is the number of pipeline stages.

    • In the 5-stage MIPS, the pipelined version would be 5 times faster.

    • When the pipeline is full, the throughput will be one instruction per cycle

  • Many factors affect pipelining performance

    • Time to fill empty the pipeline

    • Number of instructions to execute

    • Unbalancecd delay of pipeline stages

    • Instruction mix

    • Pipeline hazards

  • Ideally, the number of cycles required to finish M instructions in N-stages pipeline is N + M – 1


Pipelined mips datapath
Pipelined MIPS Datapath

  • What do we need to implement pipelining?

  • We need to consider the following:

    • The execution of instructions is divided into 5 stages (cycles): Instruction fetch (IF) , Instruction decode (ID), Execute (EX), Memory Access (MEM), Write Back (WB)

    • Instruction flow is from left to right except in two cases

      • In the write-back stage where the result is written into the register file in the middle of the datapath

      • Choosing between the incremented PC and the branch address in the MEM stage

    • In pipelining, all units are operating in every cycle; thus we have to duplicate hardware where needed

    • Since the execution is over multiple cycles, we need to add State (Pipeline) registers between stages to preserve intermediate data and control for each instruction.

      • These registers hold the values to be used in later stages as long as they are needed.


Pipelined mips datapath1
Pipelined MIPS Datapath

IF

ID

EX

MEM

WB

+

+

4

Shift

left 2

Read Addr 1

Instruction

Memory

Data

Memory

Register

File

Read

Data 1

IFetch/Dec

Read Addr 2

Read

Address

Read

Data

PC

Dec/Exec

Address

Exec/Mem

Write Addr

ALU

Read

Data 2

Mem/WB

Write Data

Write Data

Sign

Extend

16

32

System Clock

Any problem?


Pipelined mips datapath2
Pipelined MIPS Datapath

IF

ID

EX

MEM

WB

+

+

4

Shift

left 2

Read Addr 1

Instruction

Memory

Data

Memory

Register

File

Read

Data 1

IFetch/Dec

Read Addr 2

Read

Address

Read

Data

PC

Dec/Exec

Address

Exec/Mem

Write Addr

ALU

Read

Data 2

Mem/WB

Write Data

Write Data

Sign

Extend

16

32

System Clock

Need to preserve the destination register !


Pipelined mips datapath3
Pipelined MIPS Datapath

  • Example 2. Execution of LW instruction

    (1) Instruction Fetch: Put PC and the loaded instruction in the IF/ID register


Pipelined mips datapath4
Pipelined MIPS Datapath

  • Example 2. Execution of LW instruction

    (2) Instruction Decode and Read Registers: Store Reg[rs], Reg[rt], sign extended offset , rd,rt, and the updated PC (why?) in the ID/EX register


Mips pipelining1
MIPS Pipelining

  • Example 2. Execution of LW instruction

    (3) Execute Or Address Calculation: Store branch address, Reg[rt], result, and zero flag in the EX/MEM register


Pipelined mips datapath5
Pipelined MIPS Datapath

  • Example 2. Execution of LW instruction

    (4) Memory Access: Store the data from memory into MEM/WB register


Pipelined mips datapath6
Pipelined MIPS Datapath

  • Example 2. Execution of LW instruction

    (5) Write Back: Copy the data loaded in the MEM/WB register to register file


Pipelined mips datapath7
Pipelined MIPS Datapath

  • Required data fields in the pipelining registers

  • Data fields are moved from one pipeline register to another every clock cycle until they are no longer needed


Pipelined mips control
Pipelined MIPS Control

  • All control signals can be determined during Decode stage while they are needed in later stages!

  • Solution! Expand the pipeline registers to store and move the control signals between stages until they are needed


Pipelined mips control1
Pipelined MIPS Control

  • Define the control signals and generate them in the decode stage

  • For the time being, no explicit write signals are required for the pipeline registers since the are updated every cycle


Pipelined mips control2
Pipelined MIPS Control

  • Control signals needed in each stage

  • Control signal values based on instruction type


Mips pipeline
MIPS Pipeline

  • Example 3. Given the code segment and the register contents below, show the contents of the data and control fields in the pipeline registers if the sixth instruction has been fetched (i.e. the beginning of cycle 7)


Mips pipeline1

DM

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

ALU

MIPS Pipeline

Time

  • Example 3. Multi-cycle diagram

lw $10, 20($1)

I

n

s

t

r.

O

r

d

e

r

sub $11,$1,$2

add $12,$3,$4

lw $13, 24($1)

add $3,$2,$1

sub $1,$5,$6


Mips pipeline2
MIPS Pipeline

sub $1,$5,$6

  • Example 3. Single-cycle diagram

add $3,$2,$1

lw $13, 24($1)

add $12,$3,$4

sub $11,$1,$2


Mips pipeline3
MIPS Pipeline

  • Example 3.

    At the beginning of cycle 7, the sixth instruction is stored in the IF/ID register while the data and control for earlier instructions are pushed to next pipeline registers and the register files. Thus,

  • IF/ID register

    • No control signals are stored

    • Store the instruction sub $1,$5,$6 and PC+4

      • IF/ID.Instruction = 0x00A60822

      • IF/ID.PC = 0x00000018


Mips pipeline4
MIPS Pipeline

  • Example 3.

  • ID/EX register

    • Store the information of add $3,$2,$1 and PC+4

      • ID/EX.PC = 0x00000014

      • ID/EX.RegRsContents = 0x00000005

      • ID/EX.RegRtContents = 0x00000001

      • ID/EX.RegRt = (00001)2

      • ID/EX.RegRd = (00011)2

      • ID/EX.SignExtend = 0x00001820

    • Control Information

      • ID/EX.MemToReg = 0

      • ID/EX.RegWrite = 1

      • ID/EX.MemRead = 0

      • ID/EX.MemWrite = 0

      • ID/EX.Branch = 0

      • ID/EX.ALUSrc = 0

      • ID/EX.RegDst = 1

      • ID/EX.ALUOp = (10)2


Mips pipeline5
MIPS Pipeline

  • Example 3.

  • EX/MEM register

    • Store the information of lw $13,24($1), branch address, and memory address

      • EX/MEM.BranchAddress = 0x00000070

      • EX/MEM.ALUOut = 0x00000019

      • EX/MEM.Zero = 0

      • EX/MEM.RegDestination= (01101)2

      • EX/MEM.RegRtContents = 0x0000000A

    • Control Information

      • EX/MEM.MemToReg = 0

      • EX/MEM.RegWrite = 1

      • EX/MEM.MemRead = 1

      • EX/MEM.MemWrite = 0

      • EX/MEM.Branch = 0


Mips pipeline6
MIPS Pipeline

  • Example 3.

  • MEM/WB register

    • Store the information of add $12, $3,$4, addition result, and data memory

      • MEM/WB.RegDestination= (01100)2

      • MEM/WB.ALUOut = 0xFFFFFFFD

      • MEM/WB.MemoryData = XXXX

    • Control Information

      • MEM/WB.MemToReg = 0

      • MEM/WB.RegWrite = 1

  • For the sub $11, $1,$2

    • It will be writing (1 - 5) to $11


Pipelining hazards
Pipelining Hazards

  • In general, pipelining is effective!

  • MIPS ISA makes even easy

    • All instructions are of the same length (32 bits)

      • Can fetch the next instruction once the current is being decoded

    • Few instruction formats with symmetry across them

      • Can read the register file in the 2nd stage

    • Memory access is through the Load and Store instructions

      • Can use the execute stage to compute the address

    • Each MIPS instruction writes at most one result in the MEM or WB stage

  • Is it that easy? Any complications?

    • YES!

    • PIPELINING HAZARDS !


Pipelining hazards1
Pipelining Hazards

  • Hazards - problems the might occur during pipeline operation

  • Three basic sources

    • Structural Hazards

      • In pipelining, all functional units are used in any cycle

      • What if two instructions use the same functional unit in the same cycle?

    • Data Hazards

      • In pipelining, execution of instructions is overlapped

      • What if the operand(s) of some instruction comes from an earlier instruction that is still in the pipeline?

    • Control Hazards

      • In pipelining, an instruction is fetched every cycle

      • What if an instruction is a jump or a branch instruction that evaluates to true? The following instruction(s) in the pipeline might not be correct?

  • Simple Solution?

    • Wait until the issue is resolved!


Structural hazards

Mem

Mem

Mem

Mem

Mem

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Mem

Mem

Mem

Mem

Mem

ALU

ALU

ALU

ALU

ALU

Structural Hazards

Reading from memory twice in the same cycle!

Time (clock cycles)

  • Single Memory!

lw

I

n

s

t

r.

O

r

d

e

r

Inst 1

Inst 2

Inst 3

Inst 4

Solution: Use two memories; Data and Instruction!


Structural hazards1

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

ALU

ALU

ALU

ALU

clock edge that controls loading of pipeline state registers

clock edge that controls register writing

Structural Hazards

Time (clock cycles)

  • Single Register File!

One instruction is writing and the other is reading the register file?

add $1,

I

n

s

t

r.

O

r

d

e

r

Inst 1

Solution: Design the register file to write in the first half of the cycle and read in the second half!

Inst 2

add $2,$1,


Data hazards

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Data Hazards

add $1,

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

  • Dependencies backward in time cause hazards

  • This is called Read-after-Write (RAW) data hazard

  • Register-use data hazard

Solution?


Data hazards1

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

stall

IM

IM

IM

ALU

ALU

ALU

stall

sub $4,$1,$5

and $6,$1,$7

Data Hazards

  • Simply, wait for the earlier instruction to finish! This is called stalling the pipeline! However, this affects the CPI?

add $1,

I

n

s

t

r.

O

r

d

e

r

Do we need two stalls all the time?


Data hazards2

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Data Hazards

lw$1,5($s1)

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

  • Dependencies backward in time cause hazards

  • It is a Read-after-Write (RAW) data hazard

  • Load-use data hazard

Solution?


Data hazards3

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

stall

IM

IM

IM

ALU

ALU

ALU

stall

sub $4,$1,$5

and $6,$1,$7

Data Hazards

  • Again, wait for the LW instruction to finish by stalling the pipeline! However, this affects the CPI?

lw$1,

I

n

s

t

r.

O

r

d

e

r


Data hazards4
Data Hazards

  • Example 4. how many cycles are actually required to execute the following code? Assume the pipeline is already full.

    add $1, $2, $5

    add $5, $3, $1

    sub $10, $7, $8

    sub $5, $6, $7

    lw $3, 45($9)

    add $3, $3, $8

Ideally, and since the pipeline is full, each instruction requires 1 cycle. Thus, we need 6 cycles (CPI =6/6= 1). However, …

Register-use data hazard

Adds 2 cycles by stalls

Load-use data hazard

Adds 2 cycles by stalls

Thus, 10 cycles are needed.

CPI = 10/6 = 1.667 ??

Performance ??

Can we do any better?


Data hazards5
Data Hazards

  • Fixing Register-use Hazard by Forwarding

    • Note that data produced by an instruction and needed by a later instruction is pushed through the pipeline registers until it is saved into the register file !

    • Why not to read the data from the pipeline registers before it is stored ?

    • This is called forwarding!

    • What is required?

      • Need to detect the hazard

        • Is any of the source registers for the instruction the same as the destination register for an earlier instruction that is still in the pipeline?

      • Need to create a path to pass the data between pipeline stages

        • Instead of reading the source registers of the instruction from the register file, read them from the pipeline registers


Data hazards6

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Data Hazards

  • Fixing Register-use Hazard by Forwarding

add $1,

I

n

s

t

r.

O

r

d

e

r

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

No Stalls!


Data hazards7
Data Hazards

  • Forwarding Hardware implementation

    Note that forwarding could be from EX/MEM or from MEM/WB! Why?


Data hazards8
Data Hazards

  • Forwarding Hardware implementation

    • Inside the forwarding unit

    • Forwarding from EX/MEM (MEM Stage)

      if (EX/MEM.RegWrite

      and (EX/MEM.RegRd != 0)

      and (EX/MEM.RegRd = ID/EX.RegRs))

      then ForwardA = From EX/MEM

      if (EX/MEM.RegWrite

      and (EX/MEM.RegRd != 0)

      and (EX/MEM.RegRd = ID/EX.RegRt))

      then ForwardB = From EX/MEM

  • Why to check the RegWrite signal?

  • Why to check the Zero register?


Data hazards9
Data Hazards

  • Forwarding Hardware implementation

    • Inside the forwarding unit

    • Forwarding from MEM/WB (WB Stage)

      if (MEM/WB.RegWrite

      and (MEM/WB.RegRd != 0)

      and (MEM/WB.RegRd = ID/EX.RegRs))

      then ForwardA = From MEM/WB

      if (MEM/WB.RegWrite

      and (MEM/WB.RegRd != 0)

      and (MEM/WB.RegRd = ID/EX.RegRt))

      then ForwardB = From MEM/WB


Data hazards10

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Data Hazards

  • Can the forwarding hardware be used with Load-use data hazard?

lw$1,4($2)

I

n

s

t

r.

O

r

d

e

r

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

We still need 1 Stall for the instruction following the load?


Data hazards11
Data Hazards

  • How to stall the pipeline?

    • Stall is required when the instruction in the EX stage is Load and the one in the ID stage depends on the loaded value

    • The Load instruction moves normally to EX/MEM on the next cycle

    • The conflicting instruction (the instruction following the load) should stay in the decode stage? How?

      • Don’t write the IF/ID register  need IF/IDWrite Signal

      • Don’t update the PC  need PCWrite Signal

      • The control signals of the instruction in the decode stage are stored as 0’s (WHY?) in the ID/EX  need a multiplexor for the control signals

      • Controlling the process requires a special unit; Hazard Detection Unit


Data hazards12
Data Hazards

  • Stall Implementation


Data hazards13
Data Hazards

  • Stall Implementation

    • Inside hazard detection unit

      if (ID/EX.MemRead

      and [(ID/EX.RegRt == IF/ID.RegRs) or

      (ID/EX.RegRt == IF/ID.RegRt)])

      then

      PCWrite = 0

      IF/IDWrite = 0

      Select 0’s as control signals

Any Problem?

Do we need to stall in all cases?

How about j and jal that come immediately after load with rs and/or rt fields being the same as the rt field of the load?


Data hazards14
Data Hazards

  • Example 5. Consider the following code segment in C

    A = B + E

    C = B + F

    (1) Generate the MIPS code assuming that variables A, B, C, E, and F are in memory and addressable with offsets 0, 4, 8, 12, and 16 from $t0

    (2) Find all the data hazards and determine the number of cycles required to run the code. Assume forwarding is implemented.

    (3) Can you reorder the code to reduce the stalls ?


Data hazards15
Data Hazards

Ideally, each instruction requires 1 cycle after the pipeline is full. Thus, we need (5+7-1) cycles.

CPI = 11/7 = 1.57

  • Example 5.

    lw $t1, 4($t0) # loads B

    lw $t2, 12($t0) # loads E

    add $t3, $t1, $t2 # A = B + E

    sw $t3, 0($t0) # stores A

    lw $t4, 16($t0) # loads F

    add $t5, $t1, $t4 # C = B + F

    sw $t5, 8($t0) # stores C

Load-use data hazard

Adds 1 cycle as a stall

Load-use data hazard

Adds 1 cycle as a stall

Thus, 13 cycles are needed.

CPI = 13/7 = 1.86 ??

Performance ??


Data hazards16
Data Hazards

  • Example 5. Reducing stalls by instruction reordering

    lw $t1, 4($t0) # loads B

    lw $t2, 12($t0) # loads E

    lw $t4, 16($t0) # loads F

    add $t3, $t1, $t2 # A = B + E

    sw $t3, 0($t0) # stores A

    lw $t4, 16($t0) # loads F

    add $t5, $t1, $t4 # C = B + F

    sw $t5, 8($t0) # stores C

Moving this instructions fills the first stall and eliminate the second one!

Thus, 11 cycles are needed.

CPI = 11/7 = 1.57


Data hazards17
Data Hazards

  • Example 6. Assume that the pipelined MIPS processor without forwarding is used to run a program with the following instruction mix: 20% loads, 20% store, and 60% ALU. Then compute the average CPI given that

    • 10% of the ALU instructions result in load-use hazards.

    • 15% of the ALU instructions result in read-before-write hazards.

  • Solution

    • Ideally, the average CPI is 1 for each instruction

    • With no forwarding

      • Load-use hazards add two cycles

      • Register-use hazards add two cycles

    • Average CPI = 0.2 x 1 + 0.2 x 1 + 0.75 x 0.60 x 1 +

      0.1 x 0.60 x 3 + 0.15 x 0.60 x 3 = 1.30


Control hazards
Control Hazards

  • For the pipelined datapath designed so far, the branch address and decisionare known by the end of the MEM stage

  • Instructions following the branch instruction in the pipeline are not correct if the branch evaluates to true!

  • If the branch is true, then these instructions should be removed from the pipeline and execution should continue from the branch address

  • Otherwise, no action is required!

  • This is a dependency backward in time  ControlHazard


Control hazards1
Control Hazards

Branch

Inst1

Inst2

Inst3

Solution!

Once it is known that the instruction is branch, then stall the pipeline for 3 cycles? Is it actually a stall?


Control hazards2

DM

DM

Reg

Reg

Reg

Reg

IM

IM

IM

ALU

ALU

ALU

stall

stall

stall

Inst

DM

Reg

Inst

Control Hazards

beq

I

n

s

t

r.

O

r

d

e

r

Are these actual stalls? Why not to start the execution of the following instructions normally and if the branch is true, then flush these instructions?!

Fetching from instruction memory is either from PC+4or Branch address depending on the branch result


Control hazards3
Control Hazards

  • Reducing the Cost of Branch Hazard

    • Note that three cycles are lost if the branch evaluates to true in order to remove the three instructions following the branch instruction!

    • This could affect the performance significantly!

    • Can we reduce this cost?

      • Move the branch address computation to the decode stage

      • Add additional hardware to compare the two registers in the ID stage!

      • Whenever there is a branch instruction in the ID/EX register (ID/EX.branch =1), flush the instruction in the IF/ID register.

      • The branch penalty in this case will be 1 cycle instead of 3 cycles!


Control hazards4
Control Hazards

  • Reducing the Cost of Branch Hazard


Control hazards5

DM

DM

Reg

Reg

Reg

Reg

IM

IM

ALU

ALU

stall

lw

Control Hazards

  • Reducing the Cost of Branch Hazard

  • Modifying the Hazard Detection Unit

    IF (ID/EX.Branch) then Flush IF/ID register

  • Note that we lose one cycle whenever a branch instruction is encountered!

  • Can we do any better?

beq


Control hazards6
Control Hazards

  • Reducing the Cost of Branch Hazard

    • Approach I – Static Branch Prediction

      • Always predict the branch as Not Taken and start fetching the instruction following the branch

      • If the branch evaluates to Not Taken, then the prediction is correct and no further actions are required!

      • If the branch evaluates to Taken, then the prediction is not correct! Remove the fetched instruction and start fetching from the branch address

      • In this approach, we only lose one cycle if the prediction is not correct

      • Inside the hazard detection unit

        IF (ID/EX.Branch) and (ID/EX.ZERO) Then Flush IF/ID register


Control hazards7
Control Hazards

  • Reducing the Cost of Branch Hazard

    • Approach II – Dynamic Branch Prediction

      • Prediction could be Taken or Not Taken

      • If the branch is predicted as Not Taken

        • Fetch the next instruction

        • If prediction is false, flush the instruction. One cycle is lost!

      • If branch is predicted as Taken

        • Fetch the instruction from the branch address

        • If prediction is false, flush and fetch from PC+4

    • How to store branch prediction?

      • Use Branch History Table or Branch Prediction Buffer

      • The table is addressable by the lower bits of the branch instruction address

    • If branch is predicted as taken, we need to wait for the branch address to be computed?

      • UseBranch Target Buffer


Control hazards8
Control Hazards

  • Approach II – Dynamic Branch Prediction

    • 1-bit Branch Predictor

      • Basically we have two states (Taken and Not Taken)

      • One bit is used to store the prediction

      • Prediction state is changed when prediction is wrong

      • Performance Issues

        • Consider branching in loops? EXAMPLE?


Control hazards9
Control Hazards

  • Approach II – Dynamic Branch Prediction

    • 2-bit Branch Predictor

      • Basically we have four states

      • two bits are used to store the prediction

      • Prediction state is changed when prediction is wrong twice


Control hazards10
Control Hazards

  • Example 7. Consider a certain program that have a conditional branch instruction whose actual outcome is given below when the program is executed.

    T-T-N-T-T-N-T

    List predictions for the following branch prediction schemes and find the prediction accuracy.

    • Predict always taken

    • Predict always not taken

    • 1-bit predictor, initialized to predict taken

    • 2-bit predictor, initialized to weakly predict taken


Control hazards11
Control Hazards

  • Example 7.

    • Actual branch actions : T-T-N-T-T-N-T

    • Predict as always taken

      • Predictions : T-T-T-T-T-T-T

      • Accuracy = 5/7 = 71%

    • Predict as always not taken

      • Predictions : N-N-N-N-N-N-N

      • Accuracy = 2/7 = 29%

    • 1-bit predictor initialized to predict taken

      • Predictions: T-T-T-N-T-T-N

      • Accuracy = 3/7 = 43%

    • 2-bit predictor initialized to weakly predict taken

      • Predictions: T-T-T-T-T-T-T

      • Accuracy = 5/7 = 71%


Pipelining performance
Pipelining Performance

  • Example 8.Let’s compare the performance of single-cycle, multi-cycle, and pipeline implementation of MIPS processor given the operation times and instruction mix below.

    For the pipelined implementation, assume that:

    1) Branch decision is done in the MEM cycle. Branch handling in the pipeline implementation is done by stalling the pipeline.

    2) Half of the load instructions incur load-use hazard.

    3) Forwarding is implemented.

    4) The jump instruction is completed in the ID stage


Pipelining performance1
Pipelining Performance

  • Example 8.

  • Clock cycle time

    • Single-cycle = 200 + 50 + 100 + 50 + 200 = 600 ps

    • Multi-cycle = 200 ps

    • Pipeline = 200 ps

  • CPI

    • Single-cycle = 1

    • Multi-cycle = 5x 0.25 + 4x0.52 + 4x0.10 + 3x0.11 + 3x0.02

      = 4.12

    • Pipeline = 0.125x2 + 0.125x1 + 0.52x1 + 0.1x1 + 0.11x4 + 0.02x2

      = 1.475

  • Execution Time per instruction

    • Single-cycle = 600 ps

    • Multi-cycle = 4.12 x 200 ps = 824 ps

    • Pipeline = 1.475 x 200 = 295 ps


Pipelining performance2
Pipelining Performance

  • Example 9. Redo example 8 by assuming that branch prediction is employed and 1/4th of the branch instructions are miss predicted.


Exceptions interrupts
Exceptions & Interrupts

  • Exceptions and interrupts are unexpectedevents that require the change in the flow

  • The two terms are used interchangeably and depending is ISA

    • Intel x86 uses the term interrupt only

    • In MIPS

      • Exceptions: any internal unexpected change in the flow (undefined opecode, overflow, system calls)

      • Interrupts: the event is external (I/O controller request)

  • Dealing with them

    • Is a challenging part of processor design

    • Affects performance


Exceptions interrupts1
Exceptions & Interrupts

  • In MIPs, when an exception is generated, the following sequence of steps are taken

    • The address of the offending instruction is saved into a special called the Exception Program Counter (EPC).

    • The cause of the exception is saved in a special register called the Cause Register.

    • The control is transferred to the operating system by loading a special address (0x8000 00180) into the PC. The code loaded starting at this address

      • Determines what actions will be done by the operating system in response to the exception based on the value found in the Cause Register. The operating system may terminate the program or resume the execution using the value found in the EPC


Overflow exception
Overflow Exception

  • Modifications to the Datapath


Fallacies
Fallacies

  • Fallacy 1. Pipelining is easy !

    • Not true ! Hazards complicate the operation

  • Fallacy 2. Pipelining is independent of technology!

    • Why didn’t we have pipelined processors before ?

    • Advanced technology allowed more transistors and thus more operations !


Reading assignment
Reading Assignment

  • Read the following from the textbook

    • Section 4.9 – Exceptions

    • Section 4.10 – Parallelism and Advanced Instruction Level Parallelism


ad