cmpe 421 parallel computer architecture n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CMPE 421 Parallel Computer Architecture PowerPoint Presentation
Download Presentation
CMPE 421 Parallel Computer Architecture

Loading in 2 Seconds...

play fullscreen
1 / 32

CMPE 421 Parallel Computer Architecture - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

CMPE 421 Parallel Computer Architecture. Part 1 Pipeline: HAZARD. Pipelining MIPS. Lets us examine why the pipeline can not run at full speed There are some cases, though, where the next instruction can not begin executing immediately This limits to pipeline are known as hazards

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CMPE 421 Parallel Computer Architecture' - zhen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pipelining mips
Pipelining MIPS
  • Lets us examine why the pipeline can not run at full speed
    • There are some cases, though, where the next instruction can not begin executing immediately
    • This limits to pipeline are known as hazards
  • What makes it hard?
    • structural hazards: different instructions, at different stages, in the pipeline want to use the same hardware resource (resource conflict)
    • control hazards:
      • succeeding instruction, to put into pipeline, depends on the outcome of a previous branch instruction, already in pipeline
      • Control decision determines execution path, such as when the instruction changes the PC
    • data hazards: an instruction in the pipeline requires data to be computed by a previous instruction still in the pipeline
  • Before actually building the pipelined datapath and control we first briefly examine these potential hazards individually…
structural hazards

P

r

o

g

r

a

m

1

4

2

4

6

8

1

0

1

2

e

x

e

c

u

t

i

o

n

T

i

m

e

o

r

d

e

r

(

i

n

i

n

s

t

r

u

c

t

i

o

n

s

)

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

l

w

$

1

,

1

0

0

(

$

0

)

Pipelined

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

l

w

$

2

,

2

0

0

(

$

0

)

2

n

s

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

l

w

$

3

,

3

0

0

(

$

0

)

2

n

s

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

R

e

g

A

L

U

R

e

g

l

w

$

4

,

4

0

0

(

$

0

)

f

e

t

c

h

a

c

c

e

s

s

2

n

s

2

n

s

2

n

s

2

n

s

2

n

s

2

n

s

Structural Hazards
  • Structural hazard: inadequate hardware to simultaneously support all instructions in the pipeline in the same clock cycle
  • E.g., suppose single – not separate – instruction and data memory in pipeline below with one read port
    • then a structural hazard between first and fourthlw instructions
  • MIPS was designed to be pipelined: structural hazards are easy to avoid!

Structural Hazards

Hazard if single memory

slide4
Structural HazardEx 1: Suppose we have one memory unit instead of separate instruction and data memory

When a load or store word instruction is used the MEM stage tries to access the memory and because of single data memory a conflict occurs

structural hazard

IF

RF/ID

EX

WB

IF

RF/ID

EX

WBB

IF

RF/ID

EX

MEMM

WB

IF

RF/ID

EX

WB

IF

RF/ID

EX

WB

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6

Cycle 7

Cycle 8

Cycle 9

Clock

bubble

R-type

R-type

Load

R-type

R-type

Structural Hazard
  • Consider a load followed immediately by a store
    • Processor only has a single write port
structural hazard1
Structural Hazard
  • Solutions
    • Delay instruction until functional unit is ready
      • Hardware inserts a pipeline stall or a bubble that delays execution of all instructions that follow (previous instructions continue)
      • Increases CPI from the ideal value of 1
    • Build more sophisticated functional units so that all combinations of instructions can be accommodated
      • Example: Allow two simultaneous writes to the register file
structural hazard solution

IF

RF/ID

EX

WB

IF

RF/ID

EX

IF

RF/ID

IF

MEM

WB

WB

EX

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6

Cycle 7

Cycle 8

Cycle 9

Clock

R-type

MEM

WB

1

2

3

4

R-type

MEM

R-type

MEM

WB

EX

Load

RF/ID

EX

WB

R-type

IF

RF/ID

MEM

EX

R-type

IF

RF/ID

MEM

Structural Hazard Solution

Write Back Stall Solution:

Delay R-type register write by one cycle

control hazards

b

u

b

b

l

e

Control Hazards
  • Control hazard: need to make a decision based on the result of a previous instruction still executing in pipeline
  • Solution 1Stall the pipeline

P

r

o

g

r

a

m

e

x

e

c

u

t

i

o

n

1

4

1

6

2

4

6

8

1

0

1

2

T

i

m

e

o

r

d

e

r

(

i

n

i

n

s

t

r

u

c

t

i

o

n

s

)

Note that branch outcome is

computed in ID stage with

added hardware (later…)

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

R

e

g

A

L

U

R

e

g

a

d

d

$

4

,

$

5

,

$

6

f

e

t

c

h

a

c

c

e

s

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

b

e

q

$

1

,

$

2

,

4

0

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

2

n

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

l

w

$

3

,

3

0

0

(

$

0

)

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

2

n

s

4

n

s

Pipeline stall

control hazards1
Control Hazards
  • Solution 2Predict branch outcome
    • e.g., predict branch-not-taken :

Prediction success

Prediction failure: undo (=flush) lw

control hazards2
Control Hazards
  • Solution 3Delayed branch: always execute the sequentially next statement with the branch executing after one instruction delay – compiler’s job to find a statement that can be put in the slot that is independent of branch outcome
    • MIPS does this – but it is an option in SPIM (Simulator -> Settings)

P

r

o

g

r

a

m

e

x

e

c

u

t

i

o

n

1

4

2

4

6

8

1

0

1

2

o

r

d

e

r

T

i

m

e

(

i

n

i

n

s

t

r

u

c

t

i

o

n

s

)

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

b

e

q

$

1

,

$

2

,

4

0

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

a

d

d

$

4

,

$

5

,

$

6

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

2

n

s

(

d

e

l

a

y

e

d

b

r

a

n

c

h

s

l

o

t

)

I

n

s

t

r

u

c

t

i

o

n

D

a

t

a

l

w

$

3

,

3

0

0

(

$

0

)

R

e

g

A

L

U

R

e

g

f

e

t

c

h

a

c

c

e

s

s

2

n

s

2

n

s

Delayed branch beq is followed by add that is

independent of branch outcome

review pipelining multiple instructions
Review: Pipelining Multiple Instructions
  • The Instructions in Figures 6-19, 6-20 and 6-21 were independent
    • None of them used the results calculated by any of the others (register numbers are different)
data hazards
Data Hazards
  • Problem with starting next instruction before first is finished
    • dependencies that “go backward in time” are data hazards
solution to data hazards
Solution to Data Hazards
  • Data hazard: instruction needs data from the result of a previous instruction still executing in pipeline
  • Occur when pipeline changes the order of read/write access to operands so that the order differs from the order seen by sequentially executing instructions
  • Solution1Forward data if possible…
  • Solution 2 Or change the relative timing of instructions (insert stalls)

Instruction pipeline diagram:

shade indicates use –

left=write, right=read

P

r

o

g

r

a

m

e

x

e

c

u

t

i

o

n

2

4

6

8

1

0

o

r

d

e

r

T

i

m

e

(

i

n

i

n

s

t

r

u

c

t

i

o

n

s

)

Without forwarding – blue line –

data has to go back in time;

with forwarding – red line –

data is available in time

a

d

d

$

s

0

,

$

t

0

,

$

t

1

I

F

I

D

E

X

M

E

M

W

B

s

u

b

$

t

2

,

$

s

0

,

$

t

3

M

E

M

I

F

I

D

E

X

M

E

M

W

B

  • Caused by several different types of dependencies
data hazards1
Data Hazards
  • SOLUTION 1
  • Don’t wait for the instruction to complete before trying to resolve the data hazard
  • As soon as ALU creates the sum for “add”, we can supply it as an input for the add
  • Adding extra H/W to retrieve the missing item early from the internal resources is called forwarding or bypassing

Invalid

Remark: Forwarding path from the output of the memory access stage in the first instruction to the input of the execution stage is invalid (backward in time)

data dependency types
Data Dependency Types
  • -Three classifications of data dependencies for instruction j following instruction I
    • Read after Write (RAW)
      • Instr. j tries to read before instr. i tries to write it
    • Write after Write (WAW)
      • Instr. j tries to write an operand before i writes its value
      • Since register writes only occur in WB, the pipeline we have been discussing does not have this type of dependency
    • Write after Read (WAR)
      • Instr. j tries to write a destination before it is read by i
      • This also does not occur in this pipeline we have been discussing since all reads happen early in the ID/RF stage and all writes are late in the WB stage
  • -WAW and WAR are in later more complicated pipes
data hazards2

I

F

I

D

W

B

E

X

M

E

M

Data Hazards
  • Forwarding may not be enough (Hybrid solution is required)
    • e.g., if an R-type instruction following a load uses the result of the load – called load-use data hazard

2

4

6

8

1

0

1

2

1

4

T

i

m

e

P

r

o

g

r

a

m

e

x

e

c

u

t

i

o

n

o

r

d

e

r

(

i

n

i

n

s

t

r

u

c

t

i

o

n

s

)

Without a stall it is impossible

to provide input to the sub

instruction in time

I

F

I

D

l

w

$

s

0

,

2

0

(

$

t

1

)

E

X

M

E

M

W

B

s

u

b

$

t

2

,

$

s

0

,

$

t

3

-With a one-stage stall (solution 2)

-Forwarding can get the data to the sub instruction in time (solution 1)

reordering code to avoid pipeline stall alternative software solution
Reordering Code to Avoid Pipeline Stall (Alternative Software Solution)
  • Example:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t2, 0($t1)

sw $t0, 4($t1)

  • Reordered code:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t0, 4($t1)

sw $t2, 0($t1)

Data hazard

Interchanged

revisiting hazards
Revisiting Hazards
  • So far our datapath and control have ignored hazards
  • We shall revisit data hazards and control hazards and enhance our datapath and control to handle them in hardware…
data hazards and forwarding
Data Hazards and Forwarding
  • Problem with starting an instruction before previous are finished:
    • data dependencies that go backward in time – called data hazards

$2 = 10 before sub;

$2 = -20 after sub

sub $2, $1, $3

and $12, $2, $5

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

software solution
Software Solution
  • Have compiler guarantee never any data hazards!
    • by rearranging instructions to insert independent instructionsbetween instructions that would otherwise have a data hazard between them,
    • or, if such rearrangement is not possible, insertnops
  • Such compiler solutions may not always be possible, and nops slow the machine down

sub $2, $1, $3

nop

nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

sub $2, $1, $3

lw $10, 40($3)

slt $5, $6, $7 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2sw $15, 100($2)

or

MIPS: nop = “no operation” = 00…0 (32bits) = sll $0, $0, 0

how about register file access

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

ALU

ALU

ALU

ALU

clock edge that controls loading of pipeline state registers

clock edge that controls register writing

How About Register File Access?

Time (clock cycles)

Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half

add $1,

I

n

s

t

r.

O

r

d

e

r

Inst 1

Inst 2

Define register reads to occur in the second half of the cycle and register writes in the first half

add $2,$1,

register usage can cause data hazards

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Register Usage Can Cause Data Hazards

add $1,

  • Dependencies backward in time cause hazards

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

  • Read before writedata hazard
loads can cause data hazards

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Loads Can Cause Data Hazards

lw $1,4($2)

I

n

s

t

r.

O

r

d

e

r

  • Dependencies backward in time cause hazards

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

  • Load-usedata hazard
one way to fix a data hazard

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

stall

IM

IM

IM

ALU

ALU

ALU

stall

sub $4,$1,$5

and $6,$1,$7

One Way to “Fix” a Data Hazard

Can fix data hazard by waiting – stall – but impacts CPI

add $1,

I

n

s

t

r.

O

r

d

e

r

another way to fix a data hazard

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Another Way to “Fix” a Data Hazard

Fix data hazards by forwarding results as soon as they are available to where they are needed

add $1,

I

n

s

t

r.

O

r

d

e

r

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

Forwarding paths are valid only if the destination stage is later in time than the source stage.

Forwarding is harder if there are multiple results to forward per instruction or if they need to write a result early in the pipeline

forwarding with load use data hazards

DM

DM

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

IM

ALU

ALU

ALU

ALU

ALU

Forwarding with Load-use Data Hazards

lw $1,4($2)

I

n

s

t

r.

O

r

d

e

r

  • Will still need one stall cycle even with forwarding

sub $4,$1,$5

and $6,$1,$7

or $8,$1,$9

xor $4,$1,$5

branch instructions cause control hazards

DM

DM

DM

Reg

Reg

Reg

Reg

Reg

Reg

IM

IM

IM

IM

ALU

ALU

ALU

ALU

beq

DM

Reg

Reg

Branch Instructions Cause Control Hazards

I

n

s

t

r.

O

r

d

e

r

  • Dependencies backward in time cause hazards

lw

Inst 3

Inst 4

one way to fix a control hazard
One Way to “Fix” a Control Hazard
  • Another “solution” is to put in enough extra hardware so that we can test registers, calculate the branch address, and update the PC during the second stage of the pipeline. That would reduce the number of stalls to only one.
  • A third approach is to prediction to handle branches, e.g., always predict that branches will be untaken. When right, the pipeline proceeds at full speed. When wrong, have to stall (and make sure nothing completes – changes machine state – that shouldn’t have).
one way to fix a control hazard1

DM

DM

Reg

Reg

Reg

Reg

IM

IM

IM

ALU

ALU

ALU

stall

stall

stall

lw

DM

Reg

Inst 3

One Way to “Fix” a Control Hazard

Fix branch hazard by waiting – stall – but affects CPI

beq

I

n

s

t

r.

O

r

d

e

r