eecs 470 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
EECS 470 PowerPoint Presentation
Download Presentation
EECS 470

Loading in 2 Seconds...

play fullscreen
1 / 46

EECS 470 - PowerPoint PPT Presentation


  • 215 Views
  • Uploaded on

EECS 470. Pipeline Hazards Lecture 4 Coverage: Appendix A. Basic Pipelining. Data hazards What are they? How do you detect them? How do you deal with them? Micro-architectural changes Pipeline depth Pipeline width Forwarding ISA. +. +. A L U.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'EECS 470' - bell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
eecs 470

EECS 470

Pipeline Hazards

Lecture 4

Coverage: Appendix A

basic pipelining
Basic Pipelining
  • Data hazards
    • What are they?
    • How do you detect them?
    • How do you deal with them?
  • Micro-architectural changes
    • Pipeline depth
    • Pipeline width
  • Forwarding ISA
slide3

+

+

A

L

U

Fetch Decode Execute Memory WB

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

R1

regA

ALU

result

R2

Inst

mem

Register file

regB

valA

M

U

X

PC

Data

memory

instruction

R3

ALU

result

mdata

R4

valB

R5

R6

M

U

X

data

R7

offset

dest

valB

Bits 0-2

dest

dest

dest

Bits 16-18

M

U

X

Bits 22-24

op

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide4

+

+

A

L

U

Fetch Decode Execute Memory WB

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

R1

regA

ALU

result

R2

Inst

mem

Register file

regB

valA

M

U

X

PC

Data

memory

instruction

R3

ALU

result

mdata

R4

M

U

X

valB

R5

R6

M

U

X

data

R7

offset

dest

valB

dest

dest

dest

op

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide5

+

+

A

L

U

fwd

fwd

fwd

Fetch Decode Execute Memory WB

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

R1

regA

ALU

result

R2

Inst

mem

Register file

regB

valA

M

U

X

PC

Data

memory

instruction

R3

ALU

result

mdata

R4

M

U

X

valB

R5

data

R6

M

U

X

R7

offset

valB

op

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

pipeline function for add
Pipeline function for ADD
  • Fetch: read instruction from memory
  • Decode: read source operands from reg
  • Execute: calculate sum
  • Memory: Pass results to next stage
  • Writeback: write sum into register file
data hazards
Data Hazards

add 1 2 3

nand 3 4 5

time

add

fetch decode execute memory writeback

nand

fetch decode execute memory writeback

If not careful, you will read the wrong value of R3

three approaches to handling data hazards
Three approaches to handling data hazards
  • Avoidance
    • Make sure there are no hazards in the code
  • Detect and Stall
    • If hazards exist, stall the processor until they go away.
  • Detect and Forward
    • If hazards exist, fix up the pipeline to get the correct value (if possible)
handling data hazards avoid all hazards
Handling data hazards: avoid all hazards
  • Assume the programmer (or the compiler) knows about the processor implementation.
    • Make sure no hazards exist.
      • Put noops between any dependent instructions.

write R3 in cycle 5

add 1 2 3

noop

noop

nand 3 4 5

read R3in cycle 6

problems with this solution
Problems with this solution
  • Old programs (legacy code) may not run correctly on new implementations
    • Longer pipelines need more noops
  • Programs get larger as noops are included
    • Especially a problem for machines that try to execute more than one instruction every cycle
    • Intel EPIC: Often 25% - 40% of instructions are noops
  • Program execution is slower
    • CPI is one, but some I’s are noops
handling data hazards detect and stall
Handling data hazards: detect and stall
  • Detection:
    • Compare regA with previous DestRegs
      • 3 bit operand fields
    • Compare regB with previous DestRegs
      • 3 bit operand fields
  • Stall:
    • Keep current instructions in fetch and decode
    • Pass a noop to execute
slide12

+

+

A

L

U

End of Cycle 1

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

valA

M

U

X

PC

Data

memory

10

R3

add 1 2 3

ALU

result

mdata

R4

M

U

X

valB

R5

data

R6

M

U

X

R7

offset

valB

op

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide13

+

+

A

L

U

End of Cycle 2

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

14

M

U

X

PC

Data

memory

10

R3

nand 3 4 5

3

ALU

result

mdata

R4

M

U

X

7

R5

data

R6

M

U

X

R7

3

valB

add

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide14

+

+

Hazard detection

A

L

U

First half of cycle 3

M

U

X

1

target

PC+1

PC+1

0

R0

eq?

3

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

14

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

3

ALU

result

mdata

R4

M

U

X

7

R5

data

R6

M

U

X

R7

3

valB

add

op

op

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide15

compare

compare

compare

Hazard

detected

compare

REG

file

regA

3

regB

3

IF/

ID

ID/

EX

slide16

1

Hazard

detected

compare

0 0 0

0 1 1

regA

regB

0 1 1

3

handling data hazards detect and stall the pipeline until ready
Handling data hazards: detect and stall the pipeline until ready
  • Detection:
    • Compare regA with previous DestReg
      • 3 bit operand fields
    • Compare regB with previous DestReg
      • 3 bit operand fields
  • Stall:

Keep current instructions in fetch and decode

Pass a noop to execute

slide18

en

+

+

Hazard

en

A

L

U

First half of cycle 3

M

U

X

1

target

2

1

0

R0

eq?

3

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

14

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

3

ALU

result

mdata

11

R4

M

U

X

7

R5

data

R6

M

U

X

R7

valB

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

handling data hazards detect and stall the pipeline until ready1
Handling data hazards: detect and stall the pipeline until ready
  • Detection:
    • Compare regA with previous DestReg
      • 3 bit operand fields
    • Compare regB with previous DestReg
      • 3 bit operand fields
  • Stall:
    • Keep current instructions in fetch and decode
    • Pass a noop to execute
slide20

+

+

A

L

U

noop

End of cycle 3

M

U

X

1

2

0

R0

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

21

mdata

3

11

R4

M

U

X

R5

data

R6

M

U

X

R7

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide21

en

+

+

Hazard

en

A

L

U

First half of cycle 4

M

U

X

1

2

0

R0

3

14

R1

regA

ALU

result

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

21

mdata

3

11

R4

M

U

X

R5

data

R6

M

U

X

R7

noop

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide22

+

+

A

L

U

noop

End of cycle 4

M

U

X

1

2

0

R0

14

R1

regA

21

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

3

11

R4

M

U

X

R5

data

R6

M

U

X

R7

noop

noop

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide23

+

+

No Hazard

A

L

U

First half of cycle 5

M

U

X

1

2

0

R0

3

14

R1

regA

21

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

3

11

R4

M

U

X

R5

data

R6

M

U

X

R7

noop

noop

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide24

End of cycle 5

+

+

A

L

U

M

U

X

1

3

2

0

R0

14

R1

regA

7

R2

Inst

mem

Register file

regB

21

M

U

X

PC

Data

memory

add 3 7 7

21

R3

11

R4

5

M

U

X

77

11

R5

data

1

R6

M

U

X

8

R7

nand

noop

noop

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

no more hazard stalling
No more hazard: stalling

add 1 2 3

nand 3 4 5

time

add

fetch decode execute memory writeback

nand

fetch decodedecodedecodeexecute

hazard

hazard

We are careful to get the right value of R3

problems with detect and stall
Problems with detect and stall
  • CPI increases every time a hazard is detected!
  • Is that necessary? Not always!
    • Re-route the result of the add to the nand
      • nand no longer needs to read R3 from reg file
      • It can get the data later (when it is ready)
      • This lets us complete the decode this cycle
        • But we need more control to remember that the data that we aren’t getting from the reg file at this time will be found elsewhere in the pipeline at a later cycle.
handling data hazards detect and forward
Handling data hazards: detect and forward
  • Detection: same as detect and stall
    • Except that all 4 hazards are treated differently
      • i.e., you can’t logical-OR the 4 hazard signals
  • Forward:
    • New datapaths to route computed data to where it is needed
    • New Mux and control to pick the right data
slide28

First half of cycle 3

+

+

Hazard

A

L

U

fwd

fwd

fwd

M

U

X

1

2

1

0

R0

3

14

R1

regA

7

R2

Inst

mem

Register file

regB

14

M

U

X

PC

Data

memory

nand 3 4 5

10

R3

3

11

R4

M

U

X

77

7

R5

data

1

R6

M

U

X

8

R7

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide29

End of cycle 3

+

+

A

L

U

H1

M

U

X

1

3

2

0

R0

14

R1

regA

7

R2

Inst

mem

Register file

regB

10

M

U

X

PC

Data

memory

add 6 3 7

10

R3

3

21

11

R4

5

M

U

X

77

11

R5

data

1

R6

M

U

X

8

R7

nand

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide30

First half of cycle 4

+

+

New Hazard

A

L

U

H1

M

U

X

1

3

2

0

R0

21

14

R1

regA

M

U

X

3

7

R2

Inst

mem

Register file

regB

10

M

U

X

PC

Data

memory

add 6 3 7

10

R3

3

21

11

11

R4

5

M

U

X

77

11

R5

data

1

R6

M

U

X

8

R7

nand

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide31

End of cycle 4

+

+

A

L

U

H2

H1

M

U

X

1

4

3

0

R0

14

R1

regA

21

M

U

X

7

R2

Inst

mem

Register file

regB

1

M

U

X

PC

Data

memory

lw 3 6 10

10

R3

-2

11

R4

7

5

3

M

U

X

77

10

R5

data

1

R6

M

U

X

8

R7

add

nand

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide32

First half of cycle 5

+

+

1

21

A

L

U

H2

H1

M

U

X

1

4

3

No Hazard

0

R0

3

14

R1

regA

21

M

U

X

7

R2

Inst

mem

Register file

regB

1

M

U

X

PC

Data

memory

lw 3 6 10

10

R3

-2

11

R4

7

5

3

M

U

X

77

10

R5

data

1

R6

M

U

X

8

R7

add

nand

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide33

+

+

A

L

U

H2

H1

End of cycle 5

M

U

X

1

5

4

0

R0

14

R1

regA

-2

M

U

X

7

R2

Inst

mem

Register file

regB

21

M

U

X

PC

Data

memory

sw 6 2 12

21

R3

6

22

11

R4

7

5

M

U

X

77

R5

data

1

R6

M

U

X

8

R7

10

lw

add

nand

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide34

en

+

+

en

A

L

U

H2

H1

First half of cycle 6

M

U

X

1

5

4

Hazard

0

R0

6

14

R1

regA

-2

M

U

X

7

R2

Inst

mem

Register file

regB

21

M

U

X

PC

Data

memory

sw 6 2 12

21

R3

22

11

R4

6

7

5

M

U

X

77

R5

L

1

R6

M

U

X

data

8

R7

10

lw

add

nand

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide35

+

+

A

L

U

noop

H2

End of cycle 6

M

U

X

1

5

0

R0

14

R1

regA

22

M

U

X

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

sw 6 2 12

21

R3

31

11

R4

6

7

M

U

X

-2

R5

data

1

R6

M

U

X

8

R7

lw

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide36

+

+

A

L

U

H2

First half of cycle 7

M

U

X

1

5

Hazard

0

R0

6

14

R1

regA

22

M

U

X

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

sw 6 2 12

21

R3

31

11

R4

6

7

M

U

X

-2

R5

data

1

R6

M

U

X

8

R7

noop

lw

add

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide37

+

+

A

L

U

H3

End of cycle 7

M

U

X

1

5

0

R0

14

R1

regA

M

U

X

7

R2

Inst

mem

Register file

regB

1

M

U

X

PC

Data

memory

21

R3

99

11

R4

6

M

U

X

-2

7

R5

data

1

R6

M

U

X

22

R7

12

sw

noop

lw

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide38

First half of cycle 8

+

+

99

12

A

L

U

H3

M

U

X

1

5

0

R0

14

R1

regA

M

U

X

7

R2

Inst

mem

Register file

regB

1

M

U

X

PC

Data

memory

21

R3

99

11

R4

6

M

U

X

-2

7

R5

data

1

R6

M

U

X

8

R7

12

sw

noop

lw

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

slide39

End of cycle 8

+

+

A

L

U

H3

M

U

X

1

0

R0

14

R1

regA

M

U

X

7

R2

Inst

mem

Register file

regB

M

U

X

PC

Data

memory

21

R3

111

11

R4

M

U

X

-2

R5

data

99

R6

M

U

X

8

R7

7

sw

noop

IF/

ID

ID/

EX

EX/

Mem

Mem/

WB

fp pipeline support
FP pipeline support

I

add

M1

M2

M3

M4

M5

M6

M7

Mem

WB

fetch

decode

FP multiply

A1

A2

A3

A4

FP adder

Non-pipelined divide

adding pipeline stages
Adding pipeline stages
  • Pipeline frontend
    • Fetch, Decode
  • Pipeline middle
    • Execute
  • Pipeline backend
    • Memory, Writeback
adding stages to fetch decode
Adding stages to fetch, decode
  • Delays hazard detection
  • No change in forwarding paths
  • No performance penalty with respect to data hazards
adding stages to execute
Adding stages to execute
  • Check for structural hazards
    • ALU not pipelined
    • Multiple ALU ops completing at same time
  • Data hazards may cause delays
    • If multicycle op hasn't computed data before the dependent instruction is ready to execute
  • Performance penalty for each stall
adding stages to memory writeback
Adding stages to memory, writeback
  • Instructions ready to execute may need to wait longer for multi-cycle memory stage
  • Adds more pipeline registers
    • Thus more source registers to forward
      • More complex hazard detection
      • Wider muxes
      • More control bits to manage muxes
wider pipelines
Wider pipelines

fetch

decode

execute

mem

WB

fetch

decode

execute

mem

WB

More complex hazard detection

2X pipeline registers to forward from

2X more instructions to check

2X more destinations (muxes)

making forwarding explicit
Making forwarding explicit
  • add r1  r2, EX/Mem ALU result
    • Include direct mux controls into the ISA
    • Hazard detection is now a compiler task
    • New micro-architecture leads to new ISA
    • Can reduce some resources
      • No longer need to build a heavily ported reg file

Ref: TTAs: Missing the ILP complexity wall