Recap (Pipelining) - PowerPoint PPT Presentation

Recap pipelining l.jpg
Download
1 / 31

Recap (Pipelining) What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.45 min. 275 min.

Related searches for Recap (Pipelining)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Recap (Pipelining)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Recap pipelining l.jpg

Recap(Pipelining)


What is pipelining l.jpg

What is Pipelining?

  • A way of speeding up execution of tasks

  • Key idea:

    overlap execution of multiple taks


Automobile manufacturing l.jpg

Automobile Manufacturing

1. Build frame. 60 min.

2. Add engine. 50 min.

3. Build body. 80 min.

4. Paint. 40 min.

5. Finish.45 min.

275 min.

Latency: Time from start to finish for one car.

275 minutes per car.

(smaller is better)

Throughput: Number of finished cars per time unit.

1 car/275 min = 0.218 cars/hour

(larger is better)

Issues: How can we make the process better by adding?


An assembly line l.jpg

3

4

1

2

2

3

4

1

time

4

1

2

3

3

1

4

2

4

3

2

1

An Assembly line

80

80

60

80

50

80

80

40

45

Last two stages only receive onecar/80 min to work on.

Latency: 400 min/car

Throughput: 4 cars/640 min (1 car/160 min)

Will approach 1 car/80 min as time goes on

First two stagescan’t produce faster thanone car/80 min or a backlog will occurat third stage.


Pipelining a digital system l.jpg

1ns

200ps

200ps

200ps

200ps

200ps

Pipeline

Register

Pipelining a Digital System

  • Key idea: break big computation up into piecesSeparate each piece with a pipeline register


Pipelining a digital system6 l.jpg

Non-pipelined:

1 operation finishes

every 1ns

1ns

Pipelined:

1 operation finishes

every 200ps

200ps

200ps

200ps

200ps

200ps

Pipelining a Digital System

  • Why do this? Because it's faster for repeated computations


Comments about pipelining l.jpg

Comments about pipelining

  • Pipelining increases throughput, but not latency

    • Answer available every 200ps, BUT

    • A single computation still takes 1ns

  • Limitations:

    • Computations must be divisible into stages of equal sizes

    • Pipeline registers add overhead


Another example l.jpg

30ns

3ns

Comb.

Logic

R

E

G

Clock

Another Example

Unpipelined

System

  • One operation must complete before next can begin

  • Operations spaced 33ns apart

Delay = 33ns

Throughput = 30MHz

Op1

Op2

Op3

??

Time


3 stage pipelining l.jpg

10ns

3ns

10ns

3ns

10ns

3ns

Comb.

Logic

R

E

G

Comb.

Logic

R

E

G

Comb.

Logic

R

E

G

Clock

3 Stage Pipelining

  • Space operations 13ns apart

  • 3 operations occur simultaneously

Delay = 39ns

Throughput = 77MHz

Op1

Op2

Op3

Op4

Time


Limitation nonuniform pipelining l.jpg

5ns

3ns

15ns

3ns

10ns

3ns

Com.

Log.

R

E

G

Comb.

Logic

R

E

G

Comb.

Logic

R

E

G

Limitation: Nonuniform Pipelining

  • Throughput limited by slowest stage

    • Delay determined by clock period * number of stages

  • Must attempt to balance stages

Delay = 18 * 3 = 54 ns

Throughput = 55MHz

Clock


Limitation deep pipelines l.jpg

5ns

3ns

5ns

3ns

5ns

3ns

5ns

3ns

5ns

3ns

5ns

3ns

Com.

Log.

R

E

G

Com.

Log.

R

E

G

Com.

Log.

R

E

G

Com.

Log.

R

E

G

Com.

Log.

R

E

G

Com.

Log.

R

E

G

Clock

Limitation: Deep Pipelines

  • Diminishing returns as add more pipeline stages

  • Register delays become limiting factor

    • Increased latency

    • Small throughput gains

    • More hazards

Delay = 48ns, Throughput = 128MHz


Mips pipelining l.jpg

MIPSPipelining


Mips 5 stage pipeline l.jpg

MIPS 5-stage pipeline

  • The MIPS processor needs 5 stages to execute instructions

  • Pipelining stages:

    • IF - Instruction Fetch

    • ID - Instruction Decode

    • EX - Execute / Address Calculation

    • MEM - Memory Access (read / write)

    • WB - Write Back (results into register file)

  • Not all instructions need all the stages (e.g., add instruction does not need the MEM stage)


Basic mips pipelined processor l.jpg

Pipeline Registers

Basic MIPS Pipelined Processor

IF/ID

ID/EX

EX/MEM

MEM/WB


Pipelined example executing multiple instructions l.jpg

Consider the following instruction sequence:

lw $r0, 10($r1)

sw $sr3, 20($r4)

add $r5, $r6, $r7

sub $r8, $r9, $r10

Pipelined Example - Executing Multiple Instructions


Executing multiple instructions clock cycle 1 l.jpg

Executing Multiple InstructionsClock Cycle 1

LW


Executing multiple instructions clock cycle 2 l.jpg

Executing Multiple InstructionsClock Cycle 2

SW

LW


Executing multiple instructions clock cycle 3 l.jpg

Executing Multiple InstructionsClock Cycle 3

ADD

SW

LW


Executing multiple instructions clock cycle 4 l.jpg

Executing Multiple InstructionsClock Cycle 4

ADD

SW

LW

SUB


Executing multiple instructions clock cycle 5 l.jpg

Executing Multiple InstructionsClock Cycle 5

SUB

ADD

SW

LW


Executing multiple instructions clock cycle 6 l.jpg

Executing Multiple InstructionsClock Cycle 6

SUB

ADD

SW


Executing multiple instructions clock cycle 7 l.jpg

Executing Multiple InstructionsClock Cycle 7

ADD

SUB


Executing multiple instructions clock cycle 8 l.jpg

Executing Multiple InstructionsClock Cycle 8

SUB


Alternative view multicycle diagram l.jpg

Alternative View - Multicycle Diagram


Processor pipelining l.jpg

Processor Pipelining

  • There are two ways that pipelining can help:

    • Reduce the clock cycle time, and keep the same CPI

    • Reduce the CPI, and keep the same clock cycle time

      CPU time = Instruction count * CPI * Clock cycle time


Reduce the clock cycle time and keep the same cpi l.jpg

Reduce the clock cycle time, and keep the same CPI

CPI = 1

Clock = X Hz


Reduce the clock cycle time and keep the same cpi27 l.jpg

Pipeline Registers

ADD

ADD

M

U

X

M

E

U

X

X

T

N

D

Reduce the clock cycle time, and keep the same CPI

CPI = 1

Clock = X*5 Hz

4

PC

<<2

Instruction

I

ADDR

RD

32

32

16

5

5

5

Instruction

Memory

RN1

RN2

WN

RD1

Register File

ALU

WD

RD2

ADDR

Data

RD

Memory

16

32

WD


Reduce the cpi and keep the same cycle time l.jpg

Reduce the CPI, and keep the same cycle time

CPI = 5

Clock = X*5 Hz


Reduce the cpi and keep the same cycle time29 l.jpg

Pipeline Registers

ADD

ADD

4

PC

<<2

Instruction

I

ADDR

RD

32

32

16

5

5

5

Instruction

Memory

RN1

RN2

WN

RD1

Register File

ALU

WD

RD2

M

ADDR

U

X

Data

RD

Memory

M

E

U

X

16

32

X

WD

T

N

D

Reduce the CPI, and keep the same cycle time

CPI = 1

Clock = X*5 Hz


Pipeline performance l.jpg

Pipeline performance

  • Ideally we get a speedup (by reducing clock cycle or reducing the CPI) equal to the number of stages.

  • In practice, we do not achieve that – but we get close:

    • Pipelining has additional overhead (e.g., pipeline registers)

    • Pipeline hazards


Pipeline hazards l.jpg

Pipeline Hazards

  • Hazards are situations in pipelining which prevent the next instruction in the instruction stream from executing during the designated clock cycle.

  • Hazards reduce the ideal speedup gained from pipelining (e.g., CPI =1) and are classified into three classes:

    • Structural hazards

    • Data hazards

    • Control hazards


  • Login