Lecture 9 dynamic scheduling of pipeline
This presentation is the property of its rightful owner.
Sponsored Links
1 / 82

Lecture 9 Dynamic Scheduling of Pipeline PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 9 Dynamic Scheduling of Pipeline. Static vs Dynamic Scheduling. Static Scheduling by compiler Code motion for LD delay slots and branch delay slots Code motion for avoiding data dependency In-order instruction issue:

Download Presentation

Lecture 9 Dynamic Scheduling of Pipeline

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 9 dynamic scheduling of pipeline

Lecture 9Dynamic Scheduling of Pipeline

CS510 Computer Architectures


Static vs dynamic scheduling

Static vs Dynamic Scheduling

  • Static Scheduling by compiler

    • Code motion for LD delay slots and branch delay slots

    • Code motion for avoiding data dependency

    • In-order instruction issue:

      • If an instruction is stalled, no later instructions can proceed.

      • Multiple copies of a unit may be idle - inefficiency

  • Dynamic Scheduling by Hardware

    • Allow Out-of-order execution, Out-of-order completion

    • Even though an instruction is stalled, later instructions, with no data dependencies with the instructions which are stalled and causing the stall, can proceed

    • Efficient utilization of functional unit with multiple units

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

HW Schemes:Instruction Parallelism

  • Why scheduling in HW at run time?

    • Works when dependencies are unknown at compile time

    • Simpler compiler

    • Code for one machine runs well on another

  • Key idea: Allow instructions behind stall to proceed

    DIVDF0,F2,F4

    ADDDF10,F0,F8

    SUBDF8,F8,F14

    In DLX,SUBDcannot be executed even if there is a separate adder available to maintain in-order-execution.

    • Enables out-of-order execution => out-of-order completion

    • DLX ID stage: checked both for structural hazards and data dependencies

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

HW Schemes:Instruction Parallelism

  • Out-of-order execution divides ID stage:

    1.Issue - Decode instructions, check for structural hazards

    2.Read operands - Wait until no data hazards, then read operands

  • Scoreboards(Control Data Corp. CDC 6600) allow instruction to execute whenever 1 & 2 hold, not waiting for prior instructions

    • Centralized implementation of Hazard Detection and Resolution

    • Every instruction goes through scoreboard

    • Scoreboard determines when instruction can read operands and begin execution

    • Monitoring every change in hardware and determine when to execute instruction

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Scoreboard Implications

  • Out-of-order completion => WAR, WAW hazards?

    WARWAW

    ADDD R1,R2,R3ADDD R1,R2,R3

    LD R2,XLD R1,X

  • Solutions for WAR

    • Queue both the operation and copies of its operands

    • Read registers only during Read Operands stage

  • For WAW: stall until other to complete

  • Need to have multiple instructions in execution phase => multiple execution units or pipelined execution units (superpipeline)

  • Scoreboard keeps track of dependencies, and the state of operations

  • Scoreboard replaces ID, EX, WB with 4 stages

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

4 Stages of Scoreboard Control:1st Stage(ID1) - Issue

  • Decode instructions and check for structural hazards

  • If functional unitfor the instruction is free(no structural hazard), and no other active instruction has the same destination register(WAW)

    • Scoreboard issues instruction to functional unit

    • Updates internal data structure

  • IfStructural Hazard orWAW Hazardexists

    • Stall instruction issue

    • No further instruction issue until hazards are cleared

    • IF/ID1 Buffer allows further instruction fetch(IF)

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

4 Stages of Scoreboard Control:2nd Stage(ID2) - Read Operands

  • Wait until no Data Hazard, then Read Operands

  • To prevent RAW,

  • If no earlier issued active instruction is going to writing it, or

  • If the register containing the operand is being written by none of the currently active functional units

    • Source operand is available for read

    • Scoreboard tells the functional unit to read and begin execution

    • Scoreboard resolves RAW Hazard dynamically

  • => out of order execution

CS510 Computer Architectures


4 stages of scoreboard control 3rd stage ex execution

4 Stages of Scoreboard Control:3rd Stage(EX) - Execution

  • Operates on Operands

    • Functional Unit begins execution upon receiving operands

    • When the result is ready, the functional unit notifies the Scoreboard of the completion of execution

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

4 Stages of Scoreboard Control:4th Stage(WB) - Write Result

  • Finish Execution

  • When Scoreboard knows the functional unitcompleted execution

    • Scoreboard checks for WAR Hazard If not, it writes the results If WAR Hazard, it stalls the instruction

    • Example:

    • DIVDF0,F2,F4

    • ADDDF10,F0,F8

    • SUBDF8,F8,F14

    • CDC 6600 scoreboard would stall SUBD until ADDD reads operands

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

3 Parts of the Scoreboard

1.Instruction status- Indicates which of 4 steps(Issue,ReadOperands, Execution Complete, Write Result) the instruction is in

2.Functional unit status- Indicates the state of the functional unit (FU). 9 fields for each functional unit

Busy: Indicates whether the unit is busy or not

Op: Operation to perform in the unit (e.g., + or - )

Fi:Destination register number

Fj, Fk:Source-register numbers

Qj, Qk: Functional units producing source registers Fj, Fk

Rj, Rk: Flags indicating when Fj, Fk are ready

3.Register result status- Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

WAW(if the same destination register)

Wait until

Bookkeeping

Instruction status

WAR

Scoreboard Pipeline Control

Issue

Not busy (FU) and not result(D)

Busy(FU) ¬ yes; Op(FU) ¬ op;

Fi(FU) ¬ ‘D’; Fj(FU) ¬ ‘S1’;Fk(FU) ¬ ‘S2’; Qj ¬ Result(‘S1’); Qk ¬ Result(‘S2’); Rj ¬ not Qj; Rk ¬ not Qk; Result(‘D’) ¬ FU;

Read operands

Rj and Rk

Rj ¬ No; Rk ¬ No; Qj ¬ 0; Qk ¬ 0;

Execution complete

Functional unit done

Write result

"f((Fj( f ) ¹ Fi(FU) or Rj(f)=No) & (Fk( f ) ¹ Fi(FU) or Rk( f )=No))

"f(if Qj(f)=FU then Rj(f) ¬ Yes);"f(if Qk(f)=FU then Rk(f) ¬ Yes); Result(Fi(FU)) ¬ 0; Busy(FU) ¬ No

f: register number

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

Read Execution Write

Issue Operands Complete Result

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Register Result Status

Clock0F0 F2 F4 F6 F8 F10 F12 …… F30

FU

Scoreboard Example

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

Functional Unit Status

Name

IntegerN

Mult1N

Mult2N

Add N

DivideN

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F6 R2Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

1

Cycle 1

1

Int

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F6 R2 Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

2

Int

Cycle 2

2

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F6 R2

N

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

3

Int

Cycle 3

3

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F6 R2

N

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

4

Int

Cycle 4

4

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

Load F2 R3

Y

Y

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

5

Int

Int

Cycle 5

5

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

F2

Functional Unit Status

Name

Integer N

Mult1 Y

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F2 R3

N

Y

Y

F2

Y Mult1 F0 F2 F4 Int N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

6

Int

Cycle 6

6

6

Mult1

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

6

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

AddN

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F2 R3

F2

Y

N

Y

Y

F2

Mult F0 F2 F4 Int N Y

Sub F8 F6 F2 Int Y N

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

7

Mult1Int

Cycle 7

7

7

Add

Int

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

6

7

Functional Unit Status

Name

Integer N

Mult1 N

Mult2 N

Add N

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Load F2 R3

F2

Y

N

Y

F2

Y

F2

Int

N

Mult F0 F2 F4 Int Y

F0

F2

Int

N

Sub F8 F6 F2 Int Y N

Y

F2

F0

Div F10 F0 F6 Mult N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

8

Mult1Int

Add

Cycle 8a

8

Div

Int

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

6

7

8

Y

Mult F0 F2F4 Int Y

F2

F2

Int

N

Y

Functional Unit Status

Name

IntegerN

Mult1 N

Mult2 N

Add Y

Divide N

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

N

Load F2 R3

F2

Y

N

Y

F2

F2

Sub F8 F6 F2 Int Y N

F2

Y

Div F10 F0 F6 Mult N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

8

Int

Add

Div

Mult1

Cycle 8b

8

N

Y

Int

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

6

7

8

Functional Unit Status

Name

Integer

Mult1

Mult2

Add Y

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

N

Load F2 R3

F2

Y

N

Y

F2

Y

Mult F0 F2 F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

Sub F8 F6 F2 Int Y Y

F2

N

N

Y

Div F10 F0 F6 Mult N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

9

Int

Add

Div

Mult1

Cycle 9

8

9

9

Time

10

2

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

9

8

Functional Unit Status

Name

Integer

Mult1

Mult2

Add Y

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

N

Load F2 R3

F2

Y

N

Y

F2

Y

Mult F0 F2 F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

N

Sub F8 F6 F2 Int Y Y

F2

N

N

Y

Div F10 F0 F6 Mult N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

11

Int

Add

Div

Mult1

Cycle 11

11

8

0

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

8

Functional Unit Status

Name

Integer

Mult1

Mult2

Add Y

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

N

Load F2 R3

F2

Y

N

Y

F2

Y

Mult F0 F2 F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

N

Sub F8 F6 F2 Int Y Y

F2

N

N

Y

Div F10 F0 F6 Mult N Y

Y

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

12

Int

Add

Div

Mult1

Cycle 12

12

7

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4Y

F2

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

13

Mult1

Int

Add

Div

Cycle 13

13

6

Y

Add

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

5

2

Y

Mult F0 F2 F4 Y

F2

5

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

14

Mult1

Int

Add

Add

Div

Cycle 14

14

Y

N

N

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

14

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

4

1

Y

Mult F0 F2 F4 Y

F2

4

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

15

Mult1

Int

Add

Add

Div

Cycle 15

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

14

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

3

0

Y

Mult F0 F2 F4 Y

F2

3

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

16

Mult1

Int

Add

Add

Div

Cycle 16

16

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

17

Mult1

Int

Add

Add

Div

Cycle 17

2

2

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

18

Mult1

Int

Add

Add

Div

Cycle 18

1

1

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

7

11

9

12

8

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2F4 Y

F2

F2

F2

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 Mult N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

19

Mult1

Int

Add

Add

Div

Cycle 19

19

0

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

19

7

11

9

12

8

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

MultF0F2F4 Y

F2

F2

F2

Int

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 N Y

Y

F6

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

20

Mult1

Int

Add

Add

Div

Cycle 20

20

N

Y

Mult1

F0

Y

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

19

20

7

11

9

12

8

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4 Y

F2

N

F2

F2

Int

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 N Y

Y

F0

F6

Y

N

N

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

21

Mult1

Int

Add

Add

Div

Cycle 21

21

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

19

20

7

11

9

12

8

21

13

14

16

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4 Y

F2

N

F2

F2

Int

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

F6

N

N

Div F10 F0 F6 N Y

Y

F0

F6

Y

N

N

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

22

Mult1

Int

Add

Add

Div

Cycle 22

22

N

40

F6

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

19

20

7

11

9

12

8

21

13

14

16

22

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4 Y

F2

N

F2

F2

Int

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

N

F6

N

N

Div F10 F0 F6

Y

F0

F6

F6

Y

N

N

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

61

Mult1

Int

Add

Add

Div

Cycle 61

61

0

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Instruction Status

Instruction j k

LDF6 34 + R2

LDF2 45 + R3

MULTF0 F2 F4

SUBDF8 F6 F2

DIVDF10 F0 F6

ADDDF6 F8 F2

Read Execution Write

Issue Operands Complete Result

1

2

3

4

5

6

7

8

6

9

19

20

7

11

9

12

8

21

61

13

14

16

22

Functional Unit Status

Name

Integer N

Mult1

Mult2

Add

Divide

dest S1 S2 FU for j FU for k Fj? Fk?

Busy Op Fi Fj Fk Qj Qk Rj Rk

Time

Y

Mult F0 F2 F4 Y

F2

N

F2

F2

Int

Int

N

N

Y

N

Y

N

N

Add F6 F8 F2 Y Y

Y

N

F6

N

N

Div F10 F0 F6

Y

F0

F6

F6

Y

N

N

Register Result Status

Clock F0 F2 F4 F6 F8 F10 F12 …… F30

FU

62

Mult1

Int

Add

Div

Add

Cycle 62

62

N

CS510 Computer Architectures


Scoreboard summary

Scoreboard Summary

Scoreboard Summary

  • Speedup 1.7 from compiler; 2.5 by hand BUT slow memory (no cache) limits benefit

  • Limitations of 6600 scoreboard:

    • No forwarding hardware

    • Limited to instructions in basic block (small window)

    • Small number of functional units (structural hazards)

    • Wait for WAR hazards

    • Prevent WAW hazards

  • Speedup 1.7 from FORTRAN program, 2.5 by hand coded Assembly Language program BUT slow memory (no cache) limits benefit

  • Limitations of 6600 scoreboard:

    • No forwarding hardware

    • Limited to instructions in basic block (small window)

    • Small number of functional units (structural hazards)

    • Wait for WAR hazards

    • Prevent WAW hazards

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

CS510 Computer Architectures


Case study tomasulo algorithm

Case Study:Tomasulo Algorithm

CS510 Computer Architectures


Limitations of scoreboard

Limitations of Scoreboard

  • No forwarding

  • Limited to instructions in basic block (small window)

  • Number of functional units(structural hazards)

  • Wait for WAR hazards

  • Prevent WAW hazards

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Another Dynamic Algorithm: Tomasulo Algorithm

  • For IBM 360/91 about 3 years after CDC 6600

  • Goal: High Performance without special compilers

  • Differences between IBM 360 & CDC 6600 ISA

    • IBM has only 2 register specifiers/instr vs. 3 in CDC 6600

    • IBM has 4 FP registers vs. 8 in CDC 6600

  • Differences between Tomasulo Algorithm & Scoreboard

    • Control & buffers are distributed with Function Units, called “reservation stations” vs. centralized in scoreboard;

    • Registers in instructions are replaced by pointers to reservation station buffer

    • HW renaming of registers to avoid WAR, WAW hazards

    • Common Data Bus(CDB) broadcasts results to all FUs

    • Load and Stores treated as FUs as well

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Only Data Dependence

with Register Renaming

Name dependence(arrows) and

Data Dependence(blue& green)

Loop:LDF0, 0(R1)

ADDDF4,F0, F2

SD0(R1),F4

LDF6, -8(R1)

ADDDF8,F6, F2

SD-8(R1),F8

LDF10, -16(R1)

ADDDF12,F10, F2

SD-16(R1),F12

LDF14, -24(R1)

ADDDF16,F14, F2

SD-24(R1),F16

SUBIR1,R1, #32

BNEZR1, Loop

Loop:LDF0, 0(R1)

ADDDF4,F0,F2

SD0(R1),F4

LDF0, -8(R1)

ADDDF4,F0, F2

SD-8(R1),F4

LDF0, -16(R1)

ADDDF4,F0, F2

SD-16(R1),F4

LDF0, -24(R1)

ADDDF4,F0, F2

SD-24(R1),F4

SUBIR1, R1, #32

BNEZR1, Loop

Register

Renaming

Register Renaming

CS510 Computer Architectures


Tomasulo organization

FromInstructionUnit

FromMemory

FP

Registers

Floating

Point

Operations

Queue

(Issue)

Load

Buffers

(values to be

loaded in

registers)

6

5

4

3

2

1

Operand

Bus

Store

Buffers

(addresses)

3

2

1

Operation Bus

To Memory

To Memory

FP Multiply

Reservation

Station

FP Add

Reservation

Station

3

2

1

2

1

FP Multiplier

FP Adder

Tomasulo Organization

Reservation

Station

Common Data Bus(CDB)

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Reservation Station Components

Op:Operation to perform in the unit (e.g., + or - )

Qj, Qk:Reservation stations producing source Vj, Vk. 0 indicates that Vj,Vk are ready, eliminating Rj, Rk fields in scoreboard

Vj, Vk:Value of Source operands

Busy:Indicates reservation station and FU is busy

Register result status:Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Three Stages of Tomasulo Algorithm

1.Issue: Get instruction from FP Op Queue

  • FP op: If reservation station is free, issue instr, and send operation & operands if they are in Reg’s(renames Reg’s).

  • LD/ST: If Buffer is available, issue instr.

  • If reservation station or buffer is not available, structural hazard-stall

  • Register renaming

    2.Execution: Operate on operands (EX)

  • When an operand is ready, put it in the reservation station.

  • If not ready, watch CDB for registers.

  • When both operands are available, execute

  • RAW check

    3. Write Result: Finish execution (WB)

  • When result is available write on Common Data Bus, and from there to all awaiting units; Registers, Reservation stations

  • Mark reservation station available.

CS510 Computer Architectures


Cycle 0

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

LD1

No

LD

F6

34+

R2

LD

F2

45+

R3

LD2

No

MULTD

F0

F2

F4

LD3

No

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

No

0

Add1

No

0

0

Add2

No

Add3

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Clock

R2

R3

Qi

0

80

90

Cycle 0

CS510 Computer Architectures


Cycle 1

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

LD F8 34+ R2

LD1

No

LD

F6

34+

R2

LD

F2

45+

R3

LD2

No

MULTD

F0

F2

F4

LD3

No

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

No

0

Add1

No

0

Add2

No

Add3

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Clock

R2

R3

Qi

0

80

90

Cycle 1

Yes 34+80

1

LD1

1

CS510 Computer Architectures


Cycle 2

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

LD F8 34+ R2

1

LD1

No

LD

F6

34+

R2

LD F2 45+ R3

LD

F2

45+

R3

LD2

No

MULTD

F0

F2

F4

LD3

No

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

No

0

Add1

No

0

Add2

No

Add3

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Clock

R2

R3

Qi

1

0

80

90

Cycle 2

Yes 45+90

2

LD2

LD1

2

CS510 Computer Architectures


Cycle 3

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

LD F8 34+ R2

1

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

MULTD

F0

F2

F4

LD3

No

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

No

0

Add1

No

0

Add2

No

Add3

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Clock

R2

R3

LD2

Qi

1

2

0

80

90

Cycle 3

3

3

Yes MULTD R(F4) LD2

0

Mult1

LD1

3

CS510 Computer Architectures


Cycle 4a

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

LD F6 34+ R2

1

3

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

No

0

Add1

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

0

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Clock

R2

R3

Mult1

LD2

Qi

2

1

3

0

80

90

Cycle 4a

4

Yes SUBD LD1 LD2

LD1

Add1

4

CS510 Computer Architectures


Cycle 4b

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

No

Yes 34+80

LD F6 34+ R2

1

3

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

4

SUBD

F8

F6

F2

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

Yes SUBD LD1 LD2

0

No

0

Add1

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

0

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

LD1

Add1

Clock

R2

R3

Mult1

LD2

Qi

4

3

2

1

0

80

90

Cycle 4b

4

4

M(114)

M(114)

CS510 Computer Architectures


Cycle 5

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

No

LD F6 34+ R2

1

3

4

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

4

No

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

4

SUBD

F8

F6

F2

DIVD F10 F0 F6

DIVD

F10

F0

F6

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

Yes SUBD LD1 LD2

M(114)

0

No

0

Add1

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

M(135)

0

No

0

Mult1

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

LD1

Add1

Clock

R2

R3

Mult1

LD2

Qi

2

1

4

3

0

80

90

M(114)

Cycle 5

5

5

M(135)

2

10

Yes DIVD M(114) Mult1

Mult2

5

M(135)

CS510 Computer Architectures


Cycle 6

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

No

LD F6 34+ R2

1

3

4

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

4

5

No

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

4

SUBD

F8

F6

F2

DIVD F10 F0 F6

5

DIVD

F10

F0

F6

ADDD F6 F8 F2

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

Yes SUBD LD1 LD2

M(114)

M(135)

2

0

No

0

Add1

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

M(135)

10

0

No

0

Mult1

Yes DIVD M(114) Mult1

0

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

LD1

Add1

Mult2

Clock

R2

R3

Mult1

LD2

Qi

3

1

2

5

4

0

80

90

M(135)

M(114)

Cycle 6

6

1

Yes ADDD M(135) Add1

9

Add2

6

CS510 Computer Architectures


Cycle 7

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

No

Yes 34+80

LD F6 34+ R2

1

3

4

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

4

5

No

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

4

SUBD

F8

F6

F2

DIVD F10 F0 F6

5

DIVD

F10

F0

F6

6

ADDD F6 F8 F2

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

Yes SUBD LD1 LD2

M(114)

M(135)

1

0

2

No

0

Add1

Yes ADDD M(135) Add1

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

M(135)

9

0

10

No

0

Mult1

Yes DIVD M(114) Mult1

0

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Add2

LD1

Add1

Mult2

Clock

R2

R3

Mult1

LD2

Qi

6

5

4

1

2

3

0

80

90

M(135)

M(114)

Cycle 7

7

0

8

7

CS510 Computer Architectures


Cycle 8

Instruction status

Exec

Write

Busy

Address

Instruction j k

Issue

complete

Result

Yes 34+80

No

LD F6 34+ R2

1

3

4

LD1

No

LD

F6

34+

R2

Yes 45+90

LD F2 45+ R3

2

4

5

No

LD

F2

45+

R3

LD2

No

MULTD F0 F2 F4

3

MULTD

F0

F2

F4

LD3

No

SUBD F8 F6 F2

4

7

SUBD

F8

F6

F2

DIVD F10 F0 F6

5

DIVD

F10

F0

F6

6

ADDD F6 F8 F2

ADDD

F6

F8

F2

S1

S2

RS for j

Reservation Stations

RS for k

Busy

Op

Vj

Vk

Qj

Qk

Time

Name

Yes SUBD LD1 LD2

M(114)

M(135)

0

2

1

0

No

0

Add1

Yes ADDD M(135) Add1

M()-M()

No

0

Add2

No

Add3

Yes MULTD R(F4) LD2

M(135)

8

9

10

0

No

0

Mult1

Yes DIVD M(114) Mult1

0

No

0

Mult2

Register result status

F0

F2

F4

F6

F8

F10

F12

...

F30

Add2

LD1

Add1

Mult2

Clock

R2

R3

Mult1

LD2

Qi

3

2

7

5

6

4

1

0

80

90

M(135)

M(114)

M()-M()

Cycle 8

8

No

7

8

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

CS510 Computer Architectures


Dynamic loop unrolling by tomasulo

Dynamic Loop Unrolling by Tomasulo

  • Eliminating WAW and WAR hazard by dynamic renaming of registers

  • Predict branch TAKEN will allow multiple instruction in the loop proceed in parallel

  • By the dynamic loop unrolling and register renaming, requirement of many registers in the loop unrolling can be avoided

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Tomasulo Loop Example

Loop:LDF0,0(R1)

MULTDF4,F0,F2

SD0(R1), F4

SUBIR1,R1,#8

BNEZR1,Loop

  • This example shows dynamic loop unrolling, it shows the

  • completion of the first 2 iterations

    • Multiply takes 4 clocks

    • The Load in the 1st iteration has a cache miss which takes 8 cycles

CS510 Computer Architectures


Loop example cycle 0

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Qi

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

080

Loop Example: Cycle 0

CS510 Computer Architectures


Cycle 11

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Qi

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

080

Cycle 1

Yes 80

1

1

Load1

CS510 Computer Architectures


Cycle 21

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Qi

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

080

Cycle 2

Yes 80

LD cannot progress

due to cache miss

1

2

Yes MULTD R(F2) Load1

1

2

Load1

Mult1

CS510 Computer Architectures


Cycle 31

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

2

Qi

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

1

2

080

Load1

Mult1

Cycle 3

X cannot progress

due to F0

3

Mult1

Yes 80

3

CS510 Computer Architectures


Cycle 4

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

2

Qi

3

Mult1

Yes 80

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

080

1

3

2

72

Load1

Mult1

Execute SUBI for R1 in the 2nd iteration

Cycle 4

SD cannot progress

due to F4

4

CS510 Computer Architectures


Cycle 51

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

2

Qi

3

Mult1

Yes 80

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

2

4

080

3

1

72

Load1

Mult1

Cycle 5

5

Execute BNEZ to get to the 2nd iteration

CS510 Computer Architectures


Cycle 61

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

2

Predict TAKEN,

issue LD

Qi

3

6

Mult1

Yes 80

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

5

4

2

080

1

3

72

Load1

Mult1

Cycle 6

Yes 72

6

Load2

CS510 Computer Architectures


Cycle 71

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

Yes 72

2

Qi

3

6

Mult1

Yes 80

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

2

1

5

4

080

3

6

72

Load1

Load2

Mult1

Cycle 7

7

Yes MULTD R(F2) Load2

7

Mult2

CS510 Computer Architectures


Cycle 81

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

Yes 72

2

Qi

3

6

Mult1

Yes 80

7

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Yes MULTD R(F2) Load2

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

7

6

5

4

2

1

080

3

72

Load1

Load2

Mult1

Mult2

Cycle 8

Mult2

Yes 72

8

8

CS510 Computer Architectures


Cycle 9

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

9

Yes 72

2

Cache miss

penalty over

Qi

3

6

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

Yes MULTD R(F2) Load2

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

6

2

3

4

1

080

7

8

5

72

64

Load1

Load2

Mult2

Mult1

Execute SUBI for 3rd iteration

Cycle 9

9

CS510 Computer Architectures


Cycle 10

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

9

Yes 72

2

Qi

3

6

10

Mult1

Yes 80

Cache available

for access

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

Yes MULTD R(F2) Load1

M(80)

Yes MULTD R(F2) Load2

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

2

1

080

5

4

3

7

8

6

9

64

72

Load1

Load2

Mult2

Mult1

Cycle 10

10

Start x 4

10

Execute BNEZ to get to the 3rd iteration

CS510 Computer Architectures


Cycle 111

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

1

9

10

Yes 72

2

Qi

3

6

10

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

0

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

4

YesMULTD R(F2) Load1

M(80)

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

10

3

8

080

1

4

2

9

6

7

5

72

64

Load1

Load2

Mult2

Mult1

Cycle 11

N

11

3

Start x 4

11

CS510 Computer Architectures


Cycle 12

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

2

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

3

4

Yes MULTD R(F2) Load1

M(80)

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

10

11

8

6

5

080

7

2

3

9

1

4

72

64

Load1

Load2

Load2

Mult2

Mult1

Cycle 12

N

N

2

3

12

Load3

CS510 Computer Architectures


Cycle 13

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

2

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

4

3

2

Yes MULTD R(F2) Load1

M(80)

3

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

11

10

12

4

5

2

1

9

080

8

3

6

7

64

72

Load1

Load2

Load2

Load3

Mult1

Mult2

Cycle 13

N

N

1

2

13

CS510 Computer Architectures


Cycle 14

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

N

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

2

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

1

4

3

2

Yes MULTD R(F2) Load1

M(80)

2

3

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

11

12

13

10

1

4

080

5

3

7

2

8

6

9

64

72

Load1

Load2

Load3

Load2

Mult1

Mult2

Cycle 14

N

N

14

12

0

1

14

CS510 Computer Architectures


Cycle 15

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

N

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

2

14

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

7

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

3

2

1

0

4

N

Yes MULTD R(F2) Load1

M(80)

1

2

3

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

14

13

10

12

11

6

9

5

4

8

7

1

080

2

3

72

64

Load1

Load2

Load3

Load2

Mult2

Mult1

Cycle 15

13 14

N

N

15

12

M[80]*R(F2)

15

0

15

CS510 Computer Architectures


Cycle 16

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

2

14

15

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

M[80]*R(F2)

7

15

Mult2

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

1

3

4

0

2

N

Yes MULTD R(F2) Load1

M(80)

3

2

1

0

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

14

11

15

13

12

10

6

2

3

1

5

080

9

4

8

7

72

64

Load1

Load2

Load2

Load3

Mult2

Mult1

Cycle 16

13 14

N

N

12

15

16

16

M[72]*R(F2)

N

16

CS510 Computer Architectures


Cycle 17

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

16

2

14

15

Qi

Yes 64

3

6

10

11

Mult1

Yes 80

M[80]*R(F2)

7

15

16

Mult2

M[72]*R(F2)

Yes 72

8

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

0

1

4

2

3

N

Yes MULTD R(F2) Load1

M(80)

2

3

1

0

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

16

15

10

12

14

11

13

9

4

5

6

1

080

8

3

7

2

64

72

Load1

Load3

Load2

Load2

Mult1

Mult2

Cycle 17

N

17

CS510 Computer Architectures


Cycle 18

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

16

2

14

15

Qi

Yes 64

3

3/17

6

10

11

Mult1

Yes 80

M[80]*R(F2)

7

15

16

M[72]*R(F2)

Mult2

Yes 72

8

Yes 64

Mult1

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

0

1

4

3

2

N

Yes MULTD R(F2) Load1

M(80)

2

1

0

3

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

13

10

16

12

14

17

11

15

7

6

3

5

2

080

1

4

9

8

56

64

72

Load1

Load2

Load2

Load3

Mult1

Mult2

Mult1

Execute SUBI for 4th iteration

Cycle 18

18

N

18

CS510 Computer Architectures


Cycle 19

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

16

2

14

15

Qi

Yes 64

3

3/17

18

N

6

10

11

Mult1

Yes 80

M[80]*R(F2)

7

15

16

Mult2

M[72]*R(F2)

Yes 72

8

Yes 64

Mult1

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

1

3

2

0

4

N

Yes MULTD R(F2) Load1

M(80)

2

3

0

1

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

11

12

10

18

15

16

14

13

17

4

1

8

7

9

080

3

2

6

5

64

72

56

Load1

Load2

Load3

Load2

Mult2

Mult1

Mult1

Cycle 19

19

N

19

Execute BNEZ to get to the 4th iteration

CS510 Computer Architectures


Cycle 20

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

16

2

14

15

Qi

Yes 64

3/17

3

18

19

6

10

11

Mult1

Yes 80

M[80]*R(F2)

7

15

16

Mult2

M[72]*R(F2)

Yes 72

8

Yes 64

Mult1

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

2

0

4

3

1

Yes MULTD R(F2) Load1

N

M(80)

3

0

1

2

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

19

10

15

14

16

12

11

18

17

13

6

8

9

7

5

2

1

080

3

4

72

56

64

Load1

Load2

Load3

Load2

Mult2

Mult1

Mult1

Cycle 20

N

20

N

20

CS510 Computer Architectures


Cycle 211

Instruction StatusExec Write

Instruction j k IterIssue Comp Result

Busy Address

N

Load1

Load2

Load3

Store1

Store2

Store3

N

N

N

N

N

N

LDF0 0R1 1

MULTDF4 F0F2 1

SDF4 0R1 1

LDF0 0R1 2

MULTDF4 F0F2 2

SDF4 0R1 2

Yes 80

11

1

9

10

Yes 72

16

2

14

15

Qi

Yes 64

3/17

3

18

19

N

6

10

11

Mult1

Yes 80

M[80]*R(F2)

N

7

15

16

Mult2

M[72]*R(F2)

Yes 72

8

20

Yes 64

Mult1

Reservation StationS1 S2 RS for j RS for k

TimeName Busy Operation Vj Vk Qj Qk

Add1

Add2

Add3

Mult1

Mult2

0

0

0

0

4

N

NNNN

LDF0 0 R1

MULTDF4 F0 F2

SDF4 0 R1

SUBIR1 R1 #8

BNEZR1 Loop

0

3

2

4

1

Yes MULTD R(F2) Load1

N

M(80)

3

1

2

0

Yes MULTD R(F2) Load2

M(72)

Register Result Status

ClockR1 F0 F2 F4 F6 F8 F10 . . . F30

Qi

13

14

10

11

12

15

19

18

17

16

20

9

080

1

2

8

5

3

4

6

7

64

56

72

Load1

Load3

Load2

Load2

Mult2

Mult1

Mult1

Cycle 21

21

N

21

CS510 Computer Architectures


Lecture 9 dynamic scheduling of pipeline

Tomasulo Summary

  • Prevents Register as bottleneck

  • Avoids WAR, WAW hazards of Scoreboard

  • Allows loop unrolling in HW

  • Not limited to basic blocks (provided branch prediction)

  • Lasting Contributions

    • Dynamic scheduling

    • Register renaming

    • Load/store disambiguation

  • Next: More branch prediction

CS510 Computer Architectures


  • Login