csc 4250 computer architectures n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CSC 4250 Computer Architectures PowerPoint Presentation
Download Presentation
CSC 4250 Computer Architectures

Loading in 2 Seconds...

play fullscreen
1 / 22

CSC 4250 Computer Architectures - PowerPoint PPT Presentation


  • 409 Views
  • Uploaded on

CSC 4250 Computer Architectures. October 13, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation. CPI Equation . Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls. Three Types of Dependences .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CSC 4250 Computer Architectures' - adamdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
csc 4250 computer architectures

CSC 4250Computer Architectures

October 13, 2006

Chapter 3. Instruction-Level Parallelism

& Its Dynamic Exploitation

cpi equation
CPI Equation

Pipeline CPI = Ideal pipeline CPI

+ Structural stalls

+ Data hazard stalls

+ Control stalls

three types of dependences
Three Types of Dependences
  • Data dependences (also called true data dependences)
  • Name dependences
  • Control dependences
data dependences
Data Dependences
  • An instruction j is data dependent on instruction i if either of the following holds:
    • Instruction i produces a result that may be used by instruction j, or
    • Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i.
example of data dependences
Example of Data Dependences
  • Example:

Loop: L.D F0,0(R1) ;F0 = array element

ADD.D F4,F0,F2 ;add scalar in F2

S.D F4,0(R1) ;store result

DADDUI R1,R1,#−8 ;decrement pointer 8 bytes

BNE R1,R2,Loop ;branch R1 != R2

  • The data dependences involve both FP data in F0 and F4, and integer data in R1
name dependences
Name Dependences
  • A name dependence occurs when two instructions use the same register or memory location, called a name, but there is no flow of data between the instructions associated with that name
example of name dependences
Example of Name Dependences
  • Code:

Loop: L.D F0,0(R1) ;F0 = array element

ADD.D F4,F0,F2 ;add scalar in F2

MUL.D F0,F4,F6 ;multiply by scalar in F6

SUB.D F4,F0,F8 ;subtract scalar in F8

S.D F4,0(R1) ;store result

DADDUI R1,R1,#−8 ;decrement ptr 8 bytes

BNE R1,R2,Loop ;branch R1 != R2

  • There are name dependences in F0 between ADD and MUL, in F4 between MUL and SUB, in F0 between Load and MUL, and in F4 between ADD and SUB.
two types of name dependences
Two Types of Name Dependences
  • Instruction iprecedes instruction j in program order:
    • An antidependence between instruction i and instruction j occurs when instruction j writes a register or memory location that instruction i reads. The original ordering must be preserved to ensure that i reads the correct value.
    • An output dependence occurs when instruction i and instruction j write the same register or memory location. The ordering between the instructions must be preserved to ensure that the value finally written corresponds to instruction j.
example of name dependences1
Example of Name Dependences
  • Code:

Loop: L.D F0,0(R1) ;F0 = array element

ADD.D F4,F0,F2 ;add scalar in F2

MUL.D F0,F4,F6 ;multiply by scalar in F6

SUB.D F4,F0,F8 ;subtract scalar in F8

S.D F4,0(R1) ;store result

DADDUI R1,R1,#−8 ;decrement ptr 8 bytes

BNE R1,R2,Loop ;branch R1 != R2

  • Which are the antidependences?
  • Which are the output dependences?
  • Which are the true data dependences?
register renaming
Register Renaming
  • Since a name dependence is not a true dependence, instructions involved in a name dependence can execute simultaneously or be reordered, if the name (register or memory location) used in the instructions is changed so that the instructions do not conflict. This renaming is easily done for register operands ─ register renaming.
  • IBM 360 computer family ─

Only four double-precision FP registers!

F0, F2, F4, F6.

pipeline data hazards
Pipeline Data Hazards
  • A hazard is created whenever there is a dependence between instructions, and they are close enough that the overlap caused by pipelining would change the order of access to the operand involved in the dependence
  • We must preserve the program order
three types of data hazards
Three Types of Data Hazards
  • Instruction i precedes instruction j in program order:
    • RAW ─ j tries to read a source before i writes it, so j may incorrectly get the old value; this is the most common hazard and corresponds to a true data dependence.
    • WAW ─ j tries to write an operand before it is written by i, so operand may incorrectly end up with the value written by i; this hazard corresponds to an output dependence.
    • WAR ─ j tries to write a destination before it is read by i, so i may incorrectly get the new value; this hazard arises from an antidependence.
example of register renaming 1
Example of Register Renaming (1)
  • Code:

DIV.D F0,F2,F4

ADD.D F6,F0,F

S.D F6,0(R1)

SUB.D F8,F10,F14

MUL.D F6,F10,F8

  • There is an antidependence between ADD.D and SUB.D, and an output dependence between ADD.D and MUL.D, leading to two possible hazards: a WAR hazard on the use of F8 by ADD.D and a WAW hazard since the ADD.D may finish later than the MUL.D.
  • There are also three true data dependences: between DIV.D and ADD.D, between SUB.D and MUL.D, and between ADD.D and S.D.
example of register renaming 2
Example of Register Renaming (2)
  • Using two temporary registers S and T, the code can be rewritten without any name dependences:

DIV.D F0,F2,F4

ADD.D S,F0,F8

S.D S,0(R1)

SUB.D T,F10,F14

MUL.D F6,F10,T

  • F6 in ADD.D is now S, eliminating the output dependence between ADD.D and MUL.D
  • F8 in SUB.D is now T, eliminating the antidependence between ADD.D and SUB.D
  • All subsequent uses of F8 must be replaced by T
control dependences
Control Dependences
  • A control dependence determines the ordering of an instruction with respect to a branch instruction so that the instruction is executed in correct program order and only when it should be. There are two constraints imposed by control dependences:
    • An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch.
    • An instruction that is not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch.
violating control dependences
Violating Control Dependences
  • Control dependence is not a critical property that must be preserved. We may be willing to execute instructions that should not have been executed, thereby violating the control dependences, if we can do so without affecting the correctness of the program. The two properties critical to program correctness ─ and normally preserved by maintaining both data and control dependences ─ are exception behavior and data flow.
speculation
Speculation
  • Consider the code:

DADDU R1,R2,R3

BEQZ R12,skipnext

DSUBU R4,R5,R6

DADDU R5,R4,R9

skipnext: OR R7,R8,R9

  • Suppose we know that the register destination R4 of DSUBU will be unused after the instruction named skipnext. If so, then changing the value of R4 just before the branch will not affect data flow since R4 will be dead (rather than live) in the code region after skipnext. Thus, if R4 were dead and DSUBU would not generate an exception, we could move DSUBU before the branch. This type of scheduling is called speculation, since the compiler is betting on the branch outcome; in this case, the bet is that the branch will not be taken.
dynamic scheduling using tomasulo s algorithm
Dynamic Scheduling using Tomasulo’s Algorithm
  • CDC 6600 ─ 1964;

Scoreboarding;

16 separate FUs.

  • IBM 360/91 ─ 1967

4 double precision FP registers;

One FP adder & one FP multiplier.

  • Tomasulo invented scheme to reduce structural hazards (using reservation stations), RAW hazards (using tags), and WAW and WAR hazards (using register renaming).
register renaming in tomasulo s scheme
Register Renaming in Tomasulo’s Scheme
  • Register renaming is provided by the reservation stations, which buffer the operands of instructions waiting to issue, and by the issue logic.
  • The basic idea is that a reservation station fetches and buffers an operand as soon as it is available, eliminating the need to get the operand from a register.
  • In addition, pending instructions designate the reservation station that will provide their input.
  • Finally, when successive writes to a register overlap in execution, only the last one is actually used to update the register.
reservation stations eliminate hazards
Reservation Stations Eliminate Hazards
  • As instructions are issued, the register specifiers for pending operands are renamed to the names of the reservation station, which provide register renaming.
  • Since there can be more reservation stations than registers, the technique can eliminate hazards arising from name dependences that cannot be eliminated by a compiler.
  • We will see how renaming occurs and how it eliminates WAR and WAW hazards.
two properties of hardware
Two Properties of Hardware
  • Hazard detection and execution control are distributed: The information held in the reservation stations at each FU determines when an instruction can begin execution at that unit.
  • Results are passed directly to FUs from the reservation stations where they are buffered, rather than going through the registers. This bypassing is done with a common data bus (CDB) that allows all units waiting for an operand to be loaded simultaneously.