Data Hazards
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on
  • Presentation posted in: General

Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. A different 2-Stage pipeline: 2-Stage-DH pipeline. eEpoch. fEpoch. d2e. Fetch, Decode, RegisterFetch. Execute, Memory, WriteBack. Register File.

Download Presentation

Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab.

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data hazards in pipelined processors arvind computer science artificial intelligence lab

Data Hazards

in Pipelined Processors

Arvind

Computer Science & Artificial Intelligence Lab.

Massachusetts Institute of Technology

http://csg.csail.mit.edu/6.375


A different 2 stage pipeline 2 stage dh pipeline

A different 2-Stage pipeline:2-Stage-DH pipeline

eEpoch

fEpoch

d2e

Fetch, Decode, RegisterFetch

Execute, Memory, WriteBack

Register File

redirect

PC

Execute

Decode

pred

Fifos

Data

Memory

Inst

Memory

Use the same epoch solution for control hazards as before

http://csg.csail.mit.edu/6.375


Data hazards

fetch & decode

execute

d2e

Data Hazards

dMem

pc

rf

time t0t1t2t3t4t5t6t7. . . .

FDstageFD1FD2FD3FD4FD5

EXstageEX1EX2EX3EX4EX5

I1Add(R1,R2,R3)

I2Add(R4,R1,R2)

I2 must be stalled until I1 updates the register file

time t0t1t2t3t4t5t6t7. . . .

FDstageFD1FD2FD2FD3FD4FD5

EXstageEX1EX2EX3EX4EX5

http://csg.csail.mit.edu/6.375


Dealing with data hazards

Dealing with data hazards

  • Keep track of instructions in the pipeline and determine if the register values to be fetched are stale, i.e., will be modified by some older instruction still in the pipeline. This condition is referred to as a read-after-write (RAW) hazard

  • Stall the Fetch from dispatching the instruction as long as RAW hazard prevails

  • RAW hazard will disappear as the pipeline drains

Scoreboard: A data structure to keep track of the instructions in the pipeline beyond the Fetch stage

http://csg.csail.mit.edu/6.375


Data hazard

Data Hazard

  • Data hazard depends upon the match between the source registers of the fetched instruction and the destination register of an instruction already in the pipeline

  • The matching of source and destination must take into account whether the source and destination registers are valid

functionBoolisFound

(Maybe#(RIndx) dst, Maybe#(RIndx) src);

returnisValid(dst) && isValid(src) &&

(validValue(dst)==validValue(src));

endfunction

http://csg.csail.mit.edu/6.375


Scoreboard keeping track of instructions in execution

Scoreboard: Keeping track of instructions in execution

  • Scoreboard: a data structure to keep track of the destination registers of the instructions beyond the fetch stage

    • method insert: inserts the destination (if any) of an instruction in the scoreboard when the instruction is decoded

    • method search1(src): searches the scoreboard for a data hazard

    • method search2(src):same as search1

    • method remove: deletes the oldest entry when an instruction commits

http://csg.csail.mit.edu/6.375


2 stage dh pipeline scoreboard and stall logic

2-Stage-DH pipeline:Scoreboard and Stall logic

eEpoch

fEpoch

Register File

redirect

PC

Execute

Decode

pred

d2e

Data

Memory

Inst

Memory

scoreboard

http://csg.csail.mit.edu/6.375


2 stage dh pipeline corrected

2-Stage-DH pipeline corrected

modulemkProc(Proc);

Reg#(Addr) pc <- mkRegU;

RFilerf <- mkRFile;

IMemoryiMem <- mkIMemory;

DMemorydMem <- mkDMemory;

Fifo#(Decode2Execute) d2e <- mkFifo;

Reg#(Bool) fEpoch <- mkReg(False);

Reg#(Bool) eEpoch <- mkReg(False);

Fifo#(Addr) execRedirect <- mkFifo;

Scoreboard#(1) sb <- mkScoreboard;

// contains only one slot because Execute

// can contain at most one instruction

ruledoFetch …

ruledoExecute…

http://csg.csail.mit.edu/6.375


2 stage dh pipeline dofetch rule second attempt

2-Stage-DH pipelinedoFetch rule second attempt

ruledoFetch;letinst = iMem.req(pc);if(execRedirect.notEmpty)beginfEpoch <= !fEpoch; pc <= execRedirect.first;

execRedirect.deq; endelse

beginletppc = nextAddrPredictor(pc); pc <= ppc;

letdInst = decode(inst);

letstall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);

if(!stall) beginlet rVal1 = rf.rd1(validRegValue(dInst.src1));let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch,

rVal1: rVal1, rVal2: rVal2});

sb.insert(dInst.rDst);end

endendrule

What should happen to pc when Fetch stalls?

pc should change only when the instruction is enqueued in d2e

http://csg.csail.mit.edu/6.375


2 stage dh pipeline dofetch rule corrected

2-Stage-DH pipelinedoFetch rule corrected

ruledoFetch;letinst = iMem.req(pc);if(execRedirect.notEmpty)beginfEpoch <= !fEpoch; pc <= execRedirect.first;

execRedirect.deq; endelse

beginletppc = nextAddrPredictor(pc); pc <= ppc;

letdInst = decode(inst);

letstall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);

if(!stall) beginlet rVal1 = rf.rd1(validRegValue(dInst.src1));let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch,

rVal1: rVal1, rVal2: rVal2});

sb.insert(dInst.rDst);end

endendrule

pc <= ppc; end

http://csg.csail.mit.edu/6.375


2 stage dh pipeline doexecute rule corrected

2-Stage-DH pipelinedoExecute rule corrected

ruledoExecute;let x = d2e.first;

letdInst = x.dInst; let pc = x.pc;

letppc = x.ppc; letepoch = x.epoch;

let rVal1 = x.rVal1; let rVal2 = x.rVal2;

if(epoch == eEpoch) begin

leteInst = exec(dInst, rVal1, rVal2, pc, ppc);if(eInst.iType == Ld) eInst.data <-

dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?});else if (eInst.iType == St) letd <-

dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data});if (isValid(eInst.dst))

rf.wr(validRegValue(eInst.dst), eInst.data);if(eInst.mispredict) beginexecRedirect.enq(eInst.addr); eEpoch<= !eEpoch; endendd2e.deq;sb.remove;endrule

http://csg.csail.mit.edu/6.375


Scoreboard method calls concurrency and correctness issues

Scoreboard method calls:concurrency and correctness issues

ruledoFetch; ...

letdInst = decode(inst);

let stall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);

if(!stall)

begin ... sb.insert(dInst.rDst);pc <= ppc; end

endendrule

  • scoreboard must permit concurrent execution of search1, search2 and insert for this rule to execute.Should the result of search be affected by

    • concurrent insert?

    • possibly concurrent remove?

No

 search {<, CF} insert

If search is affected by remove then the effect of removed instruction must have taken place. Otherwise make (search CF remove)

http://csg.csail.mit.edu/6.375


Assumptions about method calls

Assumptions about method calls

Register File

wr

  • Within a rule

    • d2e.first ? d2e.deq

    • redirect.first?redirect.deq

    • sb.search?sb.insert

  • High-level correctness

    • What must be true if sb.search > sb.remove ?

rd1

rd2

redirect

doFetch

doExecute

d2e

search

insert

remove

<

Scoreboard

<

<

  • The effect of the removed instruction must have taken place, i.e., any updates to the register file must be visible to the subsequent register reads

http://csg.csail.mit.edu/6.375


Concurrency analysis normal register file rf rd rf wr

Concurrency analysisnormal Register File (rf.rd< rf.wr)

Register File

wr

  • Assume both redirect and d2e are CF FIFOs, for concurrent execution

    doFetch?doExecute

     {sb.search, sb.insert} ?sb.remove

rd1

rd2

redirect

doFetch

doExecute

d2e

search

insert

remove

Scoreboard

<

{<, CF}

http://csg.csail.mit.edu/6.375


Concurrency analysis bypass register file rf rw rf wd

Concurrency analysisBypass Register File (rf.rw < rf.wd)

Register File

wr

  • Assume both redirect and d2e are CF FIFOs, for concurrent execution

    doExecute? doFetch

     sb.remove?{sb.search, sb.insert}

rd1

rd2

redirect

doFetch

doExecute

d2e

search

insert

remove

Scoreboard

<

{<, CF}

http://csg.csail.mit.edu/6.375


Performance analysis

Performance analysis

Register File

rf.wr < rf.rd

wr

  • {d2e.first, d2e.deq} {<, CF} d2e.enq

    • Will pipelined d2e Fifoimprove performance?

  • redirect.enq {<, CF} {redirect.first, redirect.deq}

    • Will bypassed redirect Fifoimprove performance ?

  • sb.remove{<, CF} {sb.search, sb.insert}

    • Why CF Scoreboard is bad for performance?

rd1

rd2

redirect

doExecute < doFetch

doFetch

doExecute

d2e

search

insert

remove

No

Scoreboard

  • Yes, by redirecting the PC a cycle earlier

It negates the advantage of bypass register file by unnecessarily stalling on an instruction whose register updates are visible to fetch

http://csg.csail.mit.edu/6.375


2 stage dh pipeline with proper specification of fifos rf scoreboard

2-Stage-DH pipelinewith proper specification of Fifos, rf, scoreboard

modulemkProc(Proc);

Reg#(Addr) pc <- mkRegU;

RFilerf <- mkBypassRFile;

IMemoryiMem <- mkIMemory;

DMemorydMem <- mkDMemory;

Fifo#(Decode2Execute) d2e <- mkPipelineFifo;

Reg#(Bool) fEpoch <- mkReg(False);

Reg#(Bool) eEpoch <- mkReg(False);

Fifo#(Addr) execRedirect <- mkBypassFifo;

Scoreboard#(1) sb <- mkPipelineScoreboard;

// contains only one slot because Execute

// can contain at most one instruction

ruledoFetch …

ruledoExecute…

Can a destination register name appear more than once in the scoreboard ?

http://csg.csail.mit.edu/6.375


Need to check for waw hazards

Need to check for WAW hazards

  • If multiple instructions in the scoreboard can update the register which the current instruction wants to read, then the current instruction has to read the update for the youngest of those instructions

  • This issue can be avoided if only one instruction in the pipeline is allowed to update a particular register

    • This means we have to stall if an instruction writes to the same destination as a previous instruction (WAW hazard)

    • This may negate some advantage of bypassing

    • A more advanced solution to avoid WAW hazards is registerrenaming

http://csg.csail.mit.edu/6.375


Fetch rule with bypassing

Fetch rule with bypassing

ruledoFetch;letinst = iMem.req(pc);if(execRedirect.notEmpty)beginfEpoch <= !fEpoch; pc <= execRedirect.first;

execRedirect.deq; endelse

beginletppc = nextAddrPredictor(pc); letdInst = decode(inst);

letstall = sb.search1(dInst.src1)|| sb.search2(dInst.src2);

|| sb.search3(dInst.dst);

if(!stall) beginlet rVal1 = rf.rd1(validRegValue(dInst.src1));let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch,

rVal1: rVal1, rVal2: rVal2});

sb.insert(dInst.rDst); pc <= ppc; end

endendrule

http://csg.csail.mit.edu/6.375


Multi stage pipeline with data hazards

Multi-stage pipeline with Data Hazards

http://csg.csail.mit.edu/6.375


3 stage dh pipeline

3-Stage-DH pipeline

eEpoch

fEpoch

nextPC

d2e

e2c

Register File

PC

Execute

Decode

pred

Data

Memory

Inst

Memory

scoreboard

Exec2Commit{Maybe#(RIndx)dst, Data data};

http://csg.csail.mit.edu/6.375


3 stage dh pipeline1

3-Stage-DH pipeline

modulemkProc(Proc);

Reg#(Addr) pc <- mkRegU;

RFilerf <- mkBypassRFile;

IMemoryiMem <- mkIMemory;

DMemorydMem <- mkDMemory;

Fifo#(1, Decode2Execute) d2e <- mkPipelineFifo;

Fifo#(1, Exec2Commit) e2c <- mkPipelineFifo;

Scoreboard#(2)sb <- mkPipelineScoreboard;

// contains two instructions

Reg#(Bool) fEpoch <- mkReg(False);

Reg#(Bool) eEpoch <- mkReg(False);

Fifo#(Addr) execRedirect<- mkBypassFifo;

http://csg.csail.mit.edu/6.375


3 stage dh pipeline dofetch rule

3-Stage-DH pipelinedoFetchrule

ruledoFetch;letinst = iMem.req(pc);if(execRedirect.notEmpty)beginfEpoch <= !fEpoch; pc <= execRedirect.first;

execRedirect.deq; endelse

beginletppc = nextAddrPredictor(pc); letdInst = decode(inst);

letstall = sb.search1(dInst.src1)|| sb.search2(dInst.src2)

|| sb.search3(dInst.dst);;

if(!stall) beginlet rVal1 = rf.rd1(validRegValue(dInst.src1));let rVal2 = rf.rd2(validRegValue(dInst.src2)); d2e.enq(Decode2Execute{pc: pc, ppc: ppc,

dIinst: dInst, epoch: fEpoch,

rVal1: rVal1, rVal2: rVal2});

sb.insert(dInst.rDst); pc <= ppc; end

endendrule

Unchanged from 2-stage DH

http://csg.csail.mit.edu/6.375


3 stage dh pipeline doexecute rule

3-Stage-DH pipelinedoExecuterule

ruledoExecute;let x = d2e.first;

letdInst = x.dInst; let pc = x.pc;

letppc = x.ppc; letepoch = x.epoch;

let rVal1 = x.rVal1; let rVal2 = x.rVal2;

if(epoch == eEpoch) begin

leteInst = exec(dInst, rVal1, rVal2, pc, ppc);if(eInst.iType == Ld) eInst.data <-

dMem.req(MemReq{op:Ld, addr:eInst.addr, data:?});else if (eInst.iType == St) letd <-

dMem.req(MemReq{op:St, addr:eInst.addr, data:eInst.data});if (isValid(eInst.dst))

rf.wr(validRegValue(eInst.dst), eInst.data);if(eInst.mispredict) beginexecRedirect.enq(eInst.addr); eEpoch<= !eEpoch; endend

d2e.deq;sb.remove;endrule

e2c.enq(Exec2Commit{dst:eInst.dst, data:eInst.data});

else e2c.enq(Exec2Commit{dst:Invalid, data:?});

http://csg.csail.mit.edu/6.375


3 stage dh pipeline docommit rule

3-Stage-DH pipelinedoCommit rule

rule doCommit;

let dst = eInst.first.dst;

let data = eInst.first.data;

if(isValid(dst))

rf.wr(tuple2(validValue(dst), data);

e2c.deq;

sb.remove;

endrule

http://csg.csail.mit.edu/6.375


6 stage dh pipeline

6-Stage-DH pipeline

eEpoch

fEpoch

nextPC

d2e

e2c

Register File

PC

Execute

Decode

pred

Data

Memory

Inst

Memory

scoreboard

Fetch

Decode

RegRead

Execute

Mem

WB

http://csg.csail.mit.edu/6.375


Take away

Take away

  • Systematic application of pipelining requires a deep understanding of hazards especially in the presence of concurrent operations

  • Performance issues are subtle

    • For instance, the value of having a bypass network depends on how frequently it is exercised by programs

http://csg.csail.mit.edu/6.375


Module implementations

Module implementations

scoreboard can be implemented using searchable Fifos in a straightforward manner

http://csg.csail.mit.edu/6.375


Normal register file

Normal Register File

modulemkRFile(RFile);

Vector#(32,Reg#(Data)) rfile <- replicateM(mkReg(0));

method Action wr(RIndxrindx, Data data);

if(rindx!=0) rfile[rindx] <= data;

endmethod

method Data rd1(RIndxrindx) = rfile[rindx];

method Data rd2(RIndxrindx) = rfile[rindx];

endmodule

{rd1, rd2} < wr

http://csg.csail.mit.edu/6.375


Bypass register file using ehr

Bypass Register File using EHR

modulemkBypassRFile(RFile);

Vector#(32,Ehr#(2, Data)) rfile <-

replicateM(mkEhr(0));

method Action wr(RIndxrindx, Data data);

if(rindex!==0) (rfile[rindex])[0] <= data;

endmethod

method Data rd1(RIndxrindx) = (rfile[rindx])[1];

method Data rd2(RIndxrindx) = (rfile[rindx])[1];

endmodule

wr < {rd1, rd2}

http://csg.csail.mit.edu/6.375


Bypass register file with external bypassing

Bypass Register Filewith external bypassing

modulemkBypassRFile(BypassRFile);

RFilerf <- mkRFile;

Fifo#(1, Tuple2#(RIndx, Data))

bypass <- mkBypassSFifo;

rule move;

beginrf.wr(bypass.first); bypass.deqend;

endrule

method Action wr(RIndxrindx, Data data);

if(rindex!==0) bypass.enq(tuple2(rindx, data));

endmethod

method Data rd1(RIndxrindx) =

return (!bypass.search1(rindx)) ? rf.rd1(rindx)

: bypass.read1(rindx);

methodData rd2(RIndxrindx) =

return (!bypass.search2(rindx)) ? rf.rd2(rindx)

: bypass.read2(rindx);

endmodule

{rf.rd1, rf.rd2} < rf.wr

wr < {rd1, rd2}

http://csg.csail.mit.edu/6.375


Scoreboard implementation using searchable fifos

Scoreboard implementationusing searchable Fifos

functionBoolisFound

(Maybe#(RIndx) dst, Maybe#(RIndx) src);

returnisValid(dst) && isValid(src) &&

(validValue(dst)==validValue(src));

endfunction

modulemkCFScoreboard(Scoreboard#(size));

SFifo#(size, Maybe#(RIndx), Maybe#(RIndx))

f <- mkCFSFifo(isFound);

method insert = f.enq;

method remove = f.deq;

method search1 = f.search1;

method search2 = f.search2;

endmodule

http://csg.csail.mit.edu/6.375


  • Login