1 / 32

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Topics. Simultaneous enq & deq in a FIFO The RWire solution Dead cycle elimination in the IP circular pipeline code

sirius
Download Presentation

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http://csg.csail.mit.edu/6.375/

  2. Topics • Simultaneous enq & deq in a FIFO • The RWire solution • Dead cycle elimination in the IP circular pipeline code • Two-stage processor pipeline • Value forwarding to reduce bubbles http://csg.csail.mit.edu/6.375/

  3. Implicit guards (conditions) • Rule rule <name> (<guard>); <action>; endrule where <action> ::= r <= <exp> | m.g(<exp>) |if (<exp>) <actions> endif m.gB(<exp>) when m.gG make implicit guards explicit http://csg.csail.mit.edu/6.375/

  4. Guards vs If’s • A guard on one action of a parallel group of actions affects every action within the group (a1 when p1); (a2 when p2) ==> (a1; a2) when (p1 && p2) • A condition of a Conditional action only affects the actions within the scope of the conditional action (if (p1) a1); a2 p1 has no effect on a2 ... • Mixing ifs and whens (if (p) (a1 when q)) ; a2  ((if (p) a1); a2) when (p&q | !p) http://csg.csail.mit.edu/6.375/

  5. Example: making guards explicit rule recirculate (True); if (p) fifo.enq(8); r <= 7; endrule rule recirculate ((p && fifo.engG) || !p); if (p) fifo.enqB(8); r <= 7; endrule http://csg.csail.mit.edu/6.375/

  6. A problem ... (from the last lecture) rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq(); endrule The fifo needs to be able to do enq and deq simultaneously for this rule to make sense http://csg.csail.mit.edu/6.375/

  7. n enab enq rdy not full enab rdy deq not empty FIFO module One Element FIFO enq and deqcannot even be enabled together much less fire concurrently! module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod endmodule The functionality we want is as if deq “happens” before enq; if deq does not happen then enq behaves normally http://csg.csail.mit.edu/6.375/

  8. RWire to rescue interface RWire#(type t); method Action wset(t x); method Maybe#(t) wget(); endinterface Like a register in that you can read and write it but unlike a register - read happens after write - data disappears in the next cycle http://csg.csail.mit.edu/6.375/

  9. enab enq rdy not full enab rdy deq not empty FIFO module One Element “Loopy” FIFO module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); method Action enq(t x) if (!full || isValid (deqEN.wget())); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod endmodule !full or http://csg.csail.mit.edu/6.375/

  10. Problem solved! LFIFO fifo <- mkLFIFO; // use a loopy fifo rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq(); endrule What if fifo is empty? http://csg.csail.mit.edu/6.375/

  11. cbuf inQ enter? done? RAM fifo The Dead Cycle Problem rule enter (True); Token tok <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ext(ip[31:16])); fifo.enq(tuple2(ip[15:0], tok)); inQ.deq(); endrule rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq(); endrule Can a new request enter the system simultaneously with an old one leaving? http://csg.csail.mit.edu/6.375/

  12. Scheduling conflicting rules • When two rules conflict on a shared resource, they cannot both execute in the same clock • The compiler produces logic that ensures that, when both rules are applicable, only one will fire • Which one? source annotations (* descending_urgency = “recirculateh, enter” *) http://csg.csail.mit.edu/6.375/

  13. inQ enter? done? RAM fifo A slightly simpler example rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule rule recirculate (True); TableEntry p = ram.peek(); ram.deq(); IP rip = fifo.first(); if (isLeaf(p)) outQ.enq(p); else begin fifo.enq(rip << 8); ram.req(p + rip[15:8]); end fifo.deq(); endrule In general these two rules conflict but when isLeaf(p) is true there is no apparent conflict! http://csg.csail.mit.edu/6.375/

  14. Rule Spliting rule fooT (p); r1 <= 5; endrule rule fooF (!p); r2 <= 7; endrule rule foo (True); if (p) r1 <= 5; else r2 <= 7; endrule  rule fooT and fooF can be scheduled independently with some other rule http://csg.csail.mit.edu/6.375/

  15. Spliting the recirculate rule rule recirculate (!isLeaf(ram.peek())); IP rip = fifo.first(); fifo.enq(rip << 8); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); outQ.enq(ram.peek()); fifo.deq(); ram.deq(); endrule rule enter (True); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(ip[15:0]); inQ.deq(); endrule Now rules enter and exit can be scheduled simultaneously http://csg.csail.mit.edu/6.375/

  16. makes life difficult Sometimes rule splitting is not possible rule recirculate (True); TableEntry p <- ram.resp(); match {.rip, .tok} = fifo.first(); if (isLeaf(p)) cbuf.put(tok, p); else begin fifo.enq(tuple2(rip << 8, tok)); ram.req(p+signExtend(rip[15:8])); end fifo.deq(); endrule You will have to resort to interface changes and/or the use of RWires http://csg.csail.mit.edu/6.375/

  17. cbuf inQ enter? done? RAM fifo Packaging a module:Turning a rule into a method rule enter (True); Token t <- cbuf.getToken(); IP ip = inQ.first(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t)); inQ.deq(); endrule method Action enter (IP ip); Token t <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(ip[15:0], t)); endmethod http://csg.csail.mit.edu/6.375/

  18. 5-minute break to stretch you legs Processor with a two-stage pipeline http://csg.csail.mit.edu/6.375/

  19. execute decode write- back memory Processor Pipelines and FIFOs rf pc fetch dMem iMem CPU http://csg.csail.mit.edu/6.375/

  20. m n SFIFO (glue between stages) interface SFIFO#(type t, type tr); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Bool find(tr); // search FIFO endinterface enab enq rdy not full n = # of bits needed to represent the values of type “t“ m = # of bits needed to represent the values of type “tr" enab SFIFO module rdy deq not empty n first rdy not empty enab more on searchable FIFOs later clear bool find http://csg.csail.mit.edu/6.375/

  21. rf pc CPU fetch & decode execute bu Two-Stage Pipeline module mkCPU#(Mem iMem, Mem dMem)(Empty); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull(); SFIFO#(InstTemplate, RName) bu <- mkSFifo(findf); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; InstTemplate it = bu.first(); rule fetch_decode ... endmodule http://csg.csail.mit.edu/6.375/

  22. Instructions & Templates typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName cond; RName addr} Bz; struct {RName dst; RName addr} Load; struct {RName value; RName addr} Store; } Instr deriving(Bits, Eq); typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value data; Daddress addr} EStore; } InstTemplate deriving(Eq, Bits); typedef Bit#(32) Iaddress; typedef Bit#(32) Daddress; typedef Bit#(32) Value; http://csg.csail.mit.edu/6.375/

  23. rf pc CPU fetch & decode execute bu Rules for Add rule decodeAdd(instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); pc <= predIa; endrule implicit check: implicit check: bu notfull rule executeAdd(it matches EAdd{dst:.rd,op1:.va,op2:.vb}) rf.upd(rd, va + vb); bu.deq(); endrule bu notempty http://csg.csail.mit.edu/6.375/

  24. rf pc CPU fetch & decode execute bu rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb}) bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa; endrule Fetch & Decode Rule: Reexamined Wrong! Because instructions in bu may be modifying ra or rb stall ! http://csg.csail.mit.edu/6.375/

  25. rf pc CPU fetch & decode execute bu Fetch & Decode Rule: corrected rule decodeAdd (instr matches Add{dst:.rd,src1:.ra,src2:.rb} bu.enq (EAdd{dst:rd, op1:rf[ra], op2:rf[rb]}); pc <= predIa; endrule &&& !bu.find(ra) &&& !bu.find(rb)) http://csg.csail.mit.edu/6.375/

  26. rf pc CPU fetch & decode execute bu Rules for Branch rule-atomicity ensures that pc update, and discard of pre- fetched instrs in bu, are done consistently rule decodeBz(instr matches Bz{cond:.rc,addr:.addr}) &&& !bu.find(rc) &&& !bu.find(addr)); bu.enq (EBz{cond:rf[rc],addr:rf[addr]}); pc <= predIa; endrule rule bzTaken(it matches EBz{cond:.vc,addr:.va}) &&& (vc==0)); pc <= va; bu.clear(); endrule rule bzNotTaken (it matches EBz{cond:.vc,addr:.va}) &&& (vc != 0)); bu.deq; endrule http://csg.csail.mit.edu/6.375/

  27. The Stall Signal Bool stall = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb));tagged Bz {cond:.rc,addr:.addr}: return (bu.find(rc) || bu.find(addr));tagged Load {dst:.rd,addr:.addr}: return (bu.find(addr));tagged Store {value:.v,addr:.addr}: return (bu.find(v)) || bu.find(addr)); endcase; Need to extend the fifo interface with the “find” method where “find” searches the fifo using the findf function http://csg.csail.mit.edu/6.375/

  28. Parameterization: The Stall Function function Bool stallfunc (Instr instr, SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb));tagged Bz {cond:.rc,addr:.addr}: return (bu.find(rc) || bu.find(addr));tagged Load {dst:.rd,addr:.addr}: return (bu.find(addr));tagged Store {value:.v,addr:.addr}: return (bu.find(v)) || bu.find(addr)); endcase endfunction We need to include the following call in the mkCPU module Bool stall = stallfunc(instr, bu); no extra gates! http://csg.csail.mit.edu/6.375/

  29. The findf function function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.ra,op2:.rb}: return (r == rd); tagged EBz {cond:.c,addr:.a}: return (False);tagged ELoad{dst:.rd,addr:.a}: return (r == rd); tagged EStore{value:.v,addr:.a}: return (False); endcase endfunction SFIFO#(InstrTemplate, RName) bu <- mkSFifo(findf); mkSFifo can be parameterized by the search function! no extra gates! http://csg.csail.mit.edu/6.375/

  30. Fetch & Decode Rule rule fetch_and_decode(!stall);case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: bu.enq(EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}); tagged Bz {cond:.rc,addr:.addr}: bu.enq(EBz{cond:rf[rc],addr:rf[addr]}); tagged Load {dst:.rd,addr:.addr}: bu.enq(ELoad{dst:rd,addr:rf[addr]});tagged Store{value:.v,addr:.addr}: bu.enq(EStore{value:rf[v],addr:rf[addr]}); endcase pc<= predIa; endrule http://csg.csail.mit.edu/6.375/

  31. Fetch & Decode Ruleanother style InstrTemplate newIt = case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {cond:.rc,addr:.addr}: return EBz{cond:rf[rc],addr:rf[addr]}; tagged Load {dst:.rd,addr:.addr}: return ELoad{dst:rd,addr:rf[addr]};tagged Store{value:.v,addr:.addr}: return EStore{value:rf[v],addr:rf[addr]}; endcase; rule fetch_and_decode (!stall); bu.enq(newIt); pc <= predIa; endrule Conceptually cleaner; hides unnecessary details http://csg.csail.mit.edu/6.375/

  32. Execute Rule rule execute (True); case (it) matchestagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq();tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule Next time simultaneous execution of fetch and execute rules http://csg.csail.mit.edu/6.375/

More Related