Implementing for Correct Concurrency Nirav Dave Computer Science & Artificial Intelligence Lab

Implementing for Correct Concurrency Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

Dealing with Conflicts • When do conflicts arise? • How do we Analyze them? • How do we fix them? • How do we make sure we’re okay? http://csg.csail.mit.edu/6.375

SFIFO m V n interface SFIFO#(type t, type tr, type v); method Action enq(t); // enqueue an item method Action deq(); // remove oldest entry method t first(); // inspect oldest item method Action clear(); // make FIFO empty method Maybe#(v) find(tr); // search FIFO endinterface n = # of bits needed to represent the values of type “t“ m = # of bits needed to represent the values of type “tr“ v = # of bits needed to represent the values of type “v“ enab enq rdy not full enab SFIFO module rdy deq not empty n first rdy not empty enab clear bool find http://csg.csail.mit.edu/6.375

Processor Example execute decode write- back memory rf pc fetch dMem iMem CPU 5 – stage Processor. 1 element FIFOs in between stages Let’s add bypassing http://csg.csail.mit.edu/6.375

Decode Rule Decode is also correct correct anytime it’s allowed to execute rule decode (!newStallFunc(instr, d2eQ, e2mQ, m2wQ)); let fetInst = f2dQ.first(); f2dQ.deq(); match {.ra, .rb} = getRARB(fetInst); let va0 = rf[ra]; let va1 = fromMaybe (m2wQ.find(ra), va0); let va2 = fromMaybe (e2mQ.find(ra), va1); let vb0 = rf[rb]; let vb1 = fromMaybe (m2wQ.find(rb), vb0); let vb2 = fromMaybe (e2mQ.find(rb), vb1); let newInst = case (fetInst) match Add: return (DAdd .va2 .vb2); … endcase; d2eQ.enq(newInst); endrule Search through each place in design When do we want it to execute? http://csg.csail.mit.edu/6.375

some insight intoConcurrent rule firing rule steps Ri Rj Rk Rules Rj HW Rk clocks Ri http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges

Parallel executionreorders reads and writes Rules rule steps reads writes reads writes reads writes reads writes reads writes reads writes reads writes clocks HW http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks

Correctness rule steps Ri Rj Rk Rules Rj HW Rk clocks Ri http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution Consequence: the HW can never reach a state unexpected in the rule semantics

Upshot • Given the concurrency of method/rules in a system we can determine viable schedules • Some variation do to applicability • BUT we know what schedule we want (mostly) • We should be able to back propagate results to submodules http://csg.csail.mit.edu/6.375

Determining Concurrency Properties http://csg.csail.mit.edu/6.375

Processor: Concurrencies execute decode write- back memory rf pc fetch dMem iMem CPU http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 In-order: F < D < E < M < W Pipelined W < M < E < D < F

execute decode rf pc fetch write- back memory imem dMem CPU Concurrency requirements for Full Pipelining – Reg File • In-Order RF: • (D calls sub) < (W calls upd) • Pipelined RF: • (W calls upd) < (D calls sub) http://csg.csail.mit.edu/6.375

Concurrency requirements for Full Pipelining – FIFOs In-Order FIFOs: 1. m2wQ, e2mQ: find < enq < first < deq 2. d2eQ: find < enq < first < deq, clear Pipeline FIFOs: 3. m2wQ, e2mQ : first < deq < enq < find 4. d2eQ : first < deq < find < enq execute decode rf pc fetch write- back memory imem dMem CPU http://csg.csail.mit.edu/6.375

Constructing Appropriately concurrent submodules http://csg.csail.mit.edu/6.375

From Analysis to Design • We need to create modules which behave as needed • Construct modules using “unsafe” primitives to have “safe” behaviors • Three major concepts: • Use primitives which remove “false” concurrency orderings (e.g. ConfigRegs vs. Regs) • Add RWires for forwarding values intra-cycle • Reason carefully to assure that execution appears “atomic” http://csg.csail.mit.edu/6.375

ConfigReg and RWire • mkConfigReg is a Reg without this restriction • mkReg requires that read < write • Allows us to read stale values (dangerous) • RWire is a “wire” • wset :: a -> Action writes • wget :: Maybe#(a) returns written value if read happened. • wset happens before wget each cycle http://csg.csail.mit.edu/6.375

Let’s implement some modules http://csg.csail.mit.edu/6.375

Processor Redux execute decode write- back memory rf pc fetch dMem iMem CPU http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 In-order: F < D < E < M < W Pipelined W < M < E < D < F

Concurrency: RegFile • The standard library regfile is implemented using with concurrency (sub < upd) • This handles the in-order case • We need to build a RegisterFile for the pipelined case http://csg.csail.mit.edu/6.375

BypassRegFile module mkBypassRegFile(RegFile#(a,d)) #(d l, d h) provisos#(Bits(a,asz), Bits#(d,dsz)); RegFile#(a,d) rfInt <- mkRegFileWCF(l,h); RWire#(Tuple2#(a,d)) curWrite <- mkRWire(); method Action upd(a x, d v); rfInternal.upd(x,v); curWrite.wset(tuple2(x,v)); endmethod method d sub(a x); case (curWrite.wget()) matches tagged Valid {.wa, .wd} &&& wa == a: return wd; default: return rfInternal.sub(a); endcase endmethod endmodule http://csg.csail.mit.edu/6.375

Processor Redux execute decode write- back memory rf pc fetch dMem iMem CPU http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375 In-order: F < D < E < M < W Pipelined W < M < E < D < F

One Element SFIFO (Naïve) module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full ? findf(r, data): Nothing); endmethod endmodule Concurrency: find < first < (enq C deq) http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

One Element SFIFO (In-Order d2eQ #1) find < first < enq < deq module mkSFIFO1#(function Maybe#(v) findf(tr r, t x)) (SFIFO#(t,tr,v)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full); return data; endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

One Element SFIFO (In-Order e2mQ, m2wQ #2) find < enq < first < deq module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(t) enqv <- mkRWire(); method Action enq(t x) if (!full); full <= True; data <= x; enqv.wset(x); endmethod method Action deq() if (full || isValid(enqv.wget())); full <= False; endmethod method t first() if (full || isValid(enqv.wget())); return (fromMaybe(enqv.wget(), data)); endmethod method Maybe#(v) find(tr r); return full ? findf(r,data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

One Element Searchable SFIFO (Pipelined #3) first < deq < enq < find module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); RWire#(void) enqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; enqw.wset(x); endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r,data) : isValid(enqw.wget()) ? findf(r, fromMaybe(enqw.wget(),?)): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

One Element Searchable SFIFO (Pipelined #4) first < deq < find < enq module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkConfigRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqw <- mkRWire(); method Action enq(t x) if (!full || isValid(deqw.wget()); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqw.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!isValid(deqw.wget()) ? findf(r, data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

One Element Searchable SFIFO (Pipelined #4) first < deq < find < enq module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkConfigReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full|| deqp); full <= True; data <= x; 12endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Maybe#(v) find(tr r); return (full&&!deqp) ? findf(r, data): Nothing; endmethod endmodule http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

Up-Down Counter http://csg.csail.mit.edu/6.375

Counter Module Interface interface Counter method Action up(); method Action down(); method Bit#(32) _read(); endinterface Concurrency: up and down should be independent http://csg.csail.mit.edu/6.375

Naïve Counter Example module mkCounter(Counter); Reg#(int) r <- mkReg(); method int _read(); return r; endmethod method Action up(); r <= r + 1; endmethod method Action down(); c <= r – 1; endmethod endmodule http://csg.csail.mit.edu/6.375

Counter Example module mkCounter(Counter); Reg#(int) r <- mkConfigReg(); RWire#(void) upW <- mkRWire(); RWire#(void) downW <- mkRWire(); method int _read(); return r; endmethod method Action up(); upW.wset(); endmethod method Action down(); downW.wset(); endmethod rule updateR(True); r <= r + (isValid( upW.wget()) ? 1 : 0) - (isValid(downW.wget()) ? 1 : 0); endrule endmodule What if want to call up then _read? http://csg.csail.mit.edu/6.375

Completion Buffer http://csg.csail.mit.edu/6.375

Completion buffer: Interface cbuf getToken getResult put (result & token) interface CBuffer#(type t); methodActionValue#(Token) getToken(); methodAction put(Token tok, t d); methodActionValue#(t) getResult(); endinterface typedef Bit#(TLog#(n)) TokenN#(numeric type n); typedef TokenN#(16) Token; http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

IP-Lookup module with the completion buffer enter getResult getToken cbuf yes done? RAM no fifo module mkIPLookup(IPLookup); rule recirculate… ; rule exit …; method Action enter (IP ip); Token tok <- cbuf.getToken(); ram.req(ip[31:16]); fifo.enq(tuple2(tok,ip[15:0])); endmethod method ActionValue#(Msg) getResult(); let result <- cbuf.getResult(); return result; endmethod endmodule for enter and getResult to execute simultaneously, cbuf.getToken and cbuf.getResult must execute simultaneously http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

IP Lookup rules with completion buffer rule recirculate (!isLeaf(ram.peek())); match{.tok,.rip} = fifo.first(); fifo.enq(tuple2(tok,(rip << 8))); ram.req(ram.peek() + rip[15:8]); fifo.deq(); ram.deq(); endrule rule exit (isLeaf(ram.peek())); cbuf.put(ram.peek()); fifo.deq(); ram.deq(); endrule For rule exit and method enter to execute simultaneously, cbuf.put and cbuf.getToken must execute simultaneously  For no dead cycles cbuf.getToken and cbuf.put and cbuf.getResult must be able to execute simultaneously http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

Naïve Completion Buffer module mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkReg(0); Reg#(Token) wrP <- mkReg(0); Reg#(Token) cnt <- mkReg(0); method ActionValue#(Token) getToken() if (cnt < Max); cnt <= cnt + 1; rdP <= nextPointer(rdP); valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt <= cnt -1; wrP <= nextPointer(wrP); return (data.sub(wrP)); endmethod endmodule http://csg.csail.mit.edu/6.375

Completion buffer: Interface Requirements cbuf getToken getResult put (result & token) Rules and methods concurrency requirement to avoid dead-cycles: exit < getResult < enter  cbuf methods’ concurency: cbuf.getResult < cbuf.put < cbuf.getToken http://csg.csail.mit.edu/6.375 http://csg.csail.mit.edu/6.375

Completion Buffer getResult < put < getToken module mkCBuffer(CBuffer#(a)); Vector#(Reg#(Bool)) valids <- replicateM(mkReg(False)); RegFile#(Token, t) data <- mkRegFile(); Reg#(Token) rdP <- mkConfigReg(0); Reg#(Token) wrP <- mkConfigReg(0); Counter cnt <- mkCounter(); method ActionValue#(Token) getToken() if (cnt < Max); cnt.up(); rdP <= rdP + 1; valids[rdP] <= False; return rdp; endmethod method Action put(Token tok, t d); valids[tok] <= True; data.upd(tok, d); endmethod method ActionValue#(t) getResult() if (valids[wrP]) cnt.down(); wrP <= wrP + 1; return (data.sub(wrP)); endmethod endmodule Is valids okay? Is the ordering correct? http://csg.csail.mit.edu/6.375

Implementing for Correct Concurrency Nirav Dave Computer Science & Artificial Intelligence Lab