1 / 23

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology. Three-Stage SMIPS. Register File. Epoch. stall?. PC. Execute. Decode. wbr. fr. +4. Data Memory. Inst Memory.

apollo
Download Presentation

Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology http://csg.csail.mit.edu/SNU

  2. Three-Stage SMIPS Register File Epoch stall? PC Execute Decode wbr fr +4 Data Memory Inst Memory The use of magic memories makes this design unrealistic http://csg.csail.mit.edu/SNU

  3. WriteEnable Clock Address MAGIC RAM ReadData WriteData A Simple Memory Model • Reads and writes are always completed in one cycle • a Read can be done any time (i.e. combinational) • If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) In a real DRAM the data will be available several cycles after the address is supplied http://csg.csail.mit.edu/SNU

  4. Big, Slow Memory DRAM Small, Fast Memory SRAM A B CPU RegFile holds frequently used data Memory Hierarchy size: RegFile << SRAM << DRAM why? latency: RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data Î fast memory)  low latency access miss (data Ï fast memory)  long latency access (DRAM) http://csg.csail.mit.edu/SNU

  5. Plan • What do simple caches look like • Incorporating caches in processor pipleline http://csg.csail.mit.edu/SNU

  6. Data Cache - Interface req memReq missReq mReqQ guard Processor DRAM guard cache resp memResp mRespQ hitQ guard guard interfaceDCache; method Action req(MemReq r); method ActionValue#(MemResp) resp; method ActionValue#(MemReq) memReq; method Action memResp(MemResp r);endinterface http://csg.csail.mit.edu/SNU

  7. Direct-Mapped Cache Block number Block offset Tag Index Offset req address t k b V Tag Data Block 2k lines t = HIT Data Word or Byte Strided at size of cache What is a bad reference pattern? http://csg.csail.mit.edu/SNU

  8. Data Cache – code structure modulemkDCache(DCache); ---state declarations; Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); … rule doMiss … endrule; method Action req(MemReq r) … endmethod; method ActionValue#(MemResp) resp … endmethod; method ActionValue#(MemReq) memReq … endmethod;method Action memResp(MemResp r) … endmethod;endmodule http://csg.csail.mit.edu/SNU

  9. Data Cache state declarations Vector#(Rows, Reg#(Bool)) vArray <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Tag)) tagArray <- replicateM(mkRegU); Vector#(Rows, Reg#(Data)) dataArray <- replicateM(mkRegU); FIFOF#(MemReq) mReqQ <- mkUGFIFOF1; FIFOF#(MemResp) mRespQ <- mkUGFIFOF1; PipeReg#(MemReq) hitQ <- mkPipeReg;Reg#(MemReq) missReq <- mkRegU;Reg#(Bit#(2)) status <- mkReg(0); http://csg.csail.mit.edu/SNU

  10. Data Cache processor-side methods method Action req(MemReqreq) if (status==0); Index idx = truncate(req.addr>>2); Tag tag = truncateLSB(req.addr);Bool valid = vArray[idx];BooltagMatch = tagArray[idx]==tag;if(valid && tagMatch && hitQ.notFull) hitQ.enq(req);else beginmissReq <= req; status <= 1; endendmethod method ActionValue#(MemResp) resp if(hitQ.notEmpty && status==0); hitQ.deq; let r = hitQ.first; Index idx = truncate(r.addr>>2);if(r.op==St) dataArray[idx] <= r.data; returndataArray[idx];endmethod http://csg.csail.mit.edu/SNU

  11. Data Cache memory-side methods method ActionValue#(MemReq) memReq if (mReqQ.notEmpty); mReqQ.deq;returnmReqQ.first;endmethod method ActionmemResp(MemResp res) if (mRespQ.notFull); mRespQ.enq(res);endmethod http://csg.csail.mit.edu/SNU

  12. Data CacheRule to process a cache-miss ruledoMiss (status!=0); Index idx = truncate(missReq.addr>>2);if(status==1 && mReqQ.notFull) beginif(vArray[idx]) mReqQ.enq( MemReq{op:St,addr:{tagArray[idx],idx,2'b00}, data:dataArray[idx]}); status <= 2;end if(status==2 && mReqQ.notFull && (!vArray[idx] || mRespQ.notEmpty)) beginif(vArray[idx]) mRespQ.deq;mReqQ.enq(MemReq{op:Ld, addr:missReq.addr, data:?}); status <= 3;end http://csg.csail.mit.edu/SNU

  13. Data CacheRule to process a cache-miss ruledoMiss (status!=0); …if(status==3 && mRespQ.notEmpty && hitQ.notFull) beginlet data = mRespQ.first; mRespQ.deq; Tag tag = truncateLSB(missReq.addr);vArray[idx] <= True;tagArray[idx] <= tag;dataArray[idx] <= data; hitQ.enq(missReq); status <= 0;endendrule http://csg.csail.mit.edu/SNU

  14. Five-Stage SMIPS Register File Epoch stall? PC Execute Decode wbr fr er dr +4 Data Memory Inst Memory In this organization memory can take any amount of time http://csg.csail.mit.edu/SNU

  15. Five-Stage SMIPS state elements modulemkProc(Proc); Reg#(Addr) pc <- mkRegU; Reg#(Bool) epoch <- mkRegU; RFilerf <- mkRFile; Memory mem <- mkTwoPortedMemory; letiMem = mem.iport; letdMem = mem.dport; PipeReg#(FBundle) fr <- mkPipeReg; PipeReg#(DBundle) dr <- mkPipeReg; PipeReg#(EBundle) er <- mkPipeReg; PipeReg#(WBBundle) wbr <- mkPipeReg; ruledoProc; … http://csg.csail.mit.edu/SNU

  16. Five-Stage SMIPSinstruction fetch ruledoProc; BooliAcc = False; if(fr.notFull && iMem.notFull) begin iMem.req(MemReq{op:Ld, addr:pc, data:?}); iAcc = True; fr.enq(FBundle{pc:pc, epoch:epoch}); end enque instruction fetch request if the request can not be enqued then we must remember not to change the pc to pc+4 (iAcc) http://csg.csail.mit.edu/SNU

  17. Five-Stage SMIPS decode if(fr.notEmpty && dr.notFull && iMem.notEmpty) begin letdInst = decode(iMem.resp); dr.enq(DBundle{pc:fr.first.pc, epoch:fr.first.epoch, dInst:dInst}); fr.deq; iMem.deq; end decode the fetched instruction http://csg.csail.mit.edu/SNU

  18. Five-Stage SMIPS code structure for killing wrongly fetched instructions AddrredirPc = ?; BoolredirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin BooleStall = … BoolwbStall = … if(!eStall && !wbStall) begin leteInst = exec(dInst, …); if(memType(eInst.iType)) dMem.req(…); if(eInst.brTaken) beginredirPC… end; er.enq(EBundle{…}; dr.deq; end end else dr.deq; end successful kill http://csg.csail.mit.edu/SNU

  19. Five-Stage SMIPS stall signal AddrredirPc = ?; BoolredirPCvalid = False; if(dr.notEmpty && er.notFull && (!memType(dr.first.dInst.iType) || dMem.notFull)) begin if(fr.first.epoch==epoch) begin letdInst = dr.first.dInst; BooleStall = er.notEmpty && er.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==er.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==er.first.rDst)); BoolwbStall = wbr.notEmpty && wbr.first.rDstValid && ((dInst.rSrc1Valid && dInst.rSrc1==wbr.first.rDst) || (dInst.rSrc2Valid && dInst.rSrc2==wbr.first.rDst)); http://csg.csail.mit.edu/SNU

  20. Five-Stage SMIPS Execute if not stall if(!eStall && !wbStall) begin Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); leteInst = exec(dInst, rVal1, rVal2, dr.first.pc); if(memType(eInst.iType)) dMem.req(MemReq{op:eInst.iType==Ld ? Ld : St, addr:eInst.addr, data:eInst.data}); if(eInst.brTaken) begin redirPC= eInst.addr; redirPCvalid = True; end er.enq(EBundle{iType:eInst.iType, rDst:eInst.rDst, data:eInst.data}); dr.deq; end end else dr.deq; end successful execution http://csg.csail.mit.edu/SNU

  21. Five-Stage SMIPS execute and memory responses to WB if(er.notEmpty && wbr.notFull && (!memType(er.first.iType) || dMem.notEmpty)) begin wbr.enq(WBBundle{iType:er.first.iType, rDst:er.first.rDst, data:er.first.iType==Ld ? dMem.resp : er.first.data}); er.deq; if(dMem.notEmpty) dMem.deq; end http://csg.csail.mit.edu/SNU

  22. Five-Stage SMIPS writeback if(wbr.notEmpty) begin if(regWriteType(wbr.first.iType)) rf.wr(wbr.first.rDst, wbr.first.data); wbr.deq; end pc <= redirPCvalid ? redirPC : iAcc ? pc + 4 : pc; epoch <= redirPCvalid ? !epoch : epoch; endrule endmodule http://csg.csail.mit.edu/SNU

  23. Summary • Lot of room for making errors  verification and testing is essential • Memory systems or dealing with load latencies is a major aspect of computer design • The 5-stage design presented here is different from H&P and is much more realistic next Branch prediction http://csg.csail.mit.edu/SNU

More Related