1 / 18

ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design

ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design. Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html. Outline. Review Single-cycle processor design VHDL models of datapath Why single-cycle is not good enough

Download Presentation

ECE 232 Hardware Organization and Design Lecture 14 Multi-cycle Processor Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 232Hardware Organization and DesignLecture 14Multi-cycle Processor Design Maciej Ciesielski www.ecs.umass.edu/ece/labs/vlsicad/ece232/spr2002/index_232.html

  2. Outline • Review • Single-cycle processor design • VHDL models of datapath • Why single-cycle is not good enough • Design of a multi-cycle processor • Multi-cycle Datapath • Multi-cycle Control • Performance analysis

  3. => Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates Recap: Processor Design is a Process • Bottom-up • assemble components in target technology to establish critical timing • Top-down • specify component behavior from high-level requirements • Iterative refinement • establish partial solution, expand and improve

  4. Instruction<31:0> nPC_sel Instruction Fetch Unit <21:25> <16:20> <11:15> <0:15> Clk RegDst 1 0 Mux Rt Rs Rd Imm16 Rs Rt RegWr ALUctr 5 5 5 MemtoReg busA Zero MemWr Rw Ra Rb busW 32 32 32-bit Registers 0 ALU 32 busB 32 0 Clk Mux 32 Mux 32 1 WrEn Adr 1 Data In 32 Data Memory Extender imm16 32 16 Clk ALUSrc ExtOp Recap: A Single Cycle Datapath • Datapath with control signals (underline) Rd Rt

  5. RegDst func ALUSrc ALUctr ALU Control (Local) op 6 Main Control : 3 6 ALUop 3 op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 R-type ori lw sw beq jump RegDst 1 0 0 x x x ALUSrc 0 1 1 1 0 x MemtoReg 0 0 1 x x x RegWrite 1 1 1 0 0 0 MemWrite 0 0 0 1 0 0 Branch 0 0 0 0 1 0 Jump 0 0 0 0 0 1 ExtOp x 0 1 1 x x ALUop (Symbolic) “R-type” Or Add Add xxx Subtract ALUop <2> 1 0 0 0 x 0 ALUop <1> 0 1 0 0 x 0 ALUop <0> 0 0 0 0 x 1 Recap: The “Truth Table” for the Main Control

  6. . . . . . . op<5> op<5> op<5> op<5> op<5> op<5> . . . . . . <0> <0> <0> <0> <0> op<0> R-type ori lw sw beq jump RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0> Recap: PLA Implementation of the Main Control

  7. OPcode Control Logic / Store (PLA, ROM) Decode microinstruction Conditions Instruction Control Points Datapath Recap: Systematic Generation of Control • In our single-cycle processor, each instruction is realized by exactly one control command or “microinstruction” • in general, the controller is a Finite State Machine • microinstruction can also control sequencing (see later)

  8. The Big Picture: Where are We Now? • The Five Classic Components of a Computer • Today’s topic: designing the datapath for the multiple clock cycle datapath Processor Input Control Memory Datapath Output

  9. A B 16 16 Cout Cin 16 DOUT Behavioral models of Datapath Components entity adder16 is generic (ccOut_delay : TIME := 12 ns; adderOut_delay: TIME := 12 ns); port(A, B: in std_logic_vector(15 downto 0); DOUT: out std_logic_vector(15 downto 0); CIN: in bit; COUT: out bit); end adder16; Attention: Altera VHDL simulation software does not support delay architecture behavior of adder32 is begin adder16_process: process(A, B, CIN) variable tmp : std_logic_vector(18 downto 0); variable adder_out : std_logic_vector(31 downto 0); variable carry : bit; begin tmp := addum (addum (A, B), CIN); adder_out := tmp(15 downto 0); carry :=tmp(16); COUT <= carry after ccOut_delay; DOUT <= adder_out after adderOut_delay; end process; end behavior;

  10. Behavioral Specification of Control Logic entity maincontrol is port(opcode: in std_logic_vector d(5 downto 0); equal_cond: in bit; extop out bit; ALUsrc out bit; ALUop out std_logic_vector d(1 downto 0); MEMwr out bit; MemtoReg out bit; RegWr out bit; RegDst out bit; nPC out bit; end maincontrol; • Decode / Control-store address modeled by Case statement • Each arm drives control signals for that operation • just like the microinstruction • either can be symbolic

  11. Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr MemWr RegWr RegDst nPC_sel ALUctr Reg. Wrt Register Fetch ALU Ext Mem Access PC Instruction Fetch Next PC Result Store Data Mem Abstract View of our Single Cycle Processor • Looks like an FSM with PC as state

  12. PC Inst Memory Reg File ALU mux setup setup PC Inst Memory Reg File ALU Data Mem mux mux mux PC Inst Memory Reg File ALU Data Mem mux PC Inst Memory Reg File cmp mux What’s wrong with our CPI=1 processor? Arithmetic & Logical • Long cycle time • All instructions take as much time as the slowest • Real memory is not so nice as our idealized memory • cannot always get the job done in one (short) cycle Load Critical Path Store Branch

  13. Memory Access Time Storage Array • Physics => fast memories are small (large memories are slow) • question: register file vs. memory • => Use a hierarchy of memories selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 cycle 2-3 cycles 20 - 50 cycles

  14. storage element storage element Acyclic Combinational Logic (A) Acyclic Combinational Logic storage element Acyclic Combinational Logic (B) storage element storage element Reducing Cycle Time • Cut combinational dependency graph and insert register / latch • Do same work in two fast cycles, rather than one slow one =>

  15. Control MemRd MemWr MemWr RegWr RegDst nPC_sel ALUctr ALUSrc ExtOp Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem Basic Limits on Cycle Time • Next address logic • PC <= branch ? PC + offset : PC + 4 • Instruction Fetch • InstructionReg <= Mem[PC] • Register Access • A <= R[rs] • ALU operation • R <= A + B

  16. MemRd MemWr MemWr RegWr RegDst nPC_sel ALUSrc ExtOp ALUctr Reg. File Operand Fetch Exec Instruction Fetch Mem Access PC Next PC Result Store Data Mem Partitioning the CPI=1 Datapath • Add registers between smallest steps

  17. Example Multicycle Datapath • Critical Path ? MemToReg RegWr RegDst MemWr MemRd nPC_sel ALUctr ALUSrc ExtOp Equal Reg. File Ext ALU A Reg File R PC IR Next PC B Mem Access M Result Store Data Mem Execute; comp. mem address Instruction Fetch Operand Fetch Memory access

  18. Summary • Disadvantages of the Single Cycle Processor • Long cycle time • Cycle time is too long for all instructions except the Load • Multiple Cycle Processor: • Divide the instructions into smaller steps • Execute each step (instead of the entire instruction) in one cycle • Partition datapath into equal size chunks to minimize cycle time • ~10 levels of logic between latches • Follow same 5-step method for designing “real” processor

More Related