1 / 29

CPU Design Steps

CPU Design Steps. 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the requirements.

Download Presentation

CPU Design Steps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements. 2. Select set of datapath components & establish clock methodology. 3. Assemble datapath meeting the requirements. 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic.

  2. => Instruction Set Architecture processor datapath control Reg. File Mux ALU Reg Mem Decoder Sequencer Cells Gates CPU Design & Implantation Process • Bottom-up Design: • Assemble components in target technology to establish critical timing. • Top-down Design: • Specify component behavior from high-level requirements. • Iterative refinement: • Establish a partial solution, expand and improve.

  3. Instruction<31:0> Inst Memory <0:15> <16:20> <11:15> <21:25> Adr Rs Rt Rd Imm16 RegDst nPC_sel ALUctr MemWr MemtoReg Equal Rt Rd 0 1 Rs Rt 4 RegWr 5 5 5 busA Adder Rw Ra Rb = busW 00 32 32 32-bit Registers ALU 0 32 busB Mux 32 0 PC 32 Mux Mux Clk 32 Adder WrEn Adr 1 Clk 1 Data In Data Memory Extender imm16 PC Ext 32 16 imm16 Clk ExtOp ALUSrc Single Cycle MIPS Datapath: CPI = 1, Long Clock Cycle

  4. Drawback of Single Cycle Processor • Long cycle time. • All instructions must take as much time as the slowest: • Cycle time for load is longer than needed for all other instructions. • Real memory is not as well-behaved as idealized memory • Cannot always complete data access in one (short) cycle.

  5. Main Control op ALU control fun ALUSrc Equal ExtOp MemRd MemWr MemWr RegWr RegDst nPC_sel ALUctr Reg. Wrt ALU Register Fetch Ext Mem Access PC Instruction Fetch Next PC Result Store Data Mem Abstract View of Single Cycle CPU

  6. Arithmetic & Logical PC Inst Memory Reg File ALU setup mux mux Load PC Inst Memory Reg File ALU Data Mem setup mux mux Critical Path Store PC Inst Memory Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux Single Cycle Instruction Timing

  7. Reducing Cycle Time: Multi-Cycle Design • Cut combinational dependency graph by inserting registers / latches. • The same work is done in two or more fast cycles, rather than one slow cycle. storage element storage element Acyclic Combinational Logic (A) Acyclic Combinational Logic => storage element Acyclic Combinational Logic (B) storage element storage element

  8. Clk . . . . . . . . . . . . Clock Cycle Time & Critical Path • Critical path: the slowest path between any two storage devices • Cycle time is a function of the critical path • must be greater than: • Clock-to-Q + Longest Path through the Combination Logic + Setup

  9. Instruction Fetch Next Instruction Instruction Decode Execute Result Store Instruction Processing Cycles } Obtain instruction from program storage Common steps for all instructions Update program counter to address of next instruction Determine instruction type Obtain operands from registers Compute result value or status Store result in register/memory if needed (usually called Write Back).

  10. Partitioning The Single Cycle Datapath Add registers between smallest steps MemWr MemWr MemRd RegWr RegDst ExtOp nPC_sel ALUSrc ALUctr Reg. File Exec Operand Fetch Instruction Fetch Mem Access PC Next PC Result Store Data Mem

  11. MemToReg RegWr RegDst MemRd MemWr nPC_sel ALUctr ExtOp ALUSrc Equal Reg. File Ext ALU A Reg File R PC IR Next PC B Mem Access M Data Mem Instruction Fetch Result Store Operand Fetch Example Multi-cycle Datapath Registers added: IR: Instruction register A, B: Two registers to hold operands read from register file. R: or ALUOut, holds the output of the ALU M: or Memory data register (MDR) to hold data read from data memory

  12. Operations In Each Cycle Logic Immediate IR ¬ Mem[PC] A ¬ R[rs] R ¬ A OR ZeroExt[imm16] R[rt] ¬ R PC ¬ PC + 4 Store IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A + SignEx(Im16) Mem[R] ¬ B PC ¬ PC + 4 Load IR ¬ Mem[PC] A ¬ R[rs] R ¬ A + SignEx(Im16) M ¬ Mem[R] R[rd] ¬ M PC ¬ PC + 4 R-Type IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] R ¬ A + B R[rd] ¬ R PC ¬ PC + 4 Branch IR ¬ Mem[PC] A ¬ R[rs] B ¬ R[rt] If Equal = 1 PC ¬ PC + 4 + (SignExt(imm16) x4) else PC ¬ PC + 4 Instruction Fetch Instruction Decode Execution Memory Write Back

  13. inputs (conditions) Next State Logic Control State Output Logic outputs (control points) Finite State Machine (FSM) Control Model • State specifies control points for Register Transfer. • Transfer occurs upon exiting state (same falling edge). State X Register Transfer Control Points Depends on Input

  14. “instruction fetch” IR ¬ MEM[PC] “decode / operand fetch” A ¬ R[rs] B ¬ R[rt] LW BEQ & Equal R-type ORi SW BEQ & ~Equal PC ¬ PC + SX || 00 PC ¬ PC + 4 Execute R ¬ A fun B R ¬ A or ZX R ¬ A + SX R ¬ A + SX M ¬ MEM[R] MEM[R] ¬ B PC ¬ PC + 4 Memory R[rd] ¬ R PC¬ PC + 4 R[rt] ¬ R PC ¬ PC + 4 R[rt] ¬ M PC ¬ PC + 4 Write-back Control Specification For Multi-cycle CPUFinite State Machine (FSM) To instruction fetch To instruction fetch To instruction fetch

  15. Traditional FSM Controller next state control points state op cond Truth or Transition Table next State control points 11 Equal 6 State 4 To datapath op datapath State

  16. Traditional FSM Controller datapath + state diagram => control • Translate RTN statements into control points. • Assign states. • Implement the controller.

  17. IR ¬ MEM[PC] 0000 “instruction fetch” imem_rd, IRen “decode / operand fetch” A ¬ R[rs] B ¬ R[rt] 0001 Aen, Ben ALUfun, Sen LW BEQ & Equal R-type ORi SW BEQ & ~Equal PC ¬ PC + SX || 00 0010 R ¬ A + SX 1011 R ¬ A + SX 1000 R ¬ A fun B 0100 R ¬ A or ZX 0110 PC ¬ PC + 4 0011 Execute RegDst, RegWr, PCen M ¬ MEM[S] 1001 MEM[S] ¬ B PC ¬ PC + 4 1100 Memory R[rt] ¬ R PC ¬ PC + 4 0111 R[rd] ¬ R PC¬ PC + 4 0101 R[rt] ¬ M PC ¬ PC + 4 1010 Write-back Mapping RTNs To Control Points Examples& State Assignments To instruction fetch state 0000 To instruction fetch state 0000 To instruction fetch state 0000

  18. Detailed Control Specification State Op field Eq Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst 0000 ?????? ? 0001 1 0001 BEQ 0 0011 1 1 0001 BEQ 1 0010 1 1 0001 R-type x 0100 1 1 0001 orI x 0110 1 1 0001 LW x 1000 1 1 0001 SW x 1011 1 1 0010 xxxxxx x 0000 1 1 0011 xxxxxx x 0000 1 0 0100 xxxxxx x 0101 0 1 fun 1 0101 xxxxxx x 0000 1 0 0 1 1 0110 xxxxxx x 0111 0 0 or 1 0111 xxxxxx x 0000 1 0 0 1 0 1000 xxxxxx x 1001 1 0 add 1 1001 xxxxxx x 1010 1 0 0 1010 xxxxxx x 0000 1 0 1 1 0 1011 xxxxxx x 1100 1 0 add 1 1100 xxxxxx x 0000 1 0 0 1 BEQ R ORI LW SW

  19. PCWr PCWrCond PCSrc BrWr Zero ALUSelA Target IorD MemWr IRWr RegDst RegWr 1 32 Mux 32 PC 0 0 Zero 32 Rs Mux Ra 0 32 RAdr 5 32 Rt Mux Rb busA 1 32 ALU Ideal Memory 32 Reg File 5 32 Instruction Reg ALU Out 0 1 4 Rt 0 Rw 32 Mux WrAdr 32 1 32 Rd 32 Din Dout busW busB 32 1 2 32 ALU Control Mux 1 0 3 << 2 Extend Imm 16 32 ALUOp ALUSelB ExtOp MemtoReg Alternative Multiple Cycle Datapath (In Textbook) • Miminizes Hardware: 1 memory, 1 adder

  20. Alternative Multiple Cycle Datapath (In Textbook) • Shared instruction/data memory unit • A single ALU shared among instructions • Shared units require additional or widened multiplexors • Temporary registers to hold data between clock cycles of the instruction: • Additional registers: Instruction Register (IR), • Memory Data Register (MDR), A, B, ALUOut

  21. Logic Immediate IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A OR ZeroExt[imm16] R[rt] ¬ ALUout Store IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Im16) Mem[ALUout] ¬ B Load IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + SignEx(Im16) M ¬ Mem[ALUout] R[rd] ¬ Mem R-Type IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) ALUout ¬ A + B R[rd] ¬ ALUout Branch IR ¬ Mem[PC] PC ¬ PC + 4 A ¬ R[rs] B ¬ R[rt] ALUout ¬ PC + (SignExt(imm16) x4) If Equal = 1 PC ¬ ALUout Instruction Fetch Instruction Decode Execution Memory Write Back Operations In Each Cycle

  22. High-Level View of Finite State Machine Control • First steps are independent of the instruction class • Then a series of sequences that depend on the instruction opcode • Then the control returns to fetch a new instruction. • Each box above represents one or several state.

  23. Instruction Fetch and Decode FSM States

  24. Load/Store Instructions FSM States

  25. R-Type Instructions FSM States

  26. Branch Instruction Single State Jump Instruction Single State

  27. If A = B then PC ¬ ALUout 0010 Finite State Machine (FSM) Specification IR ¬ MEM[PC] PC ¬ PC + 4 “instruction fetch” 0000 A ¬ R[rs] B ¬ R[rt] ALUout¬ PC +SX “decode” 0001 R-type BEQ LW ORi SW ALUout¬ A fun B ALUout¬ A op ZX ALUout¬ A + SX ALUout¬ A + SX Execute 1000 0100 0110 1011 M ¬MEM[ALUout] Memory MEM[ALUout] ¬ B To instruction fetch 1001 1100 R[rd] ¬ ALUout R[rt] ¬ ALUout Write-back R[rt] ¬ M 0101 0111 1010 To instruction fetch To instruction fetch

  28. MIPS Multi-cycle Datapath Performance Evaluation • What is the average CPI? • State diagram gives CPI for each instruction type • Workload below gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI: 4.1 Better than CPI = 5 if all instructions took the same number of clock cycles (5).

More Related