1 / 24

CPE 232 Computer Organization Basic MIPS Architecture – Part II

CPE 232 Computer Organization Basic MIPS Architecture – Part II. Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html. Multicycle Datapath Approach. Let an instruction take more than 1 clock cycle to complete

vern
Download Presentation

CPE 232 Computer Organization Basic MIPS Architecture – Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE 232 Computer Organization Basic MIPS Architecture – Part II Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

  2. Multicycle Datapath Approach • Let an instruction take more than 1 clock cycle to complete • Break up instructions into steps where • each step takes a cycle while trying to • balance the amount of work to be done in each step • restrict each cycle to use only one major functional unit; unless used in parallel • Not every instruction takes thesame number of clock cycles • In addition to faster clock rates, multicycle allows functional units that can be used more than once per instruction as long as they are used on different clock cycles, as a result • Need one memory only– but only one memory access per cycle • Need one ALU/adder only – but only one ALU operation per cycle

  3. IR Address Memory A Read Addr 1 PC Read Data 1 Register File Read Addr 2 Read Data (Instr. or Data) ALUout ALU Write Addr Write Data Read Data 2 B Write Data MDR Multicycle Datapath Approach, con’t • At the end of a cycle • Store values needed in a later cycle by the current instruction in internal registers (A,B, IR, and MDR) . These registers are invisible to the programmer. • All of these registers, except IR, hold data only between a pair of adjacent clock cycles thus they don’t need write control signal. IR– Instruction Register MDR– Memory Data Register A, B – regfile read data registers ALUout– ALU output register • Data used by subsequent instructions are stored in programmer visible registers (i.e., register file, PC, or memory)

  4. Multicycle Datapath Approach, con’t • Similar to single cycle, shared functional units should have multiplexers at their inputs. • There is only one adder that will be used to update PC, perform ALU operations, comparison for beq, memory address computation, and branch address computation.

  5. Multicycle Datapath Approach- Control Signals

  6. MDR The Multicycle Datapath with Control Signals PCWriteCond PCWrite PCSource IorD ALUOp MemRead Control ALUSrcB MemWrite ALUSrcA MemtoReg RegWrite IRWrite RegDst PC[31-28] Instr[31-26] Shift left 2 28 Instr[25-0] 2 0 1 Address Memory 0 PC 0 Read Addr 1 A Read Data 1 IR Register File 1 1 zero Read Addr 2 Read Data (Instr. or Data) 0 ALUout ALU Write Addr Write Data 1 Read Data 2 B 0 1 Write Data 4 1 0 2 Instr[15-0] Sign Extend Shift left 2 3 32 ALU control Instr[5-0]

  7. Multicycle Machine: 1-bit Control Signals

  8. Multicycle Machine: 2-bit Control Signals

  9. IFetch Exec Mem WB Breaking Instruction Execution into Clock Cycles 1.IFetch: Instruction Fetch and Update PC (Same for all instructions) • Operations 1.1 Instruction Fetch: IR <= Memory[PC] 1.2 Update PC : PC <= PC + 4 • Control signals values • IorD = 0 , MemRead = 1 , IRWrite = 1 • ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite =1 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Dec

  10. Breaking Instruction Execution into Clock Cycles 2. Dec: Instruction decode and register fetch (same for all instructions) We don’t know the instruction yet, do non harmful operations • Operations 2.1 read the two source registers rs and rt and place them in registers A and B, respectively. A <= Reg[IR[25:21]] B <= Reg[IR[20:16]] 2.2 Compute the branch address ALUOut <= PC + (sign-extend(IR[15:0]) <<2) • Control signals values • ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00

  11. Breaking Instruction Execution into Clock Cycles 3. Execution, Memory address computation, or branch completion Operation in this cycle depends on instruction type • Operations * if memory reference, compute address ALUOut <= A + sign-extend(IR[15:0]) ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00 * if arithmetic-logic instruction, perform operation ALUOut <= A op B ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10

  12. Breaking Instruction Execution into Clock Cycles 3. Execution, Memory address computation, or branch completion (continued) operation depends on instruction type • Operations * if branch instruction if (A == B) PC<= ALUOut ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01, PCWriteCond = 1, PCSrc = 01 * if jump instruction PC <= {PC[31:28], (IR[25:0],2’b00)} PCSource = 10, PCWrite = 1

  13. Breaking Instruction Execution into Clock Cycles 4. Memory access or R-type completion operation in this cycle depends on instruction type • Operations * if load instruction : read value from memory into MDR MDR <= Memory[ALUOut] MemRead = 1, IorD = 1 * if store instruction: store rt into memory Memory[ALUOut] <= B MemWrite = 1, IorD = 1 * if arithmetic-logical instruction: write ALU result into rd Reg[IR[15:11]] <= ALUOut MemtoReg = 0, RegDst = 1, RegWrite = 1

  14. Breaking Instruction Execution into Clock Cycles 5. Memory read completion Needed for the load instruction only • Operations 5.1 store the loaded value in MDR into rt Reg[IR[20:16]] <= MDR RegWrite = 1, MemtoReg = 1, RegDst = 0

  15. Breaking Instruction Execution into Clock Cycles • In this implementation, not all instructions take 5 cycles

  16. Multicycle Performance • Compute the average CPI for multicycle implementation for SPECINT2000 program which has the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52% ALU. Assume the CPI for each instruction class as given in the previous table • CPI = ΣCPIi x ICi / IC = 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4 = 4.12 • Compare to CPI = 1 for single cycle ?!! • Assume CCM = 1/5 CCS • Then PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS) = 1.21 • Multicycle is also cost-effective in terms of hardware.

  17. Datapath control points Combinational control logic . . . . . . . . . State Reg Inst Opcode Next State Multicycle Control Unit • Multicycle datapath control signals are not determined solely by the bits in the instruction • e.g., op code bits tell what operation the ALU should be doing, but not what instruction cycle is to be done next • Since the instruction is broken into multiple cycles, we need to know what we did in the previous cycle(s) in order to determine the current action • Must use a finite state machine (FSM) for control • a set of states (current state stored in State Register) • next state function (determined by current state and the input) • output function (determined by current state and the input)

  18. The States of the Control Unit • 10 states are required in the FSM control • The sequence of states is determined by five steps of execution and the instruction

  19. The Control Unit • Logic gates • inputs : present state + opcode  #bits = 10 • outputs: control + next state  #bits = 20 • truth table size = 210 rows x 20 columns • ROM • Can be used to implement the truth table above (210 x 20 bit = 20 Kbit) • Each location stores the control signals values and the next state • Each location is addressable by the opcode and next state value

  20. Micro-programmed Control Unit • ROM implementation is vulnerable to bugs and expensive especially for complex. Size increase as the number of instructions (states) increases. • Use Microprogramming • The next state value may not be sequential • Generate the next state outside the storage element • Each state is a microinstruction and the signals are specified symbolically • Use labels for sequencing

  21. Microprogram • The microassembler converts the microcode into actual signal values • The sequencing field is used along with the opcode to determine the next state

  22. Sequencer

  23. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Multicycle Advantages & Disadvantages • Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step • Multicycle implementations allow functional units to be used more than once per instruction as long as they are used on different clock cycles but • Requires additional internal state registers, more muxes, and more complicated (FSM) control

  24. Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste multicycle clock slower than 1/5th of single cycle clock due to state register overhead Multiple Cycle Implementation: IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type Single Cycle vs. Multiple Cycle Timing

More Related