Chapter Five Part 4: Implementing Multicycle Control

Chapter FivePart 4: Implementing Multicycle Control

Implementing the Control • Value of control signals is dependent upon: • what instruction is being executed • which step is being performed • Use the information we’ve accumulated to specify a finite state machine • specify the finite state machine graphically, or • use microprogramming • Implementation can be derived from specification

Defining control • Finite State Machines • set of states • next-state function: maps current state and inputs to a new state • each state specifies set of outputs that are asserted in that state • assume that any signal that is not asserted is deasserted • must always specify the control signal going to a mux

Defining control • finite state control corresponds to the 5 steps. • each state in the FSM takes 1 clock cycle • Since first two steps are common for all instructions, the first two states in every FSM are identical • After executing the last step for an instruction class, FSM returns to the initial state to begin fetching the next instruction • High level view of a FSM control. See next slide.

Defining control

Defining control • First two states of the FSM for all instructions. See figure 5.37. • First state is 0 • Signals asserted in each state shown within the circle representing the state. • Arcs between states labeled with conditions that select a specific next state. • After state 1 next state depends on the instruction type. • 4 arcs exiting from state 1 representing the 4 instruction types • This branching based on instruction type is called decoding

Defining control

Graphical Specification of FSM How many state bits will we need? Goto Truth Table

Simple Questions • How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not taken add $t5, $t2, $t3 sw $t5, 8($t3)Label: ... • What is going on during the 8th cycle of execution? • In what cycle does the actual addition of $t2 and $t3 takes place?

Multi Cycle: performance • The clock cycle for each instruction type is: • Loads: 5 • Stores: 4 • R-format instructions: 4 • Branches: 3 • Jumps: 3 • Assume the instruction mix is: • 22% loads • 11% stores • 49% R-format • 16% branches • 2% jumps

Multi Cycle: performance • Average cycles per instruction (CPI): CPU clock cycles = instruction count i X CPIi Instruction count Instruction Count = instruction count i Instruction Count • The ratio of instruction count i/ instruction count is simply the instruction frequency for the instruction class I (i.e., the frequency). • Thus the answer is just the sum of the frequencies times their corresponding CPI: CPI = 5 x 22% + 4 x 11% + 4 x 49% + 3 x 16% + 3 x 2% = 4.04 • Worst case CPI (all instructions take 5 clock cycles) is 5. X CPIi

Finite State Machine for Control • FSM implementation. • temporary register holds the current state • combinational logic block determines datapath signals and next state.

Finite State Machine for Control

Finite State Machine for Control • Expanded view of FSM implementation: see next slide • 10 states, need 4 bits to encode the state (S3, S2, S1, S0). • Current state number is stored in a state register. • Example: state 0110 means ~S3 S2 S1 ~S0 • Control unit has outputs that specify the next state. NS3, NS2, NS1, NS0

Finite State Machine for Control

Combinational Logic • Two parts: • determining control signals...depends only on the state bits • determining next state...depends on the current state and opcode • The control function can be expressed as a logic equation for each output. • two ways to implement • complete truth table • a two-level logic structure that allows a sparse encoding of the truth table

Combinational Logic • complete truth table implementation on next slide • split control function into two part • next-state outputs; depend on all inputs • control signal outputs: depend only on current-state bits • Logic Equations: see table on next slide • Column 2 contains the states in which the control signal is active. • Get this information from the FSM • Third column used to help determine next state. • When a next state is active the bits NS[3-0] are set to the corresponding binary value. • The bits NS[3-0] are active in multiple states, so the equation for a bit is the OR of the states in which it is active. • Must also AND with the appropriate opcode.

Combinational Logic Goto FSM

Combinational Logic

Creating truth tables: next state • From the preceding tables, we can create truth tables for each next state bit. • The tables need only list the states in which the bit is active.

Creating truth tables: next state Truth table for the NS0 output which is active when the next State is 1, 3, 5, 7, or 9. This situation occurs when the current State is one of 0, 2, 6, or 1

Deriving equations: low-order next-state bit NS0 • NS0 active in NextState1, NextState3, NextState5, NextState7, NextState9. Entries for these states in Figure C.8 supply conditions when these next-state values are active. NextState1 = State0 = NextState3 = State2 AND (Op[5-0]=‘lw’) = NextState5 = State2 AND (Op[5-0] = ‘sw’) = NextState7 = State6 = NextState9 = State1 AND (Op[5-0] = ‘jmp’) =

Creating truth tables: next state Truth table for the NS2 output which is active when the next State is 4, 5, 6, or 7. This situation occurs when the current State is one of 1, 2, 3, or 6

Creating truth tables: contol signals • Same process as next state bits • Do not need to consider the opcode, however • First derive a truth table for each control signal • Truth table need only list states for which the control signal is asserted • Each signal’s truth table represents 64 entries (all combinations of the 6 bits of the opcode: these are all don’t cares).

Combinational Logic: states when the control signal is active PCWrite IorD MemRead ALUSrcB1 Etc, etc, etc…. ALUSrc0

Deriving equations: control signals • PCWrite = ~S3~S2~S1~S0 + S3~S2~S1S0 • Etc.

m n ROM Implementation • ROM = "Read Only Memory" • values of memory locations are fixed ahead of time • A ROM can be used to implement a truth table • if the address is m-bits, we can address 2m entries in the ROM. • our outputs are the bits of data that the address points to. • m is the "height", and n is the "width" 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1

ROM Implementation • How many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses) • How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputs • ROM is 210 x 20 = 20K bits (and a rather unusual size) • Rather wasteful, since for lots of the entries, the outputs are the same — i.e., opcode is often ignored

ROM implementation • Break up the table into two parts — 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total: 4.3K bits of ROM

ROM vs PLA • PLA is much smaller — can share product terms — only need entries that produce an active output — can take into account don't cares • Size is (#inputs ´ #product-terms) + (#outputs ´ #product-terms) For this example = (10x17)+(20x17) = 460 PLA cells • PLA cells usually about the size of a ROM cell (slightly bigger)

PLA Implementation • Top part is the AND plane • Each dot represents a AND • Bottom part is the OR plane • each dot represents a OR • Example: • First vertical line, AND plane: • First horizontal line, OR plane • (represents PCWrite): ~S3~S2~S1~S0 + S3~S2~S1S0

Another Implementation Style: using a sequencer • If there are many states and if many states are sequential, it is more efficient to use a counter to supply the sequential next state. • Eliminates the need to encode the next-state function explicitly in the control unit • Use an adder instead • See next slide • The incremented state is always the state that follows in numerical order

Another Implementation Style: using a sequencer • Complex instructions: the "next state" is often current state + 1

Another Implementation Style: using a sequencer • sometimes must “branch” • example: after state 1 there are 4 possible next states. • Each control word must include opcode lines that will determine how the next state is chosen. • implementing the control output signal portionlooks exactly like the previous truth table

Another Implementation Style: using a sequencer • implementing the next state function • control unit logic must only specify how to choose the state when it is not the sequentially following state. • Method 1: the control unit explicitly encodes the next-state function. • CU need only set the next-state lines when the designated next state is not the state that the counter indicates. • If next-state function is mostly empty, resulting CU will have much empty or redundant space.

Another Implementation Style: using a sequencer • Method 2: use separate external logic to specify the next state when counter does not specify the state. • Most often used • nonsequential next state will come from an external table • The CU specifies when this occurs and how to find the next state.

Another Implementation Style: using a sequencer • Method 2 (continued). Two kinds of “branching” 1. Dispatch: jump to one of a number of states based on the opcode portion of the IR. 2. Branch to state 0: initiates the execution of the next MIPS instruction

Another Implementation Style: using a sequencer • Method 2 (continued). Two kinds of “branching” 1. Dispatch: jump to one of a number of states based on the opcode portion of the IR. • implemented with a set of special ROMs included as part of the address selection logic. • an additional set of control outputs, AddrCtl indicates when a dispatch should be done • From FSM see that there are 2 states in which we do a branch based on a portion of the opcode (see FSM on next slide) • Thus need 2 small dispatch tables. • Or could use single dispatch table and use the control bits that select the table as address bits that choose which portion of the dispatch table to select the address from.

Graphical Specification of FSM How many state bits will we need? Goto Truth Table

Another Implementation Style: using a sequencer Method 2 (cont.) Dispatch (cont.) • 4 ways to choose next state • 3 types of branches • incrementing current-state number • encode in 2 bits: AddrCtl valueAction 0 set state to 0 1 Dispatch with ROM 1 2 Dispatch with ROM 2 3 Use the incremented state

Details

Using a sequencer • Entire control ROM: see figure C19. • 10 control words, each 18 bits wide. Total of 180 bits. • 2 dispatch tables are 4 bits wide. Each has 64 entries. Total of 512 additional bits. • Total: 692 bits. • Implementation with 2 ROMs with next-state function encoded in the ROMs: 4.3Kbits. • Could encode dispatch tables more efficiently with two small PLAs. • Could also replace the control ROM with a PLA.

Using a sequencer: ROM Column 2: datapath control bits (same as derived earlier) Column 3: address-control bits

Optimizing the Control Implementation • Use logic minimization (techniques like K-maps) • Use state minimization. Assign state numbers such that the resulting logic equations contain more redundancy. • Example of state minimization: • In the FSM the signal RegWrite is active only in states 4 and 7. • If we encoded those states as 8 and 9 could rewrite the equation for RegWrite as a test on bit S3 (which is only used in states 8 and 9). • can then combine the two truth table entries in part o of figure C.9 into a single entry. Eliminate one term in the CU. • Can do state minimization in an implementation with an explicit program counter. Are more restricted because must keep states sequential.

Chapter Five Part 4: Implementing Multicycle Control