chapter five part 2 control n.
Skip this Video
Download Presentation
Chapter Five Part 2: Control

Loading in 2 Seconds...

play fullscreen
1 / 60

Chapter Five Part 2: Control - PowerPoint PPT Presentation

  • Uploaded on

Chapter Five Part 2: Control. Control. Selecting the operations to perform (ALU, read/write, etc.) Controlling the flow of data (multiplexor inputs) Information comes from the 32 bits of the instruction

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Chapter Five Part 2: Control' - mada

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
  • Selecting the operations to perform (ALU, read/write, etc.)
  • Controlling the flow of data (multiplexor inputs)
  • Information comes from the 32 bits of the instruction
  • Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct
  • ALU's operation based on instruction type and function code
overview of mips
Overview of MIPS
  • three instruction formats

op rs rt rd shamt funct




op rs rt 16 bit address

op 26 bit address

machine language
Machine Language
  • R-Instruction Format:000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct

op: op code

rs: the first register source operand

rt: the second register source operand

rd: the register destination operand; gets result

shamt: shift amount

(see chap 5)

funct: function; selects the

variant of the

operation in the op


branches and jumps
Branches and Jumps
  • Instructions:

bne $t4,$t5,LabelNext instruction is at Label if $t4 ° $t5

beq $t4,$t5,LabelNext instruction is at Label if $t4 = $t5

j LabelNext instruction is at Label

  • Formats:

op rs rt 16 bit address



op 26 bit address

  • Must describe ALU control hardware to compute 3-bit ALU control input
  • Later will describe main control unit (CU)
  • Common to use several levels of control. Reduces size of CU. May increase speed of CU.
  • ALU performs differently depending on instruction class:
    • Load/store: use ALU to compute memory address (addition)
    • R-type: performs one of 5 actions depending on value of the 6-bit function field.
    • Branch: ALU subtracts

ALU control unit inputs:

    • function field of instruction
    • 2-bit control field called ALUop. Sent by the main control unit.
    • ALUop determined by instruction type 00 = lw, sw means ADD 01 = beq, means SUB 11 = arithmetic means use the function code
  • ALU control unit outputs:
    • 3-bit signal to the ALU

Design question: Why do we have a funct field in the R-type instruction?

Why not just more op-codes?

Already used up the available op-codes

Since the R-type instruction doesn’t use 32 bits for the register fields, have extra space that we can use for the funct field

  • ALU control input (3 bit control line to ALU)

000 AND 001 OR 010 add 110 subtract 111 set-on-less-than

  • Why is the code for subtract 110 and not 011?

Instruction from memory




Bits 31-26




2 bits




3 bits

Bits 5-0


ALU control code

  • Below: how to set ALU control inputs based on the 2-bit ALUOp control and the 6-bit function code.
alu control hardware implementation
ALU Control hardware implementation
  • Many ways to implement the mapping from 2-bit ALUOp field and 6-bit funct field to the 3 ALU operation control bits.
  • Only need a small number of the 64 possible values of the funct field
  • funct field only used when ALUOp bits equal 10.
  • So can design small logic that recognizes a subset of possible values and sets the ALU control bits.
alu control implementation
ALU Control implementation
  • Technique:
    • Look for unique bits that will identify a particular output function.
  • Example:
    • Branch eq has ALUOp = 01. No other code has a 1 in the least significant bit.
    • Thus the 1 in the lsb uniquely identifies the branch and thus uniquely identifies ALU code 110.
    • The bit in the msb of ALUOp is a don’t care condition in this case.
    • See next slide
alu control implementation1
ALU Control implementation
  • Example 2:
    • A 1 in the msb of ALUOp uniquely identifies the arithmetic function.
    • Thus the lsb of ALUOp is a don’t care
    • Also, funct bits 5 and 4 are always 10, thus don’t help recognize the ALU control bits and can be ignored.
alu control implementation2
ALU Control implementation
  • Describe it using a truth table (can turn into gates).
  • Notes:
    • Inputs are the ALUOp and funct code field
    • Only show entries for which ALU control must have specific value, eg ALUOp does not use 11 so table contains 1X and X1 not 10 and 01
    • When funct field used, the first two bits (F5, F4) are always 10, so they are don’t care terms
alu control implementation3
ALU Control implementation
  • Implementing the TT. Call the output bits Op2, Op1, Op0.
  • Truth table for Op2 = 1 (leftmost bit of Operation field in previous table)
  • Only show entries for which output is 1.
  • Truth table above shows input combinations for which the ALU control is 010, 001, 110, 111.
  • Combinations 011, 100, and 101 are not used.
  • Don’t care about funct code if the ALUOp field is not 10.
alu control implementation4
ALU Control implementation
  • Truth table for Op2 = 1 (leftmost bit of Operation field in previous table)
alu control implementation5
ALU Control implementation

Truth table for Op1 = 1 (middle bit of Operation field in bottom table)

alu control implementation6
ALU Control implementation
  • Truth table for Op0 = 1 (rightmost bit of Operation field in bottom table)
full control implementation
Full Control Implementation
  • Have now designed the ALU control
  • Must design the full control logic
  • 3 instruction formats:


0 rs rt rd shamt funct

31-26 25-21 20-16 15-11 10-6 5-0


35 or 43 rs rt address

31-26 25-21 20-16 15-0


4 rs rt address

31-26 25-21 20-16 15-0

  • Opcode is always in bits 31-26. Refer to this as Op[5-0]
  • Two registers to read are always rs and rt at positions 25-21 and 20-16. True for R-type, branch, store.(rt)
  • Base register for load and store is always in bit positions 25-21 (rs)
  • The 16-bit offset for branch equal, load, store always in bit positions 15-0
  • Destination register is in one of two places:
    • Load: positions 20-16
    • R-type: positions 15-11 (rd)
    • Need a mux to select this
  • Can now add labels, muxes, control lines to datapath. See next slide.


instruction labels, extra mux (for Write register number input of the register

file) to datapath, ALU control block, write signals for state elements,

read signal for data memory, control signals for multiplexors.

  • The control unit can set all but one of the control signals based on the opcode
  • PCSrc is set if instruction is branch on equal and Zero output of ALU is 1.
  • Control units and signals added to datapath on next slide
  • Table on next slide defines how the control signals should be set for each opcode.

Datapath with control lines and control units

control use of the datapath
Control: use of the datapath
  • R-type instructions. Four steps (all are done in one clock cycle)
    • Signals stabilize in circuit in roughly the order of these 4 step
    • Example: add $t1, $t2, $t3
    • Step 1: Fetch and increment PC.
control use of the datapath1
Control: use of the datapath
  • Step 2: Two registers are read ($t2 and $t3).
  • Main CU computes setting of control lines.
control use of the datapath2
Control: use of the datapath
  • Step 3: ALU operates on data using the function code bits.
control use of the datapath3
Control: use of the datapath
  • Step 4: Result from ALU written into register file (register $t1).
control use of the datapath4
Control: use of the datapath
  • Execution of a load word: lw $t1, offset($t2)
    • Step 1: instruction fetched and PC incremented
    • Step 2: A register ($t2) value is read
    • Step 3: The ALU computse the sum of the register valuie and the sign-extended lower 16 bits of the offset
    • Step 4: the sum from ALU used as the address for data memory
    • Step 5: The data from memory written into the register file. Register destination given by bits 20-16 of the instruction ($t1)
control use of the datapath5
Control: use of the datapath
  • The load word instruction ( lw $t1, offset($t2) )
control use of the datapath6
Control: use of the datapath
  • Branch equal instruction beq $t1, $t2, offset
    • Step 1: instruction fetched, PC incremented
    • Step 2: two registers read ($t1, $t2)
    • Step 3: ALU subtracts. PC + 4 added to the sign-extended lower 16 bits of the instruction (offset) shifted left by two. Result is the branch target
    • Zero result used by CU to decide which adder result to store into the PC
    • Diagram next slide.
control use of the datapath7
Control: use of the datapath
  • Branch equal instruction beq $t1, $t2, offset
finalizing control
Finalizing Control
  • Instruction formats and resulting control signals
  • Encoding of the instruction formats:
finalizing control2
Finalizing Control
  • Implementing jumps
    • Similar to branch but computes target differently and is not conditional
    • Low-order 2 bits always 00
    • Next lower 26 bits come from the 26-bit immediate field
    • Upper 4 bits of the address come from the PC of the jump instruction + 4

2 address

31-26 25-0

finalizing control3
Finalizing Control
  • Implementing jumps
    • To implement, store into PC the concatenation of
      • Upper 4 bits of current PC + 4 (bits 31-28)
      • The 26-bit immediate field of the jump instruction
      • The bits 00
    • Need additional multiplexor
    • Incremented PC
    • Branch target PC
    • Or the jump target PC
  • Need additional control signal called jump. Asserted when opcode is 2.
  • Add an instruction that adds a register value with a memory value.
    • How must the datapath change to accommodate this instruction?
    • What new control signals are necessary?
exercise r m r
Exercise: R + M -> R



Problem: what is

the address size?

Must sign extend

our simple control structure
Our Simple Control Structure
  • All of the logic is combinational
  • We wait for everything to settle down, and the right thing to be done
    • ALU might not produce “right answer” right away
    • we use write signals along with clock to determine when to write
  • Cycle time determined by length of the longest path

We are ignoring some details like setup and hold times

single cycle why it is not used
Single Cycle: why it is not used
  • Bottom line: it’s inefficient.
  • Every clock cycle must have same length for every instruction.
    • Clock cycle determined by the longest possible path in machine
    • In this instruction set: load instruction
    • Load uses 5 functional units in series:
      • Instruction memory
      • Register file
      • ALU
      • Data memory
      • Register file
  • Thus cycles per instruction (CPI) is 1.
single cycle performance
Single Cycle: performance
  • Assume the operation time for the major function units in an implementation is:
    • Memory units: 200 ps
    • ALU/adders: 100 ps
    • Register file (read or write): 50 ps
    • Muxes, CU, PC accesses, sign extension, wires: no delay
  • Instruction mix:
    • 25% loads
    • 10% stores
    • 45% R-format
    • 15% branches
    • 5% jumps
single cycle performance1
Single Cycle: performance
  • Compare:
    • An implementation in which every instruction operates in 1 clock cycle of a fixed length
    • An implementation where every instruction takes 1 clock cycle using a variable-length clock. Clock is only as long as it needs to be.
    • Latter approach is not really practical.
  • Comparing execution times:

CPU execution time = Instruction count X CPI X Clock cycle time

single cycle performance2
Single Cycle: performance
  • Need clock cycle time for the two implementations. Instruction count and CPI are the same.
  • Critical path for the different instructions:
single cycle performance3
Single Cycle: performance
  • Using these critical paths, can compute the required length for each instruction class:
single cycle performance4
Single Cycle: performance
  • Time of single clock cycle implementation:
    • Clock cycle of single implementation determined by longest instruction (8ns).
    • Since CPI = 1 and the longest instruction(lw) takes 600 ps

CPU execution time = Instruction count X 1 X 600 ps

single cycle performance5
Single Cycle: performance
  • Average time per instruction with variable clock:

CPU clock cycle = 600 x 25% 550 x 10% +400 x 45% + 350 x 15% + 200 x 5%

= 447.5 ps

Average CPU execution time = Instruction count X 1 X 6.447.5 ps

single cycle performance6
Single Cycle: performance
  • Variable clock implementation has shorter average clock cycle, so is faster.
  • Performance ratio:

CPU performance variable clock = CPU execution time single clock

CPU performance single clock CPU execution time variable clock

= IC x CPU clock cycle single clock

IC x CPU clock cycle variable clock

= CPU clock cycle single clock = 600 = 1.34

CPU clock cycle variable clock 447.5

where we are headed
Where we are headed
  • Single Cycle Problems:
    • what if we had a more complicated instruction like floating point?
    • wasteful of area
    • implementing a variable-speed clock for each instruction class is extremely difficult. Overhead larger than any advantage gained.
  • One Solution:
    • use a “smaller” cycle time
    • have different instructions take different numbers of cycles
    • a “multicycle” datapath
where we are headed1
Where we are headed
  • Differences between single-cycle data path and multi-cycle datapath:
    • A single memory unit is used for both instructions and data
    • There is a single ALU, rather than an ALU and two adders
    • One or more registers are added after every major functional unit to hold the output of that unit until the value is used in a subsequent clock cycle.