Outline • 5.1 Introduction • 5.2 Logic Design Conventions • 5.3 Building a Datapath • 5.4 A simple Implementation Scheme • 5.5 A Multicycle Implementation • 5.6 Exceptions • 5.9 Real Stuff: The Organization of Recent Pentium • 5.10 Fallacies and Pitfalls • 5.11 Concluding Remarks • 5.12 Historical Perspective and Further Reading
The Processor: Datapath & Control • We're ready to look at an implementation of the MIPS • Simplified to contain only: • memory-reference instructions: lw, sw • arithmetic-logical instructions: add, sub, and, or, slt • control flow instructions: beq, j • Generic Implementation: • use the program counter (PC) to supply instruction address • get the instruction from memory • read registers • use the instruction to decide exactly what to do • All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?
More Implementation Details • Abstract / Simplified View: • Two types of functional units: • elements that operate on data values (combinational) • elements that contain state (sequential)
FIGURE 3.14 MIPS architecture revealed thus far MIPS assembly language
Figure 5.2 The basic implementation of the MIPS subset including the necessary multiplexers and control lines.
Keywords • Clocking methodologyThe approach used to determine when data is valid and stable relative to the clock. • Edge-triggered clockingA clocking scheme in which all state changes occur on a clock edge. • Control signalA signal used for multiplexer selection or for directing the operation of a function unit; contrasts with a data signal, which contains information that is operated on by a functional unit.
State Elements • Unclocked vs. Clocked • Clocks used in synchronous logic • when should an element that contains state be updated? cycle time
An unclocked state element • The set-reset latch • output depends on present inputs and also on past inputs
Latches and Flip-flops • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) • Change of state (value) is based on the clock • Latches: whenever the inputs change, and the clock is asserted • Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written
D-latch • Two inputs: • the data value to be stored (D) • the clock signal (C) indicating when to read & store D • Two outputs: • the value of the internal state (Q) and it's complement
D flip-flop • Output changes only on the clock edge
Our Implementation • An edge triggered methodology • Typical execution: • read contents of some state elements, • send values through some combinational logic • write results to one or more state elements
Figure 5.4An edge-triggered methodology allows a state element to be read and written in the same clock cycle without creating a race that could lead to indeterminate data values.
Keywords • Datapath elementA functional unit used to operate on or hold data within a processor. In the MIPS implementation the datapath elements include the instruction and data memories, the register file, the arithmetic logic unit (ALU), and adders. • Program counter (PC)The register containing the address of the instruction in the program being executed. • Register fileA state element that consists of a set of registers that can be read and written by supplying a register number to be accessed. • Sign-extend To increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.
Keywords • Branch target addressThe address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the MIPS architecture the branch target is given by the sum of the offset field of the instruction and the address of the instruction following the branch. • Branch takenA branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional branches are taken branches. • Branch not takenA branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch. • Delayed branchA type of branch where the instruction immediately following the branch is always executed, independent of whether the branch condition is true or false.
Register File • Built using D flip-flops Do you understand? What is the “Mux” above?
Abstraction • Make sure you understand the abstractions! • Sometimes it is easy to think you do, when you don’t
Register File • Note: we still use the real clock to determine when to write
Simple Implementation • Include the functional units we need for each instruction
Figure 5.10 The datapath for the memory instructions and the R-type instructions.
Building the Datapath • Use multiplexors to stitch them together
Keywords • Don’t-care termAn element of a logic function in which the output does not depend on the values of all the inputs. Don’t-care terms may be specified in different ways. • Opcode The field that denotes the operation and format of an instruction. • Single-cycle implementationAlso called single clock cycle implementation. An implementation in which an instruction is executed in one clock cycle.
Control • Selecting the operations to perform (ALU, read/write, etc.) • Controlling the flow of data (multiplexor inputs) • Information comes from the 32 bits of the instruction • Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op rs rt rd shamt funct • ALU's operation based on instruction type and function code
Control • e.g., what should the ALU do with this instruction • Example: lw $1, 100($2) 35 2 1 100 op rs rt 16 bit offset • ALU control input0000 AND 0001 OR 0010 add 0110 subtract 0111 set-on-less-than 1100 NOR • Why is the code for subtract 0110 and not 0011?
Figure 5.12 How the ALU control bits are set depends on the ALUOp control bits and the different function codes for the R-type instruction.
ALUOp computed from instruction type Control • Must describe hardware to compute 4-bit ALU control input • given instruction type 00 = lw, sw 01 = beq, 10 = arithmetic • function code for arithmetic • Describe it using a truth table (can turn into gates):
Figure B.5.9 A 1-bit ALU that performs AND, OR, and addition on a and b or a and b.
FIGURE B.5.10 (Top) A 1-bit ALU that performs AND, OR, and addition on a and b or b.
FIGURE B.5.11A 32-bit ALU constructed from the 31 copies of the 1-bit ALU in the top of Figure B.5.10 and one 1-bit ALU in the bottom of that figure.
FIGURE B.5.12 The final 32-bit ALU. This adds a Zero detector to Figure B.5.11.
FIGURE B.5.13 The values of the three ALU control lines Bnegate and Operation and the corresponding ALU operations.
FIGURE B.5.14 The symbol commonly used to represent an ALU, as shown in FigureB.5.12.
Figure 5.14 The three instruction classes (R-tape, load and store, and branch) use two different instruction formats. Field Bit positions 31:26 25:21 20:16 15:11 10:6 5:0 a. R-type instruction Field Bit positions 31:26 25:21 20:16 15:0 b. Load or store instruction Field Bit positions 31:26 25:21 20:16 15:0 c. Branch instruction
Figure 5.15 The datapath of Figure 5.12 with all necessary multiplexors and all control lines identified
Control • Simple combinational logic (truth tables)
Our Simple Control Structure • All of the logic is combinational • We wait for everything to settle down, and the right thing to be done • ALU might not produce “right answer” right away • we use write signals along with clock to determine when to write • Cycle time determined by length of the longest path We are ignoring some details like setup and hold times
Single Cycle Implementation • Calculate cycle time assuming negligible delays except: • memory (200ps), ALU and adders (100ps), register file access (50ps)