Building a Datapath for CPU Performance Optimization

Chapter 4 The Processor

Introduction §4.1 Introduction EET 4250: Microcomputer Architecture • CPU performance factors • Instruction count • Determined by ISA and compiler • CPI and Cycle time • Determined by CPU hardware • We will examine two MIPS implementations • A simplified version • A more realistic pipelined version • Simple subset, shows most aspects • Memory reference: lw, sw • Arithmetic/logical: add, sub, and, or, slt • Control transfer: beq, j

Instruction Execution EET 4250: Microcomputer Architecture • PC  instruction memory, fetch instruction • Register numbers register file, read registers • Depending on instruction class • Use ALU to calculate • Arithmetic result • Memory address for load/store • Branch target address • Access data memory for load/store • PC  target address or PC + 4

CPU Overview EET 4250: Microcomputer Architecture

Multiplexers • Can’t just join wires together • Use multiplexers EET 4250: Microcomputer Architecture

Control EET 4250: Microcomputer Architecture

Logic Design Basics §4.2 Logic Design Conventions EET 4250: Microcomputer Architecture • Information encoded in binary • Low voltage = 0, High voltage = 1 • One wire per bit • Multi-bit data encoded on multi-wire buses • Combinational element • Operate on data • Output is a function of input • State (sequential) elements • Store information

A Y B A A Mux I0 Y + Y Y I1 ALU B B S F Combinational Elements • Adder • Y = A + B • Arithmetic/Logic Unit • Y = F(A, B) • Multiplexer • Y = S ? I1 : I0 EET 4250: Microcomputer Architecture • AND-gate • Y = A & B

D Q Clk Clk D Q Sequential Elements EET 4250: Microcomputer Architecture • Register: stores data in a circuit • Uses a clock signal to determine when to update the stored value • Edge-triggered: update when Clk changes from 0 to 1

Clk D Q Write Write D Clk Q Sequential Elements EET 4250: Microcomputer Architecture • Register with write control • Only updates on clock edge when write control input is 1 • Used when stored value is required later

Clocking Methodology EET 4250: Microcomputer Architecture • Combinational logic transforms data during clock cycles • Between clock edges • Input from state elements, output to state element • Longest delay determines clock period

Building a Datapath • Datapath • Elements that process data and addressesin the CPU • Registers, ALUs, mux’s, memories, … • We will build a MIPS datapath incrementally • Refining the overview design §4.3 Building a Datapath EET 4250: Microcomputer Architecture

Creating a Single Datapath from the Parts • Assemble the datapath segments and add control lines and multiplexors as needed • Single cycle design – fetch, decode and execute each instructions in one clock cycle • no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., separate Instruction Memory and Data Memory, several adders) • multiplexors needed at the input of shared elements with control lines to do the selection • write signals to control writing to the Register File and Data Memory • Cycle time is determined by length of the longest path

Instruction Fetch Increment by 4 for next instruction 32-bit register EET 4250: Microcomputer Architecture

R-Format Instructions • Read two register operands • Perform arithmetic/logical operation • Write register result EET 4250: Microcomputer Architecture

Load/Store Instructions • Read register operands • Calculate address using 16-bit offset • Use ALU, but sign-extend offset • Load: Read memory and update register • Store: Write register value to memory EET 4250: Microcomputer Architecture

Branch Instructions • Read register operands • Compare operands • Use ALU, subtract and check Zero output • Calculate target address • Sign-extend displacement • Shift left 2 places (word displacement) • Add to PC + 4 • Already calculated by instruction fetch EET 4250: Microcomputer Architecture

Branch Instructions Justre-routes wires Sign-bit wire replicated EET 4250: Microcomputer Architecture

Branch Addressing – Absolute 1st Instruction 2nd Instruction EET 4250: Microcomputer Architecture • Simplified branch ADDR field • If immediate field is 3 bits instead of 16 • beq $t0, $t1, ADDR • How do we maximize how big ADDR can be? • Case 1: ADDR is an absolute address • Limit to 23 = 8 addresses in program! • Only 2 instructions (at address 0 & 4)

Branch Addressing – Byte Offset 1st Instr. Any of the 232 addresses 2nd Instr. EET 4250: Microcomputer Architecture • Simplified branch ADDR field • If immediate field is 3 bits instead of 16 • Case 2: ADDR is a byte-offset from PC • PC is 32 bits = 232 addresses • Branch can go PC ± 22 bytes = PC ± 4 bytes = PC ± 1 instructions!!

Branch Addressing – Word Offset EET 4250: Microcomputer Architecture • Simplified branch ADDR field • If immediate field is 3 bits instead of 16 • Case 3: ADDR is a word-offset from PC • PC is 32 bits = 232 addresses • Branch can go PC ± 22 words = PC ± 16 bytes = PC ± 4 instructions!!

Branch Instructions Justre-routes wires Sign-bit wire replicated EET 4250: Microcomputer Architecture

Composing the Elements • First-cut data path does an instruction in one clock cycle • Each datapath element can only do one function at a time • Hence, we need separate instruction and data memories • Use multiplexers where alternate data sources are used for different instructions EET 4250: Microcomputer Architecture

R-Type/Load/Store Datapath EET 4250: Microcomputer Architecture

Full Datapath EET 4250: Microcomputer Architecture

0 4 35 or 43 rs rs rs rt rt rt rd address address shamt funct 31:26 31:26 31:26 25:21 25:21 25:21 20:16 20:16 20:16 15:11 10:6 15:0 15:0 5:0 The Main Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add EET 4250: Microcomputer Architecture

ALU Control • ALU used for • Load/Store: F = add • Branch: F = subtract • R-type: F depends on funct field §4.4 A Simple Implementation Scheme EET 4250: Microcomputer Architecture

ALU Control • Assume 2-bit ALUOp derived from opcode • Combinational logic derives ALU control EET 4250: Microcomputer Architecture

Datapath With Control EET 4250: Microcomputer Architecture

Add 4 Fetch PC = PC+4 Instruction Memory Exec Decode Read Address PC Instruction Fetching Instructions • Fetching instructions involves • reading the instruction from the Instruction Memory • updating the PC value to be the address of the next (sequential) instruction • PC is updated every clock cycle, so it does not need an explicit write control signal just a clock signal • Reading from the Instruction Memory is a combinational activity, so it doesn’t need an explicit read control signal

Fetch PC = PC+4 Exec Decode Read Addr 1 Read Data 1 Register File Read Addr 2 Write Addr Read Data 2 Write Data Decoding Instructions • Decoding instructions involves • sending the fetched instruction’s opcode and function field bits to the control unit Control Unit Instruction • reading two values from the Register File • Register File addresses are contained in the instruction

31 25 20 15 10 5 0 R-type: op rs rt rd shamt funct RegWrite ALU control Fetch PC = PC+4 Read Addr 1 Read Data 1 Register File Read Addr 2 overflow Instruction zero Exec Decode ALU Write Addr Read Data 2 Write Data Executing R Format Operations • R format operations (add, sub, slt, and, or) • perform operation (op and funct) on values in rs and rt • store the result back into the Register File (into location rd) • Note that Register File is not written every cycle (e.g. sw), so we need an explicit write control signal for the Register File

R-type Instruction Data/Control Flow 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

RegWrite ALU control MemWrite overflow zero Read Addr 1 Read Data 1 Address Register File Read Addr 2 Instruction Data Memory Read Data ALU Write Addr Read Data 2 Write Data Write Data MemRead Sign Extend 16 32 Executing Load and Store Operations • Load and store operations involves • compute memory address by adding the base register (read from the Register File during decode) to the 16-bit signed-extended offset field in the instruction • store value (read from the Register File during decode) written to the Data Memory • load value, read from the Data Memory, written to the Register File

Load Word Instruction Data/Control Flow 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Add RegWrite ALUSrc ALU control MemWrite MemtoReg 4 ovf zero Read Addr 1 Instruction Memory Read Data 1 Address Register File Read Addr 2 Data Memory Read Address PC Instruction Read Data ALU Write Addr Read Data 2 Write Data Write Data MemRead Sign Extend 16 32 Fetch, R, and Memory Access Portions

Branch-on-Equal Instruction EET 4250: Microcomputer Architecture

Executing Branch Operations • Branch operations involves • compare the operands read from the Register File during decode for equality (zero ALU output) • compute the branch target address by adding the updated PC to the 16-bit signed-extended offset field in the instr Branch target address Add Add 4 Shift left 2 ALU control PC zero (to branch control logic) Read Addr 1 Read Data 1 Register File Read Addr 2 Instruction ALU Write Addr Read Data 2 Write Data Sign Extend 16 32

Branch Instruction Data/Control Flow 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Single Cycle Datapath with Control Unit 0 Add Add 1 4 Shift left 2 PCSrc ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

2 address 31:26 25:0 Implementing Jumps • Jump uses word address • Update PC with concatenation of • Top 4 bits of old PC • 26-bit jump address • 00 • Need an extra control signal decoded from opcode Jump EET 4250: Microcomputer Architecture

Adding the Jump Operation Instr[25-0] 1 Shift left 2 28 32 26 0 PC+4[31-28] 0 Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

Executing Jump Operations • Jump operation involves • replace the lower 28 bits of the PC with the lower 26 bits of the fetched instruction shifted left by 2 bits Add 4 4 Jump address Instruction Memory Shift left 2 28 Read Address PC Instruction 26

Datapath With Jumps Added EET 4250: Microcomputer Architecture

Performance Issues • Longest delay determines clock period • Critical path: load instruction • Instruction memory  register file  ALU  data memory  register file • Not feasible to vary period for different instructions • Violates design principle • Making the common case fast • We will improve performance by pipelining EET 4250: Microcomputer Architecture

Instruction Critical Paths • What is the clock cycle time assuming negligible delays for muxes, control unit, sign extend, PC access, shift left 2, wires, setup and hold times except: • Instruction and Data Memory (200 ps) • ALU and adders (200 ps) • Register File access (reads or writes) (100 ps)

Pipelining Analogy • Pipelined laundry: overlapping execution • Parallelism improves performance §4.5 An Overview of Pipelining EET 4250: Microcomputer Architecture

MIPS Pipeline • Five stages, one step per stage • IF: Instruction fetch from memory • ID: Instruction decode & register read • EX: Execute operation or calculate address • MEM: Access memory operand • WB: Write result back to register EET 4250: Microcomputer Architecture

Pipeline Performance • Assume time for stages is • 100ps for register read or write • 200ps for other stages • Compare pipelined datapath with single-cycle datapath EET 4250: Microcomputer Architecture

Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) EET 4250: Microcomputer Architecture

Building a Datapath for CPU Performance Optimization

Building a Datapath for CPU Performance Optimization

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4