This presentation is the property of its rightful owner.
1 / 115

# ECE243 PowerPoint PPT Presentation

ECE243. CPU. IMPLEMENTING A SIMPLE CPU. How are machine instructions implemented? What components are there? How are they connected and controlled?. MINI ISA:. every instruction is 1-byte wide data and address values are also 1-byte wide address space

ECE243

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## ECE243

CPU

### IMPLEMENTING A SIMPLE CPU

• How are machine instructions implemented?

• What components are there?

• How are they connected and controlled?

### MINI ISA:

• every instruction is 1-byte wide

• data and address values are also 1-byte wide

• 8 addr bits => 256 byte locations

• 4 registers:

• k0..k3

• PC (resets to \$80)

• Condition codes:

• Z (zero), N (negative)

• these are used by branches

### Some Definitions:

• IMM3: a 3-bit signed immediate, 2 parts:

• 1 sign bit: sign(IMM3)

• 2 bit value: value(IMM3)

• IMM4: a 4-bit signed immediate

• IMM5: a 5-bit unsigned immediate

• R1, R2: registers variables

• represent one of k0..k3

• SE8(X):

• means sign-extend value X to 8 bits

• NOTE: ALL INSTS DO THIS LAST:

• PC = PC + 1

### Mini ISA Instructions

R1 = mem[R2]

PC = PC + 1

store R1 (R2):

mem[R2] = R1

PC = PC + 1

R1 = R1+ R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1

sub R1 R2

R1= R1 - R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1

### Mini ISA Instructions

nand R1 R2

R1= R1 bitwise-NAND R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1

ori IMM5

K1 = K1 bitwise-OR IMM5

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1

shift R1 IMM3

IF (sign(IMM3)) R1 =R1 << value(IMM3)

ELSE R1 = R1 >> value(IMM3)

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1

### Mini ISA Instructions

bz IMM4

IF (Z == 1) PC = PC + SE8(IMM4)

PC = PC + 1

bnz IMM4

IF (Z == 0) PC = PC + SE8(IMM4)

PC = PC + 1

bpz IMM4

IF (N == 0) PC= PC + SE8(IMM4)

PC = PC + 1

• Ori:

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

### ENCODINGS: Inst(opcode)

• Shift:

• BZ(0101), BNZ(1001), BPZ(1101):

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

### DESIGNING A CPU

• Two main components:

• datapath and control

• datapath:

• registers, functional units, muxes, wires

• must be able to perform all steps of every inst

• control:

• a finite state machine (FSM)

• commands the datapath

• performs: fetch, decode, read, execute, write, get next inst

## ECE243

CPU: basic components

### REGISTERS

REGWrite?

out

in

REG

clock

8

8

• REGISTERS

• we assume falling-edge-triggered

• in is stored if REGWrite=1 on falling clock edge

• we won’t normally draw the clock input

### MUXES

out

0

1

8

8

8

select

• ‘select’ signal chooses which input to route to output

### REGISTER FILE

2

2

2

8

8

REGWrite?

R1

Out1

Reg

FILE

(k0,k1,k2,k3)

Out2

R2

in

Rwrite

clock

8

• Out1 is the value of reg indexed by R1

• Out2 is the value of reg indexed by R2

• if REGWrite is 1 when clock goes low

• then the value on ‘in’ is written to reg indexed by Rwrite

### ALU (arithmetic logic unit)

8

8

8

Z

N

In0

out

In1

3

ALUop

• ALUop:

• sub = 001

• or = 010

• nand = 011

• shift = 100

• Z = nor(out7,out6,out5…out0)

• N = out bit 7 (implies negative---sign bit)

### MEMORY

• our CPU has two memories for simplicity:

• instruction memory and data memory

• known as a “Harvard architecture”

### INSTRUCTION MEM

INST

MEM

Iout

8

8

• Iout is set to the value indexed by the address

### DATA MEMORY

8

8

8

MEMWrite?

DATA

MEM

clock

Din

Dout

• but only one in a given clock cycle

• on falling clock edge:

• if MEMWrite==1: value on Din is stored at addr

### SE8(x): SIGN-EXTEND TO 8 BITS

I3

O3

O7

I2

O2

O6

I1

O1

O5

I0

O0

O4

• assuming 4-bit input

• Recall: want:

• SE8(0100) -> 00000100

• SE8(1100) -> 11111100

• In bits i3,i2,i1,i0; out bits o7…o0

### ZE8(x): ZERO EXTEND TO 8 bits

O3

O7

O2

O6

0

O1

O5

I4

O0

O4

I3

I2

I1

I0

• assuming 5-bit input

• Recall: want

• ZE8(00100) -> 00000100

• ZE8(11100) -> 00011100

• In bits i4,i3,i2,i1,i0; out bits o7…o0

## ECE243

CPU: Single Cycle Implementation

### SINGLE CYCLE DATAPATH

Inst1

Inst2

1 cyc

• each instruction executes entirely

• in one cycle of the cpu clock

• registers are triggered by the falling edge

• new values begin propagating through datapath

• some values may be temporarily incorrect

• the clock period is large enough to ensure:

• that all values correct before next falling edge

### FETCH

8

8

• needed by every instruction

• i.e., every instruction must be fetched

PC

INST

MEM

inst

PCwrite?

8

8

PC

INST

MEM

inst

PCwrite?

### BRANCHES: BZ IMM4

8

8

8

7 6 5 4 3 2 1 0

IMM4

opcode

• (if branch is taken does: PC = PC + IMM4 + 1)

PC

INST

MEM

inst

PCwrite?

+

1

0

1

1

+

+

8

8

8

4

8

i7 i6 i5 i4 i3 i2 i1 i0

R2

0 1 0 0

R1

SE8

Inst:

• does r1 = r1 + r2

• same datapath for sub and nand

PC

INST

MEM

inst

PCwrite?

PCsel

IMM4

### SHIFT: SHIFT R1 IMM3

i7 i6 i5 i4 i3 i2 i1 i0

0

1

IMM3

R1

0 1 1

N

1

Z

+

+

2

2

8

2

4

8

8

8

SE8

R2

2

REGwrite?

REG

FILE

Rw

Out1

PC

A

L

U

INST

MEM

R1

Out2

inst

PCwrite?

in

PCsel

IMM4

ALUop

### ORI: ORI IMM5

0

1

N

1

Z

+

A

L

U

+

2

8

8

8

8

8

4

2

SE8

i7 i6 i5 i4 i3 i2 i1 i0

IMM5

1 1 1

R2

2

• does: k1 <- k1 bitwise-or IMM5

REGwrite?

REG

FILE

Rw

Out1

PC

INST

MEM

R1

Out2

inst

PCwrite?

in

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

### Store: Store R1 (R2)

0

1

1

0

N

1

Z

5

3

00

+

+

A

L

U

01

10

11

2

2

8

8

8

4

2

8

8

2

i7 i6 i5 i4 i3 i2 i1 i0

R2

opcode

R1

SE8

R2

2

Inst:

• does: mem[r2] = r1

R1sel

REGwrite?

1

REG

FILE

Rw

Out1

PC

INST

MEM

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

0

1

1

0

N

1

Z

3

5

00

+

A

L

U

+

01

10

11

8

4

8

2

2

8

8

2

2

8

i7 i6 i5 i4 i3 i2 i1 i0

R2

opcode

R1

SE8

R2

2

Inst:

• does: r1 = mem[r2]

MEMwrite

Data

MEM

R1sel

REGwrite?

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

0

1

1

0

1

0

N

1

Z

5

3

00

A

L

U

+

+

01

10

11

8

8

8

2

4

2

2

8

8

2

SE8

R2

2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

### DESIGNING THE CONTROL UNIT

opcode

PCsel

CTRL

Z

N

• CONTROL SIGNALS TO GENERATE:

• PCsel, PCwrite, REGwrite, MEMread, MEMwrite, R1sel, ALUop, ALU2, RFin

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

store R1 (R2)

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

sub R1 R2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

nand R1 R2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

ori IMM5

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

shift R1 IMM3

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bz IMM4

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bnz IMM4

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bpz IMM4

## ECE243

CPU: Multicycle Implementation

### Key Difference #3: Temp Regs

what benefit are tmp regs / multicycle?

### Key Difference #3: Temp Regs

critical path is long large clock period

### Key Difference #3: Temp Regs

smaller critical pathsshorter clock period

### Key Difference #3: Temp Regs

let’s examine these one at a time

### IR: Instruction Register

holds inst encoding

### MDR: Memory Data Register

holds the value returned from Memory

### R1 and R2

hold values from the register file

### ALUout

holds the result calculcated by the ALU

### All Insts Cycle1: Fetch and Increment PC

IR ← mem[PC]; PC ← PC + 1;

increment PC

fetch next inst into the IR

### All Insts Cycle2: Decoding Inst & Reading Reg File

R1 ← Kx; R2 ← Ky

Note: not all insts need R1 and R2

### Add, Sub, Nand Cycle3: Calculate

ALUout ← R1 op R2

Kx ← ALUout

### Shift Cycle3: Calculate

ALUout ← R1 op IMM3

Kx ← ALUout

R1 ← k1

### ORI Cycle4: Calculate

ALUout ← R1 op IMM5

ky ← ALUout

MDR ← mem[R2]

ky ← MDR

mem[R2] ← R1

PC ← PC + IMM4

### Summary

Example: total time to execute one of each instruction:

Single cycle: 1*4 + 1*4+1*1 = 9 cycles; 9 cycles / 1MHz = 9us

Multicycle: 3*4 + 4*4 + 1*5 = 33 cycles; 33 cycles / 4MHz = 8.25us

### Control: An FSM

• need a state transition diagram

• how many states are there?

• how many bits to represent state?

Z

N

Pcwrite

Pcsel

ALUop

IR

Ctrl logic

IR:3..0

Next_state

Current_state

State Register

(4 bits)

## ECE243

### EXAMPLE QUESTION:ADDING A NEW INSTRUCTION

Does: RF[r1] = MEM[RF[r2]]

RF[r2] = RF[r2] + 1

r2 is permanently changed to be r2+1

### Implementing: RF[r1] = MEM[RF[r2]]; RF[r2] = RF[r2] + 1

IR= mem[PC] , PC = PC + 1

R1 = RF[r1], R2 = RF[r2]

MDR = mem[R2]

RF[r1] = MDR

### Modifying the Datapath

RF[r2] = RF[r2] + 1

## ECE243

CPU: Pipelining

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

### With One Cook

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

customer1

customer1

customer1

customer1

customer1

• one customer is serviced at a time

### Like the single-cycle CPU

1

0

0

0

1

1

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

R1

00

+

A

L

U

+

Out2

inst

01

10

PCwrite?

in

11

2

8

8

8

2

2

2

8

4

8

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2

• one instruction flows through at a time

cook

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

### Pipelining

• Like an assembly line

• Doesn’t change the interface or result

• improves performance

1

0

1

0

1

0

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

R1

00

A

L

U

+

+

Out2

inst

01

10

PCwrite?

in

11

8

4

8

8

2

8

8

2

2

2

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2

0

1

1

0

1

0

N

1

Z

5

3

00

A

L

U

+

+

01

10

11

8

8

8

2

4

2

2

8

8

2

SE8

R2

2

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

cook

cook

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

1

0

1

0

1

0

MEMwrite

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

R1

00

A

L

U

+

+

Out2

inst

01

10

PCwrite?

in

11

8

4

8

8

2

8

8

2

2

2

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

### Fast Food Hazards

cook

cook

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

customer3

customer2

customer1

What if: c1 and c2 are friends, c2 has no money, and

c2 needs to know how much change c1 will get before

ordering (to ensure c2 can afford his order)?

cook

cook

cook

take

order

select

bun

ingredients

wrap and

bag

cash and

change

customer2

customer1

### CPU Hazards

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

• called a data hazard

• must be observed to ensure correct execution

• there are two solutions to data hazards

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

### How to insert bubbles

• option1: hardware stalls the pipeline

• need extra logic to do so

• happens ‘automatically’ for any code

• option2: compiler inserts “no-ops”

• a no-op is an instruction that does nothing

• compiler must do it right or wrong results!

• example: inserting a bubble with a no-op:

noop

### Solution2: Forwarding Lines

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

• to pass values directly between stages

### Control Hazards

• cpu predicts each branch is not taken

• Better: predict taken

• why?---loops are common, usually taken

• More advanced: remember what each branch did last time

• “branch predictor”:

• a table that remembers what each branch did the last time

• uses this to make a prediction next time

### Some Real CPU Pipelines

TC nxt IP

TC fetch

Drv

Alloc

Rename

Que

Sch

Sch

Sch

Disp

Disp

RF

RF

Ex

Flgs

BrCk

Drv

21264 Pipeline (Alpha)

Microprocessor Report 10/28/96

Pentium IV’s Pipeline:

## ECE243

CPU: Alternate Architectures

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMWrite

Imm3,4,5

IR

1

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z

### SOME CONTROL SIGNALS

• PCout:

• write PC value to bus

• PCin:

• read bus value into PC

• MDRinBus:

• read value from bus into MDR

• MDRinMem:

• write value from Dout of MEM into MDR

• MDRoutBus:

• write value from MDR onto bus

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMWrite

Imm3,4,5

IR

1

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMWrite

Imm3,4,5

IR

1

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z

### CHARACTERIZATION OF ISAs

• attribute #1:

• number of explicit operands

• Attribute #2:

• are registers general purpose?

• Attribute #3:

• Can an operand be a memory location?

• Attribute #4:

• RISC vs CISC

• Attribute #5:

• Relation between instructions and data

### att1: num of explicit operands

• focus on calculation instructions (add,sub…)

• running example: A = B + C (C-code)

• assume A, B, C are memory locations

• 0 operands:

• eg., stack based (like first calculator CPUs)

• push and pop operations, refer to top of stack

### att1: num of explicit operands

• 1 operand:

• eg., accumulator based;

• accumulator is a reg inside cpu

• instructions use accum as destination.

### att1: num of explicit operands

• 2 operands

• eg: 68k, ia32

### att1: num of explicit operands

• 3-operand

• eg: MIPS, SPARC, POWERpc

• How many operands is NIOS?

### Att2: are regs general purpose?

• if yes:

• you can use any register for any purpose

• special registers are by convention only

• if no:

• some registers have hardwired purposes

• ex: in 68k, A7 is hardwired to be stack pointer

• used implicitly for jsr, rts, link instructions

• Are NIOS registers general purpose?

### Att3: operand = mem location?

• with respect to calculation insts (add, sub)

• if yes:

• one operand can be in memory, the other in a register

• maybe: can can also write result to memory

• if no:

• only load/store insts can get/put memory values to/from regs

• Can a NIOS operand be a mem location?

### Att4: RISC vs CISC

• Are there instructions with many steps?

• a vague and debatable question

• CISC: complex instruction set computer

• Many, complex instructions

• can be hard to pipeline!

• ex: 68k, x86, PowerPC?

• RISC: reduced instruction set computer

• Fewer, simple instructions

• easy to pipeline

• ex: MIPS, alpha, Powerpc?

• Which is NIOS?

• Quandry: x86 is a CISC

• but pentiumIV has a 20-stage pipeline!

• How’d they do it?

### Att5: Relation bet. insts & data

• SISD: single instruction, single data

• everyting we have seen so far

• an inst only writes one reg/memory location

• SIMD: single instruction, multiple data

• one instruction tells CPU to operate on an array of regs or memory locations

• ex: multimedia extensions: MMX, SSE, 3Dnow (intel); altivec (powerpc)

• ex: IBM/Sony/toshiba Cell processor (vector processor)

• MIMD: multiple instruction, multiple data

• ex: Cluster of workstations, SMP servers, multicores, hyperthreading

• Which is NIOS?