Ece243
Download
1 / 115

ECE243 - PowerPoint PPT Presentation


  • 167 Views
  • Uploaded on

ECE243. CPU. IMPLEMENTING A SIMPLE CPU. How are machine instructions implemented? What components are there? How are they connected and controlled?. MINI ISA:. every instruction is 1-byte wide data and address values are also 1-byte wide address space

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ECE243' - loki


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Ece243

ECE243

CPU


Implementing a simple cpu
IMPLEMENTING A SIMPLE CPU

  • How are machine instructions implemented?

  • What components are there?

  • How are they connected and controlled?


Mini isa
MINI ISA:

  • every instruction is 1-byte wide

    • data and address values are also 1-byte wide

  • address space

    • byte addressable (every byte has an address)

    • 8 addr bits => 256 byte locations

  • 4 registers:

    • k0..k3

  • PC (resets to $80)

  • Condition codes:

    • Z (zero), N (negative)

    • these are used by branches


Some definitions
Some Definitions:

  • IMM3: a 3-bit signed immediate, 2 parts:

    • 1 sign bit: sign(IMM3)

    • 2 bit value: value(IMM3)

  • IMM4: a 4-bit signed immediate

  • IMM5: a 5-bit unsigned immediate

  • R1, R2: registers variables

    • represent one of k0..k3

  • SE8(X):

    • means sign-extend value X to 8 bits

  • NOTE: ALL INSTS DO THIS LAST:

    • PC = PC + 1


Mini isa instructions
Mini ISA Instructions

load R1 (R2):

R1 = mem[R2]

  PC = PC + 1

store R1 (R2):

mem[R2] = R1

  PC = PC + 1

add R1 R2

R1 = R1+ R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

  PC = PC + 1

sub R1 R2

R1= R1 - R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

PC = PC + 1


Mini isa instructions1
Mini ISA Instructions

nand R1 R2

R1= R1 bitwise-NAND R2

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

   PC = PC + 1

ori IMM5

K1 = K1 bitwise-OR IMM5

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

  PC = PC + 1

shift R1 IMM3

IF (sign(IMM3)) R1 =R1 << value(IMM3)

ELSE R1 = R1 >> value(IMM3)

IF (R1 == 0) Z = 1 ELSE Z = 0

IF (R1< 0) N = 1 ELSE N = 0

  PC = PC + 1


Mini isa instructions2
Mini ISA Instructions

bz IMM4

IF (Z == 1) PC = PC + SE8(IMM4)

    PC = PC + 1

bnz IMM4

IF (Z == 0) PC = PC + SE8(IMM4)

    PC = PC + 1

bpz IMM4

IF (N == 0) PC= PC + SE8(IMM4)

   PC = PC + 1


Encodings inst opcode
ENCODINGS: Inst(opcode)

  • Load(0000), store(0010), add(0100), sub(0110), nand(1000):

  • Ori:

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0


Encodings inst opcode1
ENCODINGS: Inst(opcode)

  • Shift:

  • BZ(0101), BNZ(1001), BPZ(1101):

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0


Designing a cpu
DESIGNING A CPU

  • Two main components:

    • datapath and control

  • datapath:

    • registers, functional units, muxes, wires

    • must be able to perform all steps of every inst

  • control:

    • a finite state machine (FSM)

    • commands the datapath

    • performs: fetch, decode, read, execute, write, get next inst


Ece2431

ECE243

CPU: basic components


Registers
REGISTERS

REGWrite?

out

in

REG

clock

8

8

  • REGISTERS

    • can always read

    • we assume falling-edge-triggered

    • in is stored if REGWrite=1 on falling clock edge

    • we won’t normally draw the clock input


Muxes
MUXES

out

0

1

8

8

8

select

  • ‘select’ signal chooses which input to route to output


Register file
REGISTER FILE

2

2

2

8

8

REGWrite?

R1

Out1

Reg

FILE

(k0,k1,k2,k3)

Out2

R2

in

Rwrite

clock

8

  • Out1 is the value of reg indexed by R1

  • Out2 is the value of reg indexed by R2

  • if REGWrite is 1 when clock goes low

    • then the value on ‘in’ is written to reg indexed by Rwrite


Alu arithmetic logic unit
ALU (arithmetic logic unit)

8

8

8

Z

N

In0

out

In1

3

ALUop

  • ALUop:

    • add = 000

    • sub = 001

    • or = 010

    • nand = 011

    • shift = 100

  • Z = nor(out7,out6,out5…out0)

  • N = out bit 7 (implies negative---sign bit)


Memory
MEMORY

  • our CPU has two memories for simplicity:

    • instruction memory and data memory

    • known as a “Harvard architecture”


Instruction mem
INSTRUCTION MEM

INST

MEM

addr

Iout

8

8

  • is read only

  • Iout is set to the value indexed by the address


Data memory
DATA MEMORY

8

8

8

MEMRead?

MEMWrite?

DATA

MEM

addr

clock

Din

Dout

  • can read or write

    • but only one in a given clock cycle

  • on falling clock edge:

    • if MEMWrite==1: value on Din is stored at addr

    • if MEMRead==1: value at addr is output on Dout


Se8 x sign extend to 8 bits
SE8(x): SIGN-EXTEND TO 8 BITS

I3

O3

O7

I2

O2

O6

I1

O1

O5

I0

O0

O4

  • assuming 4-bit input

  • Recall: want:

    • SE8(0100) -> 00000100

    • SE8(1100) -> 11111100

  • In bits i3,i2,i1,i0; out bits o7…o0


Ze8 x zero extend to 8 bits
ZE8(x): ZERO EXTEND TO 8 bits

O3

O7

O2

O6

0

O1

O5

I4

O0

O4

I3

I2

I1

I0

  • assuming 5-bit input

  • Recall: want

    • ZE8(00100) -> 00000100

    • ZE8(11100) -> 00011100

  • In bits i4,i3,i2,i1,i0; out bits o7…o0


Ece2432

ECE243

CPU: Single Cycle Implementation


Single cycle datapath
SINGLE CYCLE DATAPATH

Inst1

Inst2

1 cyc

  • each instruction executes entirely

    • in one cycle of the cpu clock

  • registers are triggered by the falling edge

    • new values begin propagating through datapath

    • some values may be temporarily incorrect

  • the clock period is large enough to ensure:

    • that all values correct before next falling edge


Fetch
FETCH

8

8

  • needed by every instruction

    • i.e., every instruction must be fetched

addr

PC

INST

MEM

inst

PCwrite?


Pc pc 1
PC = PC + 1

8

8

PC

INST

MEM

addr

inst

PCwrite?


Branches bz imm4
BRANCHES: BZ IMM4

8

8

8

7 6 5 4 3 2 1 0

IMM4

opcode

  • (if branch is taken does: PC = PC + IMM4 + 1)

PC

INST

MEM

addr

inst

PCwrite?

+

1


Add add r1 r2
ADD add R1 R2

0

1

1

+

+

8

8

8

4

8

i7 i6 i5 i4 i3 i2 i1 i0

R2

0 1 0 0

R1

SE8

Inst:

  • does r1 = r1 + r2

  • same datapath for sub and nand

PC

INST

MEM

addr

inst

PCwrite?

PCsel

IMM4


Shift shift r1 imm3
SHIFT: SHIFT R1 IMM3

i7 i6 i5 i4 i3 i2 i1 i0

0

1

IMM3

R1

0 1 1

N

1

Z

+

+

2

2

8

2

4

8

8

8

SE8

R2

2

REGwrite?

REG

FILE

Rw

Out1

PC

A

L

U

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

PCsel

IMM4

ALUop


Ori ori imm5
ORI: ORI IMM5

0

1

N

1

Z

+

A

L

U

+

2

8

8

8

8

8

4

2

SE8

i7 i6 i5 i4 i3 i2 i1 i0

IMM5

1 1 1

R2

2

  • does: k1 <- k1 bitwise-or IMM5

REGwrite?

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2


Store store r1 r2
Store: Store R1 (R2)

0

1

1

0

N

1

Z

5

3

00

+

+

A

L

U

01

10

11

2

2

8

8

8

4

2

8

8

2

i7 i6 i5 i4 i3 i2 i1 i0

R2

opcode

R1

SE8

R2

2

Inst:

  • does: mem[r2] = r1

R1sel

REGwrite?

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2


Load load r1 r2
Load: Load R1 (R2)

0

1

1

0

N

1

Z

3

5

00

+

A

L

U

+

01

10

11

8

4

8

2

2

8

8

2

2

8

i7 i6 i5 i4 i3 i2 i1 i0

R2

opcode

R1

SE8

R2

2

Inst:

  • does: r1 = mem[r2]

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2


Final datapath
Final Datapath!

0

1

1

0

1

0

N

1

Z

5

3

00

A

L

U

+

+

01

10

11

8

8

8

2

4

2

2

8

8

2

SE8

R2

2

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2


Designing the control unit
DESIGNING THE CONTROL UNIT

opcode

PCsel

CTRL

Z

N

  • CONTROL SIGNALS TO GENERATE:

    • PCsel, PCwrite, REGwrite, MEMread, MEMwrite, R1sel, ALUop, ALU2, RFin


Control signals
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

load R1 (R2)


Control signals1
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

store R1 (R2)


Control signals2
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

add R1 R2


Control signals3
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

sub R1 R2


Control signals4
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

nand R1 R2


Control signals5
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

ori IMM5


Control signals6
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

shift R1 IMM3


Control signals7
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bz IMM4


Control signals8
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bnz IMM4


Control signals9
Control Signals

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

0

R1

Out2

inst

PCwrite?

in

0

1

ZE8

IMM5

PCsel

1

0

ZE8

IMM3

IMM4

ALUop

ALU2

N

1

Z

3

5

00

+

+

A

L

U

01

10

11

2

2

8

2

8

2

8

8

8

4

SE8

R2

2

bpz IMM4






Ece2433

ECE243

CPU: Multicycle Implementation





Key difference 3 temp regs
Key Difference #3: Temp Regs

what benefit are tmp regs / multicycle?


Key difference 3 temp regs1
Key Difference #3: Temp Regs

critical path is long large clock period


Key difference 3 temp regs2
Key Difference #3: Temp Regs

smaller critical pathsshorter clock period


Key difference 3 temp regs3
Key Difference #3: Temp Regs

let’s examine these one at a time


Ir instruction register
IR: Instruction Register

holds inst encoding


Mdr memory data register
MDR: Memory Data Register

holds the value returned from Memory


R1 and r2
R1 and R2

hold values from the register file


Aluout
ALUout

holds the result calculcated by the ALU



All insts cycle1 fetch and increment pc
All Insts Cycle1: Fetch and Increment PC

IR ← mem[PC]; PC ← PC + 1;

increment PC

fetch next inst into the IR


All insts cycle2 decoding inst reading reg file
All Insts Cycle2: Decoding Inst & Reading Reg File

R1 ← Kx; R2 ← Ky

Note: not all insts need R1 and R2



Add sub nand cycle4 write to reg file
Add, Sub, Nand Cycle4: Write to Reg FIle

Kx ← ALUout


Shift cycle3 calculate
Shift Cycle3: Calculate

ALUout ← R1 op IMM3


Shift cycle4 write to reg file
Shift Cycle4: Write to Reg FIle

Kx ← ALUout


Ori cycle3 read k1 from reg file
ORI Cycle3: Read K1 from Reg File

R1 ← k1


Ori cycle4 calculate
ORI Cycle4: Calculate

ALUout ← R1 op IMM5


Ori cycle5 write to reg file
ORI Cycle5: Write to Reg FIle

ky ← ALUout


Load cycle3 addr to mem value into mdr
Load Cycle3: addr to Mem, value into MDR

MDR ← mem[R2]


Load cycle4 write value into reg file
Load Cycle4: write value into reg file

ky ← MDR


Store cycle3 addr to mem value to mem
Store Cycle3:addr to Mem, value to Mem

mem[R2] ← R1


Branches cycle3
Branches Cycle3

PC ← PC + IMM4


Summary
Summary

Example: total time to execute one of each instruction:

Single cycle: 1*4 + 1*4+1*1 = 9 cycles; 9 cycles / 1MHz = 9us

Multicycle: 3*4 + 4*4 + 1*5 = 33 cycles; 33 cycles / 4MHz = 8.25us



Control an fsm
Control: An FSM

  • need a state transition diagram

  • how many states are there?

  • how many bits to represent state?



Multicycle control hardware
Multicycle Control Hardware

Z

N

Pcwrite

Pcsel

ALUop

IR

Ctrl logic

IR:3..0

Next_state

Current_state

State Register

(4 bits)


Ece2434

ECE243

CPU: Adding a New Instruction


Example question adding a new instruction
EXAMPLE QUESTION:ADDING A NEW INSTRUCTION

  • Implement a post-increment load:

  • load r1, (r2)+

    Does: RF[r1] = MEM[RF[r2]]

    RF[r2] = RF[r2] + 1

    r2 is permanently changed to be r2+1


Implementing rf r1 mem rf r2 rf r2 rf r2 1
Implementing: RF[r1] = MEM[RF[r2]]; RF[r2] = RF[r2] + 1

Recall: load r1, (r2)

IR= mem[PC] , PC = PC + 1

R1 = RF[r1], R2 = RF[r2]

MDR = mem[R2]

RF[r1] = MDR


Modifying the datapath
Modifying the Datapath

RF[r2] = RF[r2] + 1


Ece2435

ECE243

CPU: Pipelining


A fast food sandwich shop
A Fast-Food Sandwich Shop

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change


With one cook
With One Cook

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change

customer1

customer1

customer1

customer1

customer1

  • one customer is serviced at a time


Like the single cycle cpu
Like the single-cycle CPU

1

0

0

0

1

1

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

00

+

A

L

U

+

Out2

inst

01

10

PCwrite?

in

11

2

8

8

8

2

2

2

8

4

8

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2

  • one instruction flows through at a time

add k1, k2


With two cooks
With Two Cooks?

cook

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change


Pipelining
Pipelining

  • Like an assembly line

  • Doesn’t change the interface or result

    • improves performance


Pipelining a cpu rough idea
Pipelining a CPU (rough idea)

1

0

1

0

1

0

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

00

A

L

U

+

+

Out2

inst

01

10

PCwrite?

in

11

8

4

8

8

2

8

8

2

2

2

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2


Pipelining details
Pipelining Details:

0

1

1

0

1

0

N

1

Z

5

3

00

A

L

U

+

+

01

10

11

8

8

8

2

4

2

2

8

8

2

SE8

R2

2

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

Din

1

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

Out2

inst

PCwrite?

in

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2


With three cooks
With Three Cooks?

cook

cook

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change


Pipelining a cpu rough idea1
Pipelining a CPU (rough idea)

1

0

1

0

1

0

MEMwrite

MEMread

addr

Data

MEM

R1sel

REGwrite?

RFin

N

1

Z

Din

1

5

3

REG

FILE

Rw

Out1

PC

INST

MEM

addr

R1

00

A

L

U

+

+

Out2

inst

01

10

PCwrite?

in

11

8

4

8

8

2

8

8

2

2

2

ZE8

IMM5

PCsel

ZE8

IMM3

IMM4

ALUop

ALU2

SE8

R2

2


Visualizing pipelining
Visualizing Pipelining

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)


Visualizing pipelining again
Visualizing Pipelining (again)

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)


Fast food hazards
Fast Food Hazards

cook

cook

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change

customer3

customer2

customer1

What if: c1 and c2 are friends, c2 has no money, and

c2 needs to know how much change c1 will get before

ordering (to ensure c2 can afford his order)?


Fast food hazards1
Fast Food Hazards

cook

cook

cook

take

order

select

bun

add

ingredients

wrap and

bag

cash and

change

customer2

customer1


Cpu hazards
CPU Hazards

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

  • called a data hazard

  • must be observed to ensure correct execution

  • there are two solutions to data hazards


Solution1 stalling
Solution1: Stalling

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)


How to insert bubbles
How to insert bubbles

  • option1: hardware stalls the pipeline

    • need extra logic to do so

    • happens ‘automatically’ for any code

  • option2: compiler inserts “no-ops”

    • a no-op is an instruction that does nothing

    • ex: add r0,r0,r0 (NIOS)

    • compiler must do it right or wrong results!

  • example: inserting a bubble with a no-op:

    add k1, k2

    noop

    add k3, k1


Solution2 forwarding lines
Solution2: Forwarding Lines

Fetch

(inst mem)

Decode

(reg file)

Execute

(ALU and

data mem)

  • add “forwarding” logic

    • to pass values directly between stages


Control hazards
Control Hazards

  • cpu predicts each branch is not taken

  • Better: predict taken

    • why?---loops are common, usually taken

  • More advanced: remember what each branch did last time

  • “branch predictor”:

    • a table that remembers what each branch did the last time

    • uses this to make a prediction next time


Some real cpu pipelines
Some Real CPU Pipelines

TC nxt IP

TC fetch

Drv

Alloc

Rename

Que

Sch

Sch

Sch

Disp

Disp

RF

RF

Ex

Flgs

BrCk

Drv

21264 Pipeline (Alpha)

Microprocessor Report 10/28/96

Pentium IV’s Pipeline:


Ece2436

ECE243

CPU: Alternate Architectures


Another multicycle cpu
ANOTHER MULTICYCLE CPU

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMRead

MEMWrite

Imm3,4,5

IR

1

addr

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z


Some control signals
SOME CONTROL SIGNALS

  • PCout:

    • write PC value to bus

  • PCin:

    • read bus value into PC

  • MDRinBus:

    • read value from bus into MDR

  • MDRinMem:

    • write value from Dout of MEM into MDR

  • MDRoutBus:

    • write value from MDR onto bus


Ex ctrl add k1 k2 k1 k1 k2
Ex: Ctrl: Add k1, k2 # k1 = k1 + k2

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMRead

MEMWrite

Imm3,4,5

IR

1

addr

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z


Ex ctrl add k1 k2 k1 k1 k21
Ex: Ctrl: Add k1, k2 # k1 = k1 + k2

Internal bus

Control

Signals to

All components

CONTROL

PC

MEMRead

MEMWrite

Imm3,4,5

IR

1

addr

MEM

Y

MAR

111 … 000

Dout

MDR

Din

Select

ALUop

ALU

Regs k0..k3

Z


Characterization of isas
CHARACTERIZATION OF ISAs

  • attribute #1:

    • number of explicit operands

  • Attribute #2:

    • are registers general purpose?

  • Attribute #3:

    • Can an operand be a memory location?

  • Attribute #4:

    • RISC vs CISC

  • Attribute #5:

    • Relation between instructions and data


Att1 num of explicit operands
att1: num of explicit operands

  • focus on calculation instructions (add,sub…)

  • running example: A = B + C (C-code)

    • assume A, B, C are memory locations

  • 0 operands:

    • eg., stack based (like first calculator CPUs)

    • push and pop operations, refer to top of stack


Att1 num of explicit operands1
att1: num of explicit operands

  • 1 operand:

    • eg., accumulator based;

    • accumulator is a reg inside cpu

    • instructions use accum as destination.


Att1 num of explicit operands2
att1: num of explicit operands

  • 2 operands

    • eg: 68k, ia32


Att1 num of explicit operands3
att1: num of explicit operands

  • 3-operand

    • eg: MIPS, SPARC, POWERpc

  • How many operands is NIOS?


Att2 are regs general purpose
Att2: are regs general purpose?

  • if yes:

    • you can use any register for any purpose

    • special registers are by convention only

  • if no:

    • some registers have hardwired purposes

    • ex: in 68k, A7 is hardwired to be stack pointer

    • used implicitly for jsr, rts, link instructions

  • Are NIOS registers general purpose?


Att3 operand mem location
Att3: operand = mem location?

  • with respect to calculation insts (add, sub)

  • if yes:

    • one operand can be in memory, the other in a register

    • maybe: can can also write result to memory

  • if no:

    • called a load/store architecture

    • only load/store insts can get/put memory values to/from regs

  • Can a NIOS operand be a mem location?


Att4 risc vs cisc
Att4: RISC vs CISC

  • Are there instructions with many steps?

    • a vague and debatable question

  • CISC: complex instruction set computer

    • Many, complex instructions

    • can be hard to pipeline!

    • ex: 68k, x86, PowerPC?

  • RISC: reduced instruction set computer

    • Fewer, simple instructions

    • easy to pipeline

    • ex: MIPS, alpha, Powerpc?

  • Which is NIOS?

  • Quandry: x86 is a CISC

    • but pentiumIV has a 20-stage pipeline!

    • How’d they do it?


Att5 relation bet insts data
Att5: Relation bet. insts & data

  • SISD: single instruction, single data

    • everyting we have seen so far

    • an inst only writes one reg/memory location

  • SIMD: single instruction, multiple data

    • one instruction tells CPU to operate on an array of regs or memory locations

    • ex: multimedia extensions: MMX, SSE, 3Dnow (intel); altivec (powerpc)

    • ex: IBM/Sony/toshiba Cell processor (vector processor)

  • MIMD: multiple instruction, multiple data

    • ex: Cluster of workstations, SMP servers, multicores, hyperthreading

  • Which is NIOS?


ad