slide1
Download
Skip this Video
Download Presentation
Instructor: Erol Sahin

Loading in 2 Seconds...

play fullscreen
1 / 87

Instructor: Erol Sahin - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Y86 Instruction Set Architecture – SEQ processor CENG331: Introduction to Computer Systems 8 th Lecture. Instructor: Erol Sahin. Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. Application Program. Compiler.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Instructor: Erol Sahin' - mira


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Y86 Instruction Set Architecture – SEQ processorCENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared
  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
instruction set architecture

Application

Program

Compiler

OS

ISA

CPU

Design

Circuit

Design

Chip

Layout

Instruction Set Architecture
  • Assembly Language View
    • Processor state
      • Registers, memory, …
    • Instructions
      • addl, movl, leal, …
      • How instructions are encoded as bytes
  • Layer of Abstraction
    • Above: how to program machine
      • Processor executes instructions in a sequence
    • Below: what needs to be built
      • Use variety of tricks to make it run fast
      • E.g., execute multiple instructions simultaneously
y86 processor state

OF

ZF

SF

Y86 Processor State

Program registers

Condition codes

Memory

  • Program Registers
    • Same 8 as with IA32. Each 32 bits
  • Condition Codes
    • Single-bit flags set by arithmetic or logical instructions
      • OF: Overflow ZF: Zero SF:Negative
  • Program Counter
    • Indicates address of instruction
  • Memory
    • Byte-addressable storage array
    • Words stored in little-endian byte order

%eax

%esi

%ecx

%edi

PC

%edx

%esp

%ebx

%ebp

y86 instructions
Y86 Instructions
  • Format
    • 1--6 bytes of information read from memory
      • Can determine instruction length from first byte
      • Not as many instruction types, and simpler encoding than with IA32
    • Each accesses and modifies some part(s) of the program state
encoding registers

%eax

0

%esi

6

%ecx

1

%edi

7

%edx

2

%esp

4

%ebx

3

%ebp

5

Encoding Registers
  • Each register has 4-bit ID
    • Same encoding as in IA32
  • Register ID 8 indicates “no register”
    • Will use this in our hardware design in multiple places
instruction example

Generic Form

Encoded Representation

%eax

0

%esi

6

%ecx

1

%edi

7

addl rA, rB

%edx

2

%esp

4

%ebx

3

%ebp

5

6

0

rA

rB

Instruction Example

Addition Instruction

  • Add value in register rA to that in register rB
    • Store result in register rB
    • Note that Y86 only allows addition to be applied to register data
  • Set condition codes based on result
  • e.g., addl %eax,%esiEncoding:60 06
  • Two-byte encoding
    • First indicates instruction type
    • Second gives source and destination registers
arithmetic and logical operations

Instruction Code

Function Code

addl rA, rB

xorl rA, rB

andl rA, rB

subl rA, rB

6

6

6

6

2

3

1

0

rA

rA

rA

rA

rB

rB

rB

rB

Arithmetic and Logical Operations
  • Refer to generically as “OPl”
  • Encodings differ only by “function code”
    • Low-order 4 bytes in first instruction word
  • Set condition codes as side effect

Add

Subtract (rA from rB)

And

Exclusive-Or

move operations

rA

8

rA

rB

rB

rB

rrmovl rA, rB

4

3

2

5

0

0

0

0

rA

rB

irmovl V, rB

mrmovl D(rB), rA

rmmovl rA, D(rB)

V

D

D

Move Operations
  • Like the IA32 movlinstruction
  • Simpler format for memory addresses
  • Give different names to keep them distinct

Register --> Register

Immediate --> Register

Register --> Memory

Memory --> Register

move instruction examples

%eax

0

%esi

6

%ecx

1

%edi

7

%edx

2

%esp

4

%ebx

3

%ebp

5

Move Instruction Examples

IA32

Y86

Encoding

movl $0xabcd, %edx

irmovl $0xabcd, %edx

30 82 cd ab 00 00

movl %esp, %ebx

rrmovl %esp, %ebx

20 43

movl -12(%ebp),%ecx

mrmovl -12(%ebp),%ecx

50 15 f4 ff ff ff

movl %esi,0x41c(%esp)

rmmovl %esi,0x41c(%esp)

40 64 1c 04 00 00

movl $0xabcd, (%eax)

movl %eax, 12(%eax,%edx)

movl (%ebp,%eax,4),%ecx

jump instructions

Jump When Equal

Jump When Not Equal

Jump Unconditionally

Jump When Greater

Jump When Greater or Equal

Jump When Less or Equal

Jump When Less

jge Dest

jl Dest

je Dest

jmp Dest

jg Dest

jne Dest

jle Dest

Dest

Dest

Dest

Dest

Dest

Dest

Dest

7

7

7

7

7

7

7

0

4

3

2

1

5

6

Jump Instructions
  • Refer to generically as “jXX”
  • Encodings differ only by “function code”
  • Based on values of condition codes
  • Same as IA32 counterparts
  • Encode full destination address
    • Unlike PC-relative addressing seen in IA32
y86 program stack
Y86 Program Stack

Stack “Bottom”

  • Region of memory holding program data
  • Used in Y86 (and IA32) for supporting procedure calls
  • Stack top indicated by %esp
    • Address of top stack element
  • Stack grows toward lower addresses
    • Top element is at highest address in the stack
    • When pushing, must first decrement stack pointer
    • When popping, increment stack pointer

Increasing

Addresses

%esp

Stack “Top”

stack operations

pushl rA

popl rA

a

rA

b

rA

0

8

0

8

Stack Operations
  • Decrement %espby 4
  • Store word from rA to memory at %esp
  • Like IA32
  • Read word from memory at %esp
  • Save in rA
  • Increment %espby 4
  • Like IA32
subroutine call and return

call Dest

Dest

ret

8

9

0

0

Subroutine Call and Return
  • Push address of next instruction onto stack
  • Start executing instructions at Dest
  • Like IA32
  • Pop value from stack
  • Use as address for next instruction
  • Like IA32
miscellaneous instructions

nop

halt

0

1

0

0

Miscellaneous Instructions
  • Don’t do anything
  • Stop executing instructions
  • IA32 has comparable instruction, but can’t execute it in user mode
  • We will use it to stop the simulator
y86 code generation example
Y86 Code Generation Example

First Try

  • Write typical array code
  • Compile with gcc -O2 -S
  • Problem
    • Hard to do array indexing on Y86
      • Since don’t have scaled addressing modes

/* Find number of elements in

null-terminated list */

int len1(int a[])

{

int len;

for (len = 0; a[len]; len++)

;

return len;

}

L18:

incl %eax

cmpl $0,(%edx,%eax,4)

jne L18

y86 code generation example 2
Y86 Code Generation Example #2

Second Try

  • Write with pointer code
  • Compile with gcc -O2 -S
  • Result
    • Don’t need to do indexed addressing

/* Find number of elements in

null-terminated list */

int len2(int a[])

{

int len = 0;

while (*a++)

len++;

return len;

}

L24:

movl (%edx),%eax

incl %ecx

L26:

addl $4,%edx

testl %eax,%eax

jne L24

y86 code generation example 3
Y86 Code Generation Example #3

IA32 Code

  • Setup

Y86 Code

  • Setup

len2:

pushl %ebp # Save %ebp

xorl %ecx,%ecx # len = 0

rrmovl %esp,%ebp # Set frame

mrmovl 8(%ebp),%edx # Get a

mrmovl (%edx),%eax # Get *a

jmp L26 # Goto entry

len2:

pushl %ebp

xorl %ecx,%ecx

movl %esp,%ebp

movl 8(%ebp),%edx

movl (%edx),%eax

jmp L26

y86 code generation example 4
Y86 Code Generation Example #4

IA32 Code

  • Loop + Finish

Y86 Code

  • Loop + Finish

L24:

movl (%edx),%eax

incl %ecx

L26:

addl $4,%edx

testl %eax,%eax

jne L24

movl %ebp,%esp

movl %ecx,%eax

popl %ebp

ret

L24:

mrmovl (%edx),%eax # Get *a

irmovl $1,%esi

addl %esi,%ecx # len++

L26: # Entry:

irmovl $4,%esi

addl %esi,%edx # a++

andl %eax,%eax # *a == 0?

jne L24 # No--Loop

rrmovl %ebp,%esp # Pop

rrmovl %ecx,%eax # Rtn len

popl %ebp

ret

y86 program structure
Y86 Program Structure

irmovl Stack,%esp # Set up stack

rrmovl %esp,%ebp # Set up frame

irmovl List,%edx

pushl %edx # Push argument

call len2 # Call Function

halt # Halt

.align 4

List: # List of elements

.long 5043

.long 6125

.long 7395

.long 0

# Function

len2:

. . .

# Allocate space for stack

.pos 0x100

Stack:

  • Program starts at address 0
  • Must set up stack
    • Make sure don’t overwrite code!
  • Must initialize data
  • Can use symbolic names
assembling y86 program
Assembling Y86 Program
  • Generates “object code” file eg.yo
    • Actually looks like disassembler output
  • unix> yas eg.ys
  • 0x000: 308400010000 | irmovl Stack,%esp # Set up stack
  • 0x006: 2045 | rrmovl %esp,%ebp # Set up frame
  • 0x008: 308218000000 | irmovl List,%edx
  • 0x00e: a028 | pushl %edx # Push argument
  • 0x010: 8028000000 | call len2 # Call Function
  • 0x015: 10 | halt # Halt
  • 0x018: | .align 4
  • 0x018: | List: # List of elements
  • 0x018: b3130000 | .long 5043
  • 0x01c: ed170000 | .long 6125
  • 0x020: e31c0000 | .long 7395
  • 0x024: 00000000 | .long 0
simulating y86 program
Simulating Y86 Program
  • Instruction set simulator
    • Computes effect of each instruction on processor state
    • Prints changes in state from original
  • unix> yis eg.yo
  • Stopped in 41 steps at PC = 0x16. Exception \'HLT\', CC Z=1 S=0 O=0
  • Changes to registers:
  • %eax: 0x00000000 0x00000003
  • %ecx: 0x00000000 0x00000003
  • %edx: 0x00000000 0x00000028
  • %esp: 0x00000000 0x000000fc
  • %ebp: 0x00000000 0x00000100
  • %esi: 0x00000000 0x00000004
  • Changes to memory:
  • 0x00f4: 0x00000000 0x00000100
  • 0x00f8: 0x00000000 0x00000015
  • 0x00fc: 0x00000000 0x00000018
cisc versus risc ceng331 introduction to computer systems 8 th lecture

CISC versus RISC CENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared
  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
cisc instruction sets
CISC Instruction Sets
    • Complex Instruction Set Computer
    • Dominant style through mid-80’s
  • Stack-oriented instruction set
    • Use stack to pass arguments, save program counter (IA32 but not int x86-64)
    • Explicit push and pop instructions
  • Arithmetic instructions can access memory
    • addl %eax, 12(%ebx,%ecx,4)
      • requires memory read and write
      • Complex address calculation
  • Condition codes
    • Set as side effect of arithmetic and logical instructions
  • Philosophy
    • Add instructions to perform “typical” programming tasks
risc instruction sets
RISC Instruction Sets
  • Reduced Instruction Set Computer
  • Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley)

Fewer, simpler instructions

  • Might take more to get given task done
  • Can execute them with small and fast hardware

Register-oriented instruction set

  • Many more (typically 32) registers
  • Use for arguments, return pointer, temporaries

Only load and store instructions can access memory

  • Similar to Y86 mrmovland rmmovl

No Condition codes

  • Test instructions return 0/1 in register
mips instruction examples

Op

Ra

Rb

Rd

00000

Fn

R-R

Op

Op

Op

Ra

Ra

Ra

Rb

Rb

Rb

Offset

Immediate

Offset

R-I

MIPS Instruction Examples

addu $3,$2,$1 # Register add: $3 = $2+$1

addu $3,$2, 3145 # Immediate add: $3 = $2+3145

sll $3,$2,2 # Shift left: $3 = $2 << 2

Branch

beq $3,$2,dest # Branch when $3 = $2

Load/Store

lw $3,16($2) # Load Word: $3 = M[$2+16]

sw $3,16($2) # Store Word: M[$2+16] = $3

cisc vs risc
CISC vs. RISC
  • Original Debate
    • Strong opinions!
    • CISC proponents---easy for compiler, fewer code bytes
    • RISC proponents---better for optimizing compilers, can make run fast with simple chip design
  • Current Status
    • For desktop processors, choice of ISA not a technical issue
      • With enough hardware, can make anything run fast
      • Code compatibility more important
    • For embedded processors, RISC makes sense
      • Smaller, cheaper, less power
summary
Summary
  • Y86 Instruction Set Architecture
    • Similar state and instructions as IA32
    • Simpler encodings
    • Somewhere between CISC and RISC
  • How Important is ISA Design?
    • Less now than before
      • With enough hardware, can make almost anything go fast
    • AMD/Intel moved away from IA32
      • Does not allow enough parallel execution
      • x86-64
        • 64-bit word sizes (overcome address space limitations)
        • Radically different style of instruction set with explicit parallelism
        • Requires sophisticated compilers
logic design and hcl ceng331 introduction to computer systems 8 th lecture

Logic Design and HCLCENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared
  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
computing with logic gates

a && b

Computing with Logic Gates
  • Outputs are Boolean functions of inputs
  • Respond continuously to changes in inputs
    • With some, small delay

Falling Delay

Rising Delay

b

Voltage

a

Time

combinational circuits

Acyclic Network

Primary

Inputs

Primary

Outputs

Combinational Circuits

Acyclic Network of Logic Gates

  • Continously responds to changes on primary inputs
  • Primary outputs become (after some delay) Boolean functions of primary inputs
bit equality

Bit equal

a

eq

b

Bit Equality
    • Generate 1 if a and b are equal
  • Hardware Control Language (HCL)
    • Very simple hardware description language
      • Boolean operations have syntax similar to C logical operations
    • We’ll use it to describe control logic for processors

HCL Expression

bool eq = (a&&b)||(!a&&!b)

word equality

b31

Bit equal

eq31

a31

b30

Bit equal

eq30

a30

Eq

B

=

Eq

b1

Bit equal

eq1

A

a1

b0

Bit equal

eq0

a0

Word Equality

Word-Level Representation

  • 32-bit word size
  • HCL representation
    • Equality operation
    • Generates Boolean value

HCL Representation

bool Eq = (A == B)

bit level multiplexor
Bit-Level Multiplexor
  • Control signal s
  • Data signals a and b
  • Output a when s=1, b when s=0

s

Bit MUX

HCL Expression

bool out = (s&&a)||(!s&&b)

b

out

a

word multiplexor

s

b31

out31

a31

s

b30

out30

MUX

B

Out

a30

A

b0

out0

a0

Word Multiplexor

Word-Level Representation

  • Select input word A or B depending on control signal s
  • HCL representation
    • Case expression
    • Series of test : value pairs
    • Output value for first successful test

HCL Representation

int Out = [

s : A;

1 : B;

];

hcl word level examples

MIN3

C

Min3

B

A

s1

s0

MUX4

D0

D1

Out4

D2

D3

HCL Word-Level Examples

Minimum of 3 Words

  • Find minimum of three input words
  • HCL case expression
  • Final case guarantees match

int Min3 = [

A < B && A < C : A;

B < A && B < C : B;

1 : C;

];

4-Way Multiplexor

  • Select one of 4 inputs based on two control bits
  • HCL case expression
  • Simplify tests by assuming sequential matching

int Out4 = [

!s1&&!s0: D0;

!s1 : D1;

!s0 : D2;

1 : D3;

];

arithmetic logic unit

2

1

0

3

Y

Y

Y

Y

A

A

A

A

X ^ Y

X & Y

X + Y

X - Y

X

X

X

X

B

B

B

B

OF

OF

OF

OF

ZF

ZF

ZF

ZF

CF

CF

CF

CF

A

L

U

A

L

U

A

L

U

A

L

U

Arithmetic Logic Unit
  • Combinational logic
    • Continuously responding to inputs
  • Control signal selects function computed
    • Corresponding to 4 arithmetic/logical operations in Y86
  • Also computes values for condition codes
registers

i7

D

o7

Q+

C

i6

D

o6

Q+

C

i5

D

o5

Q+

C

i4

D

o4

Q+

I

O

C

i3

D

o3

Q+

C

i2

D

o2

Q+

C

i1

Clock

D

o1

Q+

C

i0

D

o0

Q+

C

Clock

Registers

Structure

  • Stores word of data
    • Different from program registers seen in assembly code
  • Collection of edge-triggered latches
  • Loads input on rising edge of clock
register operation

Rising

clock

State = y

y

Output = y

Register Operation
  • Stores data bits
  • For most of time acts as barrier between input and output
  • As clock rises, loads input

State = x

x

Input = y

Output = x

state machine example

Comb. Logic

0

MUX

0

Out

In

1

Load

Clock

Clock

Load

A

L

U

x0

x1

x2

x3

x4

x5

In

x0

x0+x1

x0+x1+x2

x3

x3+x4

x3+x4+x5

Out

State Machine Example
  • Accumulator circuit
  • Load or accumulate on each cycle
random access memory

valA

Register

file

srcA

A

valW

Read ports

W

dstW

Write port

valB

srcB

B

Clock

Random-Access Memory
  • Stores multiple words of memory
    • Address input specifies which word to read or write
  • Register file
    • Holds values of program registers
    • %eax, %esp, etc.
    • Register identifier serves as address
      • ID 8 implies no read or write performed
  • Multiple Ports
    • Can read and/or write multiple words in one cycle
      • Each has separate address and data input/output
register file timing

Register

file

Register

file

y

2

valW

valW

W

W

dstW

dstW

x

valA

Register

file

srcA

A

2

Clock

Clock

valB

srcB

B

x

2

Rising

clock

y

2

Register File Timing

Reading

  • Like combinational logic
  • Output data generated based on input address
    • After some delay

Writing

  • Like register
  • Update only as clock rises

x

2

hardware control language
Hardware Control Language
    • Very simple hardware description language
    • Can only express limited aspects of hardware operation
      • Parts we want to explore and modify
  • Data Types
    • bool: Boolean
      • a, b, c, …
    • int: words
      • A, B, C, …
      • Does not specify word size---bytes, 32-bit words, …
  • Statements
    • bool a = bool-expr;
    • int A = int-expr;
hcl operations
HCL Operations
    • Classify by type of value returned
  • Boolean Expressions
    • Logic Operations
      • a && b, a || b, !a
    • Word Comparisons
      • A == B, A != B, A < B, A <= B, A >= B, A > B
    • Set Membership
      • A in { B, C, D }
        • Same as A == B || A == C || A == D
  • Word Expressions
    • Case expressions
      • [ a : A; b : B; c : C ]
      • Evaluate test expressions a, b, c, … in sequence
      • Return word expression A, B, C, … for first successful test
summary1
Summary

Computation

  • Performed by combinational logic
  • Computes Boolean functions
  • Continuously reacts to input changes

Storage

  • Registers
    • Hold single words
    • Loaded as clock rises
  • Random-access memories
    • Hold multiple words
    • Possible multiple read or write ports
    • Read word when address input changes
    • Write word as clock rises
sequential processor implementation ceng331 introduction to computer systems 8 th lecture

SEQuential processor implementationCENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared
  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
y86 instruction set

Byte

0

1

2

3

4

5

nop

0

0

addl

6

0

halt

1

0

subl

6

1

rrmovl rA, rB

2

0

rA

rB

andl

6

2

irmovl V, rB

3

0

8

rB

V

xorl

6

3

rmmovl rA, D(rB)

4

0

rA

rB

D

jmp

7

0

mrmovl D(rB), rA

5

0

rA

rB

D

jle

7

1

OPl rA, rB

6

fn

rA

rB

jl

7

2

jXX Dest

7

fn

Dest

je

7

3

call Dest

8

0

Dest

jne

7

4

ret

9

0

jge

7

5

pushl rA

A

0

rA

8

jg

7

6

popl rA

B

0

rA

8

Y86 Instruction Set
seq hardware structure

newPC

SEQ Hardware Structure

PC

valE

,

valM

Write back

valM

State

  • Program counter register (PC)
  • Condition code register (CC)
  • Register File
  • Memories
    • Access same memory space
    • Data: for reading/writing program data
    • Instruction: for reading instructions

Instruction Flow

  • Read instruction at address specified by PC
  • Process through stages
  • Update program counter

Data

Data

Memory

memory

memory

Addr

, Data

valE

CC

CC

ALU

ALU

Execute

Bch

aluA

,

aluB

valA

,

valB

srcA

,

srcB

Decode

A

A

B

B

dstA

,

dstB

M

M

Register

Register

Register

Register

file

file

file

file

E

E

icode

,

ifun

valP

rA

,

rB

valC

Instruction

PC

Instruction

PC

memory

increment

Fetch

memory

increment

PC

seq stages

newPC

SEQ Stages

PC

valE

,

valM

Write back

valM

Fetch

  • Read instruction from instruction memory

Decode

  • Read program registers

Execute

  • Compute value or address

Memory

  • Read or write data

Write Back

  • Write program registers

PC

  • Update program counter

Data

Data

Memory

memory

memory

Addr

, Data

valE

CC

CC

ALU

ALU

Execute

Bch

aluA

,

aluB

valA

,

valB

srcA

,

srcB

Decode

A

A

B

B

dstA

,

dstB

M

M

Register

Register

Register

Register

file

file

file

file

E

E

icode

,

ifun

valP

rA

,

rB

valC

Instruction

PC

Instruction

PC

memory

increment

Fetch

memory

increment

PC

instruction decoding

Optional

Optional

D

icode

5

0

rA

rB

ifun

rA

rB

valC

Instruction Decoding

Instruction Format

  • Instruction byte icode:ifun
  • Optional register byte rA:rB
  • Optional constant word valC
executing arith logical operation

OPl rA, rB

6

fn

rA

rB

Executing Arith./Logical Operation

Fetch

  • Read 2 bytes

Decode

  • Read operand registers

Execute

  • Perform operation
  • Set condition codes

Memory

  • Do nothing

Write back

  • Update register

PC Update

  • Increment PC by 2
stage computation arith log ops

Fetch

icode:ifun  M1[PC]

Read instruction byte

rA:rB  M1[PC+1]

Read register byte

valP  PC+2

Compute next PC

Decode

valA  R[rA]

Read operand A

valB  R[rB]

Read operand B

Execute

valE  valB OP valA

Perform ALU operation

Set CC

Set condition code register

Memory

Write

back

R[rB]  valE

Write back result

PC update

PC  valP

Update PC

Stage Computation: Arith/Log. Ops

OPl rA, rB

  • Formulate instruction execution as sequence of simple steps
  • Use same general form for all instructions
executing rmmovl

rA

rB

4

0

rmmovl rA, D(rB)

D

Executing rmmovl

Fetch

  • Read 6 bytes

Decode

  • Read operand registers

Execute

  • Compute effective address

Memory

  • Write to memory

Write back

  • Do nothing

PC Update

  • Increment PC by 6
stage computation rmmovl

Fetch

icode:ifun  M1[PC]

Read instruction byte

rA:rB  M1[PC+1]

Read register byte

valC  M4[PC+2]

Read displacement D

valP  PC+6

Compute next PC

Decode

valA  R[rA]

Read operand A

valB  R[rB]

Read operand B

Execute

valE  valB + valC

Compute effective address

Memory

M4[valE]  valA

Write value to memory

Write

back

PC update

PC  valP

Update PC

Stage Computation: rmmovl

rmmovl rA, D(rB)

  • Use ALU for address computation
executing popl

popl rA

b

rA

0

8

Executing popl

Fetch

  • Read 2 bytes

Decode

  • Read stack pointer

Execute

  • Increment stack pointer by 4

Memory

  • Read from old stack pointer

Write back

  • Update stack pointer
  • Write result to register

PC Update

  • Increment PC by 2
stage computation popl

Fetch

icode:ifun  M1[PC]

Read instruction byte

rA:rB  M1[PC+1]

Read register byte

valP  PC+2

Compute next PC

Decode

valA  R[%esp]

Read stack pointer

valB  R [%esp]

Read stack pointer

Execute

valE  valB + 4

Increment stack pointer

Memory

valM  M4[valA]

Read from stack

Write

back

R[%esp]  valE

Update stack pointer

R[rA]  valM

Write back result

PC update

PC  valP

Update PC

Stage Computation: popl

popl rA

  • Use ALU to increment stack pointer
  • Must update two registers
    • Popped value
    • New stack pointer
executing jumps

jXX Dest

Dest

Not taken

fall thru:

target:

Taken

7

XX

XX

fn

XX

XX

Executing Jumps

Fetch

  • Read 5 bytes
  • Increment PC by 5

Decode

  • Do nothing

Execute

  • Determine whether to take branch based on jump condition and condition codes

Memory

  • Do nothing

Write back

  • Do nothing

PC Update

  • Set PC to Dest if branch taken or to incremented PC if not branch
stage computation jumps

Fetch

icode:ifun  M1[PC]

Read instruction byte

valC  M4[PC+1]

Read destination address

valP  PC+5

Fall through address

Decode

Execute

Bch  Cond(CC,ifun)

Take branch?

Memory

Write

back

PC update

PC  Bch ? valC : valP

Update PC

Stage Computation: Jumps

jXX Dest

  • Compute both addresses
  • Choose based on setting of condition codes and branch condition
executing call

call Dest

Dest

return:

target:

8

XX

XX

0

XX

XX

Executing call

Fetch

  • Read 5 bytes
  • Increment PC by 5

Decode

  • Read stack pointer

Execute

  • Decrement stack pointer by 4

Memory

  • Write incremented PC to new value of stack pointer

Write back

  • Update stack pointer

PC Update

  • Set PC to Dest
stage computation call

Fetch

icode:ifun  M1[PC]

Read instruction byte

valC  M4[PC+1]

Read destination address

valP  PC+5

Compute return point

Decode

valB  R[%esp]

Read stack pointer

Execute

valE  valB + –4

Decrement stack pointer

Memory

M4[valE]  valP

Write return value on stack

Write

back

R[%esp]  valE

Update stack pointer

PC update

PC  valC

Set PC to destination

Stage Computation: call

call Dest

  • Use ALU to decrement stack pointer
  • Store incremented PC
executing ret

9

XX

0

XX

Executing ret

Fetch

  • Read 1 byte

Decode

  • Read stack pointer

Execute

  • Increment stack pointer by 4

Memory

  • Read return address from old stack pointer

Write back

  • Update stack pointer

PC Update

  • Set PC to return address

ret

return:

stage computation ret

Fetch

icode:ifun  M1[PC]

Read instruction byte

Decode

valA  R[%esp]

Read operand stack pointer

valB  R[%esp]

Read operand stack pointer

Execute

valE  valB + 4

Increment stack pointer

Memory

valM  M4[valA]

Read return address

Write

back

R[%esp]  valE

Update stack pointer

PC update

PC  valM

Set PC to return address

Stage Computation: ret

ret

  • Use ALU to increment stack pointer
  • Read return address from memory
computation steps
Computation Steps

OPl rA, rB

  • All instructions follow same general pattern
  • Differ in what gets computed on each step

Fetch

icode,ifun

icode:ifun  M1[PC]

Read instruction byte

rA,rB

rA:rB  M1[PC+1]

Read register byte

valC

[Read constant word]

valP

valP  PC+2

Compute next PC

Decode

valA, srcA

valA  R[rA]

Read operand A

valB, srcB

valB  R[rB]

Read operand B

Execute

valE

valE  valB OP valA

Perform ALU operation

Cond code

Set CC

Set condition code register

Memory

valM

[Memory read/write]

Write

back

dstE

R[rB]  valE

Write back ALU result

dstM

[Write back memory result]

PC update

PC

PC  valP

Update PC

irmovl computation steps
Irmovl: Computation Steps

irmovl V, rB

  • All instructions follow same general pattern
  • Differ in what gets computed on each step

Fetch

icode,ifun

icode:ifun  M1[PC]

Read instruction byte

rA,rB

rA:rB  M1[PC+1]

Read register byte

valC

valC M4[PC+2]

[Read constant word]

valP

valP  PC+6

Compute next PC

Decode

valA, srcA

valB, srcB

Execute

valE

valE  0 + valC

Perform ALU operation

Cond code

Memory

valM

Write

back

dstE

R[rB]  valE

Write back ALU result

dstM

PC update

PC

PC  valP

Update PC

computation steps1
Computation Steps

call Dest

  • All instructions follow same general pattern
  • Differ in what gets computed on each step

Fetch

icode,ifun

icode:ifun  M1[PC]

Read instruction byte

rA,rB

[Read register byte]

valC

valC  M4[PC+1]

Read constant word

valP

valP  PC+5

Compute next PC

Decode

valA, srcA

[Read operand A]

valB, srcB

valB  R[%esp]

Read operand B

Execute

valE

valE  valB + –4

Perform ALU operation

Cond code

[Set condition code reg.]

Memory

valM

M4[valE]  valP

[Memory read/write]

Write

back

dstE

R[%esp]  valE

[Write back ALU result]

dstM

Write back memory result

PC update

PC

PC  valC

Update PC

computed values
Computed Values
  • Fetch

icode Instruction code

ifun Instruction function

rA Instr. Register A

rB Instr. Register B

valC Instruction constant

valP Incremented PC

  • Decode

srcA Register ID A

srcB Register ID B

dstE Destination Register E

dstM Destination Register M

valA Register value A

valB Register value B

Execute

  • valE ALU result
  • Bch Branch flag

Memory

  • valM Value from memory
seq hardware
SEQ Hardware
  • Key
    • Blue boxes: predesigned hardware blocks
      • E.g., memories, ALU
    • Gray boxes: control logic
      • Describe in HCL
    • White ovals: labels for signals
    • Thick lines: 32-bit word values
    • Thin lines: 4-8 bit values
    • Dotted lines: 1-bit values
fetch logic
Fetch Logic

Predefined Blocks

  • PC: Register containing PC
  • Instruction memory: Read 6 bytes (PC to PC+5)
  • Split: Divide instruction byte into icode and ifun
  • Align: Get fields for rA, rB, and valC
fetch logic1
Fetch Logic

Control Logic

  • Instr. Valid: Is this instruction valid?
  • Need regids: Does this instruction have a register bytes?
  • Need valC: Does this instruction have a constant word?
fetch control logic
Fetch Control Logic

bool need_regids =

icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,

IIRMOVL, IRMMOVL, IMRMOVL };

bool instr_valid = icode in

{ INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,

IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };

decode logic
Decode Logic

Register File

  • Read ports A, B
  • Write ports E, M
  • Addresses are register IDs or 8 (no access)

Control Logic

  • srcA, srcB: read port addresses
  • dstA, dstB: write port addresses
a source

OPl rA, rB

Decode

valA  R[rA]

Read operand A

rmmovl rA, D(rB)

Decode

valA  R[rA]

Read operand A

popl rA

Decode

valA  R[%esp]

Read stack pointer

jXX Dest

Decode

No operand

call Dest

Decode

No operand

ret

Decode

valA  R[%esp]

Read stack pointer

A Source

int srcA = [

icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : rA;

icode in { IPOPL, IRET } : RESP;

1 : RNONE; # Don\'t need register

];

e destination

OPl rA, rB

Write-back

R[rB]  valE

Write back result

rmmovl rA, D(rB)

Write-back

None

popl rA

Write-back

R[%esp]  valE

Update stack pointer

jXX Dest

Write-back

None

call Dest

Write-back

R[%esp]  valE

Update stack pointer

ret

Write-back

R[%esp]  valE

Update stack pointer

E Destination

int dstE = [

icode in { IRRMOVL, IIRMOVL, IOPL} : rB;

icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;

1 : RNONE; # Don\'t need register

];

execute logic
Execute Logic

Units

  • ALU
    • Implements 4 required functions
    • Generates condition code values
  • CC
    • Register with 3 condition code bits
  • bcond
    • Computes branch flag

Control Logic

  • Set CC: Should condition code register be loaded?
  • ALU A: Input A to ALU
  • ALU B: Input B to ALU
  • ALU fun: What function should ALU compute?
alu a input

OPl rA, rB

Execute

valE  valB OP valA

Perform ALU operation

rmmovl rA, D(rB)

Execute

valE  valB + valC

Compute effective address

popl rA

Execute

valE  valB + 4

Increment stack pointer

jXX Dest

Execute

No operation

call Dest

Execute

valE  valB + –4

Decrement stack pointer

ret

Execute

valE  valB + 4

Increment stack pointer

ALU A Input

int aluA = [

icode in { IRRMOVL, IOPL } : valA;

icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;

icode in { ICALL, IPUSHL } : -4;

icode in { IRET, IPOPL } : 4;

# Other instructions don\'t need ALU

];

alu operation

OPl rA, rB

Execute

valE  valB OP valA

Perform ALU operation

rmmovl rA, D(rB)

Execute

valE  valB + valC

Compute effective address

popl rA

Execute

valE  valB + 4

Increment stack pointer

jXX Dest

Execute

No operation

call Dest

Execute

valE  valB + –4

Decrement stack pointer

ret

Execute

valE  valB + 4

Increment stack pointer

ALU Operation

int alufun = [

icode == IOPL : ifun;

1 : ALUADD;

];

memory logic
Memory Logic

Memory

  • Reads or writes memory word

Control Logic

  • Mem. read: should word be read?
  • Mem. write: should word be written?
  • Mem. addr.: Select address
  • Mem. data.: Select data
memory address

OPl rA, rB

Memory

No operation

rmmovl rA, D(rB)

popl rA

jXX Dest

Memory

Memory

Memory

Memory

M4[valE]  valP

valM  M4[valA]

M4[valE]  valA

valM  M4[valA]

Write return value on stack

Write value to memory

Read return address

Read from stack

Memory

No operation

call Dest

ret

Memory Address

int mem_addr = [

icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;

icode in { IPOPL, IRET } : valA;

# Other instructions don\'t need address

];

memory read

OPl rA, rB

Memory

No operation

rmmovl rA, D(rB)

popl rA

jXX Dest

Memory

Memory

Memory

Memory

M4[valE]  valP

valM  M4[valA]

M4[valE]  valA

valM  M4[valA]

Write return value on stack

Write value to memory

Read return address

Read from stack

Memory

No operation

call Dest

ret

Memory Read

bool mem_read = icode in { IMRMOVL, IPOPL, IRET };

pc update logic
PC Update Logic

New PC

  • Select next value of PC
pc update

OPl rA, rB

rmmovl rA, D(rB)

popl rA

jXX Dest

call Dest

PC update

PC update

PC update

PC update

PC update

PC update

PC  valC

PC  valP

PC  Bch ? valC : valP

PC  valM

PC  valP

PC  valP

Set PC to return address

Update PC

Update PC

Set PC to destination

Update PC

Update PC

ret

PCUpdate

int new_pc = [

icode == ICALL : valC;

icode == IJXX && Bch : valC;

icode == IRET : valM;

1 : valP;

];

seq operation
SEQ Operation

State

  • PC register
  • Cond. Code register
  • Data memory
  • Register file

All updated as clock rises

Combinational Logic

  • ALU
  • Control logic
  • Memory reads
    • Instruction memory
    • Register file
    • Data memory
seq operation 2
SEQ Operation #2
  • state set according to second irmovl instruction
  • combinational logic starting to react to state changes
seq operation 3
SEQ Operation #3
  • state set according to second irmovl instruction
  • combinational logic generates results for addl instruction
seq operation 4
SEQ Operation #4
  • state set according to addl instruction
  • combinational logic starting to react to state changes
seq operation 5
SEQ Operation #5
  • state set according to addl instruction
  • combinational logic generates results for je instruction
seq summary
SEQ Summary

Implementation

  • Express every instruction as series of simple steps
  • Follow same general flow for each instruction type
  • Assemble registers, memories, predesigned combinational blocks
  • Connect with control logic

Limitations

  • Too slow to be practical
  • In one cycle, must propagate through instruction memory, register file, ALU, and data memory
  • Would need to run clock very slowly
  • Hardware units only active for fraction of clock cycle
ad