Y86 Instruction
Download
1 / 87

Instructor: Erol Sahin - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

Y86 Instruction Set Architecture – SEQ processor CENG331: Introduction to Computer Systems 8 th Lecture. Instructor: Erol Sahin. Acknowledgement: Most of the slides are adapted from the ones prepared by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ. Application Program. Compiler.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Instructor: Erol Sahin' - mira


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Y86 Instruction Set Architecture – SEQ processorCENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared

  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.


Instruction set architecture

Application

Program

Compiler

OS

ISA

CPU

Design

Circuit

Design

Chip

Layout

Instruction Set Architecture

  • Assembly Language View

    • Processor state

      • Registers, memory, …

    • Instructions

      • addl, movl, leal, …

      • How instructions are encoded as bytes

  • Layer of Abstraction

    • Above: how to program machine

      • Processor executes instructions in a sequence

    • Below: what needs to be built

      • Use variety of tricks to make it run fast

      • E.g., execute multiple instructions simultaneously


Y86 processor state

OF

ZF

SF

Y86 Processor State

Program registers

Condition codes

Memory

  • Program Registers

    • Same 8 as with IA32. Each 32 bits

  • Condition Codes

    • Single-bit flags set by arithmetic or logical instructions

      • OF: Overflow ZF: Zero SF:Negative

  • Program Counter

    • Indicates address of instruction

  • Memory

    • Byte-addressable storage array

    • Words stored in little-endian byte order

%eax

%esi

%ecx

%edi

PC

%edx

%esp

%ebx

%ebp


Y86 instructions
Y86 Instructions

  • Format

    • 1--6 bytes of information read from memory

      • Can determine instruction length from first byte

      • Not as many instruction types, and simpler encoding than with IA32

    • Each accesses and modifies some part(s) of the program state


Encoding registers

%eax

0

%esi

6

%ecx

1

%edi

7

%edx

2

%esp

4

%ebx

3

%ebp

5

Encoding Registers

  • Each register has 4-bit ID

    • Same encoding as in IA32

  • Register ID 8 indicates “no register”

    • Will use this in our hardware design in multiple places


Instruction example

Generic Form

Encoded Representation

%eax

0

%esi

6

%ecx

1

%edi

7

addl rA, rB

%edx

2

%esp

4

%ebx

3

%ebp

5

6

0

rA

rB

Instruction Example

Addition Instruction

  • Add value in register rA to that in register rB

    • Store result in register rB

    • Note that Y86 only allows addition to be applied to register data

  • Set condition codes based on result

  • e.g., addl %eax,%esiEncoding:60 06

  • Two-byte encoding

    • First indicates instruction type

    • Second gives source and destination registers


Arithmetic and logical operations

Instruction Code

Function Code

addl rA, rB

xorl rA, rB

andl rA, rB

subl rA, rB

6

6

6

6

2

3

1

0

rA

rA

rA

rA

rB

rB

rB

rB

Arithmetic and Logical Operations

  • Refer to generically as “OPl”

  • Encodings differ only by “function code”

    • Low-order 4 bytes in first instruction word

  • Set condition codes as side effect

Add

Subtract (rA from rB)

And

Exclusive-Or


Move operations

rA

8

rA

rB

rB

rB

rrmovl rA, rB

4

3

2

5

0

0

0

0

rA

rB

irmovl V, rB

mrmovl D(rB), rA

rmmovl rA, D(rB)

V

D

D

Move Operations

  • Like the IA32 movlinstruction

  • Simpler format for memory addresses

  • Give different names to keep them distinct

Register --> Register

Immediate --> Register

Register --> Memory

Memory --> Register


Move instruction examples

%eax

0

%esi

6

%ecx

1

%edi

7

%edx

2

%esp

4

%ebx

3

%ebp

5

Move Instruction Examples

IA32

Y86

Encoding

movl $0xabcd, %edx

irmovl $0xabcd, %edx

30 82 cd ab 00 00

movl %esp, %ebx

rrmovl %esp, %ebx

20 43

movl -12(%ebp),%ecx

mrmovl -12(%ebp),%ecx

50 15 f4 ff ff ff

movl %esi,0x41c(%esp)

rmmovl %esi,0x41c(%esp)

40 64 1c 04 00 00

movl $0xabcd, (%eax)

movl %eax, 12(%eax,%edx)

movl (%ebp,%eax,4),%ecx


Jump instructions

Jump When Equal

Jump When Not Equal

Jump Unconditionally

Jump When Greater

Jump When Greater or Equal

Jump When Less or Equal

Jump When Less

jge Dest

jl Dest

je Dest

jmp Dest

jg Dest

jne Dest

jle Dest

Dest

Dest

Dest

Dest

Dest

Dest

Dest

7

7

7

7

7

7

7

0

4

3

2

1

5

6

Jump Instructions

  • Refer to generically as “jXX”

  • Encodings differ only by “function code”

  • Based on values of condition codes

  • Same as IA32 counterparts

  • Encode full destination address

    • Unlike PC-relative addressing seen in IA32


Y86 program stack
Y86 Program Stack

Stack “Bottom”

  • Region of memory holding program data

  • Used in Y86 (and IA32) for supporting procedure calls

  • Stack top indicated by %esp

    • Address of top stack element

  • Stack grows toward lower addresses

    • Top element is at highest address in the stack

    • When pushing, must first decrement stack pointer

    • When popping, increment stack pointer

Increasing

Addresses

%esp

Stack “Top”


Stack operations

pushl rA

popl rA

a

rA

b

rA

0

8

0

8

Stack Operations

  • Decrement %espby 4

  • Store word from rA to memory at %esp

  • Like IA32

  • Read word from memory at %esp

  • Save in rA

  • Increment %espby 4

  • Like IA32


Subroutine call and return

call Dest

Dest

ret

8

9

0

0

Subroutine Call and Return

  • Push address of next instruction onto stack

  • Start executing instructions at Dest

  • Like IA32

  • Pop value from stack

  • Use as address for next instruction

  • Like IA32


Miscellaneous instructions

nop

halt

0

1

0

0

Miscellaneous Instructions

  • Don’t do anything

  • Stop executing instructions

  • IA32 has comparable instruction, but can’t execute it in user mode

  • We will use it to stop the simulator


Y86 code generation example
Y86 Code Generation Example

First Try

  • Write typical array code

  • Compile with gcc -O2 -S

  • Problem

    • Hard to do array indexing on Y86

      • Since don’t have scaled addressing modes

/* Find number of elements in

null-terminated list */

int len1(int a[])

{

int len;

for (len = 0; a[len]; len++)

;

return len;

}

L18:

incl %eax

cmpl $0,(%edx,%eax,4)

jne L18


Y86 code generation example 2
Y86 Code Generation Example #2

Second Try

  • Write with pointer code

  • Compile with gcc -O2 -S

  • Result

    • Don’t need to do indexed addressing

/* Find number of elements in

null-terminated list */

int len2(int a[])

{

int len = 0;

while (*a++)

len++;

return len;

}

L24:

movl (%edx),%eax

incl %ecx

L26:

addl $4,%edx

testl %eax,%eax

jne L24


Y86 code generation example 3
Y86 Code Generation Example #3

IA32 Code

  • Setup

Y86 Code

  • Setup

len2:

pushl %ebp # Save %ebp

xorl %ecx,%ecx # len = 0

rrmovl %esp,%ebp # Set frame

mrmovl 8(%ebp),%edx # Get a

mrmovl (%edx),%eax # Get *a

jmp L26 # Goto entry

len2:

pushl %ebp

xorl %ecx,%ecx

movl %esp,%ebp

movl 8(%ebp),%edx

movl (%edx),%eax

jmp L26


Y86 code generation example 4
Y86 Code Generation Example #4

IA32 Code

  • Loop + Finish

Y86 Code

  • Loop + Finish

L24:

movl (%edx),%eax

incl %ecx

L26:

addl $4,%edx

testl %eax,%eax

jne L24

movl %ebp,%esp

movl %ecx,%eax

popl %ebp

ret

L24:

mrmovl (%edx),%eax # Get *a

irmovl $1,%esi

addl %esi,%ecx # len++

L26: # Entry:

irmovl $4,%esi

addl %esi,%edx # a++

andl %eax,%eax # *a == 0?

jne L24 # No--Loop

rrmovl %ebp,%esp # Pop

rrmovl %ecx,%eax # Rtn len

popl %ebp

ret


Y86 program structure
Y86 Program Structure

irmovl Stack,%esp # Set up stack

rrmovl %esp,%ebp # Set up frame

irmovl List,%edx

pushl %edx # Push argument

call len2 # Call Function

halt # Halt

.align 4

List: # List of elements

.long 5043

.long 6125

.long 7395

.long 0

# Function

len2:

. . .

# Allocate space for stack

.pos 0x100

Stack:

  • Program starts at address 0

  • Must set up stack

    • Make sure don’t overwrite code!

  • Must initialize data

  • Can use symbolic names


Assembling y86 program
Assembling Y86 Program

  • Generates “object code” file eg.yo

    • Actually looks like disassembler output

  • unix> yas eg.ys

  • 0x000: 308400010000 | irmovl Stack,%esp # Set up stack

  • 0x006: 2045 | rrmovl %esp,%ebp # Set up frame

  • 0x008: 308218000000 | irmovl List,%edx

  • 0x00e: a028 | pushl %edx # Push argument

  • 0x010: 8028000000 | call len2 # Call Function

  • 0x015: 10 | halt # Halt

  • 0x018: | .align 4

  • 0x018: | List: # List of elements

  • 0x018: b3130000 | .long 5043

  • 0x01c: ed170000 | .long 6125

  • 0x020: e31c0000 | .long 7395

  • 0x024: 00000000 | .long 0


Simulating y86 program
Simulating Y86 Program

  • Instruction set simulator

    • Computes effect of each instruction on processor state

    • Prints changes in state from original

  • unix> yis eg.yo

  • Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0

  • Changes to registers:

  • %eax: 0x00000000 0x00000003

  • %ecx: 0x00000000 0x00000003

  • %edx: 0x00000000 0x00000028

  • %esp: 0x00000000 0x000000fc

  • %ebp: 0x00000000 0x00000100

  • %esi: 0x00000000 0x00000004

  • Changes to memory:

  • 0x00f4: 0x00000000 0x00000100

  • 0x00f8: 0x00000000 0x00000015

  • 0x00fc: 0x00000000 0x00000018


Cisc versus risc ceng331 introduction to computer systems 8 th lecture

CISC versus RISC CENG331: Introduction to Computer Systems8thLecture

Instructor:

ErolSahin

  • Acknowledgement: Most of the slides are adapted from the ones prepared

  • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.


Cisc instruction sets
CISC Instruction Sets

  • Complex Instruction Set Computer

  • Dominant style through mid-80’s

  • Stack-oriented instruction set

    • Use stack to pass arguments, save program counter (IA32 but not int x86-64)

    • Explicit push and pop instructions

  • Arithmetic instructions can access memory

    • addl %eax, 12(%ebx,%ecx,4)

      • requires memory read and write

      • Complex address calculation

  • Condition codes

    • Set as side effect of arithmetic and logical instructions

  • Philosophy

    • Add instructions to perform “typical” programming tasks


  • Risc instruction sets
    RISC Instruction Sets

    • Reduced Instruction Set Computer

    • Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley)

      Fewer, simpler instructions

    • Might take more to get given task done

    • Can execute them with small and fast hardware

      Register-oriented instruction set

    • Many more (typically 32) registers

    • Use for arguments, return pointer, temporaries

      Only load and store instructions can access memory

    • Similar to Y86 mrmovland rmmovl

      No Condition codes

    • Test instructions return 0/1 in register



    Mips instruction examples

    Op

    Ra

    Rb

    Rd

    00000

    Fn

    R-R

    Op

    Op

    Op

    Ra

    Ra

    Ra

    Rb

    Rb

    Rb

    Offset

    Immediate

    Offset

    R-I

    MIPS Instruction Examples

    addu $3,$2,$1 # Register add: $3 = $2+$1

    addu $3,$2, 3145 # Immediate add: $3 = $2+3145

    sll $3,$2,2 # Shift left: $3 = $2 << 2

    Branch

    beq $3,$2,dest # Branch when $3 = $2

    Load/Store

    lw $3,16($2) # Load Word: $3 = M[$2+16]

    sw $3,16($2) # Store Word: M[$2+16] = $3


    Cisc vs risc
    CISC vs. RISC

    • Original Debate

      • Strong opinions!

      • CISC proponents---easy for compiler, fewer code bytes

      • RISC proponents---better for optimizing compilers, can make run fast with simple chip design

    • Current Status

      • For desktop processors, choice of ISA not a technical issue

        • With enough hardware, can make anything run fast

        • Code compatibility more important

      • For embedded processors, RISC makes sense

        • Smaller, cheaper, less power


    Summary
    Summary

    • Y86 Instruction Set Architecture

      • Similar state and instructions as IA32

      • Simpler encodings

      • Somewhere between CISC and RISC

    • How Important is ISA Design?

      • Less now than before

        • With enough hardware, can make almost anything go fast

      • AMD/Intel moved away from IA32

        • Does not allow enough parallel execution

        • x86-64

          • 64-bit word sizes (overcome address space limitations)

          • Radically different style of instruction set with explicit parallelism

          • Requires sophisticated compilers


    Logic design and hcl ceng331 introduction to computer systems 8 th lecture

    Logic Design and HCLCENG331: Introduction to Computer Systems8thLecture

    Instructor:

    ErolSahin

    • Acknowledgement: Most of the slides are adapted from the ones prepared

    • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.


    Computing with logic gates

    a && b

    Computing with Logic Gates

    • Outputs are Boolean functions of inputs

    • Respond continuously to changes in inputs

      • With some, small delay

    Falling Delay

    Rising Delay

    b

    Voltage

    a

    Time


    Combinational circuits

    Acyclic Network

    Primary

    Inputs

    Primary

    Outputs

    Combinational Circuits

    Acyclic Network of Logic Gates

    • Continously responds to changes on primary inputs

    • Primary outputs become (after some delay) Boolean functions of primary inputs


    Bit equality

    Bit equal

    a

    eq

    b

    Bit Equality

    • Generate 1 if a and b are equal

  • Hardware Control Language (HCL)

    • Very simple hardware description language

      • Boolean operations have syntax similar to C logical operations

    • We’ll use it to describe control logic for processors

  • HCL Expression

    bool eq = (a&&b)||(!a&&!b)


    Word equality

    b31

    Bit equal

    eq31

    a31

    b30

    Bit equal

    eq30

    a30

    Eq

    B

    =

    Eq

    b1

    Bit equal

    eq1

    A

    a1

    b0

    Bit equal

    eq0

    a0

    Word Equality

    Word-Level Representation

    • 32-bit word size

    • HCL representation

      • Equality operation

      • Generates Boolean value

    HCL Representation

    bool Eq = (A == B)


    Bit level multiplexor
    Bit-Level Multiplexor

    • Control signal s

    • Data signals a and b

    • Output a when s=1, b when s=0

    s

    Bit MUX

    HCL Expression

    bool out = (s&&a)||(!s&&b)

    b

    out

    a


    Word multiplexor

    s

    b31

    out31

    a31

    s

    b30

    out30

    MUX

    B

    Out

    a30

    A

    b0

    out0

    a0

    Word Multiplexor

    Word-Level Representation

    • Select input word A or B depending on control signal s

    • HCL representation

      • Case expression

      • Series of test : value pairs

      • Output value for first successful test

    HCL Representation

    int Out = [

    s : A;

    1 : B;

    ];


    Hcl word level examples

    MIN3

    C

    Min3

    B

    A

    s1

    s0

    MUX4

    D0

    D1

    Out4

    D2

    D3

    HCL Word-Level Examples

    Minimum of 3 Words

    • Find minimum of three input words

    • HCL case expression

    • Final case guarantees match

    int Min3 = [

    A < B && A < C : A;

    B < A && B < C : B;

    1 : C;

    ];

    4-Way Multiplexor

    • Select one of 4 inputs based on two control bits

    • HCL case expression

    • Simplify tests by assuming sequential matching

    int Out4 = [

    !s1&&!s0: D0;

    !s1 : D1;

    !s0 : D2;

    1 : D3;

    ];


    Arithmetic logic unit

    2

    1

    0

    3

    Y

    Y

    Y

    Y

    A

    A

    A

    A

    X ^ Y

    X & Y

    X + Y

    X - Y

    X

    X

    X

    X

    B

    B

    B

    B

    OF

    OF

    OF

    OF

    ZF

    ZF

    ZF

    ZF

    CF

    CF

    CF

    CF

    A

    L

    U

    A

    L

    U

    A

    L

    U

    A

    L

    U

    Arithmetic Logic Unit

    • Combinational logic

      • Continuously responding to inputs

    • Control signal selects function computed

      • Corresponding to 4 arithmetic/logical operations in Y86

    • Also computes values for condition codes


    Registers

    i7

    D

    o7

    Q+

    C

    i6

    D

    o6

    Q+

    C

    i5

    D

    o5

    Q+

    C

    i4

    D

    o4

    Q+

    I

    O

    C

    i3

    D

    o3

    Q+

    C

    i2

    D

    o2

    Q+

    C

    i1

    Clock

    D

    o1

    Q+

    C

    i0

    D

    o0

    Q+

    C

    Clock

    Registers

    Structure

    • Stores word of data

      • Different from program registers seen in assembly code

    • Collection of edge-triggered latches

    • Loads input on rising edge of clock


    Register operation

    Rising

    clock

    State = y

    y

    Output = y

    Register Operation

    • Stores data bits

    • For most of time acts as barrier between input and output

    • As clock rises, loads input

    State = x

    x

    Input = y

    Output = x


    State machine example

    Comb. Logic

    0

    MUX

    0

    Out

    In

    1

    Load

    Clock

    Clock

    Load

    A

    L

    U

    x0

    x1

    x2

    x3

    x4

    x5

    In

    x0

    x0+x1

    x0+x1+x2

    x3

    x3+x4

    x3+x4+x5

    Out

    State Machine Example

    • Accumulator circuit

    • Load or accumulate on each cycle


    Random access memory

    valA

    Register

    file

    srcA

    A

    valW

    Read ports

    W

    dstW

    Write port

    valB

    srcB

    B

    Clock

    Random-Access Memory

    • Stores multiple words of memory

      • Address input specifies which word to read or write

    • Register file

      • Holds values of program registers

      • %eax, %esp, etc.

      • Register identifier serves as address

        • ID 8 implies no read or write performed

    • Multiple Ports

      • Can read and/or write multiple words in one cycle

        • Each has separate address and data input/output


    Register file timing

    Register

    file

    Register

    file

    y

    2

    valW

    valW

    W

    W

    dstW

    dstW

    x

    valA

    Register

    file

    srcA

    A

    2

    Clock

    Clock

    valB

    srcB

    B

    x

    2

    Rising

    clock

    y

    2

    Register File Timing

    Reading

    • Like combinational logic

    • Output data generated based on input address

      • After some delay

        Writing

    • Like register

    • Update only as clock rises

    x

    2


    Hardware control language
    Hardware Control Language

    • Very simple hardware description language

    • Can only express limited aspects of hardware operation

      • Parts we want to explore and modify

  • Data Types

    • bool: Boolean

      • a, b, c, …

    • int: words

      • A, B, C, …

      • Does not specify word size---bytes, 32-bit words, …

  • Statements

    • bool a = bool-expr;

    • int A = int-expr;


  • Hcl operations
    HCL Operations

    • Classify by type of value returned

  • Boolean Expressions

    • Logic Operations

      • a && b, a || b, !a

    • Word Comparisons

      • A == B, A != B, A < B, A <= B, A >= B, A > B

    • Set Membership

      • A in { B, C, D }

        • Same as A == B || A == C || A == D

  • Word Expressions

    • Case expressions

      • [ a : A; b : B; c : C ]

      • Evaluate test expressions a, b, c, … in sequence

      • Return word expression A, B, C, … for first successful test


  • Summary1
    Summary

    Computation

    • Performed by combinational logic

    • Computes Boolean functions

    • Continuously reacts to input changes

      Storage

    • Registers

      • Hold single words

      • Loaded as clock rises

    • Random-access memories

      • Hold multiple words

      • Possible multiple read or write ports

      • Read word when address input changes

      • Write word as clock rises


    Sequential processor implementation ceng331 introduction to computer systems 8 th lecture

    SEQuential processor implementationCENG331: Introduction to Computer Systems8thLecture

    Instructor:

    ErolSahin

    • Acknowledgement: Most of the slides are adapted from the ones prepared

    • by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.


    Y86 instruction set

    Byte

    0

    1

    2

    3

    4

    5

    nop

    0

    0

    addl

    6

    0

    halt

    1

    0

    subl

    6

    1

    rrmovl rA, rB

    2

    0

    rA

    rB

    andl

    6

    2

    irmovl V, rB

    3

    0

    8

    rB

    V

    xorl

    6

    3

    rmmovl rA, D(rB)

    4

    0

    rA

    rB

    D

    jmp

    7

    0

    mrmovl D(rB), rA

    5

    0

    rA

    rB

    D

    jle

    7

    1

    OPl rA, rB

    6

    fn

    rA

    rB

    jl

    7

    2

    jXX Dest

    7

    fn

    Dest

    je

    7

    3

    call Dest

    8

    0

    Dest

    jne

    7

    4

    ret

    9

    0

    jge

    7

    5

    pushl rA

    A

    0

    rA

    8

    jg

    7

    6

    popl rA

    B

    0

    rA

    8

    Y86 Instruction Set


    Seq hardware structure

    newPC

    SEQ Hardware Structure

    PC

    valE

    ,

    valM

    Write back

    valM

    State

    • Program counter register (PC)

    • Condition code register (CC)

    • Register File

    • Memories

      • Access same memory space

      • Data: for reading/writing program data

      • Instruction: for reading instructions

        Instruction Flow

    • Read instruction at address specified by PC

    • Process through stages

    • Update program counter

    Data

    Data

    Memory

    memory

    memory

    Addr

    , Data

    valE

    CC

    CC

    ALU

    ALU

    Execute

    Bch

    aluA

    ,

    aluB

    valA

    ,

    valB

    srcA

    ,

    srcB

    Decode

    A

    A

    B

    B

    dstA

    ,

    dstB

    M

    M

    Register

    Register

    Register

    Register

    file

    file

    file

    file

    E

    E

    icode

    ,

    ifun

    valP

    rA

    ,

    rB

    valC

    Instruction

    PC

    Instruction

    PC

    memory

    increment

    Fetch

    memory

    increment

    PC


    Seq stages

    newPC

    SEQ Stages

    PC

    valE

    ,

    valM

    Write back

    valM

    Fetch

    • Read instruction from instruction memory

      Decode

    • Read program registers

      Execute

    • Compute value or address

      Memory

    • Read or write data

      Write Back

    • Write program registers

      PC

    • Update program counter

    Data

    Data

    Memory

    memory

    memory

    Addr

    , Data

    valE

    CC

    CC

    ALU

    ALU

    Execute

    Bch

    aluA

    ,

    aluB

    valA

    ,

    valB

    srcA

    ,

    srcB

    Decode

    A

    A

    B

    B

    dstA

    ,

    dstB

    M

    M

    Register

    Register

    Register

    Register

    file

    file

    file

    file

    E

    E

    icode

    ,

    ifun

    valP

    rA

    ,

    rB

    valC

    Instruction

    PC

    Instruction

    PC

    memory

    increment

    Fetch

    memory

    increment

    PC


    Instruction decoding

    Optional

    Optional

    D

    icode

    5

    0

    rA

    rB

    ifun

    rA

    rB

    valC

    Instruction Decoding

    Instruction Format

    • Instruction byte icode:ifun

    • Optional register byte rA:rB

    • Optional constant word valC


    Executing arith logical operation

    OPl rA, rB

    6

    fn

    rA

    rB

    Executing Arith./Logical Operation

    Fetch

    • Read 2 bytes

      Decode

    • Read operand registers

      Execute

    • Perform operation

    • Set condition codes

    Memory

    • Do nothing

      Write back

    • Update register

      PC Update

    • Increment PC by 2


    Stage computation arith log ops

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    rA:rB  M1[PC+1]

    Read register byte

    valP  PC+2

    Compute next PC

    Decode

    valA  R[rA]

    Read operand A

    valB  R[rB]

    Read operand B

    Execute

    valE  valB OP valA

    Perform ALU operation

    Set CC

    Set condition code register

    Memory

    Write

    back

    R[rB]  valE

    Write back result

    PC update

    PC  valP

    Update PC

    Stage Computation: Arith/Log. Ops

    OPl rA, rB

    • Formulate instruction execution as sequence of simple steps

    • Use same general form for all instructions


    Executing rmmovl

    rA

    rB

    4

    0

    rmmovl rA, D(rB)

    D

    Executing rmmovl

    Fetch

    • Read 6 bytes

      Decode

    • Read operand registers

      Execute

    • Compute effective address

    Memory

    • Write to memory

      Write back

    • Do nothing

      PC Update

    • Increment PC by 6


    Stage computation rmmovl

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    rA:rB  M1[PC+1]

    Read register byte

    valC  M4[PC+2]

    Read displacement D

    valP  PC+6

    Compute next PC

    Decode

    valA  R[rA]

    Read operand A

    valB  R[rB]

    Read operand B

    Execute

    valE  valB + valC

    Compute effective address

    Memory

    M4[valE]  valA

    Write value to memory

    Write

    back

    PC update

    PC  valP

    Update PC

    Stage Computation: rmmovl

    rmmovl rA, D(rB)

    • Use ALU for address computation


    Executing popl

    popl rA

    b

    rA

    0

    8

    Executing popl

    Fetch

    • Read 2 bytes

      Decode

    • Read stack pointer

      Execute

    • Increment stack pointer by 4

    Memory

    • Read from old stack pointer

      Write back

    • Update stack pointer

    • Write result to register

      PC Update

    • Increment PC by 2


    Stage computation popl

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    rA:rB  M1[PC+1]

    Read register byte

    valP  PC+2

    Compute next PC

    Decode

    valA  R[%esp]

    Read stack pointer

    valB  R [%esp]

    Read stack pointer

    Execute

    valE  valB + 4

    Increment stack pointer

    Memory

    valM  M4[valA]

    Read from stack

    Write

    back

    R[%esp]  valE

    Update stack pointer

    R[rA]  valM

    Write back result

    PC update

    PC  valP

    Update PC

    Stage Computation: popl

    popl rA

    • Use ALU to increment stack pointer

    • Must update two registers

      • Popped value

      • New stack pointer


    Executing jumps

    jXX Dest

    Dest

    Not taken

    fall thru:

    target:

    Taken

    7

    XX

    XX

    fn

    XX

    XX

    Executing Jumps

    Fetch

    • Read 5 bytes

    • Increment PC by 5

      Decode

    • Do nothing

      Execute

    • Determine whether to take branch based on jump condition and condition codes

    Memory

    • Do nothing

      Write back

    • Do nothing

      PC Update

    • Set PC to Dest if branch taken or to incremented PC if not branch


    Stage computation jumps

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    valC  M4[PC+1]

    Read destination address

    valP  PC+5

    Fall through address

    Decode

    Execute

    Bch  Cond(CC,ifun)

    Take branch?

    Memory

    Write

    back

    PC update

    PC  Bch ? valC : valP

    Update PC

    Stage Computation: Jumps

    jXX Dest

    • Compute both addresses

    • Choose based on setting of condition codes and branch condition


    Executing call

    call Dest

    Dest

    return:

    target:

    8

    XX

    XX

    0

    XX

    XX

    Executing call

    Fetch

    • Read 5 bytes

    • Increment PC by 5

      Decode

    • Read stack pointer

      Execute

    • Decrement stack pointer by 4

    Memory

    • Write incremented PC to new value of stack pointer

      Write back

    • Update stack pointer

      PC Update

    • Set PC to Dest


    Stage computation call

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    valC  M4[PC+1]

    Read destination address

    valP  PC+5

    Compute return point

    Decode

    valB  R[%esp]

    Read stack pointer

    Execute

    valE  valB + –4

    Decrement stack pointer

    Memory

    M4[valE]  valP

    Write return value on stack

    Write

    back

    R[%esp]  valE

    Update stack pointer

    PC update

    PC  valC

    Set PC to destination

    Stage Computation: call

    call Dest

    • Use ALU to decrement stack pointer

    • Store incremented PC


    Executing ret

    9

    XX

    0

    XX

    Executing ret

    Fetch

    • Read 1 byte

      Decode

    • Read stack pointer

      Execute

    • Increment stack pointer by 4

    Memory

    • Read return address from old stack pointer

      Write back

    • Update stack pointer

      PC Update

    • Set PC to return address

    ret

    return:


    Stage computation ret

    Fetch

    icode:ifun  M1[PC]

    Read instruction byte

    Decode

    valA  R[%esp]

    Read operand stack pointer

    valB  R[%esp]

    Read operand stack pointer

    Execute

    valE  valB + 4

    Increment stack pointer

    Memory

    valM  M4[valA]

    Read return address

    Write

    back

    R[%esp]  valE

    Update stack pointer

    PC update

    PC  valM

    Set PC to return address

    Stage Computation: ret

    ret

    • Use ALU to increment stack pointer

    • Read return address from memory


    Computation steps
    Computation Steps

    OPl rA, rB

    • All instructions follow same general pattern

    • Differ in what gets computed on each step

    Fetch

    icode,ifun

    icode:ifun  M1[PC]

    Read instruction byte

    rA,rB

    rA:rB  M1[PC+1]

    Read register byte

    valC

    [Read constant word]

    valP

    valP  PC+2

    Compute next PC

    Decode

    valA, srcA

    valA  R[rA]

    Read operand A

    valB, srcB

    valB  R[rB]

    Read operand B

    Execute

    valE

    valE  valB OP valA

    Perform ALU operation

    Cond code

    Set CC

    Set condition code register

    Memory

    valM

    [Memory read/write]

    Write

    back

    dstE

    R[rB]  valE

    Write back ALU result

    dstM

    [Write back memory result]

    PC update

    PC

    PC  valP

    Update PC


    Irmovl computation steps
    Irmovl: Computation Steps

    irmovl V, rB

    • All instructions follow same general pattern

    • Differ in what gets computed on each step

    Fetch

    icode,ifun

    icode:ifun  M1[PC]

    Read instruction byte

    rA,rB

    rA:rB  M1[PC+1]

    Read register byte

    valC

    valC M4[PC+2]

    [Read constant word]

    valP

    valP  PC+6

    Compute next PC

    Decode

    valA, srcA

    valB, srcB

    Execute

    valE

    valE  0 + valC

    Perform ALU operation

    Cond code

    Memory

    valM

    Write

    back

    dstE

    R[rB]  valE

    Write back ALU result

    dstM

    PC update

    PC

    PC  valP

    Update PC


    Computation steps1
    Computation Steps

    call Dest

    • All instructions follow same general pattern

    • Differ in what gets computed on each step

    Fetch

    icode,ifun

    icode:ifun  M1[PC]

    Read instruction byte

    rA,rB

    [Read register byte]

    valC

    valC  M4[PC+1]

    Read constant word

    valP

    valP  PC+5

    Compute next PC

    Decode

    valA, srcA

    [Read operand A]

    valB, srcB

    valB  R[%esp]

    Read operand B

    Execute

    valE

    valE  valB + –4

    Perform ALU operation

    Cond code

    [Set condition code reg.]

    Memory

    valM

    M4[valE]  valP

    [Memory read/write]

    Write

    back

    dstE

    R[%esp]  valE

    [Write back ALU result]

    dstM

    Write back memory result

    PC update

    PC

    PC  valC

    Update PC


    Computed values
    Computed Values

    • Fetch

      icode Instruction code

      ifun Instruction function

      rA Instr. Register A

      rB Instr. Register B

      valC Instruction constant

      valP Incremented PC

    • Decode

      srcA Register ID A

      srcB Register ID B

      dstE Destination Register E

      dstM Destination Register M

      valA Register value A

      valB Register value B

    Execute

    • valE ALU result

    • Bch Branch flag

      Memory

    • valM Value from memory


    Seq hardware
    SEQ Hardware

    • Key

      • Blue boxes: predesigned hardware blocks

        • E.g., memories, ALU

      • Gray boxes: control logic

        • Describe in HCL

      • White ovals: labels for signals

      • Thick lines: 32-bit word values

      • Thin lines: 4-8 bit values

      • Dotted lines: 1-bit values


    Fetch logic
    Fetch Logic

    Predefined Blocks

    • PC: Register containing PC

    • Instruction memory: Read 6 bytes (PC to PC+5)

    • Split: Divide instruction byte into icode and ifun

    • Align: Get fields for rA, rB, and valC


    Fetch logic1
    Fetch Logic

    Control Logic

    • Instr. Valid: Is this instruction valid?

    • Need regids: Does this instruction have a register bytes?

    • Need valC: Does this instruction have a constant word?


    Fetch control logic
    Fetch Control Logic

    bool need_regids =

    icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,

    IIRMOVL, IRMMOVL, IMRMOVL };

    bool instr_valid = icode in

    { INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,

    IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };


    Decode logic
    Decode Logic

    Register File

    • Read ports A, B

    • Write ports E, M

    • Addresses are register IDs or 8 (no access)

    Control Logic

    • srcA, srcB: read port addresses

    • dstA, dstB: write port addresses


    A source

    OPl rA, rB

    Decode

    valA  R[rA]

    Read operand A

    rmmovl rA, D(rB)

    Decode

    valA  R[rA]

    Read operand A

    popl rA

    Decode

    valA  R[%esp]

    Read stack pointer

    jXX Dest

    Decode

    No operand

    call Dest

    Decode

    No operand

    ret

    Decode

    valA  R[%esp]

    Read stack pointer

    A Source

    int srcA = [

    icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL } : rA;

    icode in { IPOPL, IRET } : RESP;

    1 : RNONE; # Don't need register

    ];


    E destination

    OPl rA, rB

    Write-back

    R[rB]  valE

    Write back result

    rmmovl rA, D(rB)

    Write-back

    None

    popl rA

    Write-back

    R[%esp]  valE

    Update stack pointer

    jXX Dest

    Write-back

    None

    call Dest

    Write-back

    R[%esp]  valE

    Update stack pointer

    ret

    Write-back

    R[%esp]  valE

    Update stack pointer

    E Destination

    int dstE = [

    icode in { IRRMOVL, IIRMOVL, IOPL} : rB;

    icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;

    1 : RNONE; # Don't need register

    ];


    Execute logic
    Execute Logic

    Units

    • ALU

      • Implements 4 required functions

      • Generates condition code values

    • CC

      • Register with 3 condition code bits

    • bcond

      • Computes branch flag

        Control Logic

    • Set CC: Should condition code register be loaded?

    • ALU A: Input A to ALU

    • ALU B: Input B to ALU

    • ALU fun: What function should ALU compute?


    Alu a input

    OPl rA, rB

    Execute

    valE  valB OP valA

    Perform ALU operation

    rmmovl rA, D(rB)

    Execute

    valE  valB + valC

    Compute effective address

    popl rA

    Execute

    valE  valB + 4

    Increment stack pointer

    jXX Dest

    Execute

    No operation

    call Dest

    Execute

    valE  valB + –4

    Decrement stack pointer

    ret

    Execute

    valE  valB + 4

    Increment stack pointer

    ALU A Input

    int aluA = [

    icode in { IRRMOVL, IOPL } : valA;

    icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;

    icode in { ICALL, IPUSHL } : -4;

    icode in { IRET, IPOPL } : 4;

    # Other instructions don't need ALU

    ];


    Alu operation

    OPl rA, rB

    Execute

    valE  valB OP valA

    Perform ALU operation

    rmmovl rA, D(rB)

    Execute

    valE  valB + valC

    Compute effective address

    popl rA

    Execute

    valE  valB + 4

    Increment stack pointer

    jXX Dest

    Execute

    No operation

    call Dest

    Execute

    valE  valB + –4

    Decrement stack pointer

    ret

    Execute

    valE  valB + 4

    Increment stack pointer

    ALU Operation

    int alufun = [

    icode == IOPL : ifun;

    1 : ALUADD;

    ];


    Memory logic
    Memory Logic

    Memory

    • Reads or writes memory word

      Control Logic

    • Mem. read: should word be read?

    • Mem. write: should word be written?

    • Mem. addr.: Select address

    • Mem. data.: Select data


    Memory address

    OPl rA, rB

    Memory

    No operation

    rmmovl rA, D(rB)

    popl rA

    jXX Dest

    Memory

    Memory

    Memory

    Memory

    M4[valE]  valP

    valM  M4[valA]

    M4[valE]  valA

    valM  M4[valA]

    Write return value on stack

    Write value to memory

    Read return address

    Read from stack

    Memory

    No operation

    call Dest

    ret

    Memory Address

    int mem_addr = [

    icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;

    icode in { IPOPL, IRET } : valA;

    # Other instructions don't need address

    ];


    Memory read

    OPl rA, rB

    Memory

    No operation

    rmmovl rA, D(rB)

    popl rA

    jXX Dest

    Memory

    Memory

    Memory

    Memory

    M4[valE]  valP

    valM  M4[valA]

    M4[valE]  valA

    valM  M4[valA]

    Write return value on stack

    Write value to memory

    Read return address

    Read from stack

    Memory

    No operation

    call Dest

    ret

    Memory Read

    bool mem_read = icode in { IMRMOVL, IPOPL, IRET };


    Pc update logic
    PC Update Logic

    New PC

    • Select next value of PC


    Pc update

    OPl rA, rB

    rmmovl rA, D(rB)

    popl rA

    jXX Dest

    call Dest

    PC update

    PC update

    PC update

    PC update

    PC update

    PC update

    PC  valC

    PC  valP

    PC  Bch ? valC : valP

    PC  valM

    PC  valP

    PC  valP

    Set PC to return address

    Update PC

    Update PC

    Set PC to destination

    Update PC

    Update PC

    ret

    PCUpdate

    int new_pc = [

    icode == ICALL : valC;

    icode == IJXX && Bch : valC;

    icode == IRET : valM;

    1 : valP;

    ];


    Seq operation
    SEQ Operation

    State

    • PC register

    • Cond. Code register

    • Data memory

    • Register file

      All updated as clock rises

      Combinational Logic

    • ALU

    • Control logic

    • Memory reads

      • Instruction memory

      • Register file

      • Data memory


    Seq operation 2
    SEQ Operation #2

    • state set according to second irmovl instruction

    • combinational logic starting to react to state changes


    Seq operation 3
    SEQ Operation #3

    • state set according to second irmovl instruction

    • combinational logic generates results for addl instruction


    Seq operation 4
    SEQ Operation #4

    • state set according to addl instruction

    • combinational logic starting to react to state changes


    Seq operation 5
    SEQ Operation #5

    • state set according to addl instruction

    • combinational logic generates results for je instruction


    Seq summary
    SEQ Summary

    Implementation

    • Express every instruction as series of simple steps

    • Follow same general flow for each instruction type

    • Assemble registers, memories, predesigned combinational blocks

    • Connect with control logic

      Limitations

    • Too slow to be practical

    • In one cycle, must propagate through instruction memory, register file, ALU, and data memory

    • Would need to run clock very slowly

    • Hardware units only active for fraction of clock cycle


    ad