Computer architecture
This presentation is the property of its rightful owner.
Sponsored Links
1 / 77

Computer Architecture PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Princess Sumaya University for Technology. Computer Architecture. Dr. Esam Al_Qaralleh. Instruction Set Architecture (ISA). Outline. Introduction Classifying instruction set architectures Instruction set measurements Memory addressing Addressing modes for signal processing

Download Presentation

Computer Architecture

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computer architecture

Princess Sumaya University for Technology

Computer Architecture

Dr. Esam Al_Qaralleh


Computer architecture

Instruction Set Architecture

(ISA)


Outline

Outline

  • Introduction

  • Classifying instruction set architectures

  • Instruction set measurements

    • Memory addressing

    • Addressing modes for signal processing

    • Type and size of operands

    • Operations in the instruction set

    • Operations for media and signal processing

    • Instructions for control flow

    • Encoding an instruction set

  • MIPS architecture


Instruction set principles and examples

Instruction Set Principles and Examples


Basic issues in instruction set design

Basic Issues in Instruction Set Design

  • What operations and How many

    • Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!).

  • How (many) operands are specified?

    • Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B).

  • How to encode them into instruction format?

    • Instructions should be multiples of Bytes.

  • Typical Instruction Set

    • 32-bit word

    • Basic operand addresses are 32-bit long.

    • Basic operands (like integer) are 32-bit long.

    • In general, Instruction could refer 3 operands (AB+C).

  • Challenge: Encode operations in a small number of bits.


Brief introduction to isa

6 5 5 16

rs

rt

Immediate

opcode

Brief Introduction to ISA

  • Instruction Set Architecture: a set of instructions

    • Each instruction is directly executed by the CPU’s hardware

  • How is it represented?

    • By a binary format since the hardware understands only bits

  • Options - fixed or variable length formats

    • Fixed - each instruction encoded in same size field (typically 1 word)

    • Variable – half-word, whole-word, multiple word instructions are possible


What must be specified

What Must be Specified?

  • Instruction Format (encoding)

    • How is it decoded?

  • Location of operands and result

    • Where other than memory?

    • How many explicit operands?

    • How are memory operands located?

  • Data type and Size

  • Operations

    • What are supported?


Example of program execution

Example of Program Execution

  • Command

    • 1: Load AC from Memory

    • 2: Store AC to memory

    • 5: Add to AC from memory

  • Add the contents of memory 940 to the content of memory 941 and stores the result at 941

Fetch

Execution


Classifying instruction set architecture

Classifying Instruction Set Architecture


Instruction set design

Instruction Set Design

The instruction set influences everything


Instruction characteristics

Instruction Characteristics

  • Usually a simple operation

    • Which operation is identified by the op-code field

  • But operations require operands - 0, 1, or 2

    • To identify where they are, they must be addressed

      • Address is to some piece of storage

      • Typical storage possibilities are main memory, registers, or a stack

  • 2 options explicit or implicit addressing

    • Implicit - the op-code implies the address of the operands

      • ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result

      • HP calculators work this way

    • Explicit - the address is specified in some field of the instruction

      • Note the potential for 3 addresses - 2 operands + the destination


Classifying instruction set architectures

Classifying Instruction Set Architectures

Based on CPU internal storage optionsAND # of operands

These choices critically affect - #instructions, CPI, and cycle time


Operand locations for four isa classes

Operand Locations for Four ISA Classes


C a b

Stack

Push A

Push B

Add

Pop the top-2 values of the stack (A, B) and push the result value into the stack

Pop C

Accumulator (AC)

Load A

Add B

Add AC (A) with B and store the result into AC

Store C

Register (register-memory)

Load R1, A

Add R3, R1, B

Store R3, C

Register (load-store)

Load R1, A

Load R2, B

Add R3, R1, R2

Store R3, C

C=A+B


Modern choice load store register gpr architecture

Modern Choice – Load-store Register (GPR) Architecture

  • Reasons for choosing GPR (general-purpose registers) architecture

    • Registers (stacks and accumulators…) are faster than memory

    • Registers are easier and more effective for a compiler to use

      • (A+B) – (C*D) – (E*F)

        • May be evaluated in any order (for pipelining concerns or …)

          • But on a stack machine  must left to right

    • Registers can be used to hold variables

      • Reduce memory traffic

      • Speed up programs

      • Improve code density (fewer bits are used to name a register)

  • Compiler writers prefer that all registers be equivalent and unreserved

    • The number of GPR: at least 16


Characteristics divide gpr architectures

Characteristics Divide GPR Architectures

  • # of operands

    • Three-operand: 1 result and 2 source operands

    • Two-operand – 1 both source/result and 1 source

  • How many operands are memory addresses

    • 0 – 3 (two sources + 1 result)

Load-store

Register-memory

Memory-memory


Pro s and con s of three most common gpr computers

Pro’s and Con’s of Three Most Common GPR Computers

Register-Register: (0,3)

+ Simple, fixed length instruction encoding.

+ Simple code-generation model.

+ Similar number of clocks to execute.

- Higher instruction count.

Memory-memory: (3,3)

+ Most compact.

- Different Instruction size.

- Memory access bottleneck.

Register-Memory: (1,2)

+ Data access without loading first.

+ Easy to encode and yield good density.

- One operand is destroyed.

- Limited number of registers.


Memory addressing

Memory Addressing


Memory addressing basics

Memory Addressing Basics

All architectures must address memory

  • What is accessed - byte, word, multiple words?

    • Today’s machine are byte addressable

    • Main memory is organized in 32 - 64 byte lines

    • Big-Endian or Little-Endian addressing

  • Hence there is a natural alignment problem

    • Size s bytes at byte address A is aligned if

      A mod s = 0

    • Misaligned access takes multiple aligned memory references

  • Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI)


Byte ordering

Byte Ordering

  • Idea

    • Bytes in long word numbered 0 to 3

    • Which is most (least) significant?

    • Can cause problems when exchanging binary data between machines

  • Big Endian: Byte 0 is most, 3 is least

    • IBM 360/370, Motorola 68K, SPARC.

  • Little Endian: Byte 0 is least, 3 is most

    • Intel x86, VAX

  • Alpha

    • Chip can be configured to operate either way

    • DEC workstation are little endian

    • Cray T3E Alpha’s are big endian


Byte ordering example

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

s[0]

s[1]

s[2]

s[3]

i[0]

i[1]

l[0]

Byte Ordering Example

union {

unsigned char c[8];

unsigned short s[4];

unsigned int i[2];

unsigned long l[1];

} dw;


Byte ordering on alpha

Byte Ordering on Alpha

Little Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

LSB

MSB

LSB

MSB

LSB

MSB

LSB

MSB

s[0]

s[1]

s[2]

s[3]

LSB

MSB

LSB

MSB

i[0]

i[1]

LSB

MSB

l[0]

Print

Output on Alpha:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]

Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]

Long 0 == [0xf7f6f5f4f3f2f1f0]


Byte ordering on x86

Byte Ordering on x86

Little Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

LSB

MSB

LSB

MSB

LSB

MSB

LSB

MSB

s[0]

s[1]

s[2]

s[3]

LSB

MSB

LSB

MSB

i[0]

i[1]

LSB

MSB

l[0]

Print

Output on Pentium:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]

Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]

Long 0 == [f3f2f1f0]


Byte ordering on sun

Byte Ordering on Sun

Big Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

MSB

LSB

MSB

LSB

MSB

LSB

MSB

LSB

s[0]

s[1]

s[2]

s[3]

MSB

LSB

MSB

LSB

i[0]

i[1]

MSB

LSB

l[0]

Print

Output on Sun:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]

Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]

Long 0 == [0xf0f1f2f3]


Addressing modes

Immediate

Add R4, #3

Regs[R4]  Regs[R4]+3

Register

Add R4, R3

Regs[R4]  Regs[R4]+Regs[R3]

Operand:3

R3

Operand

Register Indirect

Add R4, (R1)

Regs[R4]  Regs[R4]+Mem[Regs[R1]]

R1

Registers

Operand

Memory

Registers

Addressing Modes


Addressing modes cont

Direct

Add R4, (1001)

Regs[R4]  Regs[R4]+Mem[1001]

Memory Indirect

Add R4, @(R3)

Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]]

R3

1001

Operand

Operand

Memory

Registers

Addressing Modes(Cont.)

Memory


Addressing modes cont1

Displacement

Add R4, 100(R1)

Regs[R4]  Regs[R4]+Mem[100+R1]

100

R1

Operand

Registers

Memory

Addressing Modes(Cont.)

Scaled

Add R1, 100(R2) [R3]

Regs[R1]  Regs[R1]+Mem[100+

Regs[R2]+Regs[R3]*d]

100

R3

R2

Operand

*d

Registers

Memory


Typical address modes i

Typical Address Modes (I)


Typical address modes ii

Typical Address Modes (II)


Use of memory addressing mode figure 2 7

Use of Memory Addressing Mode (Figure 2.7)

Based on a VAX which supported everything

Not counting Register mode (50% of all)


Displacement address size

Displacement Address Size

  • Average of 5 programs from SPECint92 and SPECfp92.

    • 1% of addresses > 16 bits.

Integer Average

FP Average


Immediate addressing mode

Immediate Addressing Mode

  • 10 Programs from SPECInt92 and SPECfp92


Immediate addressing mode1

Immediate Addressing Mode

  • 50% to 60% fit within 8 bits

  • 75% to 80% fit within 16 bits

gcc

spice

Tex


Short summary memory addressing

Short Summary – Memory Addressing

  • Need to support at least three addressing modes

    • Displacement, immediate, and register deferred (+ REGISTER)

    • They represent 75% -- 99% of the addressing modes in benchmarks

  • The size of the address for displacement mode to be at least 12—16 bits (75% – 99%)

  • The size of immediate field to be at least 8 – 16 bits (50%— 80%)


Operand type size

Operand Type & Size

Typical types: assume word= 32 bits

  • Character - byte - ASCII or EBCDIC (IBM) - 4 per word

  • Short integer - 2- bytes, 2’s complement

  • Integer - one word - 2’s complement

  • Float - one word - usually IEEE 754 these days

  • Double precision float - 2 words - IEEE 754

  • BCD or packed decimal - 4- bit values packed 8 per word


Data access patterns

Data Access Patterns


Short summary type and size of operand

Short Summary – Type and Size of Operand

  • The future - as we go to 64 bit machines

  • Larger offsets, immediate, etc. is likely

  • Usage of 64 and 128 bit values will increase

  • DSPs need wider accumulating registers than the size in memory to aid accuracy in fixed-point arithmetic


Alu operations

ALU Operations


What operations are needed

What Operations are Needed

  • Arithmetic + Logical

    • Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT

    • Logical operation: AND, OR, XOR, NOT

  • Data Transfer - copy, load, store

  • Control - branch, jump, call, return, trap

  • System - OS and memory management

    • We’ll ignore these for now - but remember they are needed

  • Floating Point

    • Same as arithmetic but usually take bigger operands

  • Decimal

  • String - move, compare, search

  • Graphics – pixel and vertex, compression/decompression operations


Top 10 instructions for 80x86

load: 22%

conditional branch: 20%

compare: 16%

store: 12%

add: 8%

and: 6%

sub: 5%

move register-register: 4%

call: 1%

return: 1%

The most widely executed instructions are the simple operations of an instruction set

The top-10 instructions for 80x86 account for 96% of instructions executed

Make them fast, as they are the common case

Top 10 Instructions for 80x86


Control instructions are a big deal

Control Instructions are a Big Deal

  • Jumps - unconditional transfer

  • Conditional Branches

    • How is condition code set? – by flag or part of the instruction

    • How is target specified? How far away is it?

  • Calls

    • How is target specified? How far away is it?

    • Where is return address kept?

    • How are the arguments passed? Callee vs. Caller save!

  • Returns

    • Where is the return address? How far away is it?

    • How are the results passed?


Breakdown of control flows

Breakdown of Control Flows

  • Call/Returns

    • Integer: 19%FP: 8%

  • Jump

    • Integer: 6%FP: 10%

  • Conditional Branch

    • Integer: 75%FP: 82%


Branch address specification

Branch Address Specification

  • Known at compile time for unconditional and conditional branches - hence specified in the instruction

    • As a register containing the target address

    • As a PC-relative offset

  • Consider word length addresses, registers, and instructions

    • Full address desired? Then pick the register option.

      • BUT - setup and effective address will take longer.

    • If you can deal with smaller offset then PC relative works

      • PC relative is also position independent - so simple linker duty


Returns and indirect jumps

Returns and Indirect Jumps

  • Branch target is not known at compile time

  • Need a way to specify the target dynamically

    • Use a register

    • Permit any addressing mode

    • Regs[R4]  Regs[R4] + Mem[Regs[R1]]

  • Also useful for

    • case or switch

    • Dynamically shared libraries

    • High-order functions or function pointers


Branch stats 90 are pc relative

Branch Stats - 90% are PC Relative

  • Call/Return

    • TeX = 16%, Spice = 13%, GCC = 10%

  • Jump

    • TeX = 18%, Spice = 12%, GCC = 12%

  • Conditional

    • TeX = 66%, Spice = 75%, GCC = 78%


Branch distances

Branch Distances


Condition testing options

Condition Testing Options

PSW: program Switch Word


What kinds of compares do branches use

What kinds of compares do Branches Use?

Large comparisons are with zero


Direction frequency and real change

Direction, Frequency, and real Change

Key points – 75% are forward branch

• Most backward branches are loops - taken about 90%

• Branch statistics are both compiler and application dependent

• Any loop optimizations may have large effect


Short summary operations in the instruction set

Short Summary – Operations in the Instruction Set

  • Branch addressing to be able to jump to about 100+ instructions either above or below the branch

    • Imply a PC-relative branch displacement of at least 8 bits

  • Register-indirect and PC-relative addressing for jump instructions to support returns as well as many other features of current systems ( dynamic allocations)


Encoding an instruction set

Encoding an Instruction Set


Encoding the isa

Encoding the ISA

  • Encode instructions into a binary representation for execution by CPU

  • Can pick anything but:

    • Affects the size of code - so it should be tight

    • Affects the CPU design - in particular the instruction decode

  • So it may have a big influence on the CPI or cycle-time

  • Must balance several competing forces

    • Desire for lots of addressing modes and registers

    • Desire to make average program size compact

    • Desire to have instructions encoded into lengths that will be easy to handle in a pipelined implementation (multiple of bytes)


3 popular encoding choices

3 Popular Encoding Choices

  • Variable (compact code but difficult to encode)

    • Primary opcode is fixed in size, but opcode modifiers may exist

    • Opcode specifies number of arguments - each used as address fields

    • Best when there are many addressing modes and operations

    • Use as few bits as possible, but individual instructions can vary widely in length

    • e. g. VAX - integer ADD versions vary between 3 and 19 bytes

  • Fixed (easy to encode, but lengthy code)

    • Every instruction looks the same - some field may be interpreted differently

    • Combine the operation and the addressing mode into the opcode

    • e. g. all modern RISC machines

  • Hybrid

    • Set of fixed formats

    • e. g. IBM 360 and Intel 80x86

Trade-off between size of programVS. ease of decoding


3 popular encoding choices cont

3 Popular Encoding Choices (Cont.)


An example of variable encoding vax

An Example of Variable Encoding -- VAX

  • addl3 r1, 737(r2), (r3): 32-bit integer add instruction with 3 operands  need 6 bytes to represent it

    • Opcode for addl3: 1 byte

    • A VAX address specifier is 1 byte (4-bits: addressing mode, 4-bits: register)

      • r1: 1 byte (register addressing mode + r1)

      • 737(r2)

        • 1 byte for address specifier (displacement addressing + r2)

        • 2 bytes for displacement 737

      • (r3): 1 byte for address specifier (register indirect + r3)

  • Length of VAX instructions: 1—53 bytes


Short summary encoding the instruction set

Short Summary – Encoding the Instruction Set

  • Choice between variable and fixed instruction encoding

    • Code size than performance  variable encoding

    • Performance than code size  fixed encoding


Role of compilers

Role of Compilers


Computer architecture

  • Critical goals in ISA from the compiler viewpoint

    • What features will lead to high-quality code

    • What makes it easy to write efficient compilers for an architecture


Compiler and isa

Compiler and ISA

  • ISA decisions are no more for programming AL easily

  • Due to HLL, ISA is a compiler target today

  • Performance of a computer will be significantly affected by compiler

  • Understanding compiler technology today is critical to designing and efficiently implementing an instruction set

  • Architecture choice affects the code quality and the complexity of building a compiler for it


Goal of the compiler

Goal of the Compiler

  • Primary goal is correctness

  • Second goal is speed of the object code

  • Others:

    • Speed of the compilation

    • Ease of providing debug support

    • Inter-operability among languages

    • Flexibility of the implementation - languages may not change much but they do evolve - e. g. Fortran 66 ===> HPF

Make the frequent cases fast and the rare case correct


Optimization observations

Optimization Observations

  • Hard to reduce branches

  • Biggest reduction is often memory references

  • Some ALU operation reduction happens but it is usually a few %

  • Implication:

    • Branch, Call, and Return become a larger relative % of the instruction mix

    • Control instructions among the hardest to speed up


How can architects help compiler writers

How can Architects Help Compiler Writers

  • Provide Regularity

    • Address modes, operations, and data types should be orthogonal (independent) of each other

      • Simplify code generation especially multi-pass

      • Counterexample: restrict what registers can be used for a certain classes of instructions

  • Provide primitives - not solutions

    • Special features that match a HLL construct are often un-usable

    • What works in one language may be detrimental to others


How can architects help compiler writers cont

How can Architects Help Compiler Writers (Cont.)

  • Simplify trade-offs among alternatives

    • How to write good code? What is a good code?

      • Metric: IC or code size (no longer true) caches and pipeline…

    • Anything that makes code sequence performance obvious is a definite win!

      • How many times a variable should be referenced before it is cheaper to load it into a register

  • Provide instructions that bind the quantities known at compile time as constants

    • Don’t hide compile time constants

      • Instructions which work off of something that the compiler thinks could be a run-time determined value hand-cuffs the optimizer


Short summary compilers

Short Summary -- Compilers

  • ISA has at least 16 GPR (not counting FP registers) to simplify allocation of registers using graph coloring

  • Orthogonality suggests all supported addressing modes apply to all instructions that transfer data

  • Simplicity – understand that less is more in ISA design

    • Provide primitives instead of solutions

    • Simplify trade-offs between alternatives

    • Don’t bind constants at runtime

  • Counterexample – Lack of compiler support for multimedia instructions


The mips architecture

The MIPS Architecture


Expectations for new isa

Expectations for New ISA

  • Use general-purpose registers, with a load-store architecture

  • Support displacement (offset size12-16 bits), immediate (size 8 to 16 bits), and register indirect

  • Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754 floating-point numbers

  • Support the following simple instructions: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8 bits long), jump, call, return

  • Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size

  • Provide at least 16 general-purpose registers (GPA) + separate floating-point registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set


Computer architecture

MIPS

  • Simple load- store ISA

  • Enable efficient pipeline implementation

  • Fixed instruction set encoding

  • Efficiency as a compiler target

  • MIPS64 variant is discussed here


Register for mips

Register for MIPS

  • 32 64-bit integer GPR’s - R0, R1, ... R31, R0= 0 always

  • 32 FPR’s - used for single or double precision

    • For single precision: F0, F1, ... , F31 (32-bit)

    • For double precision: F0, F2, ... , F30 (64-bit)

  • Extra status registers - moves via GPR’s

  • Instructions for moving between an FRP and a GPR


Data types for mips

Data Types for MIPS

  • 8-bit byte, 16-bit half words, 32-bit word, and 64-bit double words for integer data

  • 32-bit single precision and 64-bit double precision for FP

  • MIPS64 operations work on 64-bit integer and 32- or 64-bit floating point

    • Bytes, half words, and words are loaded into the GPRs with zeros or the sign bit replicated to fill the 64 bits of the GPRs

  • All references between memory and either GPRs or FPRs are through load or stores


Addressing modes for mips

Addressing Modes for MIPS

  • Data addressing : immediate and displacement (16 bits)

    • Displacement: Add R4, 100(R1) (Regs[R4]Regs[R4]+Mem[100+Regs[R1]])

    • Register-indirect: placing 0 in displacement field

      • Add R4, (R1) (Regs[R4]Regs[R4]+Mem[Regs[R1]])

    • Absolute addressing (16 bits): using R0 as the base register

      • Add R1, (1001) (Regs[R4]Regs[R4]+Mem[1001])

  • Byte addressable with 64-bit address

    • Mode selection for Big Endian or Little Endian


Mips instruction format

MIPS Instruction Format

  • Encode addressing mode into the opcode

  • All instructions are 32 bits with 6-bit primary opcode


Mips instruction format cont

6 5 5 16

rs

rt

Immediate

opcode

MIPS Instruction Format (Cont.)

I-Type Instruction

  • Loads and StoresLW R1, 30(R2), S.S F0, 40(R4)

  • ALU ops on immediates DADDIU R1, R2, #3

    • rt <-- rs op immediate

  • Conditional branches BEQZ R3, offset

    • rs is the register checked

    • rt unused

    • immediate specifies the offset

  • Jump registers ,jump and link registerJR R3

    • rs is target register

    • rt and immediate are unused but = 011


Mips instruction format cont1

6 5 5 5 5 6

rs

rt

rd

shamt

func

opcode

MIPS Instruction Format (Cont.)

R-Type Instruction

  • Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3

    • Function encodes the data path operations: Add, Sub...

  • read/write special registers

  • Moves

J-Type Instruction: Jump, Jump and Link, Trap and return from exception

6 26

opcode

Offset added to PC


Mips instruction mix

MIPS instruction MIX

SPECint2000


Mips instruction mix cont

MIPS instruction MIX (Cont.)

SPECfp2000


  • Login