1 / 77

# Computer Architecture - PowerPoint PPT Presentation

Princess Sumaya University for Technology. Computer Architecture. Dr. Esam Al_Qaralleh. Instruction Set Architecture (ISA). Outline. Introduction Classifying instruction set architectures Instruction set measurements Memory addressing Addressing modes for signal processing

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Computer Architecture' - joshua-hebert

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Computer Architecture

Dr. Esam Al_Qaralleh

• Introduction

• Classifying instruction set architectures

• Instruction set measurements

• Addressing modes for signal processing

• Type and size of operands

• Operations in the instruction set

• Operations for media and signal processing

• Instructions for control flow

• Encoding an instruction set

• MIPS architecture

### Instruction Set Principles and Examples

• What operations and How many

• Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!).

• How (many) operands are specified?

• Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B).

• How to encode them into instruction format?

• Instructions should be multiples of Bytes.

• Typical Instruction Set

• 32-bit word

• Basic operand addresses are 32-bit long.

• Basic operands (like integer) are 32-bit long.

• In general, Instruction could refer 3 operands (AB+C).

• Challenge: Encode operations in a small number of bits.

rs

rt

Immediate

opcode

Brief Introduction to ISA

• Instruction Set Architecture: a set of instructions

• Each instruction is directly executed by the CPU’s hardware

• How is it represented?

• By a binary format since the hardware understands only bits

• Options - fixed or variable length formats

• Fixed - each instruction encoded in same size field (typically 1 word)

• Variable – half-word, whole-word, multiple word instructions are possible

• Instruction Format (encoding)

• How is it decoded?

• Location of operands and result

• Where other than memory?

• How many explicit operands?

• How are memory operands located?

• Data type and Size

• Operations

• What are supported?

• Command

• 1: Load AC from Memory

• 2: Store AC to memory

• 5: Add to AC from memory

• Add the contents of memory 940 to the content of memory 941 and stores the result at 941

Fetch

Execution

### Classifying Instruction Set Architecture

The instruction set influences everything

• Usually a simple operation

• Which operation is identified by the op-code field

• But operations require operands - 0, 1, or 2

• To identify where they are, they must be addressed

• Address is to some piece of storage

• Typical storage possibilities are main memory, registers, or a stack

• 2 options explicit or implicit addressing

• Implicit - the op-code implies the address of the operands

• ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result

• HP calculators work this way

• Explicit - the address is specified in some field of the instruction

• Note the potential for 3 addresses - 2 operands + the destination

Based on CPU internal storage optionsAND # of operands

These choices critically affect - #instructions, CPI, and cycle time

Push A

Push B

Pop the top-2 values of the stack (A, B) and push the result value into the stack

Pop C

Accumulator (AC)

Add AC (A) with B and store the result into AC

Store C

Register (register-memory)

Store R3, C

Store R3, C

C=A+B

• Reasons for choosing GPR (general-purpose registers) architecture

• Registers (stacks and accumulators…) are faster than memory

• Registers are easier and more effective for a compiler to use

• (A+B) – (C*D) – (E*F)

• May be evaluated in any order (for pipelining concerns or …)

• But on a stack machine  must left to right

• Registers can be used to hold variables

• Reduce memory traffic

• Speed up programs

• Improve code density (fewer bits are used to name a register)

• Compiler writers prefer that all registers be equivalent and unreserved

• The number of GPR: at least 16

• # of operands

• Three-operand: 1 result and 2 source operands

• Two-operand – 1 both source/result and 1 source

• How many operands are memory addresses

• 0 – 3 (two sources + 1 result)

Register-memory

Memory-memory

Register-Register: (0,3)

+ Simple, fixed length instruction encoding.

+ Simple code-generation model.

+ Similar number of clocks to execute.

- Higher instruction count.

Memory-memory: (3,3)

+ Most compact.

- Different Instruction size.

- Memory access bottleneck.

Register-Memory: (1,2)

+ Easy to encode and yield good density.

- One operand is destroyed.

- Limited number of registers.

• What is accessed - byte, word, multiple words?

• Today’s machine are byte addressable

• Main memory is organized in 32 - 64 byte lines

• Hence there is a natural alignment problem

• Size s bytes at byte address A is aligned if

A mod s = 0

• Misaligned access takes multiple aligned memory references

• Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI)

• Idea

• Bytes in long word numbered 0 to 3

• Which is most (least) significant?

• Can cause problems when exchanging binary data between machines

• Big Endian: Byte 0 is most, 3 is least

• IBM 360/370, Motorola 68K, SPARC.

• Little Endian: Byte 0 is least, 3 is most

• Intel x86, VAX

• Alpha

• Chip can be configured to operate either way

• DEC workstation are little endian

• Cray T3E Alpha’s are big endian

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

s[0]

s[1]

s[2]

s[3]

i[0]

i[1]

l[0]

Byte Ordering Example

union {

unsigned char c[8];

unsigned short s[4];

unsigned int i[2];

unsigned long l[1];

} dw;

Little Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

LSB

MSB

LSB

MSB

LSB

MSB

LSB

MSB

s[0]

s[1]

s[2]

s[3]

LSB

MSB

LSB

MSB

i[0]

i[1]

LSB

MSB

l[0]

Print

Output on Alpha:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]

Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]

Long 0 == [0xf7f6f5f4f3f2f1f0]

Little Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

LSB

MSB

LSB

MSB

LSB

MSB

LSB

MSB

s[0]

s[1]

s[2]

s[3]

LSB

MSB

LSB

MSB

i[0]

i[1]

LSB

MSB

l[0]

Print

Output on Pentium:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]

Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]

Long 0 == [f3f2f1f0]

Big Endian

f0

f1

f2

f3

f4

f5

f6

f7

c[0]

c[1]

c[2]

c[3]

c[4]

c[5]

c[6]

c[7]

MSB

LSB

MSB

LSB

MSB

LSB

MSB

LSB

s[0]

s[1]

s[2]

s[3]

MSB

LSB

MSB

LSB

i[0]

i[1]

MSB

LSB

l[0]

Print

Output on Sun:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]

Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]

Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]

Long 0 == [0xf0f1f2f3]

Regs[R4]  Regs[R4]+3

Register

Regs[R4]  Regs[R4]+Regs[R3]

Operand:3

R3

Operand

Register Indirect

Regs[R4]  Regs[R4]+Mem[Regs[R1]]

R1

Registers

Operand

Memory

Registers

Regs[R4]  Regs[R4]+Mem[1001]

Memory Indirect

Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]]

R3

1001

Operand

Operand

Memory

Registers

Memory

Regs[R4]  Regs[R4]+Mem[100+R1]

100

R1

Operand

Registers

Memory

Scaled

Regs[R1]  Regs[R1]+Mem[100+

Regs[R2]+Regs[R3]*d]

100

R3

R2

Operand

*d

Registers

Memory

Use of Memory Addressing Mode (Figure 2.7)

Based on a VAX which supported everything

Not counting Register mode (50% of all)

• Average of 5 programs from SPECint92 and SPECfp92.

• 1% of addresses > 16 bits.

Integer Average

FP Average

• 10 Programs from SPECInt92 and SPECfp92

• 50% to 60% fit within 8 bits

• 75% to 80% fit within 16 bits

gcc

spice

Tex

• Need to support at least three addressing modes

• Displacement, immediate, and register deferred (+ REGISTER)

• They represent 75% -- 99% of the addressing modes in benchmarks

• The size of the address for displacement mode to be at least 12—16 bits (75% – 99%)

• The size of immediate field to be at least 8 – 16 bits (50%— 80%)

Typical types: assume word= 32 bits

• Character - byte - ASCII or EBCDIC (IBM) - 4 per word

• Short integer - 2- bytes, 2’s complement

• Integer - one word - 2’s complement

• Float - one word - usually IEEE 754 these days

• Double precision float - 2 words - IEEE 754

• BCD or packed decimal - 4- bit values packed 8 per word

• The future - as we go to 64 bit machines

• Larger offsets, immediate, etc. is likely

• Usage of 64 and 128 bit values will increase

• DSPs need wider accumulating registers than the size in memory to aid accuracy in fixed-point arithmetic

### ALU Operations

• Arithmetic + Logical

• Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT

• Logical operation: AND, OR, XOR, NOT

• Data Transfer - copy, load, store

• Control - branch, jump, call, return, trap

• System - OS and memory management

• We’ll ignore these for now - but remember they are needed

• Floating Point

• Same as arithmetic but usually take bigger operands

• Decimal

• String - move, compare, search

• Graphics – pixel and vertex, compression/decompression operations

conditional branch: 20%

compare: 16%

store: 12%

and: 6%

sub: 5%

move register-register: 4%

call: 1%

return: 1%

The most widely executed instructions are the simple operations of an instruction set

The top-10 instructions for 80x86 account for 96% of instructions executed

Make them fast, as they are the common case

Top 10 Instructions for 80x86

• Jumps - unconditional transfer

• Conditional Branches

• How is condition code set? – by flag or part of the instruction

• How is target specified? How far away is it?

• Calls

• How is target specified? How far away is it?

• Where is return address kept?

• How are the arguments passed? Callee vs. Caller save!

• Returns

• Where is the return address? How far away is it?

• How are the results passed?

• Call/Returns

• Integer: 19% FP: 8%

• Jump

• Integer: 6% FP: 10%

• Conditional Branch

• Integer: 75% FP: 82%

• Known at compile time for unconditional and conditional branches - hence specified in the instruction

• As a register containing the target address

• As a PC-relative offset

• Consider word length addresses, registers, and instructions

• Full address desired? Then pick the register option.

• BUT - setup and effective address will take longer.

• If you can deal with smaller offset then PC relative works

• PC relative is also position independent - so simple linker duty

• Branch target is not known at compile time

• Need a way to specify the target dynamically

• Use a register

• Regs[R4]  Regs[R4] + Mem[Regs[R1]]

• Also useful for

• case or switch

• Dynamically shared libraries

• High-order functions or function pointers

• Call/Return

• TeX = 16%, Spice = 13%, GCC = 10%

• Jump

• TeX = 18%, Spice = 12%, GCC = 12%

• Conditional

• TeX = 66%, Spice = 75%, GCC = 78%

PSW: program Switch Word

Large comparisons are with zero

Key points – 75% are forward branch

• Most backward branches are loops - taken about 90%

• Branch statistics are both compiler and application dependent

• Any loop optimizations may have large effect

• Imply a PC-relative branch displacement of at least 8 bits

• Register-indirect and PC-relative addressing for jump instructions to support returns as well as many other features of current systems ( dynamic allocations)

### Encoding an Instruction Set

• Encode instructions into a binary representation for execution by CPU

• Can pick anything but:

• Affects the size of code - so it should be tight

• Affects the CPU design - in particular the instruction decode

• So it may have a big influence on the CPI or cycle-time

• Must balance several competing forces

• Desire for lots of addressing modes and registers

• Desire to make average program size compact

• Desire to have instructions encoded into lengths that will be easy to handle in a pipelined implementation (multiple of bytes)

3 Popular Encoding Choices

• Variable (compact code but difficult to encode)

• Primary opcode is fixed in size, but opcode modifiers may exist

• Opcode specifies number of arguments - each used as address fields

• Best when there are many addressing modes and operations

• Use as few bits as possible, but individual instructions can vary widely in length

• e. g. VAX - integer ADD versions vary between 3 and 19 bytes

• Fixed (easy to encode, but lengthy code)

• Every instruction looks the same - some field may be interpreted differently

• Combine the operation and the addressing mode into the opcode

• e. g. all modern RISC machines

• Hybrid

• Set of fixed formats

• e. g. IBM 360 and Intel 80x86

Trade-off between size of programVS. ease of decoding

3 Popular Encoding Choices (Cont.)

• addl3 r1, 737(r2), (r3): 32-bit integer add instruction with 3 operands  need 6 bytes to represent it

• Opcode for addl3: 1 byte

• A VAX address specifier is 1 byte (4-bits: addressing mode, 4-bits: register)

• r1: 1 byte (register addressing mode + r1)

• 737(r2)

• 2 bytes for displacement 737

• (r3): 1 byte for address specifier (register indirect + r3)

• Length of VAX instructions: 1—53 bytes

• Choice between variable and fixed instruction encoding

• Code size than performance  variable encoding

• Performance than code size  fixed encoding

### Role of Compilers

• ISA decisions are no more for programming AL easily

• Due to HLL, ISA is a compiler target today

• Performance of a computer will be significantly affected by compiler

• Understanding compiler technology today is critical to designing and efficiently implementing an instruction set

• Architecture choice affects the code quality and the complexity of building a compiler for it

• Primary goal is correctness

• Second goal is speed of the object code

• Others:

• Speed of the compilation

• Ease of providing debug support

• Inter-operability among languages

• Flexibility of the implementation - languages may not change much but they do evolve - e. g. Fortran 66 ===> HPF

Make the frequent cases fast and the rare case correct

• Hard to reduce branches

• Biggest reduction is often memory references

• Some ALU operation reduction happens but it is usually a few %

• Implication:

• Branch, Call, and Return become a larger relative % of the instruction mix

• Control instructions among the hardest to speed up

• Provide Regularity

• Address modes, operations, and data types should be orthogonal (independent) of each other

• Simplify code generation especially multi-pass

• Counterexample: restrict what registers can be used for a certain classes of instructions

• Provide primitives - not solutions

• Special features that match a HLL construct are often un-usable

• What works in one language may be detrimental to others

• How to write good code? What is a good code?

• Metric: IC or code size (no longer true) caches and pipeline…

• Anything that makes code sequence performance obvious is a definite win!

• How many times a variable should be referenced before it is cheaper to load it into a register

• Provide instructions that bind the quantities known at compile time as constants

• Don’t hide compile time constants

• Instructions which work off of something that the compiler thinks could be a run-time determined value hand-cuffs the optimizer

• ISA has at least 16 GPR (not counting FP registers) to simplify allocation of registers using graph coloring

• Orthogonality suggests all supported addressing modes apply to all instructions that transfer data

• Simplicity – understand that less is more in ISA design

• Provide primitives instead of solutions

• Don’t bind constants at runtime

• Counterexample – Lack of compiler support for multimedia instructions

### The MIPS Architecture

• Use general-purpose registers, with a load-store architecture

• Support displacement (offset size12-16 bits), immediate (size 8 to 16 bits), and register indirect

• Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754 floating-point numbers

• Support the following simple instructions: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch (with a PC-relative address at least 8 bits long), jump, call, return

• Use fixed instruction encoding if interested in performance and use variable instruction encoding if interested in code size

• Provide at least 16 general-purpose registers (GPA) + separate floating-point registers, be sure all addressing modes apply to all data transfer instructions, and aim for a minimalist instruction set

• Enable efficient pipeline implementation

• Fixed instruction set encoding

• Efficiency as a compiler target

• MIPS64 variant is discussed here

• 32 64-bit integer GPR’s - R0, R1, ... R31, R0= 0 always

• 32 FPR’s - used for single or double precision

• For single precision: F0, F1, ... , F31 (32-bit)

• For double precision: F0, F2, ... , F30 (64-bit)

• Extra status registers - moves via GPR’s

• Instructions for moving between an FRP and a GPR

• 8-bit byte, 16-bit half words, 32-bit word, and 64-bit double words for integer data

• 32-bit single precision and 64-bit double precision for FP

• MIPS64 operations work on 64-bit integer and 32- or 64-bit floating point

• Bytes, half words, and words are loaded into the GPRs with zeros or the sign bit replicated to fill the 64 bits of the GPRs

• All references between memory and either GPRs or FPRs are through load or stores

• Data addressing : immediate and displacement (16 bits)

• Displacement: Add R4, 100(R1) (Regs[R4]Regs[R4]+Mem[100+Regs[R1]])

• Register-indirect: placing 0 in displacement field

• Absolute addressing (16 bits): using R0 as the base register

• Mode selection for Big Endian or Little Endian

• Encode addressing mode into the opcode

• All instructions are 32 bits with 6-bit primary opcode

rs

rt

Immediate

opcode

MIPS Instruction Format (Cont.)

I-Type Instruction

• Loads and Stores LW R1, 30(R2), S.S F0, 40(R4)

• ALU ops on immediates DADDIU R1, R2, #3

• rt <-- rs op immediate

• Conditional branches BEQZ R3, offset

• rs is the register checked

• rt unused

• immediate specifies the offset

• Jump registers ,jump and link register JR R3

• rs is target register

• rt and immediate are unused but = 011

rs

rt

rd

shamt

func

opcode

MIPS Instruction Format (Cont.)

R-Type Instruction

• Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3

• Function encodes the data path operations: Add, Sub...

• Moves

J-Type Instruction: Jump, Jump and Link, Trap and return from exception

6 26

opcode