Computer Architecture

Princess Sumaya University for Technology Computer Architecture Dr. Esam Al_Qaralleh

Instruction Set Architecture (ISA)

Outline • Introduction • Classifying instruction set architectures • Instruction set measurements • Memory addressing • Addressing modes for signal processing • Type and size of operands • Operations in the instruction set • Operations for media and signal processing • Instructions for control flow • Encoding an instruction set • MIPS architecture

Instruction Set Principles and Examples

Basic Issues in Instruction Set Design • What operations and How many • Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!). • How (many) operands are specified? • Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B). • How to encode them into instruction format? • Instructions should be multiples of Bytes. • Typical Instruction Set • 32-bit word • Basic operand addresses are 32-bit long. • Basic operands (like integer) are 32-bit long. • In general, Instruction could refer 3 operands (AB+C). • Challenge: Encode operations in a small number of bits.

6 5 5 16 rs rt Immediate opcode Brief Introduction to ISA • Instruction Set Architecture: a set of instructions • Each instruction is directly executed by the CPU’s hardware • How is it represented? • By a binary format since the hardware understands only bits • Options - fixed or variable length formats • Fixed - each instruction encoded in same size field (typically 1 word) • Variable – half-word, whole-word, multiple word instructions are possible

What Must be Specified? • Instruction Format (encoding) • How is it decoded? • Location of operands and result • Where other than memory? • How many explicit operands? • How are memory operands located? • Data type and Size • Operations • What are supported?

Example of Program Execution • Command • 1: Load AC from Memory • 2: Store AC to memory • 5: Add to AC from memory • Add the contents of memory 940 to the content of memory 941 and stores the result at 941 Fetch Execution

Classifying Instruction Set Architecture

Instruction Set Design The instruction set influences everything

Instruction Characteristics • Usually a simple operation • Which operation is identified by the op-code field • But operations require operands - 0, 1, or 2 • To identify where they are, they must be addressed • Address is to some piece of storage • Typical storage possibilities are main memory, registers, or a stack • 2 options explicit or implicit addressing • Implicit - the op-code implies the address of the operands • ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result • HP calculators work this way • Explicit - the address is specified in some field of the instruction • Note the potential for 3 addresses - 2 operands + the destination

Classifying Instruction Set Architectures Based on CPU internal storage optionsAND # of operands These choices critically affect - #instructions, CPI, and cycle time

Operand Locations for Four ISA Classes

Stack Push A Push B Add Pop the top-2 values of the stack (A, B) and push the result value into the stack Pop C Accumulator (AC) Load A Add B Add AC (A) with B and store the result into AC Store C Register (register-memory) Load R1, A Add R3, R1, B Store R3, C Register (load-store) Load R1, A Load R2, B Add R3, R1, R2 Store R3, C C=A+B

Modern Choice – Load-store Register (GPR) Architecture • Reasons for choosing GPR (general-purpose registers) architecture • Registers (stacks and accumulators…) are faster than memory • Registers are easier and more effective for a compiler to use • (A+B) – (C*D) – (E*F) • May be evaluated in any order (for pipelining concerns or …) • But on a stack machine  must left to right • Registers can be used to hold variables • Reduce memory traffic • Speed up programs • Improve code density (fewer bits are used to name a register) • Compiler writers prefer that all registers be equivalent and unreserved • The number of GPR: at least 16

Characteristics Divide GPR Architectures • # of operands • Three-operand: 1 result and 2 source operands • Two-operand – 1 both source/result and 1 source • How many operands are memory addresses • 0 – 3 (two sources + 1 result) Load-store Register-memory Memory-memory

Pro’s and Con’s of Three Most Common GPR Computers Register-Register: (0,3) + Simple, fixed length instruction encoding. + Simple code-generation model. + Similar number of clocks to execute. - Higher instruction count. Memory-memory: (3,3) + Most compact. - Different Instruction size. - Memory access bottleneck. Register-Memory: (1,2) + Data access without loading first. + Easy to encode and yield good density. - One operand is destroyed. - Limited number of registers.

Memory Addressing

Memory Addressing Basics All architectures must address memory • What is accessed - byte, word, multiple words? • Today’s machine are byte addressable • Main memory is organized in 32 - 64 byte lines • Big-Endian or Little-Endian addressing • Hence there is a natural alignment problem • Size s bytes at byte address A is aligned if A mod s = 0 • Misaligned access takes multiple aligned memory references • Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI)

Byte Ordering • Idea • Bytes in long word numbered 0 to 3 • Which is most (least) significant? • Can cause problems when exchanging binary data between machines • Big Endian: Byte 0 is most, 3 is least • IBM 360/370, Motorola 68K, SPARC. • Little Endian: Byte 0 is least, 3 is most • Intel x86, VAX • Alpha • Chip can be configured to operate either way • DEC workstation are little endian • Cray T3E Alpha’s are big endian

c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] s[0] s[1] s[2] s[3] i[0] i[1] l[0] Byte Ordering Example union { unsigned char c[8]; unsigned short s[4]; unsigned int i[2]; unsigned long l[1]; } dw;

Byte Ordering on Alpha Little Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Alpha: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [0xf7f6f5f4f3f2f1f0]

Byte Ordering on x86 Little Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] LSB MSB LSB MSB LSB MSB LSB MSB s[0] s[1] s[2] s[3] LSB MSB LSB MSB i[0] i[1] LSB MSB l[0] Print Output on Pentium: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6] Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4] Long 0 == [f3f2f1f0]

Byte Ordering on Sun Big Endian f0 f1 f2 f3 f4 f5 f6 f7 c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7] MSB LSB MSB LSB MSB LSB MSB LSB s[0] s[1] s[2] s[3] MSB LSB MSB LSB i[0] i[1] MSB LSB l[0] Print Output on Sun: Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7] Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7] Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7] Long 0 == [0xf0f1f2f3]

Immediate Add R4, #3 Regs[R4]  Regs[R4]+3 Register Add R4, R3 Regs[R4]  Regs[R4]+Regs[R3] Operand:3 R3 Operand Register Indirect Add R4, (R1) Regs[R4]  Regs[R4]+Mem[Regs[R1]] R1 Registers Operand Memory Registers Addressing Modes

Direct Add R4, (1001) Regs[R4]  Regs[R4]+Mem[1001] Memory Indirect Add R4, @(R3) Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]] R3 1001 Operand Operand Memory Registers Addressing Modes(Cont.) Memory

Displacement Add R4, 100(R1) Regs[R4]  Regs[R4]+Mem[100+R1] 100 R1 Operand Registers Memory Addressing Modes(Cont.) Scaled Add R1, 100(R2) [R3] Regs[R1]  Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d] 100 R3 R2 Operand *d Registers Memory

Typical Address Modes (I)

Typical Address Modes (II)

Use of Memory Addressing Mode (Figure 2.7) Based on a VAX which supported everything Not counting Register mode (50% of all)

Displacement Address Size • Average of 5 programs from SPECint92 and SPECfp92. • 1% of addresses > 16 bits. Integer Average FP Average

Immediate Addressing Mode • 10 Programs from SPECInt92 and SPECfp92

Immediate Addressing Mode • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits gcc spice Tex

Short Summary – Memory Addressing • Need to support at least three addressing modes • Displacement, immediate, and register deferred (+ REGISTER) • They represent 75% -- 99% of the addressing modes in benchmarks • The size of the address for displacement mode to be at least 12—16 bits (75% – 99%) • The size of immediate field to be at least 8 – 16 bits (50%— 80%)

Operand Type & Size Typical types: assume word= 32 bits • Character - byte - ASCII or EBCDIC (IBM) - 4 per word • Short integer - 2- bytes, 2’s complement • Integer - one word - 2’s complement • Float - one word - usually IEEE 754 these days • Double precision float - 2 words - IEEE 754 • BCD or packed decimal - 4- bit values packed 8 per word

Data Access Patterns

Short Summary – Type and Size of Operand • The future - as we go to 64 bit machines • Larger offsets, immediate, etc. is likely • Usage of 64 and 128 bit values will increase • DSPs need wider accumulating registers than the size in memory to aid accuracy in fixed-point arithmetic

ALU Operations

What Operations are Needed • Arithmetic + Logical • Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT • Logical operation: AND, OR, XOR, NOT • Data Transfer - copy, load, store • Control - branch, jump, call, return, trap • System - OS and memory management • We’ll ignore these for now - but remember they are needed • Floating Point • Same as arithmetic but usually take bigger operands • Decimal • String - move, compare, search • Graphics – pixel and vertex, compression/decompression operations

load: 22% conditional branch: 20% compare: 16% store: 12% add: 8% and: 6% sub: 5% move register-register: 4% call: 1% return: 1% The most widely executed instructions are the simple operations of an instruction set The top-10 instructions for 80x86 account for 96% of instructions executed Make them fast, as they are the common case Top 10 Instructions for 80x86

Control Instructions are a Big Deal • Jumps - unconditional transfer • Conditional Branches • How is condition code set? – by flag or part of the instruction • How is target specified? How far away is it? • Calls • How is target specified? How far away is it? • Where is return address kept? • How are the arguments passed? Callee vs. Caller save! • Returns • Where is the return address? How far away is it? • How are the results passed?

Breakdown of Control Flows • Call/Returns • Integer: 19% FP: 8% • Jump • Integer: 6% FP: 10% • Conditional Branch • Integer: 75% FP: 82%

Branch Address Specification • Known at compile time for unconditional and conditional branches - hence specified in the instruction • As a register containing the target address • As a PC-relative offset • Consider word length addresses, registers, and instructions • Full address desired? Then pick the register option. • BUT - setup and effective address will take longer. • If you can deal with smaller offset then PC relative works • PC relative is also position independent - so simple linker duty

Returns and Indirect Jumps • Branch target is not known at compile time • Need a way to specify the target dynamically • Use a register • Permit any addressing mode • Regs[R4]  Regs[R4] + Mem[Regs[R1]] • Also useful for • case or switch • Dynamically shared libraries • High-order functions or function pointers

Branch Stats - 90% are PC Relative • Call/Return • TeX = 16%, Spice = 13%, GCC = 10% • Jump • TeX = 18%, Spice = 12%, GCC = 12% • Conditional • TeX = 66%, Spice = 75%, GCC = 78%

Branch Distances

Condition Testing Options PSW: program Switch Word

What kinds of compares do Branches Use? Large comparisons are with zero

Computer Architecture