Computer Architecture Principles Dr. Mike Frank

Computer Architecture PrinciplesDr. Mike Frank CDA 5155Summer 2003 Module #9Basics of Instruction Set Architectures

Moving on to Chapter 2... • Topic: Principles of instruction set design… • An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: • A set of instructions (really, instruction types) • With associated argument fields, assembly syntax, and machine encoding. • A set of named storage locations • Registers, memory, … Programmer-accessible caches? • A set of addressing modes (ways to name locations) • Often an I/O interface (usu. memory-mapped)

Classifying Architectures

Broad Classes of Processor Architectures Can be distinguished by radically differing programming models: • Instruction-based programming model: • Subdivision of hardware into memory, register file, ALUs, etc. • Traditional assembly-language operations; like high-level RTL statements • Load, add, move, etc. operating on programmer-visible registers • Microinstruction-based programming model: • Similar microarchitecture to that used in instruction-based archs., but… • Microinstructions directly specify control signals to be sent to internal mach. registers, muxes, ALUs, etc.; equivalent to low-level RTL statements • Dataflow / systolic array programming models: • “Program” specifies custom pattern of connectivity among pre-existing arrays of hardware functional units operating in parallel • Examples: MIT J-machine, RAW project • Circuit-based programming models: • “Program” specifies interconnection and internal logic of functional units. • Examples: FPGAs (Field-Programmable Gate Arrays) (Xilinx, Altera) • Others? (E.g., mesh of processing elements, ea. w. all features) Focus of this course

The Best of All Possible Worlds? Chip Multiprocessor:Homogeneous 2-D (or 3-D) mesh of processing elements Heterogeneous Processing Element Systolic dataflow array Superscalar,highly dynamic,heavily pipelined, RISC CPU core Special-purpose units(graphics, signals, media) Local memory hierarchy 1st-level cache 2nd level cache ReconfigurableFPGAcore 3rd-level cache or local DRAM Communications/power/cooling grid

Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com) Chapter 2 contents

2.2. Classifying Architectures • One important classification scheme is by the type of addressing modes supported. • Stack architecture: Operands implicitly on top of a stack. (Early machines, Intel floating-point.) • Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs.) • General-purpose register arch.: Operands may be any of a large (typically 10s-100s) # of registers. • Register-memory architectures: One op may be memory. • Load-store architectures: All ops are registers, except in special load and store instructions.

Illustrating Architecture Types Assembly for C:=A+B:

Number of Operands • A further classification is by the max. number of operands, and # that can be memory: e.g., • 2-operand (e.g. a += b) • src/dest(reg), src(reg) • src/dest(reg), src(mem) IBM 360, x86, 68k • src/dest(mem), src(mem) VAX • 3-operand (e.g. a = b+c) • dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c. • dest(reg), src1(reg), src2(mem) • dest(mem), src1(mem), src2(mem) VAX

Memory Addressing Modes & Conventions

2.3. Memory Addressing • A memory address n names the location of the (n+1)th “item” in memory. • If each item is a byte (octet, 8-bit chunk), then the ISA’s memory system is byte-addressed. (Standard) • Also possible is numbering with larger chunks (e.g., 32 bits), such memories are called word-addressed. • Objects consisting of several consecutive items might be accessible as a unit: • Bytes, half-words (2 bytes), words (4 bytes), double words (8 bytes).

Endians & Alignment Increasing byteaddress 7 6 5 4 3 2 1 0 4 Word-aligned word at byte address 4. 2 Halfword-aligned word at byte address 2. 1 Byte-aligned (non-aligned) word, at byte address 1. 4 Little-endian byte order (least-significant byte “first”). 3 (MSB) 2 1 0 (LSB) 4 Big-endian byte order (most-significant byte “first”). 0 (LSB) 1 2 3 (MSB)

Addressing Modes • In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax.) • In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory.

Addressing Modes Visualization Mode Name Instr. Field(s) Reg. File Memory Immediate imm reg Register addr Direct reg Indirect “base”address all your baseare belongtous reg imm Displacement + offset

Addr. Mode Vis. Cont. Mode Name Instr. Field(s) Reg. File Memory “base”address reg2 reg1 Indexed + offset MemoryIndirect reg rowsz Scaled reg2 reg1 + (r1)[r2] × Example row size = 8 locations Base address index

Addressing Mode Usage (Out of non-register modes) (on a VAX)

Offset Distribution (Alpha, optimized, SPEC CPU2000)

Popularity of Immediates (Alpha, optimized, SPEC CPU2000)

Distribution of Immediates

Instruction Categories

2.7. Types of Instructions

Instruction Distribution

Control-Flow Instructions

2.9. Control Flow Instructions • Four basic types: • (Conditional) branches • (Unconditional) jumps • Procedure calls • Procedure returns • Control flow addressing modes: • Often PC-relative (PC + displacement). Relocatable. • Also useful: register indirect jumps (reg. has addr.). Uses: • Procedure returns • Case / switch statements • Virtual functions / methods (abstract class method calls) • High-order functions / function pointers • Dynamically shared libraries

Conditional Branch Options • Condition Code (CC) Register • E.g.: X86, ARM, PPC, SPARC, … • ALU ops set condition code flags in the CCR • Branch just checks the flag • Condition register • E.g.: Alpha, MIPS • Comparison instruction puts result in a GPR • Branch instruction checks the register • Compare & Branch • E.g.: PA-RISC, VAX • Compare & branch in 1 instruction.

Special Control-Flow Instrs. • In DSPs: • Repeat instruction • Repeat subsequent code block n times • Avoids some loop overhead

Procedure Calling Conventions • Two major calling conventions: • Caller saves: • Before the call, procedure caller saves registers that will be needed later • Callee saves: • Inside the call, called procedure saves registers that it will overwrite • Can be more efficient if many small procedures • Many archs. use a combination of schemes: • E.g., MIPS: Some registers caller-saves, some callee-saves

Control Flow Instr. Distrib.

Branch Distances

Comparison Types

Data Access Sizes

Outline of Today’s Lecture • Additions for signal & media processing: • Addressing modes • Operands • Instruction types • Instruction set encodings • Role of compilers • Examples: MIPS, Trimedia • Fallacies & Pitfalls

Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F(online at mkp.com) Chapter 2 contents Last Lecture / This Lecture

DSP & Multimedia Instruction-Set Extensions

Special DSP/media Addr. Modes • Modulo or circular addressing: • For dealing with circular buffers for handling infinite, continuous streams of data • Automatically increment pointer, reset to start of buffer if at end • Bit reverse addressing: • Facilitates Fast Fourier Transform (FFT) operation • The n low-order bits of an address are reversed before making the access • Special modes rarely used even in DSP code • Mainly just in hand-coded assembly library routines • Strided, gather/scatter addressing: • Used in SIMD vector machines

Special DSP & Media Operands • Media processing (e.g., 2-D & 3-D graphics): • Vertex (x,y,z,w coordinates, each a 32-bit float) • w is a visibility or color value • Pixel (R,G,B,A channels, each an 8-bit integer) • Red, Green, Blue; A is transparency • Signal processing • Fixed point (fractions between −1 and +1)

Special DSP & Media Operations • Partitioned add, etc. • Use same hardware for multiple small opsas for a single large op • E.g. use 1 same hardware that makes up one 64-bit ALU to do four 16-bit adds simultaneously • Or, 2 single-precision FP ops w. 1 instruction • Examples: Intel MMX, PowerPC AltiVec • SIMD (single-inst., multiple data) / vector ops • Same idea, more general – used on supercomputers • Saturating add, etc. • Max out @ MAXINT, instead of throw overflow exception • Multiply-accumulate (MAC) • Used in dot products for vector & matrix multiplications • Others: • Max, min, pack, unpack, merge, permute, shuffle, abs

2.10. Instruction Set Encodings • Competing forces in IS encoding design: • Want as many registers & modes as possible • Large register & mode fields  larger programs • Want simplicity of pipelined execution path • Some solutions: • Variable-length encoding (VAX, x86) • Fixed-length encoding (most RISC) • Hybrid (e.g., MIPS16, Thumb) • Dynamic decompression (IBM CodePack)

Instruction Set Encodings

Compiler Technology and ISA Design

2.11. Compiler Passes

Compiler Optimizations

Compiler Optimizations cont.

Effect of Optimization

Compilers Need Architectures that… • Provide regularity • Orthogonality (independence) of: • Registers used • Addressing modes • Operations used • Provide primitives, not solutions • Don’t directly support specific kernels or languages • Simplify trade-offs among alternatives • Make easy to tell fastest code sequence @ compile time • Don’t interpret values known at compile time • Allow compile-time constants to be provided in immediates

ISA Example:MIPS

Design Principles used in MIPS • 2.2. Use GPRs, load-store architecture • 2.3. Best addr. Modes: Displacement (12-16 bits), immediate (8-16 bits), register indirect. • 2.5. Data sizes/types: 8-64 bit integers, 64-bit IEEE 754 standard doubles • 2.7. Support load, store, add, subtract, move, shift. • 2.9. Compares: =, ≠, <, branch (relative 8+-bit), jump, call, return • 2.10. Fixed encoding for performance, variable for code size • 2.11. GPRs, orthogonality, simplicity

MIPS64 Registers • 32-bit instructions • 32 64-bit GPRs, R0-R31. • Really, only 31 – R0 is just a constant 0. • 32 64-bit FPRs, F0-F31 • Can hold 32-bit floats also (with other ½ unused). • “SIMD” extensions operate on 2 floats in 1 FPR • A few special registers • Floating-point status register • Load/store 8-, 16-, 32-, 64-bit integers • All sign-extended to fill 64-bit GPR • Also 32- bit floats/doubles

MIPS Addressing Modes • Register (arith./logical ops only) • Immediate (arith./logical only) & Displacement (load/stores only) • 16-bit immediate / offset field • Register indirect: use 0 as displacement offset • Direct (absolute): use R0 as displacement base • Byte-addressed memory, 64-bit address • Software-settable big-endian/little-endian flag • Alignment required

MIPS Instruction Layouts

Computer Architecture Principles Dr. Mike Frank