1 / 68

Computer Architecture Principles Dr. Mike Frank

Computer Architecture Principles Dr. Mike Frank. CDA 5155 Summer 2003 Module #9 Basics of Instruction Set Architectures. Moving on to Chapter 2. Topic: Principles of instruction set design…

vern
Download Presentation

Computer Architecture Principles Dr. Mike Frank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture PrinciplesDr. Mike Frank CDA 5155Summer 2003 Module #9Basics of Instruction Set Architectures

  2. Moving on to Chapter 2... • Topic: Principles of instruction set design… • An instruction set architecture is a specification of a standardized programmer-visible interface to hardware, comprised of: • A set of instructions (really, instruction types) • With associated argument fields, assembly syntax, and machine encoding. • A set of named storage locations • Registers, memory, … Programmer-accessible caches? • A set of addressing modes (ways to name locations) • Often an I/O interface (usu. memory-mapped)

  3. Classifying Architectures

  4. Broad Classes of Processor Architectures Can be distinguished by radically differing programming models: • Instruction-based programming model: • Subdivision of hardware into memory, register file, ALUs, etc. • Traditional assembly-language operations; like high-level RTL statements • Load, add, move, etc. operating on programmer-visible registers • Microinstruction-based programming model: • Similar microarchitecture to that used in instruction-based archs., but… • Microinstructions directly specify control signals to be sent to internal mach. registers, muxes, ALUs, etc.; equivalent to low-level RTL statements • Dataflow / systolic array programming models: • “Program” specifies custom pattern of connectivity among pre-existing arrays of hardware functional units operating in parallel • Examples: MIT J-machine, RAW project • Circuit-based programming models: • “Program” specifies interconnection and internal logic of functional units. • Examples: FPGAs (Field-Programmable Gate Arrays) (Xilinx, Altera) • Others? (E.g., mesh of processing elements, ea. w. all features) Focus of this course

  5. The Best of All Possible Worlds? Chip Multiprocessor:Homogeneous 2-D (or 3-D) mesh of processing elements Heterogeneous Processing Element Systolic dataflow array Superscalar,highly dynamic,heavily pipelined, RISC CPU core Special-purpose units(graphics, signals, media) Local memory hierarchy 1st-level cache 2nd level cache ReconfigurableFPGAcore 3rd-level cache or local DRAM Communications/power/cooling grid

  6. Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F (online at mkp.com) Chapter 2 contents

  7. 2.2. Classifying Architectures • One important classification scheme is by the type of addressing modes supported. • Stack architecture: Operands implicitly on top of a stack. (Early machines, Intel floating-point.) • Accumulator architecture: One operand is implicitly an accumulator (a special register). (Early machs.) • General-purpose register arch.: Operands may be any of a large (typically 10s-100s) # of registers. • Register-memory architectures: One op may be memory. • Load-store architectures: All ops are registers, except in special load and store instructions.

  8. Illustrating Architecture Types Assembly for C:=A+B:

  9. Number of Operands • A further classification is by the max. number of operands, and # that can be memory: e.g., • 2-operand (e.g. a += b) • src/dest(reg), src(reg) • src/dest(reg), src(mem) IBM 360, x86, 68k • src/dest(mem), src(mem) VAX • 3-operand (e.g. a = b+c) • dest(reg), src1(reg), src2(reg) MIPS, PPC, SPARC, &c. • dest(reg), src1(reg), src2(mem) • dest(mem), src1(mem), src2(mem) VAX

  10. Memory Addressing Modes & Conventions

  11. 2.3. Memory Addressing • A memory address n names the location of the (n+1)th “item” in memory. • If each item is a byte (octet, 8-bit chunk), then the ISA’s memory system is byte-addressed. (Standard) • Also possible is numbering with larger chunks (e.g., 32 bits), such memories are called word-addressed. • Objects consisting of several consecutive items might be accessible as a unit: • Bytes, half-words (2 bytes), words (4 bytes), double words (8 bytes).

  12. Endians & Alignment Increasing byteaddress 7 6 5 4 3 2 1 0 4 Word-aligned word at byte address 4. 2 Halfword-aligned word at byte address 2. 1 Byte-aligned (non-aligned) word, at byte address 1. 4 Little-endian byte order (least-significant byte “first”). 3 (MSB) 2 1 0 (LSB) 4 Big-endian byte order (most-significant byte “first”). 0 (LSB) 1 2 3 (MSB)

  13. Addressing Modes • In example assembly syntax in middle column, ( ) indicates memory access. (A typical syntax.) • In RTL syntax on right, [ ] denotes accessing a member of an array, Register or Memory.

  14. Addressing Modes Visualization Mode Name Instr. Field(s) Reg. File Memory Immediate imm reg Register addr Direct reg Indirect “base”address all your baseare belongtous reg imm Displacement + offset

  15. Addr. Mode Vis. Cont. Mode Name Instr. Field(s) Reg. File Memory “base”address reg2 reg1 Indexed + offset MemoryIndirect reg rowsz Scaled reg2 reg1 + (r1)[r2] × Example row size = 8 locations Base address index

  16. Addressing Mode Usage (Out of non-register modes) (on a VAX)

  17. Offset Distribution (Alpha, optimized, SPEC CPU2000)

  18. Popularity of Immediates (Alpha, optimized, SPEC CPU2000)

  19. Distribution of Immediates

  20. Instruction Categories

  21. 2.7. Types of Instructions

  22. Instruction Distribution

  23. Control-Flow Instructions

  24. 2.9. Control Flow Instructions • Four basic types: • (Conditional) branches • (Unconditional) jumps • Procedure calls • Procedure returns • Control flow addressing modes: • Often PC-relative (PC + displacement). Relocatable. • Also useful: register indirect jumps (reg. has addr.). Uses: • Procedure returns • Case / switch statements • Virtual functions / methods (abstract class method calls) • High-order functions / function pointers • Dynamically shared libraries

  25. Conditional Branch Options • Condition Code (CC) Register • E.g.: X86, ARM, PPC, SPARC, … • ALU ops set condition code flags in the CCR • Branch just checks the flag • Condition register • E.g.: Alpha, MIPS • Comparison instruction puts result in a GPR • Branch instruction checks the register • Compare & Branch • E.g.: PA-RISC, VAX • Compare & branch in 1 instruction.

  26. Special Control-Flow Instrs. • In DSPs: • Repeat instruction • Repeat subsequent code block n times • Avoids some loop overhead

  27. Procedure Calling Conventions • Two major calling conventions: • Caller saves: • Before the call, procedure caller saves registers that will be needed later • Callee saves: • Inside the call, called procedure saves registers that it will overwrite • Can be more efficient if many small procedures • Many archs. use a combination of schemes: • E.g., MIPS: Some registers caller-saves, some callee-saves

  28. Control Flow Instr. Distrib.

  29. Branch Distances

  30. Comparison Types

  31. Data Access Sizes

  32. Outline of Today’s Lecture • Additions for signal & media processing: • Addressing modes • Operands • Instruction types • Instruction set encodings • Role of compilers • Examples: MIPS, Trimedia • Fallacies & Pitfalls

  33. Introduction Classifying ISAs Addressing modes …for signal processing Type & size of operands Operands for media & Signal Processing Operations in the IS Ops for media & SP Control flow instrs. Encoding an IS Role of Compilers The MIPS Architecture Trimedia TM32 CPU Fallacies & Pitfalls Closing material Also, see Appendices C-F(online at mkp.com) Chapter 2 contents Last Lecture / This Lecture

  34. DSP & Multimedia Instruction-Set Extensions

  35. Special DSP/media Addr. Modes • Modulo or circular addressing: • For dealing with circular buffers for handling infinite, continuous streams of data • Automatically increment pointer, reset to start of buffer if at end • Bit reverse addressing: • Facilitates Fast Fourier Transform (FFT) operation • The n low-order bits of an address are reversed before making the access • Special modes rarely used even in DSP code • Mainly just in hand-coded assembly library routines • Strided, gather/scatter addressing: • Used in SIMD vector machines

  36. Special DSP & Media Operands • Media processing (e.g., 2-D & 3-D graphics): • Vertex (x,y,z,w coordinates, each a 32-bit float) • w is a visibility or color value • Pixel (R,G,B,A channels, each an 8-bit integer) • Red, Green, Blue; A is transparency • Signal processing • Fixed point (fractions between −1 and +1)

  37. Special DSP & Media Operations • Partitioned add, etc. • Use same hardware for multiple small opsas for a single large op • E.g. use 1 same hardware that makes up one 64-bit ALU to do four 16-bit adds simultaneously • Or, 2 single-precision FP ops w. 1 instruction • Examples: Intel MMX, PowerPC AltiVec • SIMD (single-inst., multiple data) / vector ops • Same idea, more general – used on supercomputers • Saturating add, etc. • Max out @ MAXINT, instead of throw overflow exception • Multiply-accumulate (MAC) • Used in dot products for vector & matrix multiplications • Others: • Max, min, pack, unpack, merge, permute, shuffle, abs

  38. 2.10. Instruction Set Encodings • Competing forces in IS encoding design: • Want as many registers & modes as possible • Large register & mode fields  larger programs • Want simplicity of pipelined execution path • Some solutions: • Variable-length encoding (VAX, x86) • Fixed-length encoding (most RISC) • Hybrid (e.g., MIPS16, Thumb) • Dynamic decompression (IBM CodePack)

  39. Instruction Set Encodings

  40. Compiler Technology and ISA Design

  41. 2.11. Compiler Passes

  42. Compiler Optimizations

  43. Compiler Optimizations cont.

  44. Effect of Optimization

  45. Compilers Need Architectures that… • Provide regularity • Orthogonality (independence) of: • Registers used • Addressing modes • Operations used • Provide primitives, not solutions • Don’t directly support specific kernels or languages • Simplify trade-offs among alternatives • Make easy to tell fastest code sequence @ compile time • Don’t interpret values known at compile time • Allow compile-time constants to be provided in immediates

  46. ISA Example:MIPS

  47. Design Principles used in MIPS • 2.2. Use GPRs, load-store architecture • 2.3. Best addr. Modes: Displacement (12-16 bits), immediate (8-16 bits), register indirect. • 2.5. Data sizes/types: 8-64 bit integers, 64-bit IEEE 754 standard doubles • 2.7. Support load, store, add, subtract, move, shift. • 2.9. Compares: =, ≠, <, branch (relative 8+-bit), jump, call, return • 2.10. Fixed encoding for performance, variable for code size • 2.11. GPRs, orthogonality, simplicity

  48. MIPS64 Registers • 32-bit instructions • 32 64-bit GPRs, R0-R31. • Really, only 31 – R0 is just a constant 0. • 32 64-bit FPRs, F0-F31 • Can hold 32-bit floats also (with other ½ unused). • “SIMD” extensions operate on 2 floats in 1 FPR • A few special registers • Floating-point status register • Load/store 8-, 16-, 32-, 64-bit integers • All sign-extended to fill 64-bit GPR • Also 32- bit floats/doubles

  49. MIPS Addressing Modes • Register (arith./logical ops only) • Immediate (arith./logical only) & Displacement (load/stores only) • 16-bit immediate / offset field • Register indirect: use 0 as displacement offset • Direct (absolute): use R0 as displacement base • Byte-addressed memory, 64-bit address • Software-settable big-endian/little-endian flag • Alignment required

  50. MIPS Instruction Layouts

More Related