1 / 59

EECS 361 Computer Architecture Lecture 3 – Instruction Set Architecture

EECS 361 Computer Architecture Lecture 3 – Instruction Set Architecture. Prof. Alok N. Choudhary choudhar@ece.northwestern.edu. Today’s Lecture. Quick Review of Last Week Classification of Instruction Set Architectures Instruction Set Architecture Design Decisions Operands Annoucements

bryanw
Download Presentation

EECS 361 Computer Architecture Lecture 3 – Instruction Set Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 361Computer ArchitectureLecture 3 – Instruction Set Architecture Prof. Alok N. Choudhary choudhar@ece.northwestern.edu

  2. Today’s Lecture • Quick Review of Last Week • Classification of Instruction Set Architectures • Instruction Set Architecture Design Decisions • Operands • Annoucements • Operations • Memory Addressing • Instruction Formats • Instruction Sequencing • Language and Compiler Driven Decisions

  3. Summary of Lecture 2

  4. DC to Paris Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 Two Notions of “Performance” Plane • Which has higher performance? • Execution time (response time, latency, …) • Time to do a task • Throughput (bandwidth, …) • Tasks per unit of time • Response time and throughput often are in opposition Boeing 747 Concorde

  5. Definitions • Performance is typically in units-per-second • bigger is better • If we are primarily concerned with response time • performance = 1 execution_time • " X is n times faster than Y" means

  6. Organizational Trade-offs Application Programming Language Compiler ISA Instruction Mix Datapath CPI Control Function Units Transistors Wires Pins Cycle Time CPI is a useful design measure relating the Instruction Set Architecture with the Implementation of that architecture, and the program measured

  7. Principal Design Metrics: CPI and Cycle Time

  8. Amdahl's “Law”: Make the Common Case Fast • Speedup due to enhancement E: • ExTime w/o E Performance w/ E • Speedup(E) = -------------------- = --------------------- • ExTime w/ E Performance w/o E • Suppose that enhancement E accelerates a fraction F of the task • by a factor S and the remainder of the task is unaffected then, • ExTime(with E) = ((1-F) + F/S) X ExTime(without E) • Speedup(with E) = ExTime(without E) ÷ ((1-F) + F/S) X ExTime(without E) Performance improvement is limited by how much the improved feature is used  Invest resources where time is spent.

  9. Classification of Instruction Set Architectures

  10. software instruction set hardware Instruction Set Design • Multiple Implementations: 8086  Pentium 4 • ISAs evolve: MIPS-I, MIPS-II, MIPS-II, MIPS-IV, MIPS,MDMX, MIPS-32, MIPS-64

  11. Typical Processor Execution Cycle Obtain instruction from program storage Instruction Fetch Determine required actions and instruction size Instruction Decode Locate and obtain operand data Operand Fetch Compute result value or status Execute Deposit results in register or storage for later use Result Store Determine successor instruction Next Instruction

  12. Instruction and Data Memory: Unified or Separate Computer Program (Instructions) Programmer's View ADD SUBTRACT AND OR COMPARE . . . 01010 01110 10011 10001 11010 . . . Memory CPU I/O Computer's View Princeton (Von Neumann) Architecture Harvard Architecture --- Data and Instructions mixed in same unified memory --- Program as data --- Storage utilization --- Single memory interface --- Data & Instructions in separate memories --- Has advantages in certain high performance implementations --- Can optimize each memory

  13. Basic Addressing Classes Declining cost of registers

  14. Stack Architectures

  15. Accumulator Architectures

  16. Register-Set Architectures

  17. Register-to-Register: Load-Store Architectures

  18. Register-to-Memory Architectures

  19. Memory-to-Memory Architectures

  20. Instruction Set Architecture Design Decisions

  21. Basic Issues in Instruction Set Design • What data types are supported. What size. • What operations (and how many) should be provided • LD/ST/INC/BRN sufficient to encode any computation, or just Sub and Branch! • But not useful because programs too long! • How (and how many) operands are specified • Most operations are dyadic (eg, A <- B + C) • Some are monadic (eg, A <- ~B) • Location of operands and result • where other than memory? • how many explicit operands? • how are memory operands located? • which can or cannot be in memory? • How are they addressed • How to encode these into consistent instruction formats • Instructions should be multiples of basic data/address widths • Encoding • Typical instruction set: • 32 bit word • basic operand addresses are 32 bits long • basic operands, like integers, are 32 bits long • in general case, instruction could reference 3 operands (A := B + C) • Typical challenge: • encode operations in a small number of bits Driven by static measurement and dynamic tracing of selected benchmarks and workloads.

  22. Operands

  23. Register Register (register-memory) (load-store) Load R1,A Add R1,B Store C, R1 Comparing Number of Instructions Code sequence for (C = A + B) for four classes of instruction sets: Stack Accumulator Push A Load A Load R1,A Push B Add B Load R2,B Add Store C Add R3,R1,R2 Pop C Store C,R3

  24. Examples of Register Usage

  25. General Purpose Registers Dominate • 1975-2002 all machines use general purpose registers • Advantages of registers • Registers are faster than memory • Registers compiler technology has evolved to efficiently generate code for register files • E.g., (A*B) – (C*D) – (E*F) can do multiplies in any order vs. stack • Registers can hold variables • Memory traffic is reduced, so program is sped up (since registers are faster than memory) • Code density improves (since register named with fewer bits than memory location) • Registers imply operand locality

  26. Operand Size Usage • Support for these data sizes and types: 8-bit, 16-bit, 32-bit integers and 32-bit and 64-bit IEEE 754 floating point numbers

  27. Announcements • Next lecture • MIPS Instruction Set

  28. Operations

  29. Typical Operations (little change since 1960) Load (from memory) Store (to memory) memory-to-memory move register-to-register move input (from I/O device) output (to I/O device) push, pop (to/from stack) Data Movement integer (binary + decimal) or FP Add, Subtract, Multiply, Divide Arithmetic shift left/right, rotate left/right Shift not, and, or, set, clear Logical unconditional, conditional Control (Jump/Branch) call, return Subroutine Linkage trap, return Interrupt test & set (atomic r-m-w) Synchronization search, translate String parallel subword ops (4 16bit add) Graphics (MMX)

  30. Top 10 80x86 Instructions

  31. Memory Addressing

  32. Memory Addressing • Since 1980, almost every machine uses addresses to level of 8-bits (byte) • Two questions for design of ISA: • Since could read a 32-but word as four loads of bytes from sequential byte address of as one load word from a single byte address, how do byte addresses map onto words? • Can a word be placed on any byte boundary?

  33. Mapping Word Data into a Byte Addressable Memory: Endianess Big Endian: address of most significant byte = word address (xx00 = Big End of word) IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA • Little Endian: address of least significant byte = word address (xx00 = Little End of word) • Intel 80x86, DEC Vax, DEC Alpha (Windows NT) Big Endian Little Endian

  34. 0 1 2 3 Aligned Not Aligned Mapping Word Data into a Byte Addressable Memory: Alignment Alignment: require that objects fall on address that is multiple of their size.

  35. Addressing Modes

  36. Common Memory Addressing Modes • Measured on the VAX-11 • Register operations account for 51% of all references • ~75% - displacement and immediate • ~85% - displacement, immediate and register indirect

  37. Displacement Address Size • Average of 5 SPECint92 and 5 SPECfp92 programs • ~1% of addresses > 16-bits • 12 ~ 16 bits of displacement cover most usage (+ and -)

  38. Frequency of Immediates (Instruction Literals) • ~25% of all loads and ALU operations use immediates • 15~20% of all instructions use immediates

  39. Size of Immediates • 50% to 60% fit within 8 bits • 75% to 80% fit within 16 bits

  40. Addressing Summary • Data Addressing modes that are important: • Displacement, Immediate, Register Indirect • Displacement size should be 12 to 16 bits • Immediate size should be 8 to 16 bits

  41. Instruction Formats

  42. Instruction Format • Specify • Operation / Data Type • Operands • Stack and Accumulator architectures have implied operand addressing • If have many memory operands per instruction and/or many addressing modes: • Need one address specifier per operand • If have load-store machine with 1 address per instruction and one or two addressing modes: • Can encode addressing mode in the opcode

  43. Encoding … Variable: Fixed: Hybrid: … • If code size is most important, use variable length instructions • If performance is most important, use fixed length instructions • Recent embedded machines (ARM, MIPS) added optional mode to execute subset of 16-bit wide instructions (Thumb, MIPS16); per procedure decide performance or density • Some architectures actually exploring on-the-fly decompression for more density.

  44. Operation Summary Support these simple instructions, since they will dominate the number of instructions executed: load, store, add, subtract, move register-register, and, shift, compare equal, compare not equal, branch, jump, call, return;

  45. register register Example: MIPS Instruction Formats and Addressing Modes • All instructions 32 bits wide Register (direct) op rs rt rd Immediate immed op rs rt Base+index immed op rs rt Memory + PC-relative immed op rs rt Memory + PC

  46. CPI Instruction Count Cycle Time Instruction Set Design Metrics • Static Metrics • How many bytes does the program occupy in memory? • Dynamic Metrics • How many instructions are executed? • How many bytes does the processor fetch to execute the program? • How many clocks are required per instruction? • How "lean" a clock is practical?

  47. Instruction Sequencing

  48. Instruction Sequencing • The next instruction to be executed is typically implied • Instructions execute sequentially • Instruction sequencing increments a Program Counter • Sequencing flow is disrupted conditionally and unconditionally • The ability of computers to test results and conditionally instructions is one of the reasons computers have become so useful Instruction 1 Instruction 2 Instruction 3 Instruction 1 Instruction 2 Conditional Branch Instruction 4 Branch instructions are ~20% of all instructions executed

  49. Dynamic Frequency

  50. Condition Testing • ° Condition Codes • Processor status bits are set as a side-effect of arithmetic instructions (possibly on Moves) or explicitly by compare or test instructions. • ex: add r1, r2, r3 • bz label • ° Condition Register • Ex: cmp r1, r2, r3 • bgt r1, label • ° Compare and Branch • Ex: bgt r1, r2, label

More Related