1 / 26

What is an ISA?

What is an ISA?. Hardware-software interface Instruction Set Architecture (ISA) defines: STATE OF THE PROGRAM (processor registers, memory) WHAT INSTRUCTIONS DO: Semantics of instructions, how they update state HOW INSTRUCTIONS ARE REPRESENTED: Syntax (bit encodings)

juanwells
Download Presentation

What is an ISA?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is an ISA? • Hardware-software interface • Instruction Set Architecture (ISA) defines: • STATE OF THE PROGRAM (processor registers, memory) • WHAT INSTRUCTIONS DO: Semantics of instructions, how they update state • HOW INSTRUCTIONS ARE REPRESENTED: Syntax (bit encodings) • …selected so that implications of the above on hardware design/compiler design are optimal • Example: register specifier moves around between different instructions-- need multiple lines and a mux before the register file.

  2. Why is the ISA important? • Fixed h/w-s/w interface for a generation of processors • IBM realized early the value of a fixed ISA • But: “stuck” with bad decisions for long time • Recent developments mitigate ISA problems (e.g., x86 micro-ops, Transmeta, virtual machines) • ISA decisions affect: (Revisit RISC vs. CISC…) • Memory cost of the machine • Short vs. long bit encodings • high vs. low semantic meaning per instruction • Hardware design • Simple, uniform-complexity ops => efficient pipeline • Don’t build hardware for instructions that never get used • Compiler and programming language issues • How much can compiler exploit ISA to optimize perf. • How well does ISA support high-level lang. constructs • Choice for hand coding vs. compiler generated code: semantics are easy to use vs. easy to generate code for

  3. ISA Design Decisions & Outline: • Style of operand specification: stack, accumulator, registers, etc. • Operand access limitations • Addressing modes for operands • Semantics: • Mix of operations supported • Control transfers • Encoding tradeoffs • Compiler influence • Example: MIPS

  4. Styles of ISAs

  5. Styles of ISAs • All implement: • C=A+B

  6. Why stacks, accumulators • Stacks: • Very compact format • All calculation operations take zero operands • Example use: Java bytecode (low network b/w) • Theoretically shortest code for implementing arithmetic expressions • All HP calculator fanatics know this • Accumulator: • Also a very compact format • Less dependence on memory than stack-based • For both: • Compact implies memory efficient • Good if memory is expensive

  7. Why registers? • Faster than memory • Latency: raw access time (once address is known) • Cache access: 2-3 cycles (typical) • Register access: 1 cycle • Register file typically smaller than data cache • Register file doesn’t need tag check logic • Bandwidth: more practical to multiport a register file • ILP requires large number of operand ports • ILP requirements • High-performance scheduling (ILP) requires detecting data dependent/independent operations early in pipeline • Register “addresses” are known at instruction decode time • Memory addresses are known quite late due to address computation

  8. Why Registers? (cont.) • Less memory traffic if values are in registers • Program runs faster if variables are inside registers (compiler does “register allocation”) • Bus can be used for other things (e.g., I/O) • More flexible for compiler/hardware scheduling • (A*B) - (C*D) - (E*F) • A*B in R1, -C*D in R2, -E*F in R3: can easily rearrange ADD instructions • A to F on the stack: less flexible • Need to add swaps/rotates or completely rewrite code

  9. How many registers? • Depends on: • Compiler ability • Program characteristics • Lots-o-registers enable two important optimizations: • Register allocation (more variables can be in registers) • Limiting reuse of registers improves parallelism • Reuse example: Load R2, A; Load R3, B; Load R4, C; Load R5, D Add R1, R2, R3 Add R2, R5, R4 (reuse of R2) vs. Add R1, R2, R3 Add R6, R4, R5 (no reuse: had R6) • Without reuse Adds are “parallelizable” if there are two adders • Instruction level parallelism (ILP) • ILP ~ Average (CPI)-1 ~ Number of registers Conflict artificially serializes the two instructions

  10. Operand access limitations • Load/store (0,3) • (+) Fixed-length instructions possible: easy fetch/decode • (+) Simpler h/w: efficient pipeline & potentially lower CT • (-) Higher instruction count (IC) • (-) Fixed-length instructions are wasteful • Register/memory (1,2) • (+) No need for extra loads • (+) “A few lengths” better uses bits • (-) Destroys source operand (e.g., Add R1,R2) • (-) May impact CPI • Memory/memory • (+) Most compact (code density) • (-) High memory traffic (memory bottleneck) Good code density

  11. Alignment • Byte alignment • Any access is accommodated • Word alignment • Only accesses that are aligned at natural word boundaries are accommodated due to DRAM/SRAM organization • Reduces number of reads/writes to memory • Eliminates hardware for alignment (typically expensive) • Often handle misalignment via software: • Compiler detects & generates appropriate instructions • …or O/S detects and runs “fixit” routine memory (bytes) 0 1 2 Unaligned access 3 4 5 6 7 Word size = 4 bytes read #1 0 1 2 3 read #2 4 5 6 7 Asking for words beginning at 0 or 4 is OK Asking for other words requires two reads (e.g., ask for word starting at 2) 4 5 2 3 reorder 2 3 4 5

  12. MSB LSB MSB LSB 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 Endian-ness • Where is the most-significant byte (MSB) in a word? • Little-endian (e.g., x86) • “little”-endian comes from interpreting byte address 0 as the “least”-significant byte • Big-endian (e.g., IBM PowerPC) • “big”-endian comes from interpreting byte address 0 as the “most”-significant byte Byte address Byte address

  13. Common addressing modes • Register • Add R4, R3 • R4 = R4 + R3 • Used when value is in a register • Immediate • Add R4, #3 • R4 = R4 + 3 • Useful for small constants, which occur frequently • Displacement • Add R4, 100(R1) • R4 = R4 + Mem[100+R1] • Accesses the frame (arguments, local variables) • Accesses the global data segment • Accesses fields of a data struct

  14. Addressing modes (cont.) • Register deferred/Register indirect • Add R3, (R1) • R3 = R3 + Mem[R1] • Access using a computed address • Indexed • Add R3, (R1 + R2) • R3 = R3 + Mem[R1 + R2] • Array accesses • R1 = base, R2 = index • Direct/Absolute • Add R1, (1001) • R1 = R1 + M[1001] • Accessing global (“static”) data

  15. Addressing modes (cont.) • Memory indirect/Memory deferred • Add R1, @(R3) • R1 = R1 + Mem[Mem[R3]] • Pointer dereferencing: x = *p; (if p is not register-allocated) • Autoincrement/Postincrement • Add R1, (R2)+ • R1 = R1 + Mem[R2]; R2 = R2 + d (d is size of operation) • Looping through arrays, stack pop • Autodecrement/Predecrement • Add R1, -(R2) • R2 = R2 - d; R1 = R1 + Mem[R2] (d is size of operation) • Same uses as autoincrement, stack push • Scaled • Add R1, 100(R2)[R3] • R1 = R1 + Mem[100+R2+R3*d] (d is size of operation) • Array accesses for non-byte-sized elements

  16. Wisdom about modes • Need: • Register, Displacement, Immediate and optionally Indexed (indexed simplifies array accesses) • Displacement size 12-16 bits (empirical) • Immediate: 8 to 16 bits (empirical) • Can synthesize the rest from simpler instructions • Example-- MIPS architecture: • Register, displacement, Immediate modes only • both immediate and displacement: 16 bits • Choice depends on workload! • For example, floating-point codes might require larger immediates, or 64bit wordsize machines might also require larger immediates (for *p++ kind of operations)

  17. Control transfer semantics • Types of branches • Conditional • Unconditional • Normal • Call • Return • PC Relative (Branch) vs. Absolute (Jump) • Branch allows relocatable (“position independent”) code • Jump allows branching further than PC relative

  18. Parts of a control transfer • WHERE • Determine target address • WHETHER • Determine if transfer should occur or not • WHEN • Determine when in time the transfer should occur • Each of the three decisions can be decoupled

  19. Types of control transfer (cont). • All three together: Compare and branch instruction • Br (R1 = R2), destination • (+) A single instruction • (-) Heavy hardware requirement, inflexible scheduling • WHETHER separate from WHERE/WHEN: • Condition code register (CMP R1,R2 … BEQ dest) • (+) Sometimes test happens “for free” • (-) Hard for compiler to figure out which instructions depend on CC register • Condition register (SUB R1,R2 … BEQ R1, dest) • (+) Simple to implement, dependencies between instructions are obvious to compiler • (-) Uses a register (“register pressure”)

  20. Prepare-to-branch • Decouple all three of WHERE / WHETHER / WHEN • WHERE: PBR BTR1 = destination • BTR1 = “Branch target register #1” • WHETHER: CMP PR2 = (R1 = R2) • PR2 = “Predicate register #2” • WHEN BR BTR1 if PR2 • (+) Schedule each instruction so it happens during “free time” when hardware is idle • (-) Three instructions: higher IC • From the HP Labs PlayDoh architecture

  21. Instruction Encoding tradeoffs • Variable width • Common instructions are short (1-2 bytes), less common or more complex instructions are long (>2 bytes) • (+) Very versatile, uses memory efficiently • (-) Instruction words must be decoded before number of instructions is known • Fixed width • Typically 1 instruction per 32-bit word (Alpha is 2 instructions per word) • (+) Every instruction word is an instruction, Easier to fetch/decode • (-) Uses memory inefficiently

  22. Addressing mode encoding • Each operand has a “mode” field • Also called “address specifiers” • VAX, 68000 • (+) Very versatile • (-) Encourages variable-width instructions (hard decode) • Opcode specifies addressing mode • Most RISCs • (+) Encourages fixed-width instructions (easy decode) • (+) “Natural” for a load/store ISA • (-) Limits what every instruction can do • But only matters for loads and stores

  23. Compiler impact • High-level opt: • Use a “virtual source level” representation • Loop interchange, etc. • Low-level opt: • Clean up parser refuse • Each “optimization pass” runs as a filter • Enhance parallelism • Code generation: • Allocate registers • Schedule code for high performance • More later on this Parse High-level intermediate language High-level Optimize Low-level intermediate language Low-level Optimize Low-level intermediate language Code generation: Allocate, Schedule translate Assembly code

  24. 6 5 5 Opcode rs1 rd 6 5 5 5 Opcode rs1 rs2 rd 6 Opcode Example: MIPS A load/store, fixed-encoding architecture with a “condition register” architecture I-type instruction 16 Immediate Load, store, all immediate operations, conditional branches (rd unused) Jump through register, call through register (“jump and link register”) R-type instruction Opcode is in the same place for every instruction 5 6 Shamt Func Register-register ALU operations “Func” is an opcode extension J-type instruction 26 Offset added to PC Jump, call (“jump and link”), trap and return from exception

  25. ISA of MIPS 64 0 R0 is permanent 0 R0 R1 R2 ... R31 PC 64 0 Use one half of Fi for single precision ops F0 F1 F2 ... F31 Load/store architecture Transfer sizes: B (byte), H (halfword), W (word), D (double word) No unaligned accesses allowed Only 3 addressing modes: register, immediate, displacement

  26. MIPS example code DADDI R1,R0,10 Put 10 into R1 (R0 = 0) LD R2,A Put A in R2 Loop L.D F0, 0(R2) Load double FP value into F0 ADD.D F4, F0, F2 Add F2 to F0 S.D 0(R2),F4 Store result back to memory DADDI R1,R1,-1 Decrement I DADDI R2,R2,8 Increment loop pointer BNE R1,Loop

More Related