Computer Architecture

Princess Sumaya University for Technology Computer Architecture Dr. Esam Al_Qaralleh

Review computer arctecture ~ PSUT

The Von Neumann Machine, 1945 • The Von Neumann model consists of five major components: • input unit • output unit • ALU • memory unit • control unit. • Sequential Execution computer arctecture ~ PSUT

Von Neumann Model • A refinement of the Von Neumann model, the system bus model has a CPU (ALU and control), memory, and an input/output unit. • Communication among components is handled by a shared pathway called the system bus, which is made up of the data bus, the address bus, and the control bus. There is also a power bus, and some architectures may also have a separate I/O bus. computer arctecture ~ PSUT

Performance • Both Hardware and Software affect performance: • Algorithm determines number of source-level statements • Language/Compiler/Architecture determine machine instructions • Processor/Memory determine how fast instructions are executed computer arctecture ~ PSUT

Computer Architecture • Instruction Set Architecture - ISA refers to the actual programmer-visible machine interface such as instruction set, registers, memory organization and exception handling. Two main approaches: RISC and CISC architectures. computer arctecture ~ PSUT

Applications Change over Time • Data-sets & memory requirements  larger • Cache & memory architecture become more critical • Standalone  networked • IO integration & system software become more critical • Single task  multiple tasks • Parallel architectures become critical • • Limited IO requirements  rich IO requirements • 60s: tapes & punch cards • 70s: character oriented displays • 80s: video displays, audio, hard disks • 90s: 3D graphics; networking, high-quality audio • 00s: real-time video, immersion, … computer arctecture ~ PSUT

Application Properties toExploit in Computer Design • Locality in memory/IO references • Programs work on subset of instructions/data at any point in time • Both spatial and temporal locality • Parallelism • Data-level (DLP): same operation on every element of a data sequence • Instruction-level (ILP): independent instructions within sequential program • Thread-level (TLP): parallel tasks within one program • Multi-programming: independent programs • Pipelining • Predictability • Control-flow direction, memory references, data values computer arctecture ~ PSUT

Levels of Machines • There are a number of levels in a computer, from the user level down to the transistor level. computer arctecture ~ PSUT

How Do the Pieces Fit Together? Application Operating System Compiler Firmware Instruction Set Architecture Memory system Instr. Set Proc. I/O system Datapath & Control Digital Design Circuit Design computer arctecture ~ PSUT

Instruction Set Architecture (ISA) • Complex Instruction Set (CISC) • Single instructions for complex tasks (string search, block move, FFT, etc.) • Usually have variable length instructions • Registers have specialized functions • Reduced Instruction Set (RISC) • Instructions for simple operations only • Usually fixed length instructions • Large orthogonal register sets computer arctecture ~ PSUT

RISC Architecture • RISC designers focused on two critical performance techniques in computer design: • the exploitation of instruction-level parallelism, first through pipelining and later through multiple instruction issue, • the use of cache, first in simple forms and later using sophisticated organizations and optimizations. computer arctecture ~ PSUT

RISC ISA Characteristics • All operations on data apply to data in registers and typically change the entire register; • The only operations that affect memory are load and store operations that move data from memory to a register or to memory from a register, respectively; • A small number of memory addressing modes; • The instruction formats are few in number with all instructions typically being one size; • Large number of registers; • These simple properties lead to dramatic simplifications in the implementation of advanced pipelining techniques, which is why RISC architecture instruction sets were designed this way. computer arctecture ~ PSUT

Performance & cost computer arctecture ~ PSUT

Computer Designers and Chip Costs • The computer designer affects die size, and hence cost, both by what functions are included on or excluded from the die and by the number of I/O pins computer arctecture ~ PSUT

Measuring and Reporting Performance

performance Time to do the task (Execution Time) – execution time, response time,latency Tasks per day, hour, week, sec, ns. .. (Performance) – performance, throughput, bandwidth Response time– the time between the start and the completion of a task Thus, to maximize performance, need to minimize execution time If X is n times faster than Y, then • Throughput – the total amount of work done in a given time • Important to data center managers • Decreasing response time almost always improves throughput computer arctecture ~ PSUT

Calculating CPU Performance • Want to distinguish elapsed time and the time spent on our task • CPU execution time (CPU time) – time the CPU spends working on a task • Does not include time waiting for I/O or running other programs • Can improve performance by reducing either the length of the clock cycle or the number of clock cycles required for a program computer arctecture ~ PSUT

Calculating CPU Performance (Cont.) • We tend to count instructions executed = IC • Note looking at the object code is just a start • What we care about is the dynamic count - e.g. don’t forget loops, recursion, branches, etc. • CPI (Clock Per Instruction) is a figure of merit computer arctecture ~ PSUT

Calculating CPU Performance (Cont.) • 3 Focus Factors -- Cycle Time, CPI, IC • Sadly - they are interdependent and making one better often makes another worse (but small or predictable impacts) • Cycle time depends on HW technology and organization • CPI depends on organization (pipeline, caching...) and ISA • IC depends on ISA and compiler technology • Often CPI’s are easier to deal with on a per instruction basis computer arctecture ~ PSUT

Example of Computing CPU time • If a computer has a clock rate of 50 MHz, how long does it take to execute a program with 1,000 instructions, if the CPI for the program is 3.5? • Using the equation CPU time = Instruction count x CPI / clock rate gives CPU time = 1000 x 3.5 / (50 x 106) • If a computer’s clock rate increases from 200 MHz to 250 MHz and the other factors remain the same, how many times faster will the computer be? CPU time old clock rate new 250 MHz ------------------- = ---------------------- = ---------------- = 1.25 CPU time new clock rate old 200 MHZ

CPI Inst. Count Cycle Time Evaluating ISAs • Design-time metrics: • Can it be implemented, in how long, at what cost? • Can it be programmed? Ease of compilation? • Static Metrics: • How many bytes does the program occupy in memory? • Dynamic Metrics: • How many instructions are executed? How many bytes does the processor fetch to execute the program? • How many clocks are required per instruction? Best Metric: Time to execute the program! depends on the instructions set, the processor organization, and compilation techniques. computer arctecture ~ PSUT

Quantitative Principles of Computer Design

Amdahl’s Law • Defines speedup gained from a particular feature • Depends on 2 factors • Fraction of original computation time that can take advantage of the enhancement - e.g. the commonality of the feature • Level of improvement gained by the feature • Amdahl’s law Quantification of the diminishing return principle computer arctecture ~ PSUT

Amdahl's Law (Cont.) Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected computer arctecture ~ PSUT

Simple Example • Important Application: • FPSQRT 20% • FP instructions account for 50% • Other 30% • Designers say same cost to speedup: • FPSQRT by 40x • FP by 2x • Other by 8x • Which one should you invest? • Straightforward plug in the numbers & compare BUT what’s your guess?? Amdahl’s Law says nothing about cost computer arctecture ~ PSUT

And the Winner Is…? computer arctecture ~ PSUT

Example of Amdahl’s Law • Floating point instructions are improved to run twice as fast, but only 10% of the time was spent on these instructions originally. How much faster is the new machine? 1 ExTimeold ExTimenew Speedup= = (1 - Fractionenhanced) + Fractionenhanced Speedupenhanced 1 Speedup= = 1.053 (1 - 0.1) + 0.1/2 • The new machine is 1.053 times as fast, or 5.3% faster. • How much faster would the new machine be if floating point instructions become 100 times faster? 1 Speedup= = 1.109 (1 - 0.1) + 0.1/100

Estimating Performance Improvements • Assume a processor currently requires 10 seconds to execute a program and processor performance improves by 50 percent per year. • By what factor does processor performance improve in 5 years? (1 + 0.5)^5 = 7.59 • How long will it take a processor to execute the program after 5 years? ExTimenew = 10/7.59 = 1.32 seconds

Performance Example • Computers M1 and M2 are two implementations of the same instruction set. • M1 has a clock rate of 50 MHz and M2 has a clock rate of 75 MHz. • M1 has a CPI of 2.8 and M2 has a CPI of 3.2 for a given program. • How many times faster is M2 than M1 for this program? • What would the clock rate of M1 have to be for them to have the same execution time? ExTimeM1 ICM1 x CPIM1 / Clock RateM1 2.8/50 = = = 1.31 ExTimeM2 ICM2 x CPIM2 / Clock RateM2 3.2/75

Simple Example • Suppose we have made the following measurements: • Frequency of FP operations (other than FPSQR) =25% • Average CPI of FP operations=4.0 • Average CPI of other instructions=1.33 • Frequency of FPSQR=2% • CPI of FPSQR=20 • Two design alternatives • Reduce the CPI of FPSQR to 2 • Reduce the average CPI of all FP operations to 2 computer arctecture ~ PSUT

And The Winner is… computer arctecture ~ PSUT

Instruction Set Architecture (ISA) computer arctecture ~ PSUT

Outline • Introduction • Classifying instruction set architectures • Instruction set measurements • Memory addressing • Addressing modes for signal processing • Type and size of operands • Operations in the instruction set • Operations for media and signal processing • Instructions for control flow • Encoding an instruction set • MIPS architecture computer arctecture ~ PSUT

Instruction Set Principles and Examples

Basic Issues in Instruction Set Design • What operations and How many • Load/store/Increment/branch are sufficient to do any computation, but not useful (programs too long!!). • How (many) operands are specified? • Most operations are dyadic (e.g., AB+C); Some are monadic (e.g., A B). • How to encode them into instruction format? • Instructions should be multiples of Bytes. • Typical Instruction Set • 32-bit word • Basic operand addresses are 32-bit long. • Basic operands (like integer) are 32-bit long. • In general, Instruction could refer 3 operands (AB+C). • Challenge: Encode operations in a small number of bits. computer arctecture ~ PSUT

6 5 5 16 rs rt Immediate opcode Brief Introduction to ISA • Instruction Set Architecture: a set of instructions • Each instruction is directly executed by the CPU’s hardware • How is it represented? • By a binary format since the hardware understands only bits • Options - fixed or variable length formats • Fixed - each instruction encoded in same size field (typically 1 word) • Variable – half-word, whole-word, multiple word instructions are possible computer arctecture ~ PSUT

What Must be Specified? • Instruction Format (encoding) • How is it decoded? • Location of operands and result • Where other than memory? • How many explicit operands? • How are memory operands located? • Data type and Size • Operations • What are supported? computer arctecture ~ PSUT

Classifying Instruction Set Architecture

Instruction Set Design The instruction set influences everything computer arctecture ~ PSUT

Instruction Characteristics • Usually a simple operation • Which operation is identified by the op-code field • But operations require operands - 0, 1, or 2 • To identify where they are, they must be addressed • Address is to some piece of storage • Typical storage possibilities are main memory, registers, or a stack • 2 options explicit or implicit addressing • Implicit - the op-code implies the address of the operands • ADD on a stack machine - pops the top 2 elements of the stack, then pushes the result • HP calculators work this way • Explicit - the address is specified in some field of the instruction • Note the potential for 3 addresses - 2 operands + the destination computer arctecture ~ PSUT

Operand Locations for Four ISA Classes computer arctecture ~ PSUT

Stack Push A Push B Add Pop the top-2 values of the stack (A, B) and push the result value into the stack Pop C Accumulator (AC) Load A Add B Add AC (A) with B and store the result into AC Store C Register (register-memory) Load R1, A Add R3, R1, B Store R3, C Register (load-store) Load R1, A Load R2, B Add R3, R1, R2 Store R3, C C=A+B computer arctecture ~ PSUT

Modern Choice – Load-store Register (GPR) Architecture • Reasons for choosing GPR (general-purpose registers) architecture • Registers (stacks and accumulators…) are faster than memory • Registers are easier and more effective for a compiler to use • (A+B) – (C*D) – (E*F) • May be evaluated in any order (for pipelining concerns or …) • But on a stack machine  must left to right • Registers can be used to hold variables • Reduce memory traffic • Speed up programs • Improve code density (fewer bits are used to name a register) • Compiler writers prefer that all registers be equivalent and unreserved • The number of GPR: at least 16 computer arctecture ~ PSUT

Memory Addressing

Memory Addressing Basics All architectures must address memory • What is accessed - byte, word, multiple words? • Today’s machine are byte addressable • Main memory is organized in 32 - 64 byte lines • Big-Endian or Little-Endian addressing • Hence there is a natural alignment problem • Size s bytes at byte address A is aligned if A mod s = 0 • Misaligned access takes multiple aligned memory references • Memory addressing mode influences instruction counts (IC) and clock cycles per instruction (CPI) computer arctecture ~ PSUT

Big-Endian and Little-Endian Assignments Big-Endian: lower byte addresses are used for the most significant bytes of the word Little-Endian: opposite ordering. lower byte addresses are used for the less significant bytes of the word W ord address Byte address Byte address 0 0 1 2 3 0 3 2 1 0 4 4 5 6 7 4 7 6 5 4 • • • • • • k k k k k k k k k k 2 - 4 2 - 4 2 - 3 2 - 2 2 - 1 2 - 4 2 - 1 2 - 2 2 - 3 2 - 4 (a) Big-endian assignment (b) Little-endian assignment computer arctecture ~ PSUT Byte and word addressing.

Immediate Add R4, #3 Regs[R4]  Regs[R4]+3 Register Add R4, R3 Regs[R4]  Regs[R4]+Regs[R3] Operand:3 R3 Operand Register Indirect Add R4, (R1) Regs[R4]  Regs[R4]+Mem[Regs[R1]] R1 Registers Operand Memory Registers Addressing Modes computer arctecture ~ PSUT

Direct Add R4, (1001) Regs[R4]  Regs[R4]+Mem[1001] Memory Indirect Add R4, @(R3) Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]] R3 1001 Operand Operand Memory Registers Addressing Modes(Cont.) Memory computer arctecture ~ PSUT

Displacement Add R4, 100(R1) Regs[R4]  Regs[R4]+Mem[100+R1] 100 R1 Operand Registers Memory Addressing Modes(Cont.) Scaled Add R1, 100(R2) [R3] Regs[R1]  Regs[R1]+Mem[100+ Regs[R2]+Regs[R3]*d] 100 R3 R2 Operand *d Registers Memory computer arctecture ~ PSUT

Computer Architecture