Computer Architecture

ComputerArchitecture EEL 4713/5764, Spring 2006 Dr. Michael Frank Module #5 – Computer Performance

Part IBackground and Motivation Computer Architecture, Background and Motivation

I Background and Motivation • Provide motivation, paint the big picture, introduce tools: • Review components used in building digital circuits • Present an overview of computer technology • Understand the meaning of computer performance • (or why a 2 GHz processor isn’t 2 as fast as a 1 GHz model) Computer Architecture, Background and Motivation

4 Computer Performance • Performance is key in design decisions; also cost and power • It has been a driving force for innovation • Isn’t quite the same as speed (higher clock rate) Computer Architecture, Background and Motivation

Course Instructional Objective #1 • As the syllabus says: • At the completion of this course, students should be able to… • CIO #1. (Metrics) Calculate and interpret different performance and cost metrics of computer systems. • This CIO should also support the following Program Outcome: • Students graduating from the BSEE and BSCpE degree programs will have: • PO (a). (Apply) An ability to apply knowledge of mathematics, science and engineering; • PO (e). (Solve) An ability to identify, formulate, and solve engineering problems; • PO (o). (Topics) EE: A knowledge of electrical engineering applications selected from the …digital systems… areas; CpE: A knowledge of computer science and computer engineering topics including …computer architecture. • Under “assessment instruments,” the syllabus says: • 1. Metrics. Students will solve exam problems in which they must analyze descriptions of hypothetical processors to determine their performance, cost-performance, and power-performance.

Module Instructional Objectives I break down the CIO as follows: CIO #1. Metrics (aeo). Calculate and interpret different performance and cost metrics of computer systems. 1.1. Know & apply (a) the definitions of clock frequency, MIPS, execution time, performance, throughput, cost-performance, and power-performance. 1.2. Explain why a given metric is or is not appropriate to use in a given situation. 1.3. Identify (e.i) the specific figure(s) of merit that are most appropriate for choosing between alternative computer designs in a given scenario. 1.4. Formulate (e.ii) appropriate symbolic equations for calculating a desired figure of merit from the provided information about an architectural scenario. 1.5. Solve (e.iii) problems involving the determination of which of several computer designs would be preferable in a given scenario. 1.6. ApplyAmdahl’s Law (and generalizations thereof) in characterizing the relationship between an improvement to a particular component of a system and the overall improvement of the whole system. 1.7. Apply (a) the CPU Performance Equation that relates performance and execution time to instruction count, CPI, and clock frequency.

Topic #1 Overview of Some Important Metrics for Computer Systems:Performance, Cost, and Power Consumption

Important Performance Metrics • Some metrics that are often used, but that do not always accurately reflect true performance: • CPU clock frequency = number of CPU clock cycles per unit time • MIPS rating = How many Millions of Instructions Per Second • Benchmark ratings (e.g., SPECmarks) – more on this later • Metrics that are “true” measures of performance: • Total execution time of a work unit (on real applications) • Wall-clock time from beginning to end of the execution process • Performance = 1/(execution time) • For a single work unit • Throughput = (# work units)/(execution time) • A generalized kind of performance

Cost and Cost-Related Metrics • In the real world, the performance of a system is not the only thing that is important… • For example, its cost may also matter a lot! • E.g., the IBM Blue Gene/L has really high performance, but you’re not likely to buy it as your next computer… • We almost always have budgetary constraints. • The usual goal: Maximize the cost-performance(i.e., cost-efficiency) of the systems that you buy. • Cost-performance = (performance) / (cost). • In other words, you want to get the best value for your dollar. • This strategy roughly maximizes total throughput within a fixed budget. • Whenever you can have many systems gathered together working in parallel.

Throughput and Cost-Performance • When there is a fixed budget, the maximum throughput of a parallel system is (roughly)  the cost-performance of the individual serial units.

The Vanishing Computer Cost Computer Architecture, Background and Motivation

Cost/Performance Figure 4.1 Performance improvement as a function of cost. Computer Architecture, Background and Motivation

Importance of Power Consumption • In the real world, a computer’s performance and manufacturing cost are not the only important concerns… • Operating costs, usability, and other factors may also be important! • Today, power consumptionis an increasingly important factor that impacts all of the following: • Manufacturing cost, operating cost, performance, and usability! • In general, higher power consumption means… • More manufacturing cost • for more aggressive power-delivery & cooling systems • power supplies, heat sinks, fans, etc. • Higher operating cost • More electricity consumed, frequent changing/recharging of batteries, inconvenience to user • Lower performance • Higher performance would exceed limits of cooling system • Poor usability / poorer overall quality of product: • Annoyingly noisy cooling fans or data center A/C units, laptops that burn up your lap • So in many design scenarios, we may wish to maximize performance within a fixed power budget, or minimize power consumption to reach a desired performance.

Throughput and Power-Performance • When there is a fixed power budget, the maximum throughput of a parallel system is (roughly)  the power-performance of the individual serial units. • This is exactly analogous to the earlier cost-performance analysis.

Performance Maximizationwithin Cost and Power Constraints • Suppose we have botha cost budget and a power budget, • and we want to maximize system throughput. • With a given unit design, we must maximize the number of || units. • Then we have the following constraints on nunits: • nunits Cunit ≤ Cmax • So, nunits≤  Cmax/Cunit • nunits Punit≤ Pmax • and nunits≤  Pmax/Punit • The largest value of nunits within these constraints is: • nunits = min(  Cmax/Cunit,  Pmax/Punit ) =  min( Cmax/Cunit, Pmax/Punit)  • and so the maximum feasible throughput is: • Ttot = Tunit  nunits = Tunit  min(Cmax/Cunit,Pmax/Punit) C = cost P = power T = throughput

Power-Performance and Energy Efficiency • Power-performance means performance (i.e., throughput) per unit of power consumption: • power-performance = (throughput)/(power). • Of course, since • throughput = (work units)/(time) and • power = (energy consumed)/(time), • The times cancel, and so power-performance is equal to: • (work units)/(energy consumed) • In other words, system power-performance is the same thing as the energy efficiency of the underlying computing process. • To maximize power-performance, minimize the amount of energy that is consumed per unit of work that is performed.

System Optimization Example • Suppose you have a budget of $1M to set up a new corporate data center that should have a total power consumption of no more than 100kW while serving web transactions in a simple database application. If your goal is to maximize total performance (in transactions/second) while staying within your budget and meeting the power constraint, which of the following types of machines would be preferable as a basis for the design? • Sun servers, each $15,000, burning 100W, processing 100 transactions/second • Playstation 2s, each $100 from flea market, 30W, processing 30 transactions/second • Solution: • A PS2-based design could attain 50 higher throughput and use only 1/3 of the budget while still meeting the power constraints!

Topic #2 Measuring Computer Performance

4.2 Defining Computer Performance Figure 4.2 Pipeline analogy shows that imbalance between processing power and I/O capabilities leads to a performance bottleneck. Computer Architecture, Background and Motivation

Concepts of Performance and Speedup Performance = 1 / Execution time is simplified to Performance = 1 / CPU execution time (Performance of M1) / (Performance of M2) = Speedup of M1 over M2 = (Execution time on M2) / (Execution time on M1) Terminology: M1 is x times as fast as M2 (e.g., 1.5 times as fast) M1 is 100(x – 1)% faster than M2 (e.g., 50% faster) CPU time = (Clock cycles executed)  (Time per cycle) = Instructions  (Cycles per instruction)  (Time per cycle) = Instructions  CPI / (Clock frequency) Instruction count, CPI, and clock rate are not completely independent, so improving one by a given factor may not lead to overall execution time improvement by the same factor. CPU performance equation: Computer Architecture, Background and Motivation

Faster Clock  Shorter Running Time Figure 4.3 Faster steps do not necessarily mean shorter travel time. Computer Architecture, Background and Motivation

s = min(p, 1/f) 1 f+(1–f)/p 4.3 Performance Enhancement: Amdahl’s Law f = fraction unaffected p = speedup of the rest Figure 4.4 Amdahl’s law: speedup achieved if a fraction f of a task is unaffected and the remaining (1–f) part runs p times as fast. Computer Architecture, Background and Motivation

Amdahl’s Law Used in Design Example 4.1 • A processor spends 30% of its time on flp addition, 25% on flp mult, • and 10% on flp division. Evaluate the following enhancements, each • costing the same to implement: • Redesign of the flp adder to make it twice as fast. • Redesign of the flp multiplier to make it three times as fast. • Redesign the flp divider to make it 10 times as fast. • Solution • Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18 • Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20 • Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10 • What if both the adder and the multiplier are redesigned? Computer Architecture, Background and Motivation

4.4 Performance Measurement vs. Modeling Figure 4.5 Running times of six programs on three machines. Computer Architecture, Background and Motivation

Performance Benchmarks Example 4.3 • You are an engineer at Outtel, a start-up aspiring to compete with Intel • via its new processor design that outperforms the latest Intel processor • by a factor of 2.5 on floating-point instructions. This level of performance • was achieved by design compromises that led to a 20% increase in the • execution time of all other instructions. You are in charge of choosing • benchmarks that would showcase Outtel’s performance edge. • What is the minimum required fraction f of time spent on floating-point instructions in a program on the Intel processor to show a speedup of 2 or better for Outtel? • Solution • We use a generalized form of Amdahl’s formula in which a fraction f is speeded up by a given factor (2.5) and the rest is slowed down by another factor (1.2): 1/ [1.2(1 – f) + f /2.5]  2 f 0.875 Computer Architecture, Background and Motivation

Performance Estimation Average CPI = All instruction classes (Class-i fraction)  (Class-i CPI) Machine cycle time = 1 / Clock rate CPU execution time = Instructions  (Average CPI) / (Clock rate) Table 4.3 Usage frequency, in percentage, for various instruction classes in four representative applications. Computer Architecture, Background and Motivation

MIPS Rating Can Be Misleading Example 4.5 • Two compilers produce machine code for a program on a machine • with two classes of instructions. Here are the number of instructions: • ClassCPICompiler 1Compiler 2 • A 1 600M 400M • B 2 400M 400M • What are run times of the two programs with a 1 GHz clock? • Which compiler produces faster code and by what factor? • Which compiler’s output runs at a higher MIPS rate? • Solution • Running time 1 (2) = (600M  1 + 400M  2) / 109 = 1.4 s (1.2 s) • b. Compiler 2’s output runs 1.4 / 1.2 = 1.17 times as fast • c. MIPS rating 1, CPI = 1.4 (2, CPI = 1.5) = 1000 / 1.4 = 714 (667) Computer Architecture, Background and Motivation

4.5 Reporting Computer Performance Table 4.4 Measured or estimated execution times for three programs. Analogy: If a car is driven to a city 100 km away at 100 km/hr and returns at 50 km/hr, the average speed is not (100 + 50) / 2 but is obtained from the fact that it travels 200 km in 3 hours. Computer Architecture, Background and Motivation

Comparing the Overall Performance Table 4.4 Measured or estimated execution times for three programs. Speedup of X over Y 10 0.1 0.1 Arithmetic mean 6.7 3.4 Geometric mean 2.15 0.46 Geometric mean does not yield a measure of overall speedup, but provides an indicator that at least moves in the right direction Computer Architecture, Background and Motivation

4.6 The Quest for Higher Performance State of available computing power ca. the early 2000s: Gigaflops on the desktop Teraflops in the supercomputer center Petaflops on the drawing board Note on terminology (see Table 3.1) Prefixes for large units: Kilo = 103, Mega = 106, Giga = 109, Tera = 1012, Peta = 1015 For memory: K = 210 = 1024, M = 220, G = 230, T = 240, P = 250 Prefixes for small units: micro = 10-6, nano = 10-9, pico = 10-12, femto = 10-15 Computer Architecture, Background and Motivation

Supercom-puters Figure 4.7 Exponential growth of supercomputer performance. Computer Architecture, Background and Motivation

The Most Powerful Computers Figure 4.8 Milestones in the DOE’s Accelerated Strategic Computing Initiative (ASCI) program with extrapolation up to the PFLOPS level. Computer Architecture, Background and Motivation

Performance is Important, But It Isn’t Everything Figure 25.1 Trend in energy consumption per MIPS of computational power in general-purpose processors and DSPs. Computer Architecture, Background and Motivation

Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank Competency Area 2: Performance Metrics Lecture 1

Performance Metrics • Why is it necessary for us to study performance? • Performance is usually the key to the effectiveness of a system (hardware + software). • Performance is critical to customers (purchasers), thus, we as designers and architects must also make it a priority. • Performance must be assessed and understood in order for a system to communicate efficiently with peripheral devices.

Topic: Computer Performance Sub-Topic: Airplane Analogy

Aircraft Passenger Capacity Fuel Capacity Cruising Range Speed Throughput Cost Boeing 747-400 421 216,847 10,734 920 387,320 0.048 Boeing 767-300 270 91,380 10,548 853 230,310 0.032 Airbus 340-300 284 139,681 12,493 869 246,796 0.039 Airbus 340-300 120 23,859 4,442 837 100,440 0.045 BAE-146-200 77 11,750 2,406 708 54,516 0.063 Concorde 132 119,501 6,230 2,180 287,760 0.145 Dash-8 50 3,202 1,389 531 26,550 0.046 Car 5 60 700 100 500 0.017 Performance Metrics • How can we determine performance? Consider this example from the transportation industry:

Performance Example • Fuel Capacity in liters • Range in kilometers • Speed in kilometers/hour • Throughput is defined as (# of passengers) x (cruising speed) • Cost is given as (fuel capacity) / (passengers x range) Which mode of transportation has the “best” performance?

best worst Performance Example • It depends on how we define performance. • Consider raw speed: • Getting from one place to another quickly

best worst Performance Example • What if we’re interested in the rate at which people are carried throughput:

Best plane Best overall Performance Example • Often times we relate performance and cost. Thus we can consider the amount of fuel used per passenger:

Topic: Computer Performance Sub-Topic: Basic Concepts: Performance, Throughput, and Execution Time

Performance Metrics • Similar measures of performance are used for computers. • Number of computations done per unit of time • Cost of computations • Possibly several aspects of cost can be considered including initial purchase price, operating cost, cost of training users of system, etc. • Common performance measures are • RESPONSE TIME – the amount of time it takes a program to complete (a.k.a execution time) • THROUGHPUT – the total amount of work done in a given amount of time

Performance Metrics Example: Given the following actions: 1. Replacing processor with a faster version 2. Adding additional processors to perform separate tasks in a multiprocessor system do they (a) increase throughput, (a) decrease response time or (c) both?

Defining Performance • Our focus will be primarily on execution time. • To maximize performance implies a minimization in execution time: • For two machines: • We say that machine Y is faster than machine X.

Performance Metrics Notes: (1) If X is n times faster than Y, then • To avoid confusion, we’ll use the following terminology: • We sayWe mean • “improve performance”  increase performance • “improve execution time”  decrease execution time

Performance Example If machine A runs a program in 10 seconds and machine B runs the same program in 15 seconds, how much faster is A than B?

Topic: Computer Performance Sub-Topic: Measuring Performance

Measuring Performance • Quite simply, TIME is the measure of computer performance! • The most straightforward definition of time is wall-clock time  elapsed time  response time. Total time to complete a task including system overhead activities such as Input/Output tasks, disk and memory accesses, etc.

Computer Architecture