Lecture 2: Evaluating Computer Architectures

Lecture 2: Evaluating Computer Architectures Last Time Technology  Architecture  Applications Technology/Applications are a moving target Today Evaluation as key component in iterative design process Two components of evaluation Performance Cost

Metrics of Evaluation • Level of design  performance metric • Examples • Applications perspective • Time to run task (Response Time) • Tasks run per second (Throughput) • Systems perspective • Millions of instructions per second (MIPS) • Millions of FP operations per second (MFLOPS) • Bus/network bandwidth: megabytes per second • Function Units: cycles per instruction (CPI) • Fundamental elements (transistors, wires, pins): clock rate

Microbenchmarks one performance dimension cache bandwidth main memory bandwidth procedure call overhead FP performance weighted combination of microbenchmark performance can be good predictor of application performance insight into the cause of performance bottlenecks Macrobenchmarks application execution time system throughput measures overall performance but only on one application Benchmarks

Benchmarks measure the whole system application compiler operating system architecture implementation Popular benchmarks typically reflect yesterday’s programs what about the programs people are running today? need to design for tomorrow’s problems Benchmark timings are sensitive alignment in cache location of data on disk values of data Danger of inbreeding or positive feedback if you make an operation fast (slow) it will be used more (less) often therefore you make it faster (slower) and so on, and so on… the optimized NOP Some Warnings about Benchmarks

Know what you are measuring! • Compare apples to apples • Example • Wall clock execution time: • User CPU time • System CPU time • Idle time (multitasking, I/O) csh> time latex lecture2.tex csh> 0.68u 0.05s 0:01.60 45.6% user elapsed system % CPU time

Performance Comparison Terminology • “X is n% faster than Y” means: • Example: Y takes 15 seconds to complete task, X takes 10 seconds • What % faster is X? • Same definitions apply to other metrics, such as throughput

Performance Comparison: Example • Performance metrics • Time to run the task (travel time for each passenger) • Throughput (person miles per hour = PMPH) • Comparisons • Speed of Concorde > 747 • Throughput of 747 > Concorde

Improving Performance: Fundamentals • Suppose we have a machine with two instructions • Instruction A executes in 100 cycles • Instruction B executes in 2 cycles • We want better performance…. • Which instruction do we improve?

Amdahl’s Law • Performance improvements depend on: • how good is enhancement • how often is it used • Speedup due to enhancement E (fraction p sped up by factor S):

Amdahl’s Law: Example • FP instructions improved by 2x • But….only 10% of instructions are FP • Speedup bounded by

Amdahl’s Law: Example 2 • Speed up vectorizable code by 100x

Amdahl’s Law assumes serial execution T2 T3 T1 T = T1 + T2 + T3 T3n T1 T2 T2 T1 T = T1 + Max(T2, T3) T3

Amdahl’s Law: Summary message • Make the Common Case fast • Examples: • All instructions require instruction fetch, only fraction require data • optimize instruction access first • Data locality (spatial, temporal), small memories faster • storage hierarchy: most frequent accesses to small, local memory

CPU Performance Equation • 3 components to execution time: • Factors affecting CPU execution time: • Consider all three elements when optimizing • Workloads change!

Cycles Per Instruction (CPI) • Depends on the instruction • Average cycles per instruction • Example:

Comparing and Summarizing Performance • Fair way to summarize performance? • Capture in a single number? • Example: Which of the following machines is best?

Means Can be weighted: aiTi Arithmetic mean Represents total execution time Harmonic mean Ri = 1/Ti Consistent independent of reference Best for combining results Geometric mean

The Bottom Line: Cost • Many contributing factors • Wafer cost • Chip size • Wafer yield (good die/wafer) • Testing cost • Packaging cost • *Design/Engineering Costs* • Marketing costs • Chip cost is primarily a function of chip area

Real World Examples From “Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993.

CPU cost is only a part of the picture

Summary • Evaluation of Systems • Performance • Amdahl’s Law, CPI • Cost • Next Time • Benchmarking • SPEC suite + others • Pitfalls of performance evaluation

Lecture 2: Evaluating Computer Architectures