1 / 21

Lecture 2: Evaluating Computer Architectures

Lecture 2: Evaluating Computer Architectures. Last Time Technology  Architecture  Applications Technology/Applications are a moving target Today Evaluation as key component in iterative design process Two components of evaluation Performance Cost. Metrics of Evaluation.

sirvat
Download Presentation

Lecture 2: Evaluating Computer Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 2: Evaluating Computer Architectures Last Time Technology  Architecture  Applications Technology/Applications are a moving target Today Evaluation as key component in iterative design process Two components of evaluation Performance Cost

  2. Metrics of Evaluation • Level of design  performance metric • Examples • Applications perspective • Time to run task (Response Time) • Tasks run per second (Throughput) • Systems perspective • Millions of instructions per second (MIPS) • Millions of FP operations per second (MFLOPS) • Bus/network bandwidth: megabytes per second • Function Units: cycles per instruction (CPI) • Fundamental elements (transistors, wires, pins): clock rate

  3. Microbenchmarks one performance dimension cache bandwidth main memory bandwidth procedure call overhead FP performance weighted combination of microbenchmark performance can be good predictor of application performance insight into the cause of performance bottlenecks Macrobenchmarks application execution time system throughput measures overall performance but only on one application Benchmarks

  4. Benchmarks measure the whole system application compiler operating system architecture implementation Popular benchmarks typically reflect yesterday’s programs what about the programs people are running today? need to design for tomorrow’s problems Benchmark timings are sensitive alignment in cache location of data on disk values of data Danger of inbreeding or positive feedback if you make an operation fast (slow) it will be used more (less) often therefore you make it faster (slower) and so on, and so on… the optimized NOP Some Warnings about Benchmarks

  5. Know what you are measuring! • Compare apples to apples • Example • Wall clock execution time: • User CPU time • System CPU time • Idle time (multitasking, I/O) csh> time latex lecture2.tex csh> 0.68u 0.05s 0:01.60 45.6% user elapsed system % CPU time

  6. Performance Comparison Terminology • “X is n% faster than Y” means: • Example: Y takes 15 seconds to complete task, X takes 10 seconds • What % faster is X? • Same definitions apply to other metrics, such as throughput

  7. Performance Comparison: Example • Performance metrics • Time to run the task (travel time for each passenger) • Throughput (person miles per hour = PMPH) • Comparisons • Speed of Concorde > 747 • Throughput of 747 > Concorde

  8. Improving Performance: Fundamentals • Suppose we have a machine with two instructions • Instruction A executes in 100 cycles • Instruction B executes in 2 cycles • We want better performance…. • Which instruction do we improve?

  9. Amdahl’s Law • Performance improvements depend on: • how good is enhancement • how often is it used • Speedup due to enhancement E (fraction p sped up by factor S):

  10. Amdahl’s Law: Example • FP instructions improved by 2x • But….only 10% of instructions are FP • Speedup bounded by

  11. Amdahl’s Law: Example 2 • Speed up vectorizable code by 100x

  12. Amdahl’s Law assumes serial execution T2 T3 T1 T = T1 + T2 + T3 T3n T1 T2 T2 T1 T = T1 + Max(T2, T3) T3

  13. Amdahl’s Law: Summary message • Make the Common Case fast • Examples: • All instructions require instruction fetch, only fraction require data • optimize instruction access first • Data locality (spatial, temporal), small memories faster • storage hierarchy: most frequent accesses to small, local memory

  14. CPU Performance Equation • 3 components to execution time: • Factors affecting CPU execution time: • Consider all three elements when optimizing • Workloads change!

  15. Cycles Per Instruction (CPI) • Depends on the instruction • Average cycles per instruction • Example:

  16. Comparing and Summarizing Performance • Fair way to summarize performance? • Capture in a single number? • Example: Which of the following machines is best?

  17. Means Can be weighted: aiTi Arithmetic mean Represents total execution time Harmonic mean Ri = 1/Ti Consistent independent of reference Best for combining results Geometric mean

  18. The Bottom Line: Cost • Many contributing factors • Wafer cost • Chip size • Wafer yield (good die/wafer) • Testing cost • Packaging cost • *Design/Engineering Costs* • Marketing costs • Chip cost is primarily a function of chip area

  19. Real World Examples From “Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993.

  20. CPU cost is only a part of the picture

  21. Summary • Evaluation of Systems • Performance • Amdahl’s Law, CPI • Cost • Next Time • Benchmarking • SPEC suite + others • Pitfalls of performance evaluation

More Related