1 / 32

Recap

Recap. Technology trends Cost/performance. Measuring and Reporting Performance. What does it mean to say “computer X is faster than computer Y ”?. E.g. Machine A executes a program in 10s; Machine B executes the same program in 15s. Which is true: A is 50% faster than B?

Download Presentation

Recap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recap • Technology trends • Cost/performance

  2. Measuring and Reporting Performance • What does it mean to say “computer X is faster than computer Y”? • E.g. Machine A executes a program in 10s; Machine B executes • the same program in 15s. • Which is true: • A is 50% faster than B? • A is 33% faster than B?

  3. Performance is reciprocal of time: Performance • H&P’s definition: “X is n times faster than Y” means

  4. Example • E.g. Machine A executes a program in 10s; Machine B executes • the same program in 15s. • Which is true: • A is 50% faster than B? • A is 33% faster than B? • Answer: 1) A is 50% faster than B

  5. Performance • Response time? • Throughput?

  6. Measuring Performance • Focus on execution time of real programs • Measuring execution time? • Wall clock time (elapsed time) • CPU time (excludes I/O and other processes) • User CPU time • System CPU time iota:~$ time gcc -g tmpcnv.s -o tmpcnv real 0m3.352s user 0m0.367s sys 0m0.468s

  7. Choosing Programs to Measure Performance • Real Programs • Compilers, text-processing, CAD tools, etc. • Modified applications • Scripted or modified for portability • Kernels • Attempt to extract key sections from real programs (Livermore loops, Linpack) • Toy Benchmarks • Short examples (e.g. Sieve of Eratosthenes) • Synthetic Benchmarks • Whetstone, Dhrystone

  8. Benchmarking • H&P: car magazines are more scientific about reporting performance than many CS journals!

  9. Benchmark Suites • Collections of benchmarks • E.g. SPEC CPU2000 (INT and FP) • 25 real FORTRAN/C/C++ programs, modified for portability • Specific graphics benchmarks

  10. Server Benchmarks • SPEC also has server benchmarks • File server • Web server • TPC: Transaction Processing Council • Various transaction processing benchmarks

  11. Embedded Benchmarks • Much less well developed • Tend to use Dhrystone! • EEMBC • Recent development • 34 benchmarks (mainly kernels) in five application areas

  12. Summarising Performance Measurements • Complex area • Weighted arithmetic mean • Geometric mean • Normalised results • …

  13. 1.6 Quantitative Principles • Make the common case fast! • E.g. addition: focus on “normal” addition, not overflow situations • Amdahl’s Law • Quantifies improvements gained by focussing on one aspect of a design

  14. Amdahl’s Law

  15. Example • We are considering an enhancement that is 10 times faster than the original, but is only used 40% of the time.

  16. CPU Performance • CPU time related to clock speed: • Period (e.g. 1ns) • Rate (e.g. 1GHz) • Also interested in Cycles Per Instruction (CPI)

  17. Three Equal Factors • Clock rate (technology) • CPI (architecture) • Instruction count (architecture and compiler)

  18. Measuring IC & CPI • Many modern processors include hardware counters for instructions and clock cycles • Simulations can give even more detail • Time consuming, but can be very accurate

  19. Another Principle: Locality • Locality of Reference • “90/10 Rule” • Also applies to data • Two aspects: • Temporal locality • Spatial locality

  20. Taking Advantage of Parallelism • Key principle for improving performance • Examples: • System level: parallel processing, disk arrays, etc. • Processor level: pipelining • Digital design: caches, ALU adders, etc.

  21. 1.7 Putting It All Together: Performance & Price/Performance • Measure performance and performance/cost for three categories • Desktop (SPEC INT and FP) • TP Servers (TPC-C) • Embedded Processors (EEMBC)

  22. Desktop • Integer: • Performance/cost tracks performance • FP: • Not as closely related • Pentium 4 much better than Pentium III • AMD Athlon very good value for money

  23. Servers • Twelve systems • Six top performers • Six best price-performance • Multiprocessors • 3 P3’s – 280 P3’s • Cost: • $131,000 – $15 million

  24. Embedded Processors • Difficult to assess • Benchmarks very new • Designs very application-specific • Power a major constraint • Cost difficult to quantify (are support chips required?)

  25. Embedded Processors • Range: • 500MHz AMD K6 ($78) and IBM PowerPC ($94) used for network switches, etc. • 167MHz NEC VR 5432 ($25) popular in colour laser printers • 180MHz NEC VR 4122 ($33) popular in PDAs (low power)

  26. 1.8 Another View: Power Consumption and Efficiency • Embedded processors from previous example: power ranged from 700mW to 9600mW • Fig. 1.27: Performance/watt • NEC VR 4122 huge leader

  27. 1.9 Fallacies and Pitfalls • Fallacy: Relative performance of two similar processors can be judged by clock rate or by a single benchmark • Factors such as pipeline structure and memory system have major impact • E.g. Pentium III vs. Pentium 4 (Fig. 1.28)

  28. 1.7GHz P4 –vs– 1.0GHz P3

  29. Fallacies and Pitfalls • Fallacy: Benchmarks remain valid indefinitely • Optimisations change • Perhaps deliberately! • Even real programs are affected by changes in technology • E.g. gcc: increasing percentage is “system time” • SPEC has adapted considerably

  30. Fallacies and Pitfalls • Pitfall: Comparing hand-coded assembly and compiled high-level language performance • E.g. embedded processor benchmarks • Hand-coded is 5 – 87 times faster!

More Related