160 likes | 181 Views
Systems Architecture II Topics Performance Evaluation and Benchmarking *. Notes Courtesy of Jeremy R. Johnson. *This lecture was derived from material in the text (Chap. 2). Notes Courtesy of Jeremy R. Johnson. Introduction.
E N D
Systems Architecture II Topics Performance Evaluation and Benchmarking* Notes Courtesy of Jeremy R. Johnson *This lecture was derived from material in the text (Chap. 2). Notes Courtesy of Jeremy R. Johnson Systems Architecture II
Introduction • Objective: To quantify performance and relate performance to design parameters. Also to understand the role of benchmarking. • Execution Time (sec) = Inst/Program X Cycles/Inst (CPI) X Sec/Cycle • Topics • Performance Definition • Performance parameters and equation • Benchmarking • Fallacies and Pitfalls: Amdahl’s law Systems Architecture II
Performance Definition • Response Time • Throughput • Cost • Example • PerformanceX = 1/Execution TimeX • “X is n times faster than Y” PerformanceX/ Performancey = n Airplane Passengers Range (mi)Speed (mph)Throughput Boeing 777 375 4630 610 228,750 Boeing 747 470 4150 610 286,700 BAC/Sud Concorde 132 4000 1350 178,200 Douglas DC-8-50 146 8720 544 79,424 Systems Architecture II
Measuring Performance • Execution time • Wallclock (elapsed time) • CPU time (system vs. user) • Limited accuracy • Instruction count (simulator/hardware counters) • Cycle count (simulator/hardware counters) • Memory performance (simulator/hardware counters) Systems Architecture II
Performance Parameters and Equation • Instruction count - depends on program, compiler, optimization flags, instruction set architecture • Cycles Per Instruction (CPI) - depends on implementation of architecture (datapath, pipelining, parallelism, etc.) • Clock rate - depends on implementation design and technology • Execution Time (sec) = Inst/Program X Cycles/Inst (CPI) X Sec/Cycle Systems Architecture II
Performance Equation Example • Suppose we have two implementations of the same instruction set architecture • Machine A has a clock cycle time of 1 ns and CPI = 2.0 • Machine B has a clock cycle time of 2 ns and CPI = 1.2 • Which machine is faster for this program? CPUA = I 2.0 1 ns = 2 I ns CPUB = I 1.2 2 ns = 2.4 I ns PerfA/PerfB = 2.4/2 = 1.2 A is 1.2 times faster than B Systems Architecture II
Comparing Code Segments • Compare efficiency of two code sequences CPU Clock Cycles1 = 2 1 + 1 2 + 2 3 = 10 cycles CPU Clock Cycles2 = 4 1 + 1 2 + 1 3 = 9 cycles CPI1 = 10 cycles / 5 instructions = 2 cycles/inst CPI2 = 9 cycles / 6 instructions = 1.5 cycles/inst • Second choice is better even though there are more inst! Systems Architecture II
Benchmarking • Use sample programs that approximate actual usage • Beware of • small (artificial, kernel) benchmarks • synthetic benchmarks (Whetstone, Dhrystone) • Peak performance reports • use of parameters other than execution time (e.g. program size, MIPS) • Make sure results are reproducible • SPEC (System Performance Evaluation Corporation) • Collection of real world integer and floating point programs • http://www.specbench.org • CPU95 (SPECint95, SPECfp95) - originate in 1989 • CPU2000 also graphics, web and other benchmarks Systems Architecture II
SPEC95 Systems Architecture II
SPEC95 • Doubling clock rate does not double performance Systems Architecture II
SPEC89 • Compiler “enhancements” and performance Systems Architecture II
Summarizing Results • Example • PerfB/PerfA = 1001/110 = 9.1 • Total execution time • Arithmetic mean (weighted) • Geometric mean (for ratios) - used by SPEC Systems Architecture II
SPECint95 • Geometric mean of ratios compared to SPARC 10 Model 40 Dell Computer Co Dell Precision WorkStation 410 1 13.4 13.4 Dell Computer Co Dell Precision WorkStation 410 1 15.3 15.3 Dell Computer Co Precision WorkStation 410 (450MH 1 17.6 17.6 Dell Computer Co Precision WorkStation 410 (650 M 1 31.5 31.2 Dell Computer Co Precision WorkStation 410 (700 M 1 33.7 33.4 Dell Computer Co Precision WorkStation 420 (600 M 1 30.0 29.7 Dell Computer Co Precision WorkStation 420 (733 M 1 35.8 35.3 Dell Computer Co Precision WorkStation 610 (450MH 1 18.9 18.9 Dell Computer Co Precision Workstation 410 (450MH 1 18.6 18.6 Dell Computer Co Precision Workstation 410 (500MH 1 20.4 20.4 Dell Computer Co Precision Workstation 410 (550MH 1 22.6 22.6 Dell Computer Co Precision Workstation 410 (600MH 1 24.6 24.6 Dell Computer Co Precision Workstation 610 1 16.4 16.4 Dell Computer Co Precision Workstation 610 1 16.5 16.5 Dell Computer Co Precision Workstation 610 (450MH 1 19.0 19.0 Dell Computer Co Precision Workstation 610 (500MH 1 22.1 22.1 Dell Computer Co Precision Workstation 610 (500MH 1 21.6 21.6 Dell Computer Co Precision Workstation 610 (550MH 1 24.4 24.4 Systems Architecture II
Amdahl’s Law • Execution Time After Improvement = Execution Time Unaffected + ( Execution Time Affected / Amount of Improvement ) • Example:“Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"How about making it 5 times faster? • Principle: Make common cases fast Systems Architecture II
Remember • Performance is specific to a particular program/s • Total execution time is a consistent summary of performance • For a given architecture performance increases come from: • increases in clock rate (without adverse CPI affects) • improvements in processor organization that lower CPI • compiler enhancements that lower CPI and/or instruction count • Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance Systems Architecture II