1 / 42

Computer System Performance Evaluation: Introduction

Computer System Performance Evaluation: Introduction. Eileen Kraemer August 25, 2004. Evaluation Metrics. What are the measures of interest? Time to complete task Per workload type (RT /TP/ IC/batch) Ability to deal with failures Catastrophic / benign Effective use of system resources.

qabil
Download Presentation

Computer System Performance Evaluation: Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer System Performance Evaluation: Introduction Eileen Kraemer August 25, 2004

  2. Evaluation Metrics • What are the measures of interest? • Time to complete task • Per workload type (RT /TP/ IC/batch) • Ability to deal with failures • Catastrophic / benign • Effective use of system resources

  3. Performance Measures • Responsiveness • Usage level • Missionability • Dependability • Productivity

  4. Classification of Computer Systems • General purpose • High availability • Real-time control • Mission-oriented • Long-life

  5. Techniques in Performance Evaluation • Measurement • Simulation Modeling • Analytic Modeling • Hybrid Modeling

  6. Applications of Performance Evaluation • System Design • System Selection • System Upgrade • System Tuning • System Analysis

  7. Workload Characterization • Inputs to evaluation: • Under admin control: • Scheduling discipline, device connections, resource allocation policies …. • Environmental inputs: • Inter-event times, service demands, failures • = workload • Drives the real system (measurement) • Input to simulation • Basis of distribution for analytic modeling

  8. Workload characterization • How much detail? How to represent? • Analytical modeling: • statistical properties • Simulation: • Event trace, either recorded or generated according to some statistical properties

  9. Benchmarking • Benchmarks are sets of well-known programs • Vendors run these programs and report results (some problems with this process)

  10. Metrics used (in absence of benchmarks).. • Processing rate: • MIPS (million instructions per second) • MFLOPS (million f.p. ops per second) • Not particularly useful • different instructions can take different amounts of time • Instructions and complexity of instructions differ from machine to machine, as will the # of instructions required to execute a particular program

  11. Benchmarks: • Provide opportunity to compare running times of programs written in a HLL • Characterize an application domain • Consist of a set of “typical” programs • Some application benchmarks (real programs), others are synthetic benchmarks

  12. Synthetic benchmarks • Programs designed to mimic real programs by matching their statistical properties • Fraction of statements of each type (=, if, for) • Fraction of variables of each type (int v real v char) (local v global) • Fraction of expressions with certain number and type of operators, operands

  13. Synthetic Benchmarks • Pro: • Can model a domain of application programs in a single program

  14. Synthetic Benchmarks • Con: • If expressions for conditionals are chosen randomly, then code sections may be unreachable and eliminated by a “smart” compiler • Locality-of-reference seen in normal programs may be violated => resource allocation algorithms that rely on locality-of-reference affected • May be small enough to fit in cache => unusually good performance, not representative of domain the benchmark is designed to represent

  15. Well-known benchmarks for measuring CPU performance • Whetsone – “old” • Dhrystone – improved on Whetstone • Linpack • Newer: • Spice, gcc, li, nasa7, livermore • See: http://www.netlib.org/benchmark/ • Java benchmarks: • See http://www-2.cs.cmu.edu/~jch/java/resources.html

  16. Whetstone (1972) • Synthetic • Models Fortran, heavy on f.p. ops • Outdated, arbitrary instruction mixes • Not useful with optimizing or parallelizing compilers • Results in mega-whetstones/sec

  17. Dhrystone (1984) • Synthetic, C (originally Ada) • Models progs with mostly integer arithmetic and string manipulation • Only 100 HLL statements – fits in cache • Calls only strcpy(), strcmp() – if compiler inlines these, then not representative of real programs • Results stated in “Dhrystones / second”

  18. Linpack • Solves a dense 100 x 100 linear system of equations using the Linpack library package • A(x) = B(x) + C*D(x) • .. 80% of time • Still too small to really test out hw

  19. “Newer” • Spice • Mostly Fortran, int and fp arith, analog circuit simulation • gcc • Gnu C compiler • Li • Lisp interpreter, written in C • Nasa7 • Fortran, 7 kernels using double-precision arithmetic

  20. How to compare machines? B A C E D

  21. How to compare machines? A B VAX 11/780 C Typical 1 MIPS machine D E

  22. To calculate MIPS rating • Choose a benchmark • MIPS = time on VAX / time on X • So, if benchmark takes 100 sec on VAX and 4 sec on X, then X is a 25 MIPS machine

  23. Cautions in calculating MIPS • Benchmarks for all machines should be compiled by similar compilers with similar settings • Need to control and explicitly sate the configuration (cahce size, buffer sizes, etc.)

  24. Features of interest for evaluation: • Integer arithmetic • Floating point arithmetic • Cache management • Paging • I/O • Could test one at a time … or, using synthetic program, exercise all at once

  25. Synthetic programs .. • Evaluate multiple features simultaneously, parameterized for characteristics of workload • Pro: • Beyond CPU performance, can also measure system throughput, investigate alternative strategies • Con: • Complex, OS-dependent • Difficult to choose params that accurately reflect real workload • Generates lots of raw data

  26. “Script” approach • Have real users work on machine of interest, recording all actions of users in real computing environment • Pro: • Can compare system under control and test conditions (disk 1 v. disk 2), (buf size 1 v. buf size 2), etc. under real workload conditions • Con: • Too many dependencies, may not work on other installations – even if same machine • System neees to be up and running already • bulky

  27. SPEC = System Performance Evaluation Cooperative (Corporation) • Mission: to establish, maintain, and endorse a standardized set of relevant benchmarks for performance evaluation of modern computer systems • SPECCPU – both int and fp version • Also for JVMs, web, graphics, other special purpose benchmarks • See: http://www.specbench.org

  28. Methodology: • 10 benchmarks: • Integer: gcc, espresso, li, eqntott • Floating point: spice, doduc, nasa7, matrix, fpppp, tomcatv

  29. Metrics: • SPECint : • Geometric mean of t(gcc), t(espresso), t(li), t(eqntott) • SPECfp • Geometric mean of t(spice), t(doduc), t(nasa7), t(matrix), t(fppp), t(tomcatv) • SPECmark • Geometric mean of SPECint, SPECfp

  30. Metrics, cont’d • SPEC thruput: measure of CPU performance under moderate CPU contention • Multiprocessor with n processors : two copies of SPEC benchmark run concurrently on each CPU, elapsed time noted • SPECthruput = Time on machine X /time on VAX 11/780

  31. Geometric mean ??? • Arithmetic mean(x1, x2…xn) • (x1+x2+…xn)/n • AM(10,50,90) = (10+50+90)/3 = 50 • Geometric mean(x1,x2,…xn) • nth root(x1*x2*…*xn) • GM(10,50,90) = (10*50*90)^1/3= 35-36 • Harmonic mean(x1,x2,..,xn) • n/ (1/x1 + 1/x2 + … + 1/xn) • HM(10,50,90) = 3/( 1/10 + 1/50 + 1/90) = 22.88

  32. Why geometric mean? Why not AM? • Arithmetic mean doesn’t preserve running time ratios (nor does harmonic mean) – geometric mean does • Example:

  33. Highly Parallel Architectures • For parallel machines/programs, performance depends on: • Inherent parallelism of application • Ability of machine to exploit parallelism • Less than full parallelism may result in performance << peak rate

  34. Amdahl’s Law • f = fraction of a program that is parallelizable • 1 –f = fraction of a program that is purely sequential • S(n) = effective speed with n processors • S(n) = S(1) / (1-f) + f/n • As n->infinity, S(n) -> S(1)/(1-f)

  35. Example • S(n) = S(1) / (1-f) + f/n • As n->infinity, S(n) -> S(1)/(1-f) • Let f = 0.5, infinite n, max S(inf) = 2 • Let f = 0.8, infinite n, max S(inf) = 5 • MIPS/MFLOPS not particularly useful for a parallel machine

  36. Are synthetic benchmarks useful for evaluating parallel machines? • Will depend on : inherent parallelism • Data parallelism • Code parallelism

  37. Data parallelism • multiple data items operated on in parallel by same op • SIMD machines • Works well with vectors, matrices, lists, sets • Metrics: • avg #data items operated on per op • (depends on problem size) • (#data items operated on / # data items) per op • Depends on type of problem

  38. Code parallelism • How finely can problem be divided into parallel sub-units? • Metric: average parallelism •  = Sum(n=1, inf) n f(n) • f(n) = fraction of code that can be split into at most n parallel activities • … not that easy to estimate • … not all that informative when you do .. • …dependencies may exist between parallel tasks, or between parallel and non-parallel sections of code

  39. Evaluating performance of parallel machines is more difficult than doing so for sequential machines • Problem: • Well-designed parallel algorithm depends on number of processors, interconnection pattern (bus, crossbar, mesh), interaction mechanism(shared memory, message passing), vector register size • Solution: • pick the optimal algorithm for each machine • Problem: that’s hard to do! .. And may also depend on actual number of processors, etc. …

  40. Other complications • Language limitations, dependencies • Compiler dependencies • OS characteristics: • Timing (communication v. computation) • Process management (light v. heavy)

  41. More complications • Small benchmark may reside in cache (Dhrystone) • Large memory may eliminate paging for medium programs, and effects of poor paging scheme hidden • Benchmark may not have enough I/o • Benchmark may have dead code, optimizable code

  42. Metrics • Speedup : S(p) – running time of the best possible sequential alg / rt of the parallel imp using p processors • Efficiency = S(p) /p

More Related