Lecture 2: Performance Measurement

Lecture 2:Performance Measurement

Performance Evaluation • The primary duty of software developers is to create functionally correct programs • Performance evaluation is a part of software development for well-performing programs

Performance Analysis Cycle • Have an optimization phase just like testing and debugging phase Code Development Functionally complete and correct program Measure Analyze Modify / Tune Complete, correct and well-performing program Usage

Goals of Performance Analysis The goal of performance analysis is to provide quantitative information about the performance of a computer system

Goals of Performance Analysis • Compare alternatives • When purchasing a new computer system, to provide quantitative information • Determine the impact of a feature • In designing a new system or upgrading, to provide before-and-after comparison • System tuning • To find the best parameters that produce the best overall performance • Identify relative performance • To quantify the performance relative to previous generations • Performance debugging • To identify the performance problems and correct them • Set expectations • To determine the expected capabilities of the next generation

Performance Evaluation Performance Evaluation steps: • Measurement / Prediction • What to measure? How to measure? • Modeling for prediction • Simulation • Analytical Modeling • Analysis & Reporting • Performance metrics

Performance Measurement Interval Timers Hardware Timers Software Timers

Performance Measurement Hardware Timers Counter value is read from a memory location Time is calculated as Tc Clock Counter n bits to processor memory bus Time = (x2 - x1) x Tc

Performance Measurement Software Timers Interrupt-based When interrupt occurs, interrupt-service routine increments the timer value which is read by a program Time is calculated as Tc Clock Prescaling Counter T’c to processor interrupt input Time = (x2 - x1) x T’c

Performance Measurement Timer Rollover Occurs when an n-bit counter undergoes a transition from its maximum value 2n – 1 to zero There is a trade-off between roll over time and accuracy

Timers Solution: Use 64-bit integer (over half a million year) Timer returns two values: One represents seconds One represents microseconds since the last second With 32-bit, the roll over is over 100 years

Performance Measurement Interval Timers T0  Read current time Event being timed (); T1  Read current time Time for the event is: T1-T0

Performance Measurement Timer Overhead Initiate read_time Current time is read Event begins Event ends; Initiate read_time Current time is read Measured time: Tm = T2 + T3 + T4 Desired measurement: Te = Tm – (T2 + T4) = Tm – (T1 + T2) since T1 = T4 Timer overhead: Tovhd= T1 + T2 Te should be 100-1000 times greater than Tovhd . T1 T2 T3 T4

Performance Measurement Timer Resolution Resolution is the smallest change that can be detected by an interval timer. nT’c< Te < (n+1)T’c If Tcis large relative to the event being measured, it may be impossible to measure the duration of the event.

Performance Measurement Measuring Short Intervals Te < Tc Tc  1 Te Tc  0 Te

Performance Measurement Measuring Short Intervals Solution:Repeat measurements n times. Average execution time: T’e= (m x Tc) / n m: number of 1s measured Average execution time: T’e= (Tt / n ) – h Tt: total execution time of n repetitions h: repetition overhead Tc Te Tt

Performance Measurement Time Elapsed time / wall-clock time / response time Latency to complete a task, including disk access, memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system) CPU time The time CPU is computing, not including I/O time or waiting time User time / user CPU time CPU time spent in the program System time / system CPU time CPU time spent in the operating system performing tasks requested by the program

Performance Measurement UNIX time command 90.7u 12.9s 2:39 65% Drawbacks: Resolution is in milliseconds Different sections of the code can not be timed User time Elapsed time Percentage of elapsed time System time

Timers Timer is a function, subroutine or program that can be used to return the amount of time spent in a section of code. zero = 0.0; t0 = timer(&zero); … < code segment > … t1 = timer(&t0); time = t1; t0 = timer(); … < code segment > … t1 = timer(); time = t1 – t0;

Timers Read Wadleigh, Crawford pg 130-136 for: time, clock, gettimeofday, etc.

Timers Measuring Timer Resolution main() { . . . zero = 0.0; t0 = timer(&zero); t1 = 0.0; j=0; while (t1 == 0.0) { j++; zero=0.0; t0 = timer(&zero); foo(j); t1 = timer(&t0); } printf (“It took %d iterations for a nonzero time\n”, j); if (j==1) printf (“timer resolution <= %13.7f seconds\n”, t1); else printf (“timer resolution is %13.7f seconds\n”, t1); } foo(n){ . . . i=0; for (j=0; j<n; j++) i++; return(i); }

Timers Measuring Timer Resolution Using clock(): Using times(): Using getrusage(): It took 682 iterations for a nonzero time timer resolution is 0.0200000 seconds It took 720 iterations for a nonzero time timer resolution is 0.0200000 seconds It took 7374 iterations for a nonzero time timer resolution is 0.0002700 seconds

Timers Spin Loops For codes that take less time to run than the resolution of the timer First call to a function may require an inordinate amount of time. Therefore the minimum of all times may be desired. main() { . . . zero = 0.0; t2 = 100000.0; for (j=0; j<n; j++) { t0 = timer(&zero); foo(j); t1 = timer(&t0); t2 = min(t2, t1); } t2 = t2 / n; printf (“Minimum time is %13.7f seconds\n”, t2); } foo(n){ . . . < code segment > }

Profilers A profiler automatically insert timing calls into applications to generate calls into applications It is used to identify the portions of the program that consumes the largest fraction of the total execution time. It may also be used to find system-level bottlenecks in a multitasking system. Profilers may alter the timing of a program’s execution

Profilers Data collection techniques Sampling-based This type of profilers use a predefined clock; every multiple of this clock tick the program is interrupted and the state information is recorded. They give the statistical profile of the program behavior. They may miss some important events. Event-based Events are defined (e.g. entry into a subroutine) and data about these events are collected. The collected information shows the exact execution frequencies. It has substantial amount of run-time overhead and memory requirement. Information kept Trace-based: The compiler keeps all information it collects. Reductionist: Only statistical information is collected.

Performance Evaluation Performance Evaluation steps: • Measurement / Prediction • What to measure? How to measure? • Modeling for prediction • Simulation • Analytical Modeling • Queuing Theory • Analysis & Reporting • Performance metrics

Predicting Performance Performance of simple kernels can be predicted to a high degree Theoretical performance and peak performance must be close It is preferred that the measured performance is over 80% of the theoretical peak performance

Performance Evaluation Performance Evaluation steps: • Measurement / Prediction • What to measure? How to measure? • Modeling for prediction • Simulation • Analytical Modeling • Queuing Theory • Analysis & Reporting • Performance metrics

Performance Metrics Measurable characteristics of a computer system: Count of an event Duration of a time interval Size of a parameter Rate: Operations executed per second

Performance Mertrics Clock Speed Clock speed/frequency (f): the rate of clock pulses (ex: 1GHz) Cycle time (Tc): time between two clock pulses (Tc = 1/f) Tc

Performance Mertrics Instruction Execution Rate Cycles per Instruction (CPI): is an average depends on the design of micro-architecture (hardwired/microprogrammed, pipelined) Number of instructions: is the number of instructions executed at runtime Depends on instruction set architecture (ISA) compiler CPIi: number of cycles required for instruction i Ii: number of executed instructions of type i CPI =

Performance Metrics CPU Performance CPU time of a program (T) = instructions x cycles x time program instruction cycle CPI (cycles per instruction) T = instruction count x CPI x 1 f

Performance Metrics CPU Performance Drawbacks: In modern computers, no program runs without some operating system running on the hardware Comparing performance between machines with different operating systems will be unfair

Performance Metrics Execution time Elapsed time / wall-clock time / response time Latency to complete a task, including disk access, memory access, I/O, operating system overhead, and everything (includes time consumed by other programs in a time-sharing system) CPU time The time CPU is computing, not including I/O time or waiting time User time / user CPU time CPU time spent in the program System time / system CPU time CPU time spent in the operating system performing tasks requested by the program

Performance Metrics Performance Comparison Relative performance Performancex = 1 . Execution timeX Performance Ratio = PerformanceX = Execution timeY PerformanceY Execution timeX

Performance Metrics Relative Performance If workload consists of more than one program, total execution time may be used. If there are more than one machine to be compared, one of them must be selected as a reference.

Performance Metrics Throughput Total amount of work done in a given time Measured in tasks per time unit Can be used for Operating system performance Pipeline performance Multiprocessor performance

Performance Metrics MIPS (Million instructions per second) Includes both integer and floating point performance Number of instructions in a program varies between different computers Number of instructions varies between different programs on the same computer MIPS = Instruction count = Clock rate Execution time x 106 CPI x 106

Performance Metrics MFLOPS (Million floating point operations per second) Give performance of only floating-point operations Different mixes of integer and floating-point operations may have different execution times: Integer and floating-point units work independently Instruction and data caches provide instruction and data concurrently

Performance Metrics Utilization Speciality ratio 1  generalpurpose Utilization = Busy time . Total time Speciality ratio = Maximum performance . Minimum performance

Performance Metrics Asymptotic and Half performance r – asymptotic performance n1/2 – half performance T = r (n + n1/2) r = 1/t n1/2 = t0/t Slope = r-1 2t0 t0 n1/2 -n1/2

Performance Metrics Speedup Express how much faster is system 2 than system 1 Calculated directly from execution time Performancex = 1 = 1 Execution timeX TX Speedup2,1 = Performance2 = T1 Performance1 T2

Performance Metrics Relative Change It expresses the performance of system 2 relative to system 1 Performancex = 1 = 1 Execution timeX TX Relative change2,1 = Performance2 - Performance1 = T1 - T2 = Speedup2,1 - 1 Performance1 T2

Performance Metrics Statistical Analysis Used to compare performance Workload consists of many programs Depends on the nature of the data as well as distribution of the test results

Performance Metrics Indices of Central Tendency Used to summarize multiple measurements Mean Median Mode

Performance Metrics Mean (average) Gives equal weight to all measurements Arithmetic mean = S xi , 1 ≤ i ≤ n n

Performance Metrics Median Order all n measurements The middle value is the median. If n is even, median is the mean of the middle 2 values Using Median instead of Mean reduces the skewing effect of the outliers. Median = = 17

Performance Metrics Mode Mode is the value that occurs most frequently If all values occur once, there is no mode If there are several samples that all have the same value, there would be several modes Mode = 20

Mean, Median, Mode Mean • Incorporates information from the entire measured values • Sensitive to outliers Median and Mode • Less sensitive to outliers • Do not effectively use all information ex

Performance Metrics Arithmetic mean (average) May be misleading if the data are skewed or scattered Arithmetic mean = S xi , 1 ≤ i ≤ n n

Lecture 2: Performance Measurement