1 / 43

Performance Measurement

Performance Measurement. Assignment? Timing. #include <sys/time.h> double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }. A Quantitative Basis for Design. Parallel programming is an optimization problem.

beulahboyd
Download Presentation

Performance Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Measurement • Assignment? • Timing #include <sys/time.h> double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec * 1e-6); }

  2. A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • execution time • scalability • efficiency

  3. A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • memory requirements • implementation costs • maintenance costs etc.

  4. A Quantitative Basis for Design • Parallel programming is an optimization problem. • Must take into account several factors: • Also must take into account the costs: • Mathematical performance models are used to asses these costs and predict performance.

  5. Defining Performance • How do you define parallel performance? • What do you define it in terms of? • Consider • Distributed databases • Image processing pipeline • Nuclear weapons testbed

  6. Amdahl's Law • Every algorithm has a sequential component. • Sequential component limits speedup Maximum Speedup Sequential Component = 1/s = s

  7. Amdahl's Law s Speedup

  8. What's wrong? • Works fine for a given algorithm. • But what if we change the algorithm? • We may change algorithms to increase parallelism and thus eventually increase performance. • May introduce inefficiency

  9. Metrics for Performance • Efficiency • Speedup • Scalability • Others …………..

  10. T1 = E pTp Efficiency The fraction of time a processor spends doing useful work • What about when pTp < T1 • Does cache make a processor work at 110%?

  11. Speedup What is Speed? Speed 1 What algorithm for Speed1? = S SpeedP What is the work performed? How much work?

  12. Two kinds of Speedup • Relative • Uses parallel algorithm on 1 processor • Most common • Absolute • Uses best known serial algorithm • Eliminates overheads in calculation.

  13. Speedup • Algorithm A • Serial execution time is 10 sec. • Parallel execution time is 2 sec. • Algorithm B • Serial execution time is 2 sec. • Parallel execution time is 1 sec. • What if I told you A = B?

  14. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of logic is the syllogism, consisting of a major and minor premise and a conclusion.

  15. Example • Major Premise: Sixty men can do a piece of work sixty times as quickly as one man. • Minor Premise: One man can dig a post-hole in sixty seconds. • Conclusion: Sixty men can dig a post-hole in one second.

  16. Performance Analysis Statements • There is always a trade-off between time and solution quality. • We should compare the quality of the answer for a given execution time. • For any performance reporting, find and clearly state the quality measure.

  17. Speedup • Conventional speedup is defined as the reduction in execution time. • Consider running a problem on a slow parallel computer and on a faster one. • Same serial component • Speedup will be lower on the faster computer.

  18. Speedup and Amdahl's Law • Conventional speedup penalizes faster absolute speed. • Assumption that task size is constant as the computing power increases results in an exaggeration of task overhead. • Scaling the problem size reduces these distortion effects.

  19. Solution • Gustafson introduces scaled speedup. • Scale the problem size as you increase the number of processors. • Calculated in two ways • Experimentally • Analytical models

  20. Traditional Speedup T ( N ) = 1 Speedup T ( N ) P T1 is time taken on a single processor TP is time taken on P processors

  21. T ( PN ) = 1 Speedup T ( PN ) P Scaled Speedup T1 is time taken on a single processor TP is time taken on P processors

  22. Scaled Speedup vs Traditional

  23. Traditional Speedup ideal Speedup measured Number of Processors

  24. Scaled Speedup Large Problem ideal Speedup Medium problem Small problem Number of Processors

  25. Performance Measurement • There is not a perfect way to measure and report performance. • Wall clock time seems to be the best. • But how much work do you do? • Best Bet: • Develop a model that fits experimental results.

  26. A Parallel Programming Model • Goal: Define an equation that predicts execution time as a function of • Problem size • Number of processors • Number of tasks • Etc. = T f ( N , P ,....)

  27. A Parallel Programming Model • Execution time can be broken up into • Computing • Communicating • Idling æ ö - - - P 1 P 1 P 1 1 å å å = + + ç ÷ i i i T T T T comp comm idle P è ø = = = i 0 i 0 i 0

  28. Computation Time • Normally depends on problem size • Also depends on machine characteristics • Processor speed • Memory system • Etc. • Often, experimentally obtained

  29. Communication Time • The amount of time spent sending & receiving messages • Most often is calculated as • Cost of sending a single message * #messages • Single message cost • T = startuptime + time_to_send_one_word * #words

  30. Idle Time • Difficult to determine • This is often the time waiting for a message to be sent to you. • Can be avoided by overlapping communication and computation.

  31. ´ ´ n n z Finite Difference Example • Finite Difference Code • 512 x 512 x 5 Elements • Nine-point stencil • Row-wise decomposition • Each processor gets n/p*n*z elements • 16 IBM RS6000 workstations • Connected via Ethernet

  32. Finite Difference Model • Execution Time (per iteration) • ExTime = (Tcomp + Tcomm)/P • Communication Time (per iteration) • Tcomm = 2 (lat + 2*n*z*bw) • Computation Time • Estimate using some sample code

  33. Estimated Performance

  34. Finite Difference Example

  35. What was wrong? • Ethernet • Shared bus • Change the computation of Tcomm • Reduce the bandwith • Scale the message volume by the number of processors sending concurrently. • Tcomm = 2 (lat + 2*n*z*bw * P/2)

  36. Finite Difference Example

  37. Using analytical models • Examine the control flow of the algorithm • Find a general algebraic form for the complexity (execution time). • Fit the curve with experimental data. • If the fit is poor, find the missing terms and repeat. • Calculate the scaled speedup using formula.

  38. + C ( PN ) 2 12 ( 4 ( 128 )) 6146 = = = 1 3 . 93 + + C ( PN ) 4 12 ( 4 ( 128 ) / 4 ) 5 ( 4 ) 1560 P Example • Serial Time = 2 + 12 N seconds • Parallel Time = 4 + 12 N/P + 5P seconds • Let N/P = 128 • Scaled Speedup for 4 processors is:

  39. Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data

  40. Performance Evaluation • Identify the data • Execution time • Be sure to examine a range of data points • Design the experiments to obtain the data • Report data

  41. Performance Evaluation • Identify the data • Design the experiments to obtain the data • Make sure the experiment measures what you intend to measure. • Remember: Execution time is max time taken. • Repeat your experiments many times • Validate data by designing a model • Report data

  42. Performance Evaluation • Identify the data • Design the experiments to obtain the data • Report data • Report all information that affects execution • Results should be separate from Conclusions • Present the data in an easily understandable format.

More Related