1 / 46

September 19, 2007 Karem Sakallah ksakalla@qatar.cmu qatar.cmu/~msakr/15447-f07/

CS-447– Computer Architecture M,W 10-11:20am Lecture 7 Performance (Cont’d). September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/. Today. Lecture & Discussion Next Lecture: Review. Done by now. Read the chapters & slides.

vanhoose
Download Presentation

September 19, 2007 Karem Sakallah ksakalla@qatar.cmu qatar.cmu/~msakr/15447-f07/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS-447– Computer Architecture M,W 10-11:20amLecture 7Performance (Cont’d) September 19, 2007 Karem Sakallah ksakalla@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/

  2. Today • Lecture & Discussion • Next Lecture: Review Done by now • Read the chapters & slides. • Practice the performance examples in the Patterson book.

  3. Assessing & Understanding Performance This chapter discusses how to measure, report, and summarize performance of a computer.

  4. Motivation It is often helpful to have some yardstick by which to compare systems • During development to evaluate different algorithms or optimizations • During purchasing to compare between product offerings • …

  5. Performance • Measure, Report, and Summarize • Make intelligent choices • See through the marketing hype • Key to understanding underlying organizational motivation

  6. Performance Why is some hardware better than others for different programs?What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)How does the machine's instruction set affect performance?

  7. Which of these airplanes has the best performance? Airplane Passengers Range (mi) Speed (mph) Boeing 737-100 101 630 598 Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544 • How much faster is the Concorde compared to the 747? • How much bigger is the 747 than the Douglas DC-8?

  8. Computer Performance • Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query? • Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done?

  9. Execution Time • Elapsed Time • counts everything (disk and memory accesses, I/O , etc.) • a useful number, but often not good for comparison purposes

  10. Execution Time • CPU time • doesn't count I/O or time spent running other programs • can be broken up into system time, and user time • Our focus: user CPU time • time spent executing the lines of code that are "in" our program

  11. Definition of Performance • For some program running on machine X, PerformanceX = 1 / Execution timeX "X is n times faster than Y" PerformanceX / PerformanceY = n

  12. Definition of Performance Problem: • machine A runs a program in 20 seconds • machine B runs the same program in 25 seconds

  13. Comparing and Summarizing Performance How to compare the performance? Total Execution Time : A Consistent Summary Measure

  14. time Clock Cycles • Instead of reporting execution time in seconds, we often use cycles • Clock “ticks” indicate when to start activities (one abstraction):

  15. Clock cycles • cycle time = time between ticks = seconds per cycle • clock rate (frequency) = cycles per second (1 Hz = 1 cycle/sec)A 4 Ghz clock has a 250ps cycle time

  16. CPU execution time for a program = (CPU clock cycles for a program) x (clock cycle time) Seconds Cycles Seconds = ´ Program Program Cycle cycles cycle = / Program sec onds = cycle / sec onds clock rate CPU Execution Time

  17. How to Improve Performance So, to improve performance (everything else being equal) you can either increase or decrease?________ the # of required cycles for a program, or________ the clock cycle time or, said another way, ________ the clock rate.

  18. How to Improve Performance So, to improve performance (everything else being equal) you can either increase or decrease?_decrease_ the # of required cycles for a program, or_decrease_ the clock cycle time or, said another way, _increase_ the clock rate.

  19. 1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th time How many cycles are required for a program? Could assume that # of cycles equals # of instruction This assumption is incorrect, different instructions take different amounts of time on different machines.

  20. Different numbers of cycles for different instructions • Multiplication takes more time than addition • Floating point operations take longer than integer ones • Accessing memory takes more time than accessing registers • Important point: changing the cycle time often changes the number of cycles required for various instructions time

  21. Now that we understand cycles

  22. CPI CPU clock cycles = Instructions for a program x Average clock cycles per Instruction (CPI) CPU time = Instruction count x CPI x clock cycle time

  23. Performance • Performance is determined by execution time • Do any of the other variables equal performance? • # of cycles to execute program? • # of instructions in program? • # of cycles per second? • average # of cycles per instruction? • average # of instructions per second? • Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

  24. CPU Clock Cycles CPIi : the average number of cycles per instructions for that instruction class Ci : the count of the number of instructions of class i executed. n : the number of instruction classes.

  25. Example • Instruction Classes: • Add • Multiply • Average Clock Cycles per Instruction: • Add 1cc • Mul 3cc • Program A executed: • 10 Add instructions • 5 Multiply instructions

  26. Quiz An application using a desktop client and a remote server is limited by network performance. What happens to response time and throughput when: • An extra network channel is added • Networking software is upgraded to reduce communications delay • More memory is added to the desktop computer

  27. Formula Summary • T: Execution Time (seconds) • C: Total Number of Cycles • f: Clock Frequency (cycles/second) • I: (Dynamic) Instruction Count • Ij: Count for Instructions of type j • Cj: Cycles per Instruction of type j T = C / f C = I1 x C1 + … + Ik x Ck I = I1 + I2 + … + Ik CPI = C / I T = (I x CPI) / f

  28. Performance Calculation Example: fact(4) • fact: • pushl %ebp # Setup • movl %esp,%ebp # Setup • movl $1,%eax # eax = 1 • movl 8(%ebp),%edx # edx = x • L11: • imull %edx,%eax # result *= x • decl %edx # x— • cmpl $1,%edx # Compare x:1 • jg L11 # if > repeat • movl %ebp,%esp # Finish • popl %ebp # Finish • ret # Finish f = 1GHz Calculate: T, C, I, & CPI when fact is executed with input x = 4

  29. Performance Calculation Example: fact(4) • fact: • pushl %ebp • movl %esp,%ebp • movl $1,%eax • movl 8(%ebp),%edx • L11: • imull %edx,%eax • decl %edx • cmpl $1,%ed • jg L11 • movl %ebp,%esp • popl %ebp • ret

  30. Benchmarks • Performance best determined by running a real application • Use programs typical of expected workload • Or, typical of expected class of applicationsex: compilers/editors, scientific applications, graphics • Small benchmarks • nice for architects and designers • easy to standardize • can be abused

  31. Benchmarks (2) • SPEC (Standard Performance Evaluation Corporation) • companies have agreed on a set of real programs and inputs • valuable indicator of performance (and compiler technology) • can still be abused

  32. Standard Performance Evaluation Corporation • SPEC is supported by a number of computer vendors to create standard sets of benchmarks for modern computer systems. • The SPEC benchmark sets include CPU performance, graphics, High-performance computing, Object-oriented computing, Java applications, Client-server models, Mail systems, File systems, and Web servers.

  33. SPEC ‘89 • Compiler “enhancements” and performance

  34. SPEC CPU Benchmarks CINT2000 : the SPEC ratio for the integer benchmark sets CFP2000 : the SPEC ratio for the floating-point benchmark sets.

  35. SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?

  36. SPEC 2000 Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?

  37. Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement )

  38. Example • Application execution time = 20sec • 12 seconds are spent performing add operations • If we improve the add operation to run twice as fast, how much faster will the application run?

  39. Amdahl’s Law • Example:"Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"

  40. Amdahl's Law Execution time after improvement

  41. MIPS (million instructions per second) Example • Which code sequence will execute faster according to MIPS? • According to execution time?

  42. 10 10 9 = = Execution time1 2 . 5 seconds ´ 4 10 9 ´ 15 10 9 = = Execution time2 3 . 75 seconds ´ 4 10 9 Execution time & MIPS CPU clock cycles1 = (5 x 1+1 x 2+1 x 3) x 109 = 10 x 109 CPU clock cycles2 = (10 x 1+1 x 2+1 x 3) x 109 = 15 x 109 ´

  43. Execution time & MIPS (2)

  44. Performance Evaluation • Performance depends on • Hardware architecture • Software environment • Meaning of performance depends on viewpoint • User: time • System Manager: throughput

  45. Performance Evaluation • Kinds of Performance • Graphics • Network • Transactional • Multi-user system • I/O • Scientific/Engineering codes

  46. Example on the MIPS R10K Prof run at: Tue Apr 28 15:50:26 1998 Command line: prof suboptim.ideal.m28293 109148754: Total number of cycles 0.55974s: Total execution time 77660914: Total number of instructions executed 1.405: Ratio of cycles / instruction 195: Clock rate in MHz R10000: Target processor modelled cycles(%) cum % secs instrns calls procedure 61901843(56.71) 56.71 0.32 45113360 1 pdot 47212563(43.26) 99.97 0.24 32523280 1 init 31767( 0.03) 100.00 0.00 21523 1 vsum 1069( 0.00) 100.00 0.00 887 3 fflush : : : : : :

More Related