1 / 41

CS1104: Computer Organisation comp.nus.sg/~cs1104

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104. School of Computing National University of Singapore. PII Lecture 2: Performance. Performance Definition Factors Affecting Performance Measurement Parameters for Performance Co-relation Among Performance Parameters

daxia
Download Presentation

CS1104: Computer Organisation comp.nus.sg/~cs1104

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 School of Computing National University of Singapore

  2. PII Lecture 2: Performance • Performance Definition • Factors Affecting Performance • Measurement Parameters for Performance • Co-relation Among Performance Parameters • Chapter 2 of Patterson’s book: 2.1, 2.2, 2.3 first part of 2.7 and 2.9. Performance

  3. Performance Definition • Purchasing perspective: • Given a collection of machines, which has the • best performance? • least cost? • best performance/cost? • Design perspective: • Faced with design options, which has the • best performance improvement? • least cost? • best performance/cost? Performance

  4. Performance Definition (2) • Both require • basis for comparison • metric for evaluation • Our goal is to understand performance of machine’s architectural design. Performance

  5. Two Notions of Performance • Which has higher performance? • Time to do ONE task (execution time) • Execution time, response time, latency • Tasks per day, hour, week, … (performance) • Throughput, bandwidth. • Response time and throughput might be in opposition. Performance

  6. Two Notions of Performance (2) • Time of AirBus vs. Boeing 747? • AirBus is 1350 mph / 610 mph = 2.2 times faster = 6.5 hours / 3 hours • Throughput of AirBus vs. Boeing 747? • AirBus is 178,200 pmph / 286,700 pmph = 0.62 “times faster” • Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster” • Boeing is 1.6 times (“60%”) faster in terms of throughput. • AirBus is 2.2 times (“120%”) faster in terms of flying time. Performance

  7. Definitions • Performance is in units of things-per-second • Bigger is better • If we are primarily concerned with response time • “X is n times faster than Y” means the speedup n is: Performance

  8. Computer Performance • Response Time (latency) • Time between start and end of an event (execution time). • How long does it take for my job to run? • How long does it take to execute a job? • How long must I wait for the database query? • Throughput • Total amount of work (or number of jobs) done in a given time. • How many jobs can the machine run at once? • What is the average execution rate? • How much work is getting done? Performance

  9. Computer Performance (2) • If we upgrade a machine with a new processor, what do we increase? Answer: Response Time. • If we add a new machine to the lab what do we increase? Answer: Throughput. Performance

  10. Execution Time • Elapsed Time • counts everything (disk and memory accesses, I/O, etc.) • a useful number, but often not good for comparison purposes. • CPU time • doesn't count I/O or time spent running other programs. • can be broken up into system time, and user time. Performance

  11. Execution Time (2) • Our focus: user CPU time • time spent executing the lines of code that are "in" our program. Performance

  12. Clock Cycle time Execution Time (3) • Instead of reporting execution time in seconds, we often use clock cycles (basic time unit in machine). • Cycle time (or cycle period) = time between two consecutive rising edges, measured in seconds. • Clock rate (or clock frequency) = cycles per second (1 Hz = 1 cycle/second). • Example: A 200 MHz clock has cycle time of 1/(200x106) = 5 x 10-9 seconds = 5 nanoseconds. ... Performance

  13. Execution Time (4) • Therefore, to improve performance (everything else being equal), you can do the following: Reduce the number of cycles for a program, or  Reduce the clock cycle time, or said in another way,  Increase the clock rate. Performance

  14. 1st instruction 2nd instruction 3rd instruction ... 4th 5th 6th Clock Cycles in a Program Could we assume that number of cycles = number of instructions? Or number of cycles is proportional to number of instructions? Performance

  15. Clock Cycles in a Program (2) • No, the assumption is incorrect. • The tasks of different instructions take different amount of time to finish. • For example, a Multiple instruction may take more cycles than an Add instruction, floating-point operations take longer than integer operations. • Accessing memory takes more time than accessing registers. Performance

  16. Example 1: Clock Rate • Our favorite program runs in 10 seconds on computer A, which has a 400 MHz clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target at? • ANSWER: For A: Time = 10 sec. = #cycles * 1/400MHz For B: Time = 6 sec. = 1.2 * #cycles * 1/clock_rate => clock_rate = 10 * 400 * 1.2 / 6 MHz = 800MHz Performance

  17. Cycles per Instruction (CPI) • A given program will require • some number of instructions (machine instructions) • some number of cycles • some number of seconds X CPI X cycle time Performance

  18. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Cycles per Instruction (CPI) (2) • Average cycles per instruction • CPI = (CPU Time * Clock Rate) / Instruction Count • = Clock Cycles / Instruction Count Invest resources where time is spent! where Ik = instruction frequency Performance

  19. Aspects of CPU Performance • Performance is determined by execution time. • Does any of these variables equal performance? • number of cycles to execute program? • number of instructions in program? • number of cycles per second? • average number of cycles per instruction? • average number of instructions per second? • Answer: No to all. • Common pitfall: thinking that one of the variables is indicative of performance when it really isn’t. Performance

  20. CPU time = Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Aspects of CPU Performance (2) • CPU performance depends on: • Clock cycle time  Hardware technology and organization • CPI  Organization and ISA • Instruction Count  ISA and compiler • Be careful of the following concepts: • Machine  ISA and hardware organization • Machine  cycle time • ISA + hardware organization  number of cycles for any instruction (this is not average CPI) • ISA + compiler + program  number of instructions executed • Therefore, ISA + Compiler + Program + Hardware organization + Cycle time  Total CPU time. Performance

  21. Example 2: CPI • Suppose we have two implementations of the same instruction set architecture (ISA). For some program,Machine A has a clock cycle time of 10 ns and a CPI of 2.0.Machine B has a clock cycle time of 20 ns and a CPI of 1.2.What machine is faster for this program, and by how much? • ANSWER:Machine A: time = #_Inst * 2.0 (CPI) * 10 ns (cycle time) Machine B: time = #_Inst * 1.2 (CPI) * 20 ns (cycle time) Performance(A) / Performance(B) = 1.2 * 20 / (2.0 * 10) = 1.2 Performance

  22. Example 3: Number of Instructions • A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). First code seq. has 5 instructions: 2 of A, 1 of B, and 2 of CSecond seq. has 6 instructions: 4 of A, 1 of B, and 1 of C.Which sequence will be faster? How much?What is the CPI for each sequence? • ANSWER:CPI(S1) = (2 * 1 + 1 * 2 + 2 * 3) / (2 + 1 + 2) = 2CPI(S2) = (4 * 1 + 1 * 2 + 1 * 3) / (4 + 1 + 1) = 1.5Time(S1) = (2 * 1 + 1 * 2 + 2 * 3) * cycle_time = 10 c_tTime(S2) = (4 * 1 + 1 * 2 + 1 * 3) * cycle_time = 9 c_tTime(S1) / Time(S2) = 10/9 = 1.1 and S2 is faster. Performance

  23. Sample Question You are given two machine designs IA1 and IA2 for performance benchmarking. Both IA1 and IA2 have the same ISA, but different hardware implementations and compilers. Assuming that the clock cycle times for IA1 & IA2 are the same, performance study gives the following measurements for the two designs: For design IA1: Instruction Class CPI No. of Instructions Executed A 1 3,000,000,000,000,000,000 B 2 2,000,000,000,000,000,000 C 3 2,000,000,000,000,000,000 D 4 1,000,000,000,000,000,000 For design IA2: A 2 2,700,000,000,000,000,000 B 3 1,800,000,000,000,000,000 C 3 1,800,000,000,000,000,000 D 2 900,000,000,000,000,000 Performance

  24. Sample Question (2) • What is the CPI for each machine (IA1 and IA2)? Let Y = 1,000,000,000,000,000,000. CPI(IA1) = (1*3Y + 2*2Y + 3*2Y + 4*1Y) / (3Y + 2Y + 2Y + 1Y) = 17Y / 8Y = 2.125 CPI(IA2) = [(2*3Y+3*2Y+3*2Y+2*1Y)*0.9] / [(3Y+2Y+2Y+1Y)*0.9] = 20Y / 8Y = 2.5 • Which machine is faster? By how much? Let C = clock cycle. Time(IA1) = 2.125 * (8Y * C) Time(IA2) = 2.5 * 0.9 * (8Y * C) = 2.25 * (8Y * C) IA1 is faster than IA2 by 2.25 / 2.125 = 1.0588 Performance

  25. Sample Question (3) • To further improve the performance of the machines, a new compiler technique is introduced. The compiler can simply eliminate all class D instructions from the benchmark program without any side effects. That is, there is no change to the number of class A, B, and C instructions executed in the two machines. With this new technique, which machine is faster? By how much? Let Y = 1,000,000,000,000,000,000 and C = clock cycle. CPI(IA1) = (1 * 3Y + 2 * 2Y + 3 * 2Y) / (3Y + 2Y + 2Y) = 13Y / 7Y = 1.857 CPI(IA2) = [(2*3Y + 3*2Y + 3*2Y) * 0.9] / [(3Y + 2Y + 2Y) * 0.9] = 18Y / 7Y = 2.571 Time(IA1) = 1.857 * (7Y * C) Time(IA2) = 2.571 * 0.9 * (7Y * C) = 2.314 * (7Y * C) IA1 is faster than IA2 by 2.314 / 1.857 = 1.246 Performance

  26. Sample Question (4) • Alternatively, to further improve the performance of the machines, a new hardware technique is introduced. The hardware can simply execute all class D instructions in zero times without any side effects. (There is still execution for class D instructions). With this new technique, which machine is faster? By how much? Let Y = 1,000,000,000,000,000,000 and C = clock cycle. CPI(IA1) = (1 * 3Y + 2 * 2Y + 3 * 2Y + 0 * Y) / (3Y + 2Y + 2Y + Y) = 13Y / 8Y = 1.625 CPI(IA2) = [(2*3Y+3*2Y+3*2Y +0*Y) * 0.9] / [(3Y+2Y+2Y+Y) * 0.9] = 18Y / 8Y = 2.25 Time(IA1) = 1.625 * (8Y * C) Time(IA2) = 2.25 * 0.9 * (8Y * C) = 2.025 * (8Y * C) IA1 is faster than IA2 by 2.025 / 1.625 = 1.2461 Performance

  27. Key Concepts • Performance is specific to a particular program. • Total execution time is a consistent summary of performance • For a given architecture, performance increase comes from: • increase in clock rate (without adverse CPI affects) • improvement in processor organisation that lowers CPI • compiler enhancement that lowers CPI and/or instruction count • Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance. Performance

  28. Key Concepts (2) • Basic formula for execution time. • Performance factors: CPI, total number of instructions executed, cycle time. • Factors affecting their values • Calculation of individual values • Note: Change of one factor might change more than one component (eg: reduction of instruction types executed) • Different performance matrix: what it can tell and what it might mislead people. Performance

  29. Sample Question (1) • Which of the following statements will affect the program execution time? • Compiler technology • Instruction set architecture • Integration circuit chip technology • Application software • (i) and (iii) • (ii) and (iii) • (i), (ii), and (iii) • All of the above • None of the above [Answer] Performance

  30. Sample Question (2) • Execution time can always be improved by: • Combining simple instructions into complex instructions in the given ISA • Improving the clock speed of the processor • Using a new compiler that produces a shorter sequence of program code • (ii) only • (i) and (ii) • (ii) and (iii) • All of the above • None of the above [Answer] Performance

  31. Sample Question (3) • The overall (average) CPI of a given program is: • decreased by decreasing the total number of instructions executed in a program, assuming all other hardware and software parameters remain the same. • the sum of the CPIs of the individual types of instructions executed in the program. • the average of the CPIs of the individual classes of instructions executed in the program. That is, if instruction class A, B, C, D, with corresponding CPI 1, 2, 3, 4, appear in a program, the overall CPI will be (1+2+3+4)/4. • decreased by doubling the clock speed of the processor, assuming all other hardware and software parameters remain the same. • None of the above. [Answer] Performance

  32. Sample Question (4) • Assuming all other hardware and software parameters remain the same, the total number of instructions executed in a program can be changed by: • changing the ISA of the processor. • reducing the number of cycles needed to execute each instruction type in a program. • increasing the number of cycles needed to execute each instruction type in a program. • doubling the individual CPI of each instruction class executed in a program. • None of the above. [Answer] Performance

  33. Sample Question (5) • Execution time can always be improved by: • changing the instruction set architecture of the processor from a complex one to a simple one. • changing the instruction set architecture of the processor from a simple one to a complex one. • changing the compiler so that a lower overall CPI value (and different program code sequence) is obtained. • (i) and (iii) • (i) only • (ii) only • (iii) only • None of the above [Answer] Performance

  34. Sample Question (6) • Which of the following affects the overall CPI of a program on a given ISA? • The CPIs of individual classes of instructions defined in the ISA. • The number of instructions executed in the program. • The clock speed of the processor. • (i) only • (ii) only • (iii) only • (i) and (ii) only • (i), (ii) and (iii) [Answer] Performance

  35. Sample Question (7) • A program takes 5 seconds to execute on machine MM1 and 10 seconds to execute on machine MM2. Which of the following conclusions must be true? • The number of instructions executed on MM1 is definitely fewer than that on MM2. • The overall CPI on MM1 is definitely smaller than that on MM2. • The clock speed of MM1 is definitely higher than that of MM2. • All (a), (b), (c). • None of the above. [Answer] Performance

  36. Sample Question (8) • To improve performance of a program, state what (increase/decrease/do nothing) you would do to the following, assuming that everything else being equal: • The number of required cycles. • The number of instructions executed. • The clock cycle time. • The clock rate. [Decrease] [Decrease] [Decrease] [Increase] Performance

  37. Sample Question (9) • Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. Class CPI Frequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) execution time, if: • Instruction distribution is: A: 40%, B: 30%, C:20%, D:10%. • The CPIs of all instruction classes are doubled. Answer increase/decrease/unchanged. a) CPI decreases, execution time decreases. b) CPI increases, execution time increases. Performance

  38. Sample Question (10) • State whether these statements are true or false: • Instruction set architecture is mainly about the digital and circuit design of microprocessor. • My 400MHz PC is always faster than the SoC’s SUN450 system, which runs at 336 MHz. • CPU execution time of a program is defined as the elapsed time between the program submission to the generation of the result. a) False. b) False. c) False. Performance

  39. Sample Question (11) • Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. Class CPI Frequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) execution time, if: • Equal instruction distribution among 4 classes (25% each) • Clock rate is increased to 800 MHz. Answer increase/decrease/unchanged. a) CPI decreases, execution time decreases. b) CPI unchanged, execution time decreases. Performance

  40. Sample Question (12) • Given a machine at 500 MHz, the measurement of a program execution is as follows: Instr. Class CPI Frequency A 1 10% B 2 20% C 3 30% D 4 40% What happens to (i) overall CPI, (ii) instruction count, (iii) clock speed, if: • A different compiler is used to produce the executable code of a program. • A new hardware implement of the given ISA is used. Answer changed/unchanged. a) CPI changed, IC changed, CS unchanged. b) CPI changed, IC unchanged, CS changed. Performance

  41. End of file Performance

More Related