1 / 20

Potential for Parallel Computation, part 2

Potential for Parallel Computation, part 2. Module 3. Behavior of Algorithms. Small Problems Can calculate actual performance measures Size (operations); time units Often can generalize, but need proof For large problems, characterize the asymptotic behavior i.e. Big Oh!! An upper bound.

lotus
Download Presentation

Potential for Parallel Computation, part 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Potential for Parallel Computation, part 2 Module 3

  2. Behavior of Algorithms • Small Problems • Can calculate actual performance measures • Size (operations); time units • Often can generalize, but need proof • For large problems, characterize the asymptotic behavior • i.e. Big Oh!! • An upper bound

  3. Big Oh Definition Assume f(n) & g(n). f(n) is of order g(n), that is, f(n) = 0 (g(n)) iff there exist constants C & N such that for n>N, |f(n)| < c |g(n)| • That is, f(n) grows no faster than g(n) • g(n) is an upper bound

  4. Omega Ω -- lower bound As in Big Oh, except f(n) = Ω (g(n)) means |f(n)| > c.|g(n)| • g(n) is a lower bound

  5. Exact Bound If both f(n) = O(g(n)) and f(n) = Ω(g(n)) then f(=n) = Θ(g(n)) and we say g(n) is an exact bound for f(n) (aka tight-bound)

  6. Speedup and Efficiency of Algorithms For any given computation (algorithm): Let Tp be the time to perform a computation with P processors. (arithmetic units, or PEs) We assume that any P independent operations can be done simultaneously. Note: The Depth of an algorithm T , the minimum execution time. The Speedup with P processors isSp = T1 / Tp and Efficiency isEp = Sp / P

  7. These numbers, SP and EP, refer to an algorithm and not to a machine. Similar numbers can be defined for specific hardware. The time T1 can be chosen in different ways: To evaluate how good an algorithm is, it should be the time for the “BEST” sequential algorithm.

  8. The Minimum Number of Processors Giving the Maximum Speedup: Let P be the minimum number of P of processors such that TP = T i.e.P = min { P | TP = T } Then, TP, SP, EP are the best known time, speedup, and efficiency, respectively.

  9. E<8> = A + B(CDE + F + G) + H

  10. Including the distributive law, we can get an even smaller depth. But the number of operations will increase.

  11. Performance • Consider small problems • e.g. Evaluation of Arithmetic Expression • Not much improvement possible • Even auto-optimization probably takes longer than sequential processing • Gains: Problems that are “compute bound” i.e. processor bound; computation intensive

  12. Factors Influencing Performance

  13. Impacting Performance Hardware (user has no influence) • Digital logic • Clock speed • Circuit interconnection Architecture • Sequential Ns degree of parallelism • ALU, CU, Memory, Cache • Synchronization among processors

  14. Impacting Performance Operating System • Shared resources • Process control, synchronization, data movement • I/O Programming Language • Operations available & the implementation • Compiler / optimizations

  15. Impacting Performance Program • Organization & style; structure • Data structures • Study of compiler design is helpful Algorithm • Often tradeoff memory vs speed • Use “reasonable” algorithms, often have only minimal impact

  16. Amdahl’s Law Let T(P) be the execution time with hardware parallelism P. Let S be the time doing the sequential part of the work and Time to do the parallel part of the work sequentially is Q, i.e., S and Q are the sequential and parallel amounts of work measured by time on one processor, The total time is

  17. Amdahl’s Law Expressing this in terms of the fraction of serial work Amdahl’s law states that Speedup Efficiency

  18. There are several consequences of this simple performance model. In order to achieve at least 50% efficiency on the program with hardware parallelism P, f can be no larger than . This becomes harder and harder to achieve as P becomes large. Amdahl used this result to argue that sequential processing was best, But it has several useful interpretations in different parallel environments:

  19. • A very small amount of unparallelized code can have a very large effect on efficiency if the parallelism is large • A fast vector processor must also have a fast scalar processor in order to achieve a sizeable fraction of its peak performance • Effort in parallelizing a small fraction of code that is currently executed sequentially may pay off in large performance gains • Hardware that allows even a small fraction of new things to be done in parallel may be considerably more efficient.

  20. Although Amdahl’s law is a simple performance model, it need not be taken simplistically. The behavior of the sequential fraction, f, for example, can be quite important. System sizes, especially the number, P, of processors are often increased for the purpose of running larger problems. Increasing the problem size often does not increase the absolute amount of sequential work significantly. In this case, f is a decreasing function of problem size, and if problem size is increased with P, the somewhat pessimistic implications of equations look much more favorable. see Problem 2.16 for a specific example. The behavior of performance as both problem and system size increase is called scalability.

More Related