Lecture 3

1. 1

2. 2 COMP 206:Computer Architecture and Implementation Montek Singh Thu, Jan 22, 2009 Lecture 3: Quantitative Principles

3. 3 Quantitative Principles of Computer Design This is intro to design and analysis Take Advantage of Parallelism Principle of Locality Focus on the Common Case Amdahl�s Law The Processor Performance Equation

4. 4 1) Taking Advantage of Parallelism (exs.) Increase throughput of server computer via multiple processors or multiple disks Detailed HW design Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand Multiple memory banks searched in parallel in set-associative caches Pipelining (next slides)

5. 5 Pipelining Overlap instruction execution� � to reduce the total time to complete an instruction sequence. Not every instruction depends on immediate predecessor ? executing instructions completely/partially in parallel possible Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)

6. 6 Pipelined Instruction Execution

7. 7 Limits to pipelining Hazards prevent next instruction from executing during its designated clock cycle Structural hazards: attempt to use the same hardware to do two different things at once Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

8. 8 Increasing Clock Rate Pipelining also used for this Clock rate determined by gate delays

9. 9 2) The Principle of Locality The Principle of Locality: Programs access a relatively small portion of the address space. Also, reuse data. Two Different Types of Locality: Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon (e.g., loops, reuse) Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon (e.g., straight-line code, array access) Last 30 years, HW relied on locality for memory perf. The principle of locality states that programs access a relatively small portion of the address space at any instant of time. This is kind of like in real life, we all have a lot of friends. But at any given time most of us can only keep in touch with a small group of them. There are two different types of locality: Temporal and Spatial. Temporal locality is the locality in time which says if an item is referenced, it will tend to be referenced again soon. This is like saying if you just talk to one of your friends, it is likely that you will talk to him or her again soon. This makes sense. For example, if you just have lunch with a friend, you may say, let�s go to the ball game this Sunday. So you will talk to him again soon. Spatial locality is the locality in space. It says if an item is referenced, items whose addresses are close by tend to be referenced soon. Once again, using our analogy. We can usually divide our friends into groups. Like friends from high school, friends from work, friends from home. Let�s say you just talk to one of your friends from high school and she may say something like: �So did you hear so and so just won the lottery.� You probably will say NO, I better give him a call and find out more. So this is an example of spatial locality. You just talked to a friend from your high school days. As a result, you end up talking to another high school friend. Or at least in this case, you hope he still remember you are his friend. +3 = 10 min. (X:50)The principle of locality states that programs access a relatively small portion of the address space at any instant of time. This is kind of like in real life, we all have a lot of friends. But at any given time most of us can only keep in touch with a small group of them. There are two different types of locality: Temporal and Spatial. Temporal locality is the locality in time which says if an item is referenced, it will tend to be referenced again soon. This is like saying if you just talk to one of your friends, it is likely that you will talk to him or her again soon. This makes sense. For example, if you just have lunch with a friend, you may say, let�s go to the ball game this Sunday. So you will talk to him again soon. Spatial locality is the locality in space. It says if an item is referenced, items whose addresses are close by tend to be referenced soon. Once again, using our analogy. We can usually divide our friends into groups. Like friends from high school, friends from work, friends from home. Let�s say you just talk to one of your friends from high school and she may say something like: �So did you hear so and so just won the lottery.� You probably will say NO, I better give him a call and find out more. So this is an example of spatial locality. You just talked to a friend from your high school days. As a result, you end up talking to another high school friend. Or at least in this case, you hope he still remember you are his friend. +3 = 10 min. (X:50)

10. 10 Levels of the Memory Hierarchy

11. 11 3) Focus on the Common Case In making a design trade-off, favor the frequent case over the infrequent case e.g., Instruction fetch and decode unit used more frequently than multiplier, so optimize it 1st e.g., If database server has 50 disks / processor, storage dependability dominates system dependability, so optimize it 1st Frequent case is often simpler and can be done faster than the infrequent case e.g., overflow is rare when adding 2 numbers, so improve performance by optimizing more common case of no overflow May slow down overflow, but overall performance improved by optimizing for the normal case What is frequent case and how much is performance improved by making case faster => Amdahl�s Law

12. 12 4) Amdahl�s Law (History, 1967) Historical context Amdahl was demonstrating �the continued validity of the single processor approach and of the weaknesses of the multiple processor approach� Paper contains no mathematical formulation, just arguments and simulation �The nature of this overhead appears to be sequential so that it is unlikely to be amenable to parallel processing techniques.� �A fairly obvious conclusion which can be drawn at this point is that the effort expended on achieving high parallel performance rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude.� Nevertheless, it is of widespread applicability in all kinds of situations

13. 13 Speedup Book shows two forms of speedup eqn We will use the second because you get �speedup� factors like 2X

14. 14 4) Amdahl�s Law

15. 15 Amdahl�s Law example New CPU 10X faster I/O bound server, so 60% time waiting

16. 16 Amdahl�s Law for Multiple Tasks

17. 17 Example

18. 18 Another Example Note that there are three categories of operations here: floating point square root operations, other floating point operations, and non-FP operations. When we change one of these categories, we group the other two categories as the �other case�.Note that there are three categories of operations here: floating point square root operations, other floating point operations, and non-FP operations. When we change one of these categories, we group the other two categories as the �other case�.

19. 19 Solution using Amdahl�s Law

20. 20 Implications of Amdahl�s Law Improvements provided by a feature limited by how often feature is used As stated, Amdahl�s Law is valid only if the system always works with exactly one of the rates Overlap between CPU and I/O operations? Amdahl�s Law as given here is not applicable Bottleneck is the most promising target for improvements �Make the common case fast� Infrequent events, even if they consume a lot of time, will make little difference to performance Typical use: Change only one parameter of system, and compute effect of this change The same program, with the same input data, should run on the machine in both cases

21. 21 5) Processor Performance

22. 22 CPI � Clocks per Instruction

23. 23 Details of CPI

24. 24 Processor Performance Eqn

25. 25 Processor Performance Eqn How can we improve performance?

26. 26 Example 1

27. 27 Example 1 (Solution)

28. 28 Example 2

29. 29 Example 2 (Solution)

30. 30 Performance of (Blocking) Caches

31. 31 Example

32. 32 Fallacies and Pitfalls Fallacies - commonly held misconceptions When discussing a fallacy, we try to give a counterexample. Pitfalls - easily made mistakes Often generalizations of principles true in limited context We show Fallacies and Pitfalls to help you avoid these errors

33. 33 Fallacies and Pitfalls (1/3) Fallacy: Benchmarks remain valid indefinitely Once a benchmark becomes popular, tremendous pressure to improve performance by targeted optimizations or by aggressive interpretation of the rules for running the benchmark: �benchmarksmanship.� 70 benchmarks from the 5 SPEC releases. 70% were dropped from the next release since no longer useful Pitfall: A single point of failure Rule of thumb for fault tolerant systems: make sure that every component was redundant so that no single component failure could bring down the whole system (e.g, power supply)

34. 34 Fallacies and Pitfalls (2/3) Fallacy - Rated MTTF of disks is 1,200,000 hours or ? 140 years, so disks practically never fail Disk lifetime is ~5 years ? replace a disk every 5 years; on average, 28 replacement cycles wouldn't fail (140 years long time!) Is that meaningful? Better unit: % that fail in 5 years Next slide

35. 35 Fallacies and Pitfalls (3/3) So 3.7% will fail over 5 years But this is under pristine conditions little vibration, narrow temperature range ? no power failures Real world: 3% to 6% of SCSI drives fail per year 3400 - 6800 FIT or 150,000 - 300,000 hour MTTF [Gray & van Ingen 05] 3% to 7% of ATA drives fail per year 3400 - 8000 FIT or 125,000 - 300,000 hour MTTF [Gray & van Ingen 05]

36. 36 Next Time Instruction Set Architecture Appendix B

37. 37 References G. M. Amdahl, �Validity of the single processor approach to achieving large scale computing capabilities�, AFIPS Conference Proceedings, pp. 483-485, April 1967 http://www-inst.eecs.berkeley.edu/~n252/paper/Amdahl.pdf

Lecture 3

Lecture 3

Presentation Transcript

Lecture 3

Lecture 3

LECTURE №3

Lecture #3

Lecture 3-3

Lecture 3

Lecture 3:

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture # 3

Lecture 3