Loading in 2 Seconds...

Lec 3 Sept 2 complete Chapter 1 exercises from Chapter 1 quiz # 1 Chapter 2 st

Loading in 2 Seconds...

- By
**mareo** - Follow User

- 574 Views
- Uploaded on

Download Presentation
## Lec 3 Sept 2 complete Chapter 1 exercises from Chapter 1 quiz # 1 Chapter 2 st

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

- complete Chapter 1
- exercises from Chapter 1
- quiz # 1
- Chapter 2 start

Performance Summary

- Performance depends on
- Algorithm: affects IC, possibly CPI
- Programming language: affects IC, CPI
- Compiler: affects IC, CPI
- Instruction set architecture: affects IC, CPI, Tc

The BIG Picture

For a color display using 8 bits for each primary color (R, G, B) per pixel and with a resolution of 1280 x 800 pixels, what should be the size (in bytes) of the frame buffer to store a frame?

Each frame requires 1280 x 800 x 3 = 3072000 ~ 3 Mbytes

If a computer has 3 GB memory to store such frames, how many frames can be stored?

3 x 109 / 3 x 106 ~ 1000 frames

Consider 3 processors P1, P2 and P3 with same instruction set with clock rates and CPI given below:

clock rate CPI

P1 2 GHz 1.5

P2 1.5 GHz 1.0

P3 3 GHz 2.5

Consider 3 processors P1, P2 and P3 with same instruction set with clock rates and CPI given below:

clock rate CPI

P1 2 GHz 1.5

P2 1.5 GHz 1.0

P3 3 GHz 2.5

1.3.1. Which processor has the highest performance?

Suppose the program has N instructions.

Time taken to execute on P1 is = 1.5 N / (2 x 109) = 0.75 N x 10-9

Time taken to execute on P2 is = N/ (1.5 x 109) = 0.66 N x 10-9

Time taken to execute on P3 is = 2.5 N/ (3 x 109) = 0.83 N x 10-9

Time taken to execute on P1 is = 1.5 N / (2 x 109) = 0.75 N x 10-9

Time taken to execute on P2 is = N/ (1.5 x 109) = 0.66 N x 10-9

Time taken to execute on P3 is = 2.5 N/ (3 x 109) = 0.83 N x 10-9

P2 has the best performance (since it takes the least time to execute).

Consider 3 processors P1, P2 and P3 with same instruction set with clock rates and CPI given below:

clock rate CPI

P1 2 GHz 1.5

P2 1.5 GHz 1.0

P3 3 GHz 2.5

1.3.2. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

Consider 3 processors P1, P2 and P3 with same instruction set with clock rates and CPI given below:

clock rate CPI

P1 2 GHz 1.5

P2 1.5 GHz 1.0

P3 3 GHz 2.5

1.3.2. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.

Time taken to execute on P1 is = 1.5 N / (2 x 109) = 0.75 N1 x 10-9

= 10

So N1 = 1.33 x 1010

Given below are the number of instructions of a program:

arith store load branch total

500 50 100 50 700

Assuming the instructions take 1, 5, 5 and 2 cycles, what is the execution time in a 2 GHz processor?

Given below are the number of instructions of a program:

arith store load branch total

500 50 100 50 700

Assuming the instructions take 1, 5, 5 and 2 cycles, what is the execution time in a 2 GHz processor?

Solution: time to execute = cycle time x CPI x no. of inst

Cycle time = 1/(2 x 10-9)

CPI = (500/700 + 50 x 5/700 + 100 x 5/700 + 50 x 2/700)

So the total time = 675 x 10-9 sec

- Compilers have a profound impact on the performance of an application on a given processor. This problem will explore the impact compilers have on execution time:.
- compiler A compiler B
- no instructions exec. Time no. instructions exec. Time
- 1.0 x 109 1 s 1.2 x 109 1.4 s
- (b) 1.4 x 109 0.8 s 1.2 x 109 0.7 s

Find the average CPI for each program given that the processor has a cycle time of 1 ns.

- Compilers have a profound impact on the performance of an application on a given processor. This problem will explore the impact compilers have on execution time:.
- compiler A compiler B
- no instructions exec. Time no. instructions exec. Time
- 1.0 x 109 1 s 1.2 x 109 1.4 s
- (b) 1.4 x 109 0.8 s 1.2 x 109 0.7 s

Find the average CPI for each program given that the processor has a cycle time of 1 ns.

Exec. Time = CPI x cycle time x no. of inst

(a) Compiler A: CPI = 1/ (10-9 x 109 ) = 1

Reducing Power

- Suppose a new CPU has
- 85% of capacitive load of old CPU
- 15% voltage and 15% frequency reduction

- The power wall
- We can’t reduce voltage further
- We can’t remove more heat
- How else can we improve performance?

Exercise 1.7

1.7.4. Given the following information about each processor, calculate its capacitive load:

Processor 80286: clock rate = 12.5 MHz

power = 3.3 W

voltage = 5 V

Solution: Use the equation

power = capacitive load x voltage2 x clock rate

Capacitive load = 3.3 / (5 x 5 x 12.5) x 10-6 = 0.01056 x 10-6

Uniprocessor Performance

§1.6 The Sea Change: The Switch to Multiprocessors

Constrained by power, instruction-level parallelism, memory latency

General-purpose uni-cores have reached limits of historic performance scaling

Power consumption

Wire delays

DRAM access latency

Diminishing returns of more instruction-level parallelism

Slide from Prof. Saman Amarasinghe

Multiprocessors

- Multicore microprocessors
- More than one processor per chip
- Requires explicitly parallel programming
- Compare with instruction level parallelism
- Hardware executes multiple instructions at once
- Hidden from the programmer
- Hard to do
- Programming for performance
- Load balancing
- Optimizing communication and synchronization

Integrated Circuit Cost

- Nonlinear relation to area and defect rate
- Wafer cost and area are fixed
- Defect rate determined by manufacturing process
- Die area determined by architecture and circuit design

SPEC CPU Benchmark

- Programs used to measure performance
- Supposedly typical of actual workload
- Standard Performance Evaluation Corp (SPEC)
- Develops benchmarks for CPU, I/O, Web, …
- SPEC CPU2006
- Elapsed time to execute a selection of programs
- Negligible I/O, so focuses on CPU performance
- Normalize relative to reference machine
- Summarize as geometric mean of performance ratios
- CINT2006 (integer) and CFP2006 (floating-point)

CINT2006 for Opteron X4 2356

High cache miss rates

Amdahl’s Law

s =

min(p, 1/f)

1

f+(1–f)/p

f = fraction

unaffected

p = speedup

of the rest

Amdahl’s law: speedup achieved if a fraction f of a task is unaffected and the remaining 1 – f part runs p times as fast.

Amdahl’s Law in design

Example

- A processor spends 30% of its time on flp addition, 25% on flp mult,
- and 10% on flp division. Evaluate the following enhancements, each
- costing the same to implement:
- Redesign of the flp adder to make it twice as fast.
- Redesign of the flp multiplier to make it three times as fast.
- Redesign the flp divider to make it 10 times as fast.

Amdahl’s Law in design

Example

- A processor spends 30% of its time on flp addition, 25% on flp mult,
- and 10% on flp division. Evaluate the following enhancements, each
- costing the same to implement:
- Redesign of the flp adder to make it twice as fast.
- Redesign of the flp multiplier to make it three times as fast.
- Redesign the flp divider to make it 10 times as fast.
- Solution
- Adder redesign speedup = 1 / [0.7 + 0.3 / 2] = 1.18
- Multiplier redesign speedup = 1 / [0.75 + 0.25 / 3] = 1.20
- Divider redesign speedup = 1 / [0.9 + 0.1 / 10] = 1.10
- What if both the adder and the multiplier are redesigned?

Amdahl’s Law – limit to improvement

- Improving an aspect of a computer and expecting a proportional improvement in overall performance

§1.8 Fallacies and Pitfalls

- Example: multiply accounts for 80s/100s
- How much improvement in multiply performance to get 5× overall?

- Can’t be done!

- Corollary: make the common case fast

Pitfall: MIPS as a Performance Metric

- MIPS: Millions of Instructions Per Second
- Doesn’t account for
- Differences in ISAs between computers
- Differences in complexity between instructions

- CPI varies between programs on a given CPU

Concluding Remarks

- Cost/performance is improving
- Due to underlying technology development
- Hierarchical layers of abstraction
- In both hardware and software
- Instruction set architecture
- The hardware/software interface
- Execution time: the best performance measure
- Power is a limiting factor
- Use parallelism to improve performance

§1.9 Concluding Remarks

Download Presentation

Connecting to Server..