coe 308 n.
Download
Skip this Video
Download Presentation
COE 308

Loading in 2 Seconds...

play fullscreen
1 / 37

COE 308 - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

COE 308. Term - 051 Dr Abdelhafid Bouhraoua Performance. Need for Performance. Goal: To Have Some Predictability Over Computer Usage. Need for Performance. Goal: To Have Some Predictability Over Computer Usage. Consequence: To Be Able To Adequately Choose The Right Computer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'COE 308' - hedya


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
coe 308

COE 308

Term - 051

Dr Abdelhafid Bouhraoua

Performance

need for performance
Need for Performance

Goal:

To Have Some Predictability Over

Computer Usage

need for performance1
Need for Performance

Goal:

To Have Some Predictability Over

Computer Usage

Consequence:

To Be Able To Adequately

Choose The Right Computer

For A Given Application

examples where performance is needed
Examples where Performance is needed
  • High Accessibility
  • Data-Base Server
  • Web Server
  • Banking System
  • High Speed
  • Astronomy
  • Genetic Research
  • Weather Prediction
  • Low Cost
  • POS Terminal
  • Portable Device
  • Cell Phone
  • Embedded Apps
  • (Appliances, Toys, …)
defining performance
Defining Performance
  • Speed ?
  • Accessibility ?
  • Cost ?
defining performance1
Defining Performance
  • Speed ?
  • Accessibility ?
  • Cost ?

Only Speed Is Considered in This Context

what speed
What Speed ?

Which Plane Has Higher Performance ?

what speed1
What Speed ?

Which Plane Has Higher Performance ?

  • Time to do the task (Execution Time)

– execution time, response time, latency

  • Tasks per day, hour, week, sec, ns. .. (Performance)

– throughput, bandwidth

Response time and throughput often are in opposition

definitions
Definitions
  • Performance is in units of things-per-second
    • bigger is better
  • If we are primarily concerned with response time:

Performance(x) = 1/Execution_time(x)

" X is n times faster than Y" means:

Performance(X)

n = -----------------------------------------

Performance(Y)

throughput and response time
Throughput and Response Time
  • Time of Concorde vs. Boeing 747?
    • Concord is 1350 mph / 610 mph = 2.2 times faster

= 6.5 hours / 3 hours

  • Throughput of Concorde vs. Boeing 747 ?
    • Concord is 178,200 pmph / 286,700 pmph = 0.62 “times faster”
    • Boeing is 286,700 pmph / 178,200 pmph = 1.6 “times faster”
  • Boeing is 1.6 times (“60%”)faster in terms of throughput
  • Concord is 2.2 times (“120%”) faster in terms of flying time

We will focus primarily on execution time for a single job

relative performance
Relative Performance

Computer A is n Times Faster

Than Computer B if:

relative performance1
Relative Performance

Computer A is n Times Faster

Than Computer B if:

Performance A

----------------------------------------- = n

Performance B

relative performance2
Relative Performance

Computer A is n Times Faster

Than Computer B if:

Performance A

----------------------------------------- = n

Performance B

Or

Execution Time B

------------------------------------------ = n

Execution Time A

metrics and their relation
Metrics and their Relation

Most Basic Metrics: Clock Cycles, Clock Cycle Time, CPU Time, # of Instructions per program

CPU Time = CPU Clk Cycles/Program * Clk Cycle Time

CPU Clk Cycles/Program

CPU Time = -----------------------------------------------------------------------------------

Clock Rate (Frequency)

CPU Cycles/Program = Instr./Program x Average Cycles/Inst.

cpi cycles per instruction

CPI =

CPI (Cycles Per Instruction)

Average Cycles Per Instruction

CPI = (CPU Time /Clock Cycle Time) / Instruction Count

= Clock Cycles / Instruction Count

n: number of instructions in the Instruction Set

CPIi: number of clock cycles Instruction i takes to execute

Ii: Count of instructions of type i in the program

CPU time = Clock Cycle Time *

CPI = Clock Cycles / Instruction Count

Divide CPU time by Clock Cycle Time and Instruction Count to get the CPI

Fi: Frequency of Instructions

Fi = Ii /Instruction Count

cpi cycles per instruction1

CPI =

CPI (Cycles Per Instruction)

Average Cycles Per Instruction

CPI = (CPU Time /Clock Cycle Time) / Instruction Count

= Clock Cycles / Instruction Count

n: number of instructions in the Instruction Set

CPIi: number of clock cycles Instruction i takes to execute

Ii: Count of instructions of type i in the program

CPU time = Clock Cycle Time *

CPI = Clock Cycles / Instruction Count

Divide CPU time by Clock Cycle Time and Instruction Count to get the CPI

Fi: Frequency of Instructions

Fi = Ii /Instruction Count

Invest Resource Where Time Is Spent

metrics and their relation revisited
Metrics and their Relation- Revisited -

Seconds

CPU TIME = -------------------------

Program

Instructions Cycles Seconds

CPU TIME = ----------------------------------- X -------------------------------- X -------------------------

Program Instruction Cycle

Implementation/

Compiler Optimization

Dependant

CPI - Variable

Clock Cycle – Fixed

example
Example
  • Example (RISC processor)
  • Typical Mix
  • Base Machine (Reg / Reg)
          • Op Freq CPI(i) CPI(i) x Freq
          • ALU 50% 1 .5
          • Load 20% 5 1.0
          • Store 10% 3 .3
          • Branch 20% 2 .4
  • How much faster would the machine be if a better data cache
  • reduced the average load time to 2 cycles?
  • How does this compare with using branch prediction to shave a
  • cycle off the branch time?
  • What if two ALU instructions could be executed at once?
answering 1
Answering 1.
  • Computing the CPI Before Improvement:
          • Op Freq CPI(i) CPI(I) x Freq
          • ALU 50% 1 .5
          • Load 20% 5 1.0
          • Store 10% 3 .3
          • Branch 20% 2 .4
          • -----------
  • CPI1 = .5x1 + .2x5 + .1%x3 +.2x2 = 2.2
  • Computing the CPI After Improvement:
          • Op Freq CPI(i) CPI(i) x FreQ
          • ALU 50% 1 .5
          • Load 20% 2 .4
          • Store 10% 3 .3
          • Branch 20% 2 .4
          • -----------
  • CPI2 = .5x1 + .2x2 + .1%x3 +.2x2 = 1.6
answering 1 cont
Answering 1. (cont.)

How much faster would the machine be if a better data cache

reduced the average load time to 2 cycles?

Answer:

It is n times faster with:

answering 1 cont1
Answering 1. (cont.)

How much faster would the machine be if a better data cache

reduced the average load time to 2 cycles?

Answer:

It is n times faster with:

n = CPU Time Before Imp. / CPU Time After Imp.

= Clock Cycle Time * CPI1 * Inst. Count /

Clock Cycle Time * CPI2 * Inst. Count

= CPI1 / CPI2 = 2.2 / 1.6 = 1.375

answering 1 cont2
Answering 1. (cont.)
  • How much faster would the machine be if a better data cache
  • reduced the average load time to 2 cycles?
  • Answer:
  • It is n times faster with:
  • n = CPU Time Before Imp. / CPU Time After Imp.
  • = Clock Cycle Time * CPI1 * Inst. Count /
  • Clock Cycle Time * CPI2 * Inst. Count
  • = CPI1 / CPI2 = 2.2 / 1.6 = 1.375
  • We Say:
  • CPU is 1.375 times faster, or
  • CPU is 37.50% faster
answering 2
Answering 2.

How does this compare with using branch prediction to shave a

cycle off the branch time?

Answer:

“Shaving” a cycle off the branch time means CPI of branch

is reduced by one cycle

  • Computing the CPI After Improvement:
          • Op Freq CPI(I) CPI(i) x Freq
          • ALU 50% 1 .5
          • Load 20% 5 1.0
          • Store 10% 3 .3
          • Branch 20% 1 .2
          • -----------
  • CPI2 = .5x1 + .2x5 + .1%x3 +.2x1= 2.0

Reducing the Load time produces better performances than

reducing the branch time

answering 3
Answering 3.

What if two ALU instructions could be executed at once?

Answer:

Two instructions executed at once means:

For one instruction, it takes virtually half the time to execute

on machine B. So,

CPI(i)B = CPI(i)A/2

  • Computing the CPI of Machine B
          • Op Freq CPI(i) CPI(I) x Freq
          • ALU 50% .5 .25
          • Load 20% 5 1.0
          • Store 10% 3 .3
          • Branch 20% 2 .4
          • -----------
  • CPI1 = .5x1 + .2x5 + .1%x3 +.2x2 = 1.95
time evaluation
Time % Evaluation

How to determine which class of instructions takes the

highest time ?

  • Evaluate Time Percentages of Instructions
  • Cannot be Directly Measured (Program has Mixed Instructions)
  • Need to be Computed Using CPI and Frequency
time evaluation1
Time % Evaluation
  • Given:
  • Ic: Instruction Count
  • Ii: Instruction Count for Instruction Class i
  • Fi: Frequency of Instructions of Class i
  • Tc: Clock Cycle Time
  • CPIi: Clock Cycles/Instruction for Class i
  • CPI: Average Clock Cycles / Instruction for the whole program
  • Pi: Percentage of time for instruction of Class i

CPUtime = CPI x Ic x Tc

CPUtimei= CPIi x Ii x Tc

Ii = Ic x Fi

CPUtimei= CPIi x Ic x Fi x Tc

Pi = CPUtimei / CPUtime

Pi = CPIi x Ic x Fi x Tc / (CPI x Ic x Tc)

CPIi x Fi

CPI

Pi =

amdahl s law
Amdahl’s Law

Speed-up due to Enhancement E

amdahl s law1
Amdahl’s Law

Speed-up due to Enhancement E

Execution Time w/o E Performance w/ E

Speedup = --------------------------------- = -----------------------------

Execution Time w/ E Performance w/o E

amdahl s law2
Amdahl’s Law

Speed-up due to Enhancement E

Execution Time w/o E Performance w/ E

Speedup = --------------------------------- = -----------------------------

Execution Time w/ E Performance w/o E

Suppose that Enhancement E accelerate a portion F Only

by a factor S

TFE

TA

TFA

TE

amdahl s law3
Amdahl’s Law

New Enhancement touched only a fraction F of the whole execution time TA and reduced this fraction by a factor S while keeping the remainder part of TA unchanged

TE = TA – TFA + TFE TA – TFA is unchanged

TFA = TA * F F is a fraction of TA

TFE = TFA/S = TA * F/S Time is reduced by a factor S

TE = TA – TA*F + TA * F/S

Means:

amdahl s law4
Amdahl’s Law

New Enhancement touched only a fraction F of the whole execution time TA and reduced this fraction by a factor S while keeping the remainder part of TA unchanged

TE = TA – TFA + TFE TA – TFA is unchanged

TFA = TA * F F is a fraction of TA

TFE = TFA/S = TA * F/S Time is reduced by a factor S

TE = TA – TA*F + TA * F/S

Means:

1

------------------

(1-F + (F/S))

Speedup =

TE = TA * (1 – F + (F/S))

benchmarks
Benchmarks
  • Few users run same program over and over
  • Need Programs specially developed to compare performance
  • Best Reference: Real Application
  • Real Application NOT common to all users

Benchmarks are Programs developed for the sole purpose of Performance Evaluation

spec95
SPEC95
  • Eighteen application benchmarks (with inputs) reflecting a technical computing workload
  • Eight integer
    • go, m88ksim, gcc, compress, li, ijpeg, perl, vortex
  • Ten floating-point intensive
    • tomcatv, swim, su2cor, hydro2d, mgrid, applu, turb3d, apsi, fppp, wave5
  • Must run with standard compiler flags
    • eliminate special undocumented incantations that may not even generate working code for real programs
fallacies and pitfalls
Fallacies and Pitfalls
  • Amdahl’s law sets limits only and is NOT unlimited
    • Improvement of one aspect cannot improve the overall performance by a factor proportional to the size of the improvement
  • Hardware-independent metrics DO NOT predict performance
    • Code size, Impl. of software systems
  • Using MIPS (Millions of Inst. Per Second) as a performance metric
    • Instructions have different CPI
    • MIPS metric vary from one program to the other on the SAME CPU.