1 / 61

# Summarizing Performance Data Confidence Intervals - PowerPoint PPT Presentation

Summarizing Performance Data Confidence Intervals. Important Easy to Difficult Warning: some mathematical content. Contents. Summarized data Confidence Intervals Independence Assumption Prediction Intervals Which Summarization to Use ? . 1 Summarizing Performance Data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Summarizing Performance Data Confidence Intervals' - jemma

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Summarizing Performance DataConfidence Intervals

Important

Easy to Difficult

Warning: some mathematical content

1

• Summarized data

• Confidence Intervals

• Independence Assumption

• Prediction Intervals

• Which Summarization to Use ?

2

• How do you quantify:

• Central value

• Dispersion (Variability)

old

new

3

new

old

5

• Median, Quantiles

• Median

• Quartiles

• P-quantiles

• Mean and standard deviation

• Mean

• Standard deviation

• What is the interpretation of standard deviation ?

• A: if data is normally distributed, with 95% probability, a new data sample lies in the interval

6

• Scale free

• Summarizes variability

• Second order

• For a data set with n samples

• Exponential distribution: CoV =1

7

• Alternative to CoV

• For a data set with n samples

• Scale free, index of unfairness

8

• Quantifies fairness of x;

• Ranges from

• 1: all xi equal

• 1/n: maximum unfairness

• Fairness and variability are two sides of the same coin

9

Perfect equality (fairness)

• Old code, new code: is JFI larger ? Gap ?

• Gini’s index is also used; Def: 2 x area between diagonal and Lorenz curve

• More or less equivalent to Lorenz curve gap

Lorenz Curve gap

10

• There are (too) many synthetic indices to choose from

• Traditional measures in engineering are standard deviation, mean and CoV

• In Computer Science, JFI and mean

• JFI is equivalent to CoV

• In economy, gap and Gini’s index (a variant of Lorenz curve gap)

• Statisticians like medians and quantiles (robust to statistical assumptions)

• We will come back to the issue after discussing confidence intervals

12

• Do not confuse with prediction interval

• Quantifies uncertainty about an estimation

13

quantiles

14

• Mean reduction = 0 is outside the confidence intervals for mean and for median

• Confidence interval for median

15

• This is simple if we can assume that the data comes from an iid modelIndependent Identically Distributed

16

• Is the simplest of all

• Robust: always true provided iid assumption holds

17

• n = 31

• n = 32

19

• This is another method, most commonly used method…

• But requires some assumptions to hold, may be misleading if they do not hold

• There is no exact theorem as for median and quantiles, but there are asymptotic results and a heuristic.

20

• If central limit theorem holds(in practice: n is large and distribution is not “wild”)

21

• n =100 ; 95% confidence levelCI for mean: m ±1.96 s / √n

• amplitude of CI decreases incompare to prediction interval

22

• Assume data comes from an iid + normal distributionUseful for very small data samples (n <30)

23

• n =100 ; 95% confidence levelCI for mean:CI for standard deviation:

• same as before except s instead of  1.96 for all n instead of 1.98 for n=100

• In practice both (normal case and large n asymptotic) are the same if n > 30

• But large n asymptotic does not require normal assumption

24

• A heuristic that is robust (requires only iid assumption)

• But be careful with heavy tail, see next

• but tends to underestimate CI

• Simple to implement with a computer

• Idea: use the empirical distribution in place of the theoretical (unknown) distribution

• For example, with confidence level = 95%:

• the data set is S=

• Do r=1 to r=999

• (replay experiment) Draw n bootstrap replicates with replacement from S

• Compute sample mean Tr

• Bootstrap percentile estimate is (T(25), T(975))

27

No

Methods 2.3.1 and 2.3.2 give same result (n >30)

Method 2.3.3 (Bootstrap) gives same result

=> Asymptotic assumption valid

Example: Compiler Options

28

• Use bootstrap if data is iid

29

Q1.1

• If data is guaranteed to be come from an iid normal distribution, can you think of a better way to compute a confidence interval for Jain’s fairness index ?

31

• Problem statement: want to estimate proba of failure; observe n outcomes; no failure; confidence interval ?

• This a confidence interval for the mean ? (explain why)

• The general theory does not give good results when mean is very small

32

• Confidence interval for median (or other quantiles) is easy to get from the Binomial distribution

• Requires iid

• No other assumption

• Confidence interval for the mean

• Requires iid

• And

• Either if data sample is normal and n is small

• Or data sample is not wild and n is large enough

• The boostrap is more robust and more general but is more than a simple formula to apply

• To we need to verifythe assumptions

36

• Confidence Intervals require that we can assume that the data comes from an iid modelIndependent Identically Distributed

• How do I know if this is true ?

• Controlled experiments: draw factors randomly with replacement

• Simulation: independent replications (with random seeds)

• Else: we do not know – in some cases we will have methods for time series

37

• Pretend data is iid: CI for mean is [69; 69.8]

• Is this biased ?

data

ACF

39

• If data is positively correlated

• Neighbouring values look similar

• Frequent in measurements

• CI is underestimated: there is less information in the data than one thinks

40

• CI for mean or median summarize

• Central value + uncertainty about it

• Prediction interval summarizes variability of data

41

• Assume data comes from an iid model

• Simplest and most robust result (not well known, though):

42

• For n=39, [xmin, xmax] is a prediction interval at level 95%

• For n <39 there is no prediction interval at level 95% with this method

• But there is one at level 90% for n > 18

• For n = 10 we have a prediction interval [xmin, xmax] at level 81%

43

Prediction Interval for Small n and Normal Data Sample

44

• Many results are simple if the data is normal, or close to it (i.e. not wild). An important question to ask is: can I change the scale of my data to have it look more normal.

• Ex: log of the data instead of the data

• A generic transformation used in statistics is the Box-Cox transformation:

• Continuous in ss=0 : logs=-1: 1/xs=1: identity

45

mean and

standard deviation

order statistic

mean and

standard deviation

on rescaled data

46

• Two issues

• Robustness to outliers

• Compactness

47

• Normal Qqplot

• X-axis: standard normal quantiles

• Y-axis: Ordered statistic of sample:

• If data comes from a normal distribution, qqplot is close to a straight line (except for end points)

• Visual inspection is often enough

• If not possible or doubtful, we will use tests later

48

• The interpretation of as measure of variability is meaningful if the data is normal (or close to normal). Else, it is misleading. The data should be best re-scaled.

50

• Issues

• Robustness to outliers

• Distribution assumptions

51

CI based on std dv

True mean

CI based on bootsrp

True median

CI for median

True mean

True median

52

Based on

mean + std dev

mean + std dev

Order stat

Based on

mean + std dev

+ re-scaling

CI for median

geom mean

Outlier removed

Outlier present

54

• Confidence Intervals obtained by Bootstrap

• How ?

• JFI is very dependent on one outlier

• As expected, since JFI is essentially CoV, i.e. standard deviation

• Gap is sensitive, but less

• Does not use squaring ; why ?

55

• If normal assumption (or, for CI; asymptotic regime) holds,  and  are more compact

• two values give both: CIs at all levels, prediction intervals

• Derived indices: CoV, JFI

• In contrast, CIs for median does not give information on variability

• Prediction interval based on order statistic is robust (and, IMHO, best)

56

• Use methods that you understand

• Mean and standard deviation make sense when data sets are not wild

• Close to normal, or not heavy tailed and large data sample

• Use quantiles and order statistics if you have the choice

• Rescale

57

Q2.2

Q2.3

We have tested a system for errors and found 0 in 37 runs. Give a confidence interval for the probability of error. Same question if we find 1 error in 37 runs.

• We are given the results of performance measurements for two protocols in the form of series x_1,…,x_m , y_1,..,y_n. We are given the means and sample standard deviations of the two series. How can I decide which protocol is better ?

61