Summarizing performance data confidence intervals
1 / 61

Summarizing Performance Data Confidence Intervals - PowerPoint PPT Presentation

  • Uploaded on

Summarizing Performance Data Confidence Intervals. Important Easy to Difficult Warning: some mathematical content. Contents. Summarized data Confidence Intervals Independence Assumption Prediction Intervals Which Summarization to Use ? . 1 Summarizing Performance Data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Summarizing Performance Data Confidence Intervals' - jemma

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Summarizing performance data confidence intervals

Summarizing Performance DataConfidence Intervals


Easy to Difficult

Warning: some mathematical content



  • Summarized data

  • Confidence Intervals

  • Independence Assumption

  • Prediction Intervals

  • Which Summarization to Use ?


1 summarizing performance data
1 Summarizing Performance Data

  • How do you quantify:

    • Central value

    • Dispersion (Variability)




Summarized measures
Summarized Measures

  • Median, Quantiles

    • Median

    • Quartiles

    • P-quantiles

  • Mean and standard deviation

    • Mean

    • Standard deviation

    • What is the interpretation of standard deviation ?

    • A: if data is normally distributed, with 95% probability, a new data sample lies in the interval


Coefficient of variation
Coefficient of Variation

  • Scale free

  • Summarizes variability

  • Second order

  • For a data set with n samples

  • Exponential distribution: CoV =1


Lorenz curve gap
Lorenz Curve Gap

  • Alternative to CoV

  • For a data set with n samples

  • Scale free, index of unfairness


Jain s fairness index
Jain’s Fairness Index

  • Quantifies fairness of x;

  • Ranges from

    • 1: all xi equal

    • 1/n: maximum unfairness

  • Fairness and variability are two sides of the same coin


Lorenz curve
Lorenz Curve

Perfect equality (fairness)

  • Old code, new code: is JFI larger ? Gap ?

  • Gini’s index is also used; Def: 2 x area between diagonal and Lorenz curve

    • More or less equivalent to Lorenz curve gap

Lorenz Curve gap


Which summarization should one use
Which Summarization Should One Use ?

  • There are (too) many synthetic indices to choose from

    • Traditional measures in engineering are standard deviation, mean and CoV

    • In Computer Science, JFI and mean

      • JFI is equivalent to CoV

    • In economy, gap and Gini’s index (a variant of Lorenz curve gap)

    • Statisticians like medians and quantiles (robust to statistical assumptions)

  • We will come back to the issue after discussing confidence intervals


2 confidence interval
2. Confidence Interval

  • Do not confuse with prediction interval

  • Quantifies uncertainty about an estimation


Confidence intervals for mean of difference
Confidence Intervals for Mean of Difference

  • Mean reduction = 0 is outside the confidence intervals for mean and for median

  • Confidence interval for median


Computing confidence intervals
Computing Confidence Intervals

  • This is simple if we can assume that the data comes from an iid modelIndependent Identically Distributed


Ci for median
CI for median

  • Is the simplest of all

  • Robust: always true provided iid assumption holds


Ci for mean and standard deviation
CI for mean and Standard Deviation

  • This is another method, most commonly used method…

  • But requires some assumptions to hold, may be misleading if they do not hold

  • There is no exact theorem as for median and quantiles, but there are asymptotic results and a heuristic.


Ci for mean asymptotic case
CI for mean, asymptotic case

  • If central limit theorem holds(in practice: n is large and distribution is not “wild”)



  • n =100 ; 95% confidence levelCI for mean: m ±1.96 s / √n

  • amplitude of CI decreases incompare to prediction interval


Normal case
Normal Case

  • Assume data comes from an iid + normal distributionUseful for very small data samples (n <30)



  • n =100 ; 95% confidence levelCI for mean:CI for standard deviation:

  • same as before except s instead of  1.96 for all n instead of 1.98 for n=100

  • In practice both (normal case and large n asymptotic) are the same if n > 30

  • But large n asymptotic does not require normal assumption


Bootstrap percentile method
Bootstrap Percentile Method

  • A heuristic that is robust (requires only iid assumption)

    • But be careful with heavy tail, see next

  • but tends to underestimate CI

  • Simple to implement with a computer

  • Idea: use the empirical distribution in place of the theoretical (unknown) distribution

  • For example, with confidence level = 95%:

    • the data set is S=

    • Do r=1 to r=999

      • (replay experiment) Draw n bootstrap replicates with replacement from S

      • Compute sample mean Tr

    • Bootstrap percentile estimate is (T(25), T(975))


Example compiler options

Does data look normal ?


Methods 2.3.1 and 2.3.2 give same result (n >30)

Method 2.3.3 (Bootstrap) gives same result

=> Asymptotic assumption valid

Example: Compiler Options


Confidence interval for fairness index
Confidence Interval for Fairness Index

  • Use bootstrap if data is iid




  • If data is guaranteed to be come from an iid normal distribution, can you think of a better way to compute a confidence interval for Jain’s fairness index ?


Confidence interval for success probability
Confidence Interval for Success Probability

  • Problem statement: want to estimate proba of failure; observe n outcomes; no failure; confidence interval ?

  • This a confidence interval for the mean ? (explain why)

  • The general theory does not give good results when mean is very small


Take home message
Take Home Message

  • Confidence interval for median (or other quantiles) is easy to get from the Binomial distribution

    • Requires iid

    • No other assumption

  • Confidence interval for the mean

    • Requires iid

    • And

      • Either if data sample is normal and n is small

      • Or data sample is not wild and n is large enough

  • The boostrap is more robust and more general but is more than a simple formula to apply

  • To we need to verifythe assumptions


3 the independence assumption
3. The Independence Assumption

  • Confidence Intervals require that we can assume that the data comes from an iid modelIndependent Identically Distributed

  • How do I know if this is true ?

    • Controlled experiments: draw factors randomly with replacement

    • Simulation: independent replications (with random seeds)

    • Else: we do not know – in some cases we will have methods for time series



  • Pretend data is iid: CI for mean is [69; 69.8]

  • Is this biased ?




What happens if data is not iid
What happens if data is not iid ?

  • If data is positively correlated

    • Neighbouring values look similar

    • Frequent in measurements

    • CI is underestimated: there is less information in the data than one thinks


4 prediction interval
4. Prediction Interval

  • CI for mean or median summarize

    • Central value + uncertainty about it

  • Prediction interval summarizes variability of data


Prediction interval based on order statistic
Prediction Interval based on Order Statistic

  • Assume data comes from an iid model

  • Simplest and most robust result (not well known, though):


Prediction interval for small n
Prediction Interval for small n

  • For n=39, [xmin, xmax] is a prediction interval at level 95%

  • For n <39 there is no prediction interval at level 95% with this method

    • But there is one at level 90% for n > 18

    • For n = 10 we have a prediction interval [xmin, xmax] at level 81%


Prediction interval for small n and normal data sample
Prediction Interval for Small n and Normal Data Sample


Re scaling

  • Many results are simple if the data is normal, or close to it (i.e. not wild). An important question to ask is: can I change the scale of my data to have it look more normal.

    • Ex: log of the data instead of the data

  • A generic transformation used in statistics is the Box-Cox transformation:

    • Continuous in ss=0 : logs=-1: 1/xs=1: identity


Prediction intervals for file transfer times
Prediction Intervals for File Transfer Times

mean and

standard deviation

order statistic

mean and

standard deviation

on rescaled data


Which summarization should i use
Which Summarization Should I Use ?

  • Two issues

    • Robustness to outliers

    • Compactness


Qqplot is common tool for verifying assumption
QQplot is common tool for verifying assumption

  • Normal Qqplot

    • X-axis: standard normal quantiles

    • Y-axis: Ordered statistic of sample:

  • If data comes from a normal distribution, qqplot is close to a straight line (except for end points)

    • Visual inspection is often enough

    • If not possible or doubtful, we will use tests later


Take home message1
Take Home Message

  • The interpretation of as measure of variability is meaningful if the data is normal (or close to normal). Else, it is misleading. The data should be best re-scaled.


5 which summarization to use
5. Which Summarization to Use ?

  • Issues

    • Robustness to outliers

    • Distribution assumptions


A distribution with infinite variance
A Distribution with Infinite Variance

CI based on std dv

True mean

CI based on bootsrp

True median

CI for median

True mean

True median


Robustness of conf prediction intervals
Robustness of Conf/Prediction Intervals

Based on

mean + std dev

mean + std dev

Order stat

Based on

mean + std dev

+ re-scaling

CI for median

geom mean

Outlier removed

Outlier present


Fairness indices
Fairness Indices

  • Confidence Intervals obtained by Bootstrap

    • How ?

  • JFI is very dependent on one outlier

    • As expected, since JFI is essentially CoV, i.e. standard deviation

  • Gap is sensitive, but less

    • Does not use squaring ; why ?



  • If normal assumption (or, for CI; asymptotic regime) holds,  and  are more compact

    • two values give both: CIs at all levels, prediction intervals

    • Derived indices: CoV, JFI

  • In contrast, CIs for median does not give information on variability

  • Prediction interval based on order statistic is robust (and, IMHO, best)


Take home message2
Take-Home Message

  • Use methods that you understand

  • Mean and standard deviation make sense when data sets are not wild

    • Close to normal, or not heavy tailed and large data sample

  • Use quantiles and order statistics if you have the choice

  • Rescale


More questions
More Questions



We have tested a system for errors and found 0 in 37 runs. Give a confidence interval for the probability of error. Same question if we find 1 error in 37 runs.

  • We are given the results of performance measurements for two protocols in the form of series x_1,…,x_m , y_1,..,y_n. We are given the means and sample standard deviations of the two series. How can I decide which protocol is better ?