Statistical Methods in Computer Science

Hypothesis Testing II: Single-Factor Experiments Ido Dagan Statistical Methods in Computer Science

Single-Factor Experiments A generalization of treatment experiments Determine effect of independent variable values (nominal) Effect: On the dependent variable treatment1Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 treatment2 Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2 control Ex1 & Ex2 & .... & Exn ==> Dep3 Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish baseline)

Single-Factor Experiments An generalization of treatment experiments Determine effect of independent variable values (nominal) Effect: On the dependent variable treatment1Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 treatment2 Ind2 & Ex1 & Ex2 & .... & Exn ==> Dep2 control Ex1 & Ex2 & .... & Exn ==> Dep3 Compare performance of algorithm A to B to C .... Control condition: Optional (e.g., to establish baseline) Values of dependent variable Values of independent variable

Single-Factor Experiments: Definitions The independent variable is called the factor Its values (being tested) are called levels Our goal: Determine whether there is an effect of levels Null hypothesis: There is no effect Alternative hypothesis: At least one level causes an effect Tool: One-way ANOVA A simple special case of general Analysis of Variance

The case for Single-factor ANOVA(one-way ANOVA) We have k samples (k levels of the factor) Each with its own sample mean, sample std. deviation for the dependent variable value We want to determine whether one (at least) is different treatment1Ind1 & Ex1 & Ex2 & .... & Exn ==> Dep1 … treatment2 Indk & Ex1 & Ex2 & .... & Exn ==> Depk control Ex1 & Ex2 & .... & Exn ==> Dep3 Values of dependent variable Values of independent variable = levels of the factor Cannot use the tests we learned: Why?

The case for Single-factor ANOVA(one-way ANOVA) We have k samples (k levels of the factor) Each with its own sample mean, sample std. deviation We want to determine whether one (at least) is different • H0: M1=M2=M3=M4 • H1: There exist i,j such that Mi <> Mj

The case for Single-factor ANOVA(one-way ANOVA) • We have k samples (k levels of the factor) • Each with its own sample mean, sample std. deviation • We want to determine whether one (at least) is different Why not use t-test to compare every Mi, Mj? • H0: M1=M2=M3=M4 • H1: There exist i,j such that Mi <> Mj

Multiple paired comparisons Let ac be the probability of an error in a single comparison alpha = the probability of incorrectly rejecting null hypothesis 1-ac: probability of making no error in a single comparison (1-ac)m: probability of no error in m comparisons (experiment) ae = 1-(1-ac)m: probability of an error in the experiment Under assumption of independent comparisons aequickly becomes large as m increases

Example Suppose we want to contrast 15 levels of the factor 15 groups, k=15 Total number of pairwise comparisons (m) : 105 15 X (15-1) / 2 = 105 Suppose ac = 0.05 Then ae = 1-(1-ac)m = 1-(1-0.05)105 = 0.9954 We are very likely to make a type I error!

Possible solutions? Reduce ac until overall ae level is 0.05 (or as needed) Risk: comparison alpha target may become unobtainable Ignore experiment null hypothesis, focus on comparisons Carry out m comparisons # of errors in m experiments: m X ac e.g., m=105, ac=0.05, # of errors = 5.25. But which?

One-way ANOVA A method for testing the experiment null hypothesis H0: all levels' sample means are equal to each other Key idea: Estimate a variance B under the assumption H0 is true Estimate a “real” variance W (regardless of H0) Use F-test to test hypothesis that B=W Assumes variance of all groups is the same

Some preliminaries Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i For example: x1,2 x3,4

Some preliminaries Let xi,j be the jth element in sample i Let Mi be the sample mean of sample i Let Vi be the sample variance of sample i Let M be the grand sample mean (all elements, all samples) Let V be the grand sample variance

The variance contributing to a value Every element xi,j can be re-written as: xi,j = M + ei,j where ei,j is some error component We can focus on the error component ei,j = xi,j – M which we will rewrite as: ei,j = (xi,j - Mi ) + (Mi - M)

Within-group and between-group The re-written form of the error component has two parts ei,j = (xi,j - Mi) + (Mi - M) Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean For example, in the table: x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1

Within-group and between-group The re-written form of the error component has two parts ei,j = (xi,j - Mi ) + (Mi - M) Within-group component: variance w.r.t group mean Between-group component: variance w.r.t grand mean For example, in the table: x1,1 = 14.9, M1 = 14.86, M = 10.8 e1,1 = (14.9-14.86) + (14.86 – 10.8) = 0.04 + 4.06 = 4.1 Note within-group and between-group components: Most of the error (variance) is due to the between group! Can we use this in more general fashion?

No within-group variance No variance within group, in any element

No between-group variance No variance between groups, in any group

Comparing within-group and between-groups components The error component of a single element is: ei,j =(xi,j - M) = (xi,j - Mi ) + (Mi - M) Let us relate this to the sample and grand sums-of-squares It can be shown that: Let us rewrite this as

From Sums of Squares (SS) to variances We know ... and convert to Mean Squares (as variance estimates):

From Sums of Squares (SS) to variances We know ... and convert to variances: Degrees of freedom

From Sums of Squares (SS) to variances • We know • ... and convert to variances: # of levels (samples)

From Sums of Squares (SS) to variances • We know • ... and convert to variances:

Determining final alpha level • MSwithin is an estimate of the (inherent) population variance • Which does not depend on the null hypothesis (M1=M2=... MI) • Intuition: It’s an “average” of variances in the individual groups • MSbetween estimates the population variance + the treatment effect • It does depend on the null hypothesis • Intuition: It’s similar to an estimate for the variance of the samples means, where each component is multiplied by Ni • Recall: N · sample mean variance = population variance • If the null hypothesis is true – the two values estimate the inherent variance, and should be equal up to the sampling variation • So now we have two variance estimates for testing • Use F-test • F = Msbetween / MSwithin • Compare to F-distribution with dfbetween, dfwithin • Determine alpha level (significance)

Example

Example Check F distribution(2,12): Significant!

Reading the results from statistics software You can use a statistics software to run one-way ANOVA It will give out something like this: Source df SS MS F p between 2 173.3 86.7 32.97 p<0.001 within 14 31.5 2.6 total 16 204.9 You should have no problem reading this, now.

Analogy to linear regression • Analogy to linear regression – where: • the variance of observation is composed of: • the variance of the predictions • plus the variance of the deviations from the corresponding predictions: • that is – explained variance (according to the prediction) vs. unexplained variance (due to deviations from prediction)

Summary • Treatment and single-factor experiments • Independent variable: categorical • Dependent variable: “numerical” (ratio/interval) • Multiple comparisons: A problem for experiment hypotheses • Run one-way ANOVA instead • Assumes: • populations are normal • have equal variances • independent random samples (with replacement) • Moderate deviation from normal, particularly with large samples, is still fine • Somewhat different variances are fine for roughly equal samples • If significant, run additional tests for details: • Tukey's procedure (T method) • LSD • Scheffe • ...

Statistical Methods in Computer Science