1 / 102

# Classical Hypothesis Testing Theory - PowerPoint PPT Presentation

Classical Hypothesis Testing Theory. Alexander Senf. Review. 5 steps of classical hypothesis testing (Ch. 3) Declare null hypothesis H 0 and alternate hypothesis H 1 Fix a threshold α for Type I error (1% or 5%) Type I error ( α ): reject H 0 when it is true

Related searches for Classical Hypothesis Testing Theory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Classical Hypothesis Testing Theory' - hope

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Classical Hypothesis Testing Theory

Alexander Senf

• 5 steps of classical hypothesis testing (Ch. 3)

• Declare null hypothesis H0 and alternate hypothesis H1

• Fix a threshold α for Type I error (1% or 5%)

• Type I error (α): reject H0 when it is true

• Type II error (β): accept H0 when it is false

• Determine a test statistic

• a quantity calculated from the data

• Determine what observed values of the test statistic should lead to rejection of H0

• Significance point K (determined by α)

• Test to see if observed data is more extreme than significance point K

• If it is, reject H0

• Otherwise, accept H0

• Simple Fixed-Sample-Size Tests

• Composite Fixed-Sample-Size Tests

• The -2 log λ Approximation

• The Analysis of Variance (ANOVA)

• Multivariate Methods

• ANOVA: the Repeated Measures Case

• Bootstrap Methods: the Two-sample t-test

• Sequential Analysis

• In the simplest case, everything is specified

• Probability distribution of H0 and H1

• Including all parameters

• α (and K)

• But: β is left unspecified

• It is desirable to have a procedure that minimizes β given a fixed α

• This would maximize the power of the test

• 1-β, the probability of rejecting H0 when H1 is true

Most Powerful Procedure

• Neyman-Pearson Lemma

• States that the likelihood-ratio (LR) test is the most powerful test for a given α

• The LR is defined as:

• where

• f0, f1 are completely specified density functions for H0,H1

• X1, X2, … Xn are iid random variables

• H0 is rejected when LR ≥ K

• With a constant K chosen such that:

P(LR ≥ K when H0 is true) = α

• Let’s look at an example using the Neyman-Pearson Lemma!

• Then we will prove it.

• Basketball players seem to be taller than average

• Use this observation to formulate our hypothesis H1:

• “Tallness is a factor in the recruitment of KU basketball players”

• The null hypothesis, H0, could be:

• “No, the players on KU’s team are a just average height compared to the population in the U.S.”

• “Average height of the team and the population in general is the same”

• Setup:

• Average height of males in the US: 5’9 ½“

• Average height of KU players in 2008: 6’04 ½”

• Assumption: both populations are normal-distributed centered on their respective averages (μ0 = 69.5 in, μ1 = 76.5 in) and σ = 2

• Sample size: 3

• Choose α: 5%

• The two populations:

f0

f1

p

height (inches)

• Our test statistic is the Likelihood Ratio, LR

• Now we need to determine a significance point K at which we can reject H0, given α = 5%

• P(Λ(x) ≥ K | H0 is true) = 0.05, determine K

• So we just need to solve for K’ and calculate K:

• How to solve this? Well, we only need one set of values to calculate K, so let’s pick two and solve for the third:

• We get one result: K3’=71.0803

• Then we can just plug it in to Λ and calculate K:

• With the significance point K = 1.663*10-7 we can now test our hypothesis based on observations:

• E.g.: Sasha = 83 in, Darrell = 81 in, Sherron = 71 in

• 1.446*1012 > 1.663*10-7

• Therefore, our hypothesis that tallness is a factor in the recruitment of KU basketball players is true.

• Let A define region in the joint range of X1, X2, … Xn such that LR ≥ K. A is the critical region.

• If A is the only critical region of size α we are done

• Let’s assume another critical region of size α, defined by B

• H0 is rejected if the observed vector (x1, x2, …, xn) is in A or in B.

• Let A and B overlap in region C

• Power of the test: rejecting H0 when H1 is true

• The Power of this test using A is:

• Define: Δ = ∫AL(H1) - ∫BL(H1)

• The power of the test using A minus using B

• Where A\C is the set of points in A but not in C

• And B\C contains points in B but not in C

• So, in A\C we have:

• While in B\C we have:

Why?

• Thus

• Which implies that the power of the test using A is greater than or equal to the power using B.

• In most cases, random variables are not identically distributed, at least not in H1

• This affects the likelihood function, L

• For example, H1 in the two-sample t-test is:

• Where μ1 and μ2 are different

• Further, the hypotheses being tested do not specify all parameters

• They are composite

• This chapter only outlines aspects of composite test theory relevant to the material in this book.

• The set of values the parameters of interest can take

• Null hypothesis: parameters in some region ω

• Alternate hypothesis: parameters in Ω

• ω is usually a subspace of Ω

• Nested hypothesis case

• Null hypothesis nested within alternate hypothesis

• This book focuses on this case

• “if the alternate hypothesis can explain the data significantly better we can reject the null hypothesis”

λ Ratio

• Optimality theory for composite tests suggests this as desirable test statistic:

• Lmax(ω): maximum likelihood when parameters are confined to the region ω

• Lmax(Ω): maximum likelihood when parameters are confined to the region Ω, defined by H1

• H0 is rejected when λ is sufficiently small (→ Type I error)

Example: t-tests

• The next slides calculate the λ-ratio for the two sample t-test (with the likelihood)

• t-tests later generalize to ANOVA and T2 tests

Equal Variance Two-Sided t-test

• Setup

• Random variables X11,…,X1m in group 1 are Normally and Independently Distributed (μ1,σ2)

• Random variables X21,…,X2n in group 2 are NID (μ2,σ2)

• X1i and X2j are independent for all i and j

• Null hypothesis H0: μ1= μ2 (= μ, unspecified)

• Alternate hypothesis H1: both unspecified

Equal Variance Two-Sided t-test

• Setup (continued)

• σ2 is unknown and unspecified in H0 and H1

• Is assumed to be the same in both distributions

• Region ω is:

• Region Ω is:

Equal Variance Two-Sided t-test

• Derivation

• H0: writing μ for the mean, when μ1= μ2, the maximum over likelihood ω is at

• And the (common) variance σ2 is

Equal Variance Two-Sided t-test

• Inserting both into the likelihood function, L

Equal Variance Two-Sided t-test

• Do the same thing for region Ω

• Which produces this likelihood Function, L

Equal Variance Two-Sided t-test

• The test statistic λ is then

It’s the same function, just

With different variances

Equal Variance Two-Sided t-test

• We can then use the algebraic identity

• To show that

• Where t is (from Ch. 3)

Equal Variance Two-Sided t-test

• t is the observed value of T

• S is defined in Ch. 3 as

λ

We can plot λ as a

function of t:

(e.g. m+n=10)

t

Equal Variance Two-Sided t-test

• So, by the monotonicity argument, we can use t2 or |t| instead of λ as test statistic

• Small values of λ correspond to large values of |t|

• Sufficiently large |t| lead to rejection of H0

• The H0 distribution of t is known

• t-distribution with m+n-2 degrees of freedom

• Significance points are widely available

• Once α has been chosen, values of |t| sufficiently large to reject H0 can be determined

Equal Variance Two-Sided t-test

http://www.socr.ucla.edu/Applets.dir/T-table.html

Equal Variance One-Sided t-test

• Similar to Two-Sided t-test case

• Different region Ω for H1:

• Means μ1 and μ2 are not simply different, but one is larger than the other μ1 ≥ μ2

• If then maximum likelihood estimates are the same as for the two-sided case

Equal Variance One-Sided t-test

• If then the unconstrained maximum of the likelihood is outside of ω

• The unique maximum is at , implying that the maximum in ω occurs at a boundary point in Ω

• At this point estimates of μ1 and μ2 are equal

• At this point the likelihood ratio is 1 and H0 is not rejected

• Result: H0 is rejected in favor of H1 (μ1 ≥ μ2) only for sufficiently large positive values of t

• This scenario fits with our original example:

• H1 is that the average height of KU basketball players is bigger than for the general population

• One-sided test

• We could assume that we don’t know the averages for H0 and H1

• We actually don’t know σ (I just guessed 2 in the original example)

• Updated example:

• Observation in group 1 (KU): X1 = {83, 81, 71}

• Observation in group 2: X2 = {65, 72, 70}

• Pick significance point for t from a table: tα = 2.132

• t-distribution, m+n-2 = 4 degrees of freedom, α = 0.05

• Calculate t with our observations

• t > tα, so we can reject H0!

• Problems that might arise in other cases

• The λ-ratio might not reduce to a function of a well-known test statistic, such as t

• There might not be a unique H0 distribution of λ

• Fortunately, the t statistic is a pivotal quantity

• Independent of the parameters not prescribed by H0

• e.g. μ, σ

• For many testing procedures this property does not hold

• Identical to Equal Variance Two-Sided t-test

• Except: variances in group 1 and group 2 are no longer assumed to be identical

• Group 1: NID(μ1, σ12)

• Group 2: NID(μ2, σ22)

• With σ12 and σ22 unknown and not assumed identical

• Region ω = {μ1 = μ2, 0 < σ12, σ22 < +∞}

• Ω makes no constraints on values μ1, μ2, σ12, and σ22

• The likelihood function of (X11, X12, …, X1m, X21, X22, …, X2n) then becomes

• Under H0 (μ1 = μ2 = μ), this becomes:

• Maximum likelihood estimates , and satisfy the simultaneous equations:

•  cubic equation in

• Neither the λ ratio, nor any monotonic function has a known probability distribution when H0 is true!

• This does not lead to any useful testing statistic

• The t-statistic may be used as reasonably close

• However H0 distribution is still unknown, as it depends on the unknown ratio σ12/σ22

• In practice, a heuristic is often used (see Ch. 3.5)

The -2 log λ Approximation

The -2 log λ Approximation

• Used when the λ-ratio procedure does not lead to a test statistic whose H0 distribution is known

• Example: Unequal Variance Two-Sided t-test

• Various approximations can be used

• But only if certain regularity assumptions and restrictions hold true

The -2 log λ Approximation

• Best known approximation:

• If H0 is true, -2 log λ has an asymptotic chi-square distribution,

• with degrees of freedom equal to the difference in parameters unspecified by H0 and H1, respectively.

• λ is the likelihood ratio

• “asymptotic” = “as the sample size → ∞”

• Provides an asymptotically valid testing procedure

The -2 log λ Approximation

• Restrictions:

• Parameters must be real numbers that can take on values in some interval

• The maximum likelihood estimator is found at a turning point of the function

• i.e. a “real” maximum, not at a boundary point

• H0 is nested in H1 (as in all previous slides)

• These restrictions are important in the proof

• I skip the proof…

The -2 log λ Approximation

• Our original basketball example, revised again:

• Let’s drop our last assumption, that the variance in the population at large is the same as in the group of KU basketball players.

• All we have left now are our observations and the hypothesis that μ1 > μ2

• Where μ1 is the average height of Basketball players

• Observation in group 1 (KU): X1 = {83, 81, 71}

• Observation in group 2: X2 = {65, 72, 70}

• Using the Unequal Variance One-Sided t-Test

• We get:

• Probably the most frequently used hypothesis testing procedure in statistics

• This section

• Derives of the Sum of Squares

• Gives an outline of the ANOVA procedure

• Introduces one-way ANOVA as a generalization of the two-sample t-test

• Two-way and multi-way ANOVA

• Further generalizations of ANOVA

• New variables (from Ch. 3)

• The two-sample t-test tests for equality of the means of two groups.

• We could express the observations as:

• Where the Eij are assumed to be NID(0,σ2)

• H0 is μ1 = μ2

• This can also be written as:

• μ could be seen as overall mean

• αj as deviation from μ in group j

• This model is overparameterized

• Uses more parameters than necessary

• Necessitates the requirement

• (always assumed imposed)

• We are deriving a test procedure similar to the two-sample two-sided t-test

• Using |t| as test statistic

• Absolute value of the T statistic

• This is equivalent to using t2

• Because it’s a monotonic function of |t|

• The square of the t statistic (from Ch. 3)

• …can, after algebraic manipulations, be written as F

• where

• B: between (among) group sum of squares

• W: within group sum of squares

• B + W: total sum of squares

• Can be shown to be:

• Total number of degrees of freedom: m + n – 1

• Between groups: 1

• Within groups: m + n - 2

• This gives us the F statistic

• Our goal is to test the significance of the difference between the means of two groups

• B measures the difference

• The difference must be measured relative to the variance within the groups

• W measures that

• The larger F is, the more significant the difference

• Subdivide observed total sum of squares into several components

• In our case, B and W

• Pick appropriate significance point for a chosen Type I error α from an F table

• Compare the observed components to test our hypothesis

F-Statistic

• Significance points depend on degrees of freedom in B and W

• In our case, 1 and (m + n – 2)

http://www.ento.vt.edu/~sharov/PopEcol/tables/f005.html

• The two-group case readily generalizes to any number of groups.

• ANOVAs can be classified in various ways, e.g.

• fixed effects models

• mixed effects models

• random effects model

• Difference is discussed later

• For now we consider fixed effect models

• Parameter αi is fixed, but unknown, in group i

• Terminology

• Although ANOVA contains the word ‘variance’

• What we actually test for is a equality in means between the groups

• The different mean assumptions affect the variance, though

• ANOVAs are special cases of regression models from Ch. 8

• One-Way fixed-effect ANOVA

• Setup and derivation

• Like two-sample t-test for g number of groups

• Observations (ni observations, i=1,2,…,g)

• Using overparameterized model for X

• Eij assumed NID(0,σ2), Σniαi = 0, αi fixed in group i

• Null Hypothesis H0 is: α1 = α2 = … = αg = 0

• Total sum of squares is

• This is subdivided into B and W

• with

• Total degrees of freedom: N – 1

• Subdivided into dfB = g – 1 and dfW = N - g

• This gives us our test statistic F

• We can now look in the F-table for these degrees of freedom to pick significance points for B and W

• And calculate B and W from the observed data

• And accept or reject H0

• Looking at it as a One-Way ANOVA analysis

• Observation in group 1 (KU): X1 = {83, 81, 71}

• Observation in group 2: X2 = {65, 72, 70}

• Total Sum of Squares:

• B (between groups sum of squares)

• W (within groups sum of squares)

• Degrees of freedom

• Total: N-1 = 5

• dfB = g – 1 = 2 - 1 = 1

• dfW = N – g = 6 – 2 = 4

• Table lookup for df 1 and 4 and α= 0.05:

• Critical value: F = 7.71

• Calculate F from our data:

• So… 4.806 < 7.71

• With ANOVA we actually accept H0!

• Seems to be the large variance in group 1

• Screenshots:

• Offers most of these tests, built-in

• Two-Way Fixed Effects ANOVA

• Overview only (in the scope of this book)

• More complicated setup; example:

• Expression levels of one gene in lung cancer patients

• a different risk classes

• E.g.: ultrahigh, very high, intermediate, low

• b different age groups

• n individuals for each risk/age combination

• Expression levels (our observations): Xijk

• i is the risk class (i = 1, 2, …, a)

• j indicates the age group

• k corresponds to the individual in each group (k = 1, …, n)

• Each group is a possible risk/age combination

• The number of individuals in each group is the same, n

• This is a “balanced” design

• Theory for unbalanced designs is more complicated and not covered in this book

• The Xijk can be arranged in a table:

Risk category

j

i

Age group

Number of individuals in this

risk/age group (aka “cell”)

This is a two-way table

• The model adopted for each Xijk is

• Where Eijk are NID(μ, α2)

• The mean of Xijk is μ + αi + βi + δij

• αi is a fixed parameter, additive for risk class i

• βi is a fixed parameter, additive for age group i

• δij is a fixed risk/age interaction parameter

• Should be added is a possible group/group interaction exists

• These constraints are imposed

• Σiαi = Σiβi = 0

• Σiδij = 0 for all j

• Σjδij = 0 for all i

• The total sum of squares is then subdivided into four groups:

• Risk class sum of squares

• Age group sum of squares

• Interaction sum of squares

• Within cells (“residual” or “error”) sum of squares

• Associated with each sum of squares

• Corresponding degrees of freedom

• Hence also a corresponding mean square

• Sum of squares divided by degrees of freedom

• The mean squares are then compared using F ratios to test for significance of various effects

• First – test for a significant risk/age interaction

• F-ratio used is ratio of interaction mean square and within-cells mean square

• If such an interaction is used, it may not be reasonable to test for significant risk or age differences

• Example, μ in two risk classes, two age groups:

• No evidence of interaction

• Example of interaction

Risk

Age

Age

• One-way and two-way fixed effects ANOVAs can be extended to multi-way ANOVAs

• Gets complicated

• Example: three-way ANOVA model:

• The 2m factorial design

• A particular form of the one-way ANOVA

• Interactions between main effects

• m “factors” taken at two “levels”

• E.g. (1) Gender, (2) Tissue (lung, kidney), and (3) status (affected, not affected)

• 2m possible combinations of levels/groups

• Can test for main effects and interactions

• Need replicated experiments

• n replications for each of the 2m experiments

• Example, m = 3, denoted by A, B, C

• 8 groups, {abc, ab, ac, bc, a, b, c, 1}

• Write totals of n observations Tabc, Tab, …, T1

• The total between sum of squares can be subdivided into seven individual sums of squares

• Three main effects (A, B, C)

• Three pair wise interactions (AB, AC, BC)

• One triple-wise interaction (ABC)

• Example: Sum of squares for A, and for BC, respectively

• If m ≥ 5 the number of groups becomes large

• Then the total number of observations, n2m is large

• It is possible to reduce the number of observations by a process …

• Confounding

• Interaction ABC probably very small and not interesting

• So, prefer a model without ABC, reduce data

• There are ANOVA designs for that

• Fractional Replication

• Related to confounding

• Sometimes two groups cannot be distinguished from each other, then they are aliases

• E.g. A and BC

• This reduces the need to experiments and data

• So far: fixed effect models

• E.g. Risk class, age group fixed in previous example

• Multiple experiments would use same categories

• But: what if we took experimental data on several random days?

• The days in itself have no meaning, but a “between days” sum of squares must be extracted

• What if the days turn out to be important?

• If we fail to test for it, the significance of our procedure is diminished.

• Days are a random category, unlike risk and age!

• Mixed Effect Models

• If some categories are fixed and some are random

• Symbols used:

• Greek letters for fixed effects

• Uppercase Roman letters for random effects

• Example: two-way mixed effect model with

• Risk class a and days d and n values collected each day, the appropriate model is written:

• Random effect model have no fixed categories

• The details on the ANOVA analysis depend on which effects are random and which are fixed

• In a microarray context (more in Ch. 13)

• There tend to be several fixed and several random effects, which complicates the analysis

• Many interactions simply assumed zero

ANOVA: the Repeated Measures Case

Bootstrap Methods: the Two-sample t-test

All skipped …

• Sequential Probability Ratio

• Sample size not known in advance

• Depends on outcomes of successive observations

• Some of this theory is in BLAST

• Basic Local Alignment Search Tool

• The book focuses on discreet random variables

• Consider:

• Random variable Y with distribution P(y;ξ)

• Tests usually relate to the value of parameter ξ

• H0: ξ is ξ0

• H1: ξ is ξ1

• We can choose a value for the Type I error α

• And a value for the Type II error β

• Sampling then continues while

• A and B are chosen to correspond to an α and β

• Sampling continues until the ratio is less than A (accept H0) or greater than B (reject H0)

• Because these are discreet variables, boundary overshoot usually occurs

• We don’t expect to exactly get values α and β

• Desired values for α and β approximately achieved by using

• It is also convenient to take logarithms, which gives us:

• Using

• We can write

• Example: sequence matching

• H0: p0 = 0.25 (probability of a match is 0.25)

• H1: p1 = 0.35 (probability of a match is 0.35)

• Type I error α and Type II error β chosen 0.01

• Yi: 1 if there is a match at position i, otherwise 0

• Sampling continues while

• with

• S can be seen as the support offered by Yi for H1

• The inequality can be re-written as

• This is actually a random walk with step sizes 0.7016 for a match and -0.2984 for a mismatch

• Power Function for a Sequential Test

• Suppose the true value of the parameter of interest is ξ

• We wish to know the probability that H1 is accepted, given ξ

• This probability is the power Ρ(ξ) of the test

• Where θ* is the unique non-zero solution to θ in

• R is the range of values of Y

• Equivalently, θ* is the unique non-zero solution to θ in

• Where S is defined as before

• This is very similar to Ch. 7 – Random Walks

• The parameter θ* is the same as in Ch. 7

• And it will be the same in Ch 10 – BLAST

• < skipping the random walk part >

• Mean Sample Size

• The (random) number of observations until one or the other hypothesis is accepted

• Find approximation by ignoring boundary overshoot

• Essentially identical method used to find the mean number of steps until the random walk stops

• Two expressions are calculated for ΣiS1,0(Yi)

• One involves the mean sample size

• By equating both expressions, solve for mean sample size

• So, the mean sample size is:

• Both numerator and denominator depend on Ρ(ξ), and so also on θ*

• A generalization applies if Q(y) of Y has different distribution than H0 and H1 – relevant to BLAST

• Example

• Same sequence matching example as before

• H0: p0 = 0.25 (probability of a match is 0.25)

• H1: p1 = 0.35 (probability of a match is 0.35)

• Type I error α and Type II error β chosen 0.01

• Mean sample size equation is:

• Mean sample size is when H0 is true: 194

• Mean sample size is when H1 is true: 182

• Boundary Overshoot

• So far we assumed no boundary overshoot

• In practice, there will almost always be, though

• Exact Type I and Type II errors different from α and β

• Random walk theory can be used to assess how significant the effects of boundary overshoot are

• It can be shown that the sum of Type I and Type II errors is always less than α + β (also individually)

• BLAST deals with this in a novel way -> see Ch. 10