Loading in 5 sec....

Classical Hypothesis Testing TheoryPowerPoint Presentation

Classical Hypothesis Testing Theory

- By
**hope** - Follow User

- 200 Views
- Updated On :

Classical Hypothesis Testing Theory. Alexander Senf. Review. 5 steps of classical hypothesis testing (Ch. 3) Declare null hypothesis H 0 and alternate hypothesis H 1 Fix a threshold α for Type I error (1% or 5%) Type I error ( α ): reject H 0 when it is true

Related searches for Classical Hypothesis Testing Theory

Download Presentation
## PowerPoint Slideshow about 'Classical Hypothesis Testing Theory' - hope

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Classical Hypothesis Testing Theory

Alexander Senf

Review

- 5 steps of classical hypothesis testing (Ch. 3)
- Declare null hypothesis H0 and alternate hypothesis H1
- Fix a threshold α for Type I error (1% or 5%)
- Type I error (α): reject H0 when it is true
- Type II error (β): accept H0 when it is false

- Determine a test statistic
- a quantity calculated from the data

Review

- Determine what observed values of the test statistic should lead to rejection of H0
- Significance point K (determined by α)

- Test to see if observed data is more extreme than significance point K
- If it is, reject H0
- Otherwise, accept H0

Overview of Ch. 9

- Simple Fixed-Sample-Size Tests
- Composite Fixed-Sample-Size Tests
- The -2 log λ Approximation
- The Analysis of Variance (ANOVA)
- Multivariate Methods
- ANOVA: the Repeated Measures Case
- Bootstrap Methods: the Two-sample t-test
- Sequential Analysis

The Issue

- In the simplest case, everything is specified
- Probability distribution of H0 and H1
- Including all parameters

- α (and K)
- But: β is left unspecified

- Probability distribution of H0 and H1
- It is desirable to have a procedure that minimizes β given a fixed α
- This would maximize the power of the test
- 1-β, the probability of rejecting H0 when H1 is true

- This would maximize the power of the test

Most Powerful Procedure

- Neyman-Pearson Lemma
- States that the likelihood-ratio (LR) test is the most powerful test for a given α
- The LR is defined as:
- where
- f0, f1 are completely specified density functions for H0,H1
- X1, X2, … Xn are iid random variables

Neyman-Pearson Lemma

- H0 is rejected when LR ≥ K
- With a constant K chosen such that:
P(LR ≥ K when H0 is true) = α

- Let’s look at an example using the Neyman-Pearson Lemma!
- Then we will prove it.

Example

- Basketball players seem to be taller than average
- Use this observation to formulate our hypothesis H1:
- “Tallness is a factor in the recruitment of KU basketball players”

- The null hypothesis, H0, could be:
- “No, the players on KU’s team are a just average height compared to the population in the U.S.”
- “Average height of the team and the population in general is the same”

- Use this observation to formulate our hypothesis H1:

Example

- Setup:
- Average height of males in the US: 5’9 ½“
- Average height of KU players in 2008: 6’04 ½”
- Assumption: both populations are normal-distributed centered on their respective averages (μ0 = 69.5 in, μ1 = 76.5 in) and σ = 2
- Sample size: 3

- Choose α: 5%

Example

- Our test statistic is the Likelihood Ratio, LR
- Now we need to determine a significance point K at which we can reject H0, given α = 5%
- P(Λ(x) ≥ K | H0 is true) = 0.05, determine K

Example

- So we just need to solve for K’ and calculate K:
- How to solve this? Well, we only need one set of values to calculate K, so let’s pick two and solve for the third:
- We get one result: K3’=71.0803

Example

- Then we can just plug it in to Λ and calculate K:

Example

- With the significance point K = 1.663*10-7 we can now test our hypothesis based on observations:
- E.g.: Sasha = 83 in, Darrell = 81 in, Sherron = 71 in
- 1.446*1012 > 1.663*10-7
- Therefore, our hypothesis that tallness is a factor in the recruitment of KU basketball players is true.

Neyman-Pearson Proof

- Let A define region in the joint range of X1, X2, … Xn such that LR ≥ K. A is the critical region.
- If A is the only critical region of size α we are done
- Let’s assume another critical region of size α, defined by B

Proof

- H0 is rejected if the observed vector (x1, x2, …, xn) is in A or in B.
- Let A and B overlap in region C
- Power of the test: rejecting H0 when H1 is true
- The Power of this test using A is:

Proof

- Define: Δ = ∫AL(H1) - ∫BL(H1)
- The power of the test using A minus using B
- Where A\C is the set of points in A but not in C
- And B\C contains points in B but not in C

Proof

- Thus
- Which implies that the power of the test using A is greater than or equal to the power using B.

Not Identically Distributed

- In most cases, random variables are not identically distributed, at least not in H1
- This affects the likelihood function, L
- For example, H1 in the two-sample t-test is:
- Where μ1 and μ2 are different

Composite

- Further, the hypotheses being tested do not specify all parameters
- They are composite
- This chapter only outlines aspects of composite test theory relevant to the material in this book.

Parameter Spaces

- The set of values the parameters of interest can take
- Null hypothesis: parameters in some region ω
- Alternate hypothesis: parameters in Ω
- ω is usually a subspace of Ω
- Nested hypothesis case
- Null hypothesis nested within alternate hypothesis
- This book focuses on this case

- “if the alternate hypothesis can explain the data significantly better we can reject the null hypothesis”

- Nested hypothesis case

λ Ratio

- Optimality theory for composite tests suggests this as desirable test statistic:
- Lmax(ω): maximum likelihood when parameters are confined to the region ω
- Lmax(Ω): maximum likelihood when parameters are confined to the region Ω, defined by H1
- H0 is rejected when λ is sufficiently small (→ Type I error)

Example: t-tests

- The next slides calculate the λ-ratio for the two sample t-test (with the likelihood)
- t-tests later generalize to ANOVA and T2 tests

Equal Variance Two-Sided t-test

- Setup
- Random variables X11,…,X1m in group 1 are Normally and Independently Distributed (μ1,σ2)
- Random variables X21,…,X2n in group 2 are NID (μ2,σ2)
- X1i and X2j are independent for all i and j
- Null hypothesis H0: μ1= μ2 (= μ, unspecified)
- Alternate hypothesis H1: both unspecified

Equal Variance Two-Sided t-test

- Setup (continued)
- σ2 is unknown and unspecified in H0 and H1
- Is assumed to be the same in both distributions

- Region ω is:
- Region Ω is:

- σ2 is unknown and unspecified in H0 and H1

Equal Variance Two-Sided t-test

- Derivation
- H0: writing μ for the mean, when μ1= μ2, the maximum over likelihood ω is at
- And the (common) variance σ2 is

Equal Variance Two-Sided t-test

- Inserting both into the likelihood function, L

Equal Variance Two-Sided t-test

- Do the same thing for region Ω
- Which produces this likelihood Function, L

Equal Variance Two-Sided t-test

- The test statistic λ is then

It’s the same function, just

With different variances

Equal Variance Two-Sided t-test

- We can then use the algebraic identity
- To show that
- Where t is (from Ch. 3)

Equal Variance Two-Sided t-test

- t is the observed value of T
- S is defined in Ch. 3 as

λ

We can plot λ as a

function of t:

(e.g. m+n=10)

t

Equal Variance Two-Sided t-test

- So, by the monotonicity argument, we can use t2 or |t| instead of λ as test statistic
- Small values of λ correspond to large values of |t|
- Sufficiently large |t| lead to rejection of H0
- The H0 distribution of t is known
- t-distribution with m+n-2 degrees of freedom

- Significance points are widely available
- Once α has been chosen, values of |t| sufficiently large to reject H0 can be determined

Equal Variance Two-Sided t-test

http://www.socr.ucla.edu/Applets.dir/T-table.html

Equal Variance One-Sided t-test

- Similar to Two-Sided t-test case
- Different region Ω for H1:
- Means μ1 and μ2 are not simply different, but one is larger than the other μ1 ≥ μ2
- If then maximum likelihood estimates are the same as for the two-sided case

- Different region Ω for H1:

Equal Variance One-Sided t-test

- If then the unconstrained maximum of the likelihood is outside of ω
- The unique maximum is at , implying that the maximum in ω occurs at a boundary point in Ω
- At this point estimates of μ1 and μ2 are equal
- At this point the likelihood ratio is 1 and H0 is not rejected
- Result: H0 is rejected in favor of H1 (μ1 ≥ μ2) only for sufficiently large positive values of t

Example - Revised

- This scenario fits with our original example:
- H1 is that the average height of KU basketball players is bigger than for the general population
- One-sided test
- We could assume that we don’t know the averages for H0 and H1
- We actually don’t know σ (I just guessed 2 in the original example)

Example - Revised

- Updated example:
- Observation in group 1 (KU): X1 = {83, 81, 71}
- Observation in group 2: X2 = {65, 72, 70}
- Pick significance point for t from a table: tα = 2.132
- t-distribution, m+n-2 = 4 degrees of freedom, α = 0.05

- Calculate t with our observations
- t > tα, so we can reject H0!

Comments

- Problems that might arise in other cases
- The λ-ratio might not reduce to a function of a well-known test statistic, such as t
- There might not be a unique H0 distribution of λ
- Fortunately, the t statistic is a pivotal quantity
- Independent of the parameters not prescribed by H0
- e.g. μ, σ

- Independent of the parameters not prescribed by H0
- For many testing procedures this property does not hold

Unequal Variance Two-Sided t-test

- Identical to Equal Variance Two-Sided t-test
- Except: variances in group 1 and group 2 are no longer assumed to be identical
- Group 1: NID(μ1, σ12)
- Group 2: NID(μ2, σ22)
- With σ12 and σ22 unknown and not assumed identical
- Region ω = {μ1 = μ2, 0 < σ12, σ22 < +∞}
- Ω makes no constraints on values μ1, μ2, σ12, and σ22

- Except: variances in group 1 and group 2 are no longer assumed to be identical

Unequal Variance Two-Sided t-test

- The likelihood function of (X11, X12, …, X1m, X21, X22, …, X2n) then becomes
- Under H0 (μ1 = μ2 = μ), this becomes:

Unequal Variance Two-Sided t-test

- Maximum likelihood estimates , and satisfy the simultaneous equations:

Unequal Variance Two-Sided t-test

- cubic equation in
- Neither the λ ratio, nor any monotonic function has a known probability distribution when H0 is true!
- This does not lead to any useful testing statistic
- The t-statistic may be used as reasonably close
- However H0 distribution is still unknown, as it depends on the unknown ratio σ12/σ22
- In practice, a heuristic is often used (see Ch. 3.5)

The -2 log λ Approximation

The -2 log λ Approximation

- Used when the λ-ratio procedure does not lead to a test statistic whose H0 distribution is known
- Example: Unequal Variance Two-Sided t-test

- Various approximations can be used
- But only if certain regularity assumptions and restrictions hold true

The -2 log λ Approximation

- Best known approximation:
- If H0 is true, -2 log λ has an asymptotic chi-square distribution,
- with degrees of freedom equal to the difference in parameters unspecified by H0 and H1, respectively.
- λ is the likelihood ratio
- “asymptotic” = “as the sample size → ∞”

- Provides an asymptotically valid testing procedure

- If H0 is true, -2 log λ has an asymptotic chi-square distribution,

The -2 log λ Approximation

- Restrictions:
- Parameters must be real numbers that can take on values in some interval
- The maximum likelihood estimator is found at a turning point of the function
- i.e. a “real” maximum, not at a boundary point

- H0 is nested in H1 (as in all previous slides)

- These restrictions are important in the proof
- I skip the proof…

The -2 log λ Approximation

- Instead:
- Our original basketball example, revised again:
- Let’s drop our last assumption, that the variance in the population at large is the same as in the group of KU basketball players.
- All we have left now are our observations and the hypothesis that μ1 > μ2
- Where μ1 is the average height of Basketball players

- Observation in group 1 (KU): X1 = {83, 81, 71}
- Observation in group 2: X2 = {65, 72, 70}

- Our original basketball example, revised again:

Example – Revised Again

- Using the Unequal Variance One-Sided t-Test
- We get:

The Analysis of Variance (ANOVA)

- Probably the most frequently used hypothesis testing procedure in statistics
- This section
- Derives of the Sum of Squares
- Gives an outline of the ANOVA procedure
- Introduces one-way ANOVA as a generalization of the two-sample t-test
- Two-way and multi-way ANOVA
- Further generalizations of ANOVA

Sum of Squares

- New variables (from Ch. 3)
- The two-sample t-test tests for equality of the means of two groups.
- We could express the observations as:
- Where the Eij are assumed to be NID(0,σ2)
- H0 is μ1 = μ2

Sum of Squares

- This can also be written as:
- μ could be seen as overall mean
- αj as deviation from μ in group j

- This model is overparameterized
- Uses more parameters than necessary
- Necessitates the requirement
- (always assumed imposed)

Sum of Squares

- We are deriving a test procedure similar to the two-sample two-sided t-test
- Using |t| as test statistic
- Absolute value of the T statistic

- This is equivalent to using t2
- Because it’s a monotonic function of |t|

- The square of the t statistic (from Ch. 3)

Sum of Squares

- …can, after algebraic manipulations, be written as F
- where

Sum of Squares

- B: between (among) group sum of squares
- W: within group sum of squares
- B + W: total sum of squares
- Can be shown to be:

- Total number of degrees of freedom: m + n – 1
- Between groups: 1
- Within groups: m + n - 2

Sum of Squares

- This gives us the F statistic
- Our goal is to test the significance of the difference between the means of two groups
- B measures the difference

- The difference must be measured relative to the variance within the groups
- W measures that

- The larger F is, the more significant the difference

The ANOVA Procedure

- Subdivide observed total sum of squares into several components
- In our case, B and W

- Pick appropriate significance point for a chosen Type I error α from an F table
- Compare the observed components to test our hypothesis

F-Statistic

- Significance points depend on degrees of freedom in B and W
- In our case, 1 and (m + n – 2)

http://www.ento.vt.edu/~sharov/PopEcol/tables/f005.html

Comments

- The two-group case readily generalizes to any number of groups.
- ANOVAs can be classified in various ways, e.g.
- fixed effects models
- mixed effects models
- random effects model
- Difference is discussed later
- For now we consider fixed effect models
- Parameter αi is fixed, but unknown, in group i

Comments

- Terminology
- Although ANOVA contains the word ‘variance’
- What we actually test for is a equality in means between the groups
- The different mean assumptions affect the variance, though

- ANOVAs are special cases of regression models from Ch. 8

One-Way ANOVA

- One-Way fixed-effect ANOVA
- Setup and derivation
- Like two-sample t-test for g number of groups
- Observations (ni observations, i=1,2,…,g)
- Using overparameterized model for X
- Eij assumed NID(0,σ2), Σniαi = 0, αi fixed in group i

One-Way ANOVA

- Null Hypothesis H0 is: α1 = α2 = … = αg = 0
- Total sum of squares is
- This is subdivided into B and W
- with

One-Way ANOVA

- Total degrees of freedom: N – 1
- Subdivided into dfB = g – 1 and dfW = N - g

- This gives us our test statistic F
- We can now look in the F-table for these degrees of freedom to pick significance points for B and W
- And calculate B and W from the observed data
- And accept or reject H0

Example

- Revisiting the Basketball example
- Looking at it as a One-Way ANOVA analysis
- Observation in group 1 (KU): X1 = {83, 81, 71}
- Observation in group 2: X2 = {65, 72, 70}

- Total Sum of Squares:
- B (between groups sum of squares)

- Looking at it as a One-Way ANOVA analysis

Example

- W (within groups sum of squares)
- Degrees of freedom
- Total: N-1 = 5
- dfB = g – 1 = 2 - 1 = 1
- dfW = N – g = 6 – 2 = 4

Example

- Table lookup for df 1 and 4 and α= 0.05:
- Critical value: F = 7.71
- Calculate F from our data:
- So… 4.806 < 7.71
- With ANOVA we actually accept H0!
- Seems to be the large variance in group 1

Same Example – with Excel

- Screenshots:

Excel

- Offers most of these tests, built-in

Two-Way ANOVA

- Two-Way Fixed Effects ANOVA
- Overview only (in the scope of this book)
- More complicated setup; example:
- Expression levels of one gene in lung cancer patients
- a different risk classes
- E.g.: ultrahigh, very high, intermediate, low

- b different age groups
- n individuals for each risk/age combination

Two-Way ANOVA

- Expression levels (our observations): Xijk
- i is the risk class (i = 1, 2, …, a)
- j indicates the age group
- k corresponds to the individual in each group (k = 1, …, n)
- Each group is a possible risk/age combination

- The number of individuals in each group is the same, n
- This is a “balanced” design
- Theory for unbalanced designs is more complicated and not covered in this book

Two-Way ANOVA

- The Xijk can be arranged in a table:

Risk category

j

i

Age group

Number of individuals in this

risk/age group (aka “cell”)

This is a two-way table

Two-Way ANOVA

- The model adopted for each Xijk is
- Where Eijk are NID(μ, α2)
- The mean of Xijk is μ + αi + βi + δij
- αi is a fixed parameter, additive for risk class i
- βi is a fixed parameter, additive for age group i
- δij is a fixed risk/age interaction parameter
- Should be added is a possible group/group interaction exists

Two-Way ANOVA

- These constraints are imposed
- Σiαi = Σiβi = 0
- Σiδij = 0 for all j
- Σjδij = 0 for all i

- The total sum of squares is then subdivided into four groups:
- Risk class sum of squares
- Age group sum of squares
- Interaction sum of squares
- Within cells (“residual” or “error”) sum of squares

Two-Way ANOVA

- Associated with each sum of squares
- Corresponding degrees of freedom
- Hence also a corresponding mean square
- Sum of squares divided by degrees of freedom

- The mean squares are then compared using F ratios to test for significance of various effects
- First – test for a significant risk/age interaction
- F-ratio used is ratio of interaction mean square and within-cells mean square

Two-Way ANOVA

- If such an interaction is used, it may not be reasonable to test for significant risk or age differences
- Example, μ in two risk classes, two age groups:
- No evidence of interaction
- Example of interaction

Risk

Age

Age

Multi-Way ANOVA

- One-way and two-way fixed effects ANOVAs can be extended to multi-way ANOVAs
- Gets complicated
- Example: three-way ANOVA model:

Further generalizations of ANOVA

- The 2m factorial design
- A particular form of the one-way ANOVA
- Interactions between main effects

- m “factors” taken at two “levels”
- E.g. (1) Gender, (2) Tissue (lung, kidney), and (3) status (affected, not affected)

- 2m possible combinations of levels/groups
- Can test for main effects and interactions
- Need replicated experiments
- n replications for each of the 2m experiments

- A particular form of the one-way ANOVA

Further generalizations of ANOVA

- Example, m = 3, denoted by A, B, C
- 8 groups, {abc, ab, ac, bc, a, b, c, 1}
- Write totals of n observations Tabc, Tab, …, T1
- The total between sum of squares can be subdivided into seven individual sums of squares
- Three main effects (A, B, C)
- Three pair wise interactions (AB, AC, BC)
- One triple-wise interaction (ABC)
- Example: Sum of squares for A, and for BC, respectively

Further generalizations of ANOVA Confounding

- If m ≥ 5 the number of groups becomes large
- Then the total number of observations, n2m is large
- It is possible to reduce the number of observations by a process …

- Interaction ABC probably very small and not interesting
- So, prefer a model without ABC, reduce data
- There are ANOVA designs for that

Further generalizations of ANOVA

- Fractional Replication
- Related to confounding
- Sometimes two groups cannot be distinguished from each other, then they are aliases
- E.g. A and BC

- This reduces the need to experiments and data
- Ch. 13 talks more about this in the context of microarrays

Random/Mixed Effect Models

- So far: fixed effect models
- E.g. Risk class, age group fixed in previous example
- Multiple experiments would use same categories
- But: what if we took experimental data on several random days?
- The days in itself have no meaning, but a “between days” sum of squares must be extracted
- What if the days turn out to be important?
- If we fail to test for it, the significance of our procedure is diminished.
- Days are a random category, unlike risk and age!

- E.g. Risk class, age group fixed in previous example

Random/Mixed Effect Models

- Mixed Effect Models
- If some categories are fixed and some are random
- Symbols used:
- Greek letters for fixed effects
- Uppercase Roman letters for random effects
- Example: two-way mixed effect model with
- Risk class a and days d and n values collected each day, the appropriate model is written:

Random/Mixed Effect Models

- Random effect model have no fixed categories
- The details on the ANOVA analysis depend on which effects are random and which are fixed
- In a microarray context (more in Ch. 13)
- There tend to be several fixed and several random effects, which complicates the analysis
- Many interactions simply assumed zero

Multivariate Methods

ANOVA: the Repeated Measures Case

Bootstrap Methods: the Two-sample t-test

All skipped …

Sequential Analysis

- Sequential Probability Ratio
- Sample size not known in advance
- Depends on outcomes of successive observations
- Some of this theory is in BLAST
- Basic Local Alignment Search Tool

- The book focuses on discreet random variables

Sequential Analysis

- Consider:
- Random variable Y with distribution P(y;ξ)
- Tests usually relate to the value of parameter ξ
- H0: ξ is ξ0
- H1: ξ is ξ1
- We can choose a value for the Type I error α
- And a value for the Type II error β
- Sampling then continues while

Sequential Analysis

- A and B are chosen to correspond to an α and β
- Sampling continues until the ratio is less than A (accept H0) or greater than B (reject H0)
- Because these are discreet variables, boundary overshoot usually occurs
- We don’t expect to exactly get values α and β

- Desired values for α and β approximately achieved by using

Sequential Analysis

- It is also convenient to take logarithms, which gives us:
- Using
- We can write

Sequential Analysis

- Example: sequence matching
- H0: p0 = 0.25 (probability of a match is 0.25)
- H1: p1 = 0.35 (probability of a match is 0.35)
- Type I error α and Type II error β chosen 0.01
- Yi: 1 if there is a match at position i, otherwise 0
- Sampling continues while
- with

Sequential Analysis

- S can be seen as the support offered by Yi for H1
- The inequality can be re-written as
- This is actually a random walk with step sizes 0.7016 for a match and -0.2984 for a mismatch

Sequential Analysis

- Power Function for a Sequential Test
- Suppose the true value of the parameter of interest is ξ
- We wish to know the probability that H1 is accepted, given ξ
- This probability is the power Ρ(ξ) of the test

Sequential Analysis

- Where θ* is the unique non-zero solution to θ in
- R is the range of values of Y
- Equivalently, θ* is the unique non-zero solution to θ in
- Where S is defined as before

Sequential Analysis

- This is very similar to Ch. 7 – Random Walks
- The parameter θ* is the same as in Ch. 7
- And it will be the same in Ch 10 – BLAST
- < skipping the random walk part >

Sequential Analysis

- Mean Sample Size
- The (random) number of observations until one or the other hypothesis is accepted
- Find approximation by ignoring boundary overshoot
- Essentially identical method used to find the mean number of steps until the random walk stops

Sequential Analysis

- Two expressions are calculated for ΣiS1,0(Yi)
- One involves the mean sample size
- By equating both expressions, solve for mean sample size

Sequential Analysis

- So, the mean sample size is:
- Both numerator and denominator depend on Ρ(ξ), and so also on θ*
- A generalization applies if Q(y) of Y has different distribution than H0 and H1 – relevant to BLAST

Sequential Analysis

- Example
- Same sequence matching example as before
- H0: p0 = 0.25 (probability of a match is 0.25)
- H1: p1 = 0.35 (probability of a match is 0.35)
- Type I error α and Type II error β chosen 0.01

- Mean sample size equation is:
- Mean sample size is when H0 is true: 194
- Mean sample size is when H1 is true: 182

- Same sequence matching example as before

Sequential Analysis

- Boundary Overshoot
- So far we assumed no boundary overshoot
- In practice, there will almost always be, though
- Exact Type I and Type II errors different from α and β

- Random walk theory can be used to assess how significant the effects of boundary overshoot are
- It can be shown that the sum of Type I and Type II errors is always less than α + β (also individually)
- BLAST deals with this in a novel way -> see Ch. 10

Download Presentation

Connecting to Server..