AOV Assumption Checking and Transformations ( § 8.4, 8.5)

1 / 34

# AOV Assumption Checking and Transformations ( § 8.4, 8.5) - PowerPoint PPT Presentation

AOV Assumption Checking and Transformations ( § 8.4, 8.5). How do we check the Normality assumption in AOV? How do we check the Homogeneity of variances assumption in AOV? (§7.4 ) What to do if these assumptions are not met?. Model Assumptions. Homoscedasticity (common group variances).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## AOV Assumption Checking and Transformations ( § 8.4, 8.5)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
AOV Assumption Checking andTransformations (§8.4, 8.5)
• How do we check the Normality assumption in AOV?
• How do we check the Homogeneity of variances assumption in AOV? (§7.4)
• What to do if these assumptions are not met?
Model Assumptions
• Homoscedasticity (common group variances).
• Normality of responses (or of residuals).
• Independence of responses (or of residuals). (Hopefully achieved through randomization…)
• Effect additivity. (Only an issue in multi-way AOV; later).
Checking the Equal Variance Assumption

HA: some of the variances are different from each other

Little work but little power

Hartley’s Test: A logical extension of the F test for t=2.

Requires equal replication, n, among groups. Requires normality.

Reject if Fmax > Fa,t,n-1, tabulated in Table 12.

Bartlett’s Test

More work but better power

Bartlett’s Test: Allows unequal replication, but requires normality.

T.S.

If C > c2(t-1),a then apply the correction term

Reject if C/CF > c2(t-1),a

R.R.

Levene’s Test

More work but powerful result.

= sample median of

i-th group

Let

T.S.

df1 = t -1

df2 = nT - t

Reject H0 if

R.R.

Essentially an AOV on the zij

Test for Equal Variances

Response Resist

Factors Sand

ConfLvl 95.0000

Bonferroni confidence intervals for standard deviations

Lower Sigma Upper N Factor Levels

1.70502 3.28634 14.4467 5 15

1.89209 3.64692 16.0318 5 20

1.07585 2.07364 9.1157 5 25

1.07585 2.07364 9.1157 5 30

1.48567 2.86356 12.5882 5 35

Bartlett's Test (normal distribution)

Test Statistic: 1.890

P-Value : 0.756

Levene's Test (any continuous distribution)

Test Statistic: 0.463

P-Value : 0.762

Minitab

Stat > ANOVA > Test for Equal Variances

Minitab Help

Use Bartlett’s test when the data come from normal distributions; Bartlett’s test is not robust to departures from normality. Use Levene’s test when the data come from continuous, but not necessarily normal, distributions.

The computational method for Levene’s Test is a modification of Levene’s procedure [10] developed by [2]. This method considers the distances of the observations from their sample median rather than their sample mean. Using the sample median rather than the sample mean makes the test more robust for smaller samples.

Do not reject H0 since p-value > 0.05 (traditional a)

SAS Program

procglmdata=stress;

class sand;

model resistance = sand / solution;

means sand / hovtest=bartlett;

means sand / hovtest=levene(type=abs);

means sand / hovtest=levene(type=square);

means sand / hovtest=bf; /* Brown and Forsythe mod of Levene */

title1'Compression resistance in concrete beams as';

title2' a function of percent sand in the mix';

run;

Hovtest only works when one factor in (right hand side) model.

hovtest=bartlett;

Bartlett's Test for Homogeneity of resistance Variance

Source DF Chi-Square Pr > ChiSq

sand 4 1.8901 0.7560

SAS

Levene's Test for Homogeneity of resistance Variance

ANOVA of Absolute Deviations from Group Means

Sum of Mean

Source DF Squares Square F Value Pr > F

sand 4 8.8320 2.2080 0.95 0.4573

Error 20 46.6080 2.3304

hovtest=levene(type=abs);

Levene's Test for Homogeneity of resistance Variance

ANOVA of Squared Deviations from Group Means

Sum of Mean

Source DF Squares Square F Value Pr > F

sand 4 202.2 50.5504 0.85 0.5076

Error 20 1182.8 59.1400

hovtest=levene(type=square);

Brown and Forsythe's Test for Homogeneity of resistance Variance

ANOVA of Absolute Deviations from Group Medians

Sum of Mean

Source DF Squares Square F Value Pr > F

sand 4 7.4400 1.8600 0.46 0.7623

Error 20 80.4000 4.0200

hovtest=bf;

RESIST

Levene

Statistic

df1

df2

Sig.

.947

4

20

.457

SPSS

Test of Homogeneity of Variances

Since the p-value (0.457) is greater than our (typical) a =0.05 Type I error risk level, we do not reject the null hypothesis.

This is Levene’s original test in which the zij are centered on group means and not medians.

R

Tests of Homogeneity of Variances

bartlett.test(): Bartlett’s Test.

fligner.test(): Fligner-Killeen Test (nonparametric).

Checking for Normality

Reminder: Normality of the RESIDUALS is assumed. The original data are assumed normal also, but each group may have a different mean if HA is true. Practice is to first fit the model, THEN output the residuals, then test for normality of the residuals. This APPROACH is always correct.

TOOLS

• Histogram and/or boxplot of all residuals (eij).
• Normal probability (Q-Q) plot.
• Formal test for normality.
Histogram of Residuals

procglmdata=stress;

class sand;

model resistance = sand / solution;

outputout=resid r=r_resis p=p_resis ;

title1'Compression resistance in concrete beams as';

title2' a function of percent sand in the mix';

run;

proccapabilitydata=resid;

histogram r_resis / normal;

ppplot r_resis / normalsquare ;

run;

Probability Plots (QQ-Plots)

A scatter plot of the percentiles of the residuals against the percentiles of a standard normal distribution. The basic idea is that if the residuals came from a normal distribution, values for these percentiles should lie on a straight line.

• Compute and sort the residuals e(1), e(2),…, e(n).
• Associate with each residual a standard normal percentile: z(i) = F-1((i-.5)/n).
• Plot z(i) versus e(i). Compare to straight line (don’t care so much about which line).
Software

Percentile pi = (i-0.5)/n

Normal percentile =NORMSINV(pi)

MTB:

Graph -> Probability Plot

R: with residuals in “y”

qqnorm(y)

qqline(y)

Probability Plot

Minitab

SAS (note axes changed)

These look normal!

Formal Tests of Normality

Many, many tests (a favorite pass-time of statisticians is developing new tests for normality.)

• Kolmogorov-Smirnov test; Anderson-Darling test (both based on the empirical CDF).
• Shapiro-Wilk’s test; Ryan-Joiner test (both are correlation based tests applicable for n < 50).
• D’Agostino’s test (n>=50).

All quite conservative – they fail to reject the null hypothesis of normality more often than they should.

Shapiro-Wilk’s W test

e1, e2, …, en represent data ranked from smallest to largest.

H0: The population has a normal distribution.

HA: The population does not have a normal distribution.

T.S.

Coefficients ai come from a table.

If n is even

R.R. Reject H0 if W < W0.05

If n is odd.

Critical values of Wa come from a table.

D’Agostino’s Test

e1, e2, …, en represent data ranked from smallest to largest.

H0: The population has a normal distribution.

HA: The population does not have a normal distribution.

T.S.

R.R. (two sided test)

Reject H0 if

Y0.025 and Y0.975 come from a table of percentiles of the Y statistic.

Transformations to Achieve Homoscedasticity
• What can we do if the homoscedasticity (equal variances) assumption is rejected?
• Declare that the AOV model is not an adequate model for the data. Look for alternative models. (Later.)
• Try to “cheat” by forcing the data to be homoscedastic through a transformation of the response variable Y. (Variance Stabilizing Transformations.)
Square Root Transformation

Response is positive and continuous.

This transformation works when we notice the variance changes as a linear function of the mean.

k>0

• Useful for count data (Poisson Distributed).
• For small values of Y, use Y+.5.

Typical use:

Counts of items when counts

are between 0 and 10.

Logarithmic Transformation

Response is positive and continuous.

This transformation tends to work when the variance is a linear function of the square of the mean

k>0

• Replace Y by Y+1 if zero occurs.
• Useful if effects are multiplicative (later).
• UsefulIf there is considerable heterogeneity in the data.

Typical use:

Growth over time.

Concentrations.

Counts of times when counts

are greater than 10.

ARCSINE SQUARE ROOT

Response is a proportion.

With proportions, the variance is a linear function of the mean times (1-mean) where the sample mean is the expected proportion.

• Y is a proportion (decimal between 0 and 1).
• Zero counts should be replaced by 1/4, and
• N by N-1/4 before converting to percentages

Typical use:

Proportion of seeds germinating.

Proportion responding.

Reciprocal Transformation

Response is positive and continuous.

This transformation works when the variance is a linear function of the fourth power of the mean.

• Use Y+1 if zero occurs.
• Useful if the reciprocal of the original
• scale has meaning.

Typical use: Survival time.

Power Family of Transformations (1)

Suppose we apply the power transformation:

Suppose the true situation is that the variance is proportional to the k-th power of the mean.

In the transformed variable we will have:

If p is taken as 1-k, then the variance of Z will not depend on the mean, i.e. the variance will be constant. This is a Variance stabilizing transformation.

Power Family of Transformations (2)

With replicated data, k can sometimes be found empirically by fitting:

Estimate:

k can be estimated by least squares (regression – Next Unit).

If is zero use the logarithmic transformation.

suggested

transformation

geometric mean of the original data.

Exponent, l, is unknown. Hence the model can be viewed as having an additional parameter which must be estimated (choose the value of l that minimizes the residual sum of squares).

Handling Heterogeneity

no

Regression?

ANOVA

yes

Fit Effect Model

Fit linear

model

accept

OK

Test for

Homoscedasticity

Plot

residuals

reject

Transform

Not OK

OK

Box/Cox Family

Power Family

Transformed Data

Transformations to Achieve Normality

no

Regression?

ANOVA

yes

Fit linear

model

Estimate

group means

Probability plot

Formal Tests

yes

OK

Residuals Normal?

no

Different Model

Transform