Lecture 7 Outline

Lecture 7 Outline • Levene’s test for equality of variances (4.5.3) • Interpretation of p-values (2.5.1) • Robustness and resistance of t-tools (3.1-3.4)

Bumpus’ Data Revisited • Bumpus concluded that sparrows were subjected to stabilizing selection – birds that were markedly different from the average were more likely to have died. • Bumpus (1898): “The process of selective elimination is most severe with extremely variable individuals, no matter in what direction the variations may occur. It is quite as dangerous to be conspicuously above a certain standard of organic excellence as it is to be conspicuously below the standard. It is the type that nature favors.” • Bumpus’ hypothesis is that the variance of physical characteristics in the survivor group should be smaller than the variance in the perished group

Testing Equal Variances • Two independent samples from populations with variances and • H0: vs. H1: • Levene’s Test – Section 4.5.3 • In JMP, Fit Y by X, under red triangle next to Oneway Analysis of humerus by group, click Unequal Variances. Use Levene’s test. • p-value = .4548, no evidence that variances are not equal, thus no evidence for Bumpus’ hypothesis.

t-tests for randomized experiments • Section 2.4 • t-test (with its associated Student t distribution under H0) has been developed in Ch. 2 for making inferences to populations using the random sampling probability model. • In Ch. 1, we studied making causal inferences in the additive treatment effect model using the probability model of a randomized experiment. • The two-sample t-statistic is a reasonable test statistic for testing H0: additive treatment effect is , basically equivalent to

t-test for randomized experiments cont. • When the two group sizes are large the t-test provides an approximately correct p-value for a randomized experiment experiment, i.e., the distribution of the t-statistic under the null hypothesis of an additive treatment effect of 0 is approximately t distribution with degrees of freedom. • See Display 2.11 • Bottom line: t-test in JMP can be used to make approximately correct inferences (p-values and CIs) for randomized experiments but inferences should be phrased in terms of additive treatment effects rather than difference in population mean.

Notes about tests, p-values • Interpretation of p-value: • Formally: the probability of random sampling (or random assignment) leading to a test statistic at least as large as the observed one if is true. • Informally, the degree of credibility in H0. • Conclusions from p-values • (a) Small p-values mean either (i) H0 is wrong or (ii) we obtained an unusual sample • (b) Large p-values mean either (i) H0 is correct or (ii)the study isn’t large enough to conclude otherwise (i.e., the data are consistent with H0 being true but do not prove it).

Conceptual Question 2.8 • Suppose the following statement is made in a statistical summary: “A comparison of breathing capacities in individuals in households with low nitrogen dioxide levels and individuals in households with high nitrogen dioxide levels indicated that there is no difference in the means (two-sided p-value =.24).” What is wrong with this statement?

Interpretation of p-values • So what p-values are small and large. • For reference: chance of • 3 heads in 3 coin tosses is .125 • 4 4 .063 • 5 5 .031 • 6 6 .016 • 7 7 .008 • 8 8 .004 • See Display 2.12 for a subjective guide.

Closer Look at Assumptions • Chapter 3 • t-test and CIs based on the assumptions that • (i) the population distributions are normal • (ii) the population distributions have same S.D. • (iii) the sample observations are independent • These ideal assumptions, particularly (i) and (ii) are never met.

Usefulness of t-tools • The t-tests and CIs are still quite useable if we • understand their robustness and resistance • consider transformations, e.g. log(Y) • have a strategy for outliers • be prepared to label inferences as “approximate” • additionally, we have alternative tools (Ch. 4)

Case study 3.1.2: Effect of Agent Orange • Many Vietnam veterans are concerned that their health may have been affected by exposure to Agent Orange, a herbicide sprayed in South Vietnam between 1962 and 1970. • Particularly worrisome component of Agent Orange is a dioxin called TCDD which in high doses is known to be associated with certain cancers. • Nonrandom sample of 646 Vietnam vets and 97 non-Vietnam vets who entered Army between 1965 and 1971 and served only in U.S. or Germany, dioxin levels of both samples measured in 1987. • Question of interest: Are current (1987) dioxin levels higher in population of Vietnam vets?

Robustness of two-sample t-tools • A statistical procedure is robust to departures from a particular assumption if it is valid even when the assumption is not met exactly • Valid means that the uncertainty measures – the confidence levels and p-values – are nearly equal to the stated rules, e.g., a procedure for obtaining a 95% confidence interval is valid if it is roughly 95% successful in capturing the parameter • Statisticians know something about robustness from advanced theory and computer simulation.

How important is normality? • If the sample sizes are large the t-tests will be valid no matter how nonnormal the populations are. • If the two populations have same S.D. and approximately the same shape and if , validity of t-tools is affected moderately by long-tailedness and very little by skewness. • See Display 3.4 • See Chapter 3.2 for how t-tools are affected by departures from normality and equal S.D. in other situations.

Departures from Independence • Independence: Knowledge of one observation can’t help to predict another. • Common violations of independence assumption: • Cluster effects (Y’s from same cluster, e.g., litters, are similar) • Serial effects (Y’s close together in time or space are similar) • Effect of lack of independence on validity of t-tools: . t-ratio no longer has a t-distribution and t-tools may give misleading results. • If cluster effects occur in pairs, use matched pairs t-test. • If we suspect other types of non-independence, use Ch. 9-15 tools.

Recognizing Matched Pairs Studies • Does there exist some natural relationship between the first pair of observations that makes it more appropriate to compare the first pair than the first observation in group 1 and the second observation in group 2? • Before and after designs • Example: A researcher for OSHA wants to see whether cutbacks in enforcement of safety regulations coincided with an increase in work related accidents. For 20 industrial plants, she has number of accidents in 1980 and 1995.

Outliers and resistance • Outliers are observations relatively far from their estimated means. • Outliers may arise either • (a) if the population distribution is long-tailed. • (b) they don’t belong to the population of interest (come from contaminating population) • A statistical procedure is resistant if one or a few outliers cannot have an undue influence on result.

Resistance • Illustration for understanding resistance: the sample mean is not resistant; the sample median is. • Sample: 9, 3, 5, 8, 100 • Mean with outlier: 25, without: 6.2 • Median with outlier: 8, without: 6.5 • t-tools are not resistant to outliers because they are based on sample means.

Practical two-sample strategy • Think about independence – use tools from later in course (or matched pairs) if there’s a potential problem • Use graphical displays to assess: normality, spread, outliers • If there are outliers, investigate them and see whether they (i) change conclusions; (ii) warrant removal. Follow the outlier examination strategy in Display 3.6.

Excluding Observations from Analysis in JMP for Investigating Outliers • Click on row you want to exclude. • Click on rows menu and then click exclude/unexclude. A red circle with a line through it will appear next to the excluded observation. • Multiple observations can be excluded. • To include an observation that was excluded back into the analysis, click on excluded row, click on rows menu and then click exclude/unexclude. The red circle next to observation should disappear.

Conceptual Question #6 • (a) What course of action would you propose for the statistical analysis if it was learned that Vietnam veteran #646 (the largest observation in Display 3.6) worked for several years, after Vietnam, handling herbicides with dioxin? • (b) What would you propose if this was learned instead for Vietnam veteran #645 (second largest observation)?

Lecture 7 Outline