- 116 Views
- Uploaded on
- Presentation posted in: General

STAT 3130

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

STAT 3130

Guest Speaker:

Ashok Krishnamurthy, Ph.D.

Department of Mathematical and Statistical Sciences

24 January 2011

Correspondence: [email protected]

- A brief overview of STAT 3120
- One-way Analysis of Variance (ANOVA)
- ANOVA example
- Implementing ANOVA in R statistical programming language

- Catalog description
- A SAS/SPSS based course aimed at providing students with a foundation in statistical methods, including review of descriptive statistics, confidence intervals, hypothesis testing, t-tests, basic Regression and Chi-Square tests.

- Statistical inference is the process of drawing conclusions from data that are subject to random variation.
- The conclusion of a statistical inference is a statistical proposition.
- Estimating the mean and variance of a distribution.
- Confidence interval estimation the mean and variance of a distribution.
- Hypothesis tests on the mean and variance of a distribution.

- There is at least one parameter whose value is to be approximated on the basis of a sample.
- The approximation is done using an appropriate statistic.
- This statistic is called a point estimator for .

- In the estimation problem there is no preconceived notion concerning the actual value of the parameter .
- In contrast, when testing a hypothesis on , there is a preconceived notion concerning its value.
- There are two theories,
- The hypothesis proposed by the experimenter, denoted H1
- The negation of H1, denoted H0

(or)

Independent samples of sizes n1and n2.

(or)

(or)

How Many

Dependent

Variables?

What Type

Of Outcome?

How Many

Predictors?

If Categorical Predictor,

Same Participants or Different in each category?

Does Data Meet

Parametric Assumptions?

What type

Of predictors?

If Categorical Predictor,

How many Categories?

ANALYSIS TOOL

Yes

Independent T-test

Different

No

Mann-Whitney Test

Two

Yes

Paired T-test

Same

No

Wilcoxon Rank Sum

Categorical

Yes

One Way ANOVA

Different

No

Kruskall Wallis Test

Three +

One

Yes

Repeated Measures ANOVA

Same

No

Friedman’s ANOVA

Yes

Pearson Correlation or Regression

Continuous

Yes

No

Spearman Correlation or Kendall’s Tau

Continuous

Yes

Ind. Factorial ANOVA or Regression

Different

Categorical

Same

No

Factorial Repeated Measures ANOVA

Yes

Factorial Mixed ANOVA

Both

One

Continuous

Yes

Multiple Regression

Two +

Both

Yes

Multiple Regression/ANCOVA

Categorical

Different

Pearson Chi-Square or Likelihood Ratio

One

Continuous

Logistic Regression

Categorical

Categorical

Different

Loglinear Analysis

Two +

Continuous

Logistic Regression/Discriminant

Both

Different

One

Categorical

Yes

MANOVA

Two +

Continuous

Categorical

Yes

Factorial MANOVA

Two +

Both

Yes

MANCOVA

- It is often necessary to compare many populations for a quantitative variable.
- That is, we may want to compare the mean outcome over several populations to determine whether they have the same mean outcome and if not, where differences exist.
- The standard method of analysis for these types of problems is the one-way Analysis of Variance, often abbreviated ANOVA

- You might be tempted to use t-tests to make such comparisons. Why would this be difficult?
# groups# pair-wise test

3 3

4 6

5 10

6 15

7 21

and so on….

- The method of ANOVA allow for comparison of the mean over more than two independent groups.
- In particular, it tests the following hypotheses for comparing over k groups:

- Populations have normal distributions
- Population standard deviations are equal
- Observations are independent, both within and between samples

- A rejection of the null hypothesis tells us that there is at least one group with a differing mean (though there could be more than one group that is different).
- If we do not reject the null hypothesis, then we can only conclude that there is no significant difference among the groups.

- Total variation in a measured response is partitioned into components that can be attributed to recognizable sources of variation.
- For example, suppose we wish to investigate the sulfur content of 5 coal reams in a certain geographical region. Then we would test,

* See page 682 for a general format of a One-Way ANOVA Table

A biologist is doing research on elk in their natural Colorado habitat. Three regions are under study, each region having about the same amount of forage and natural cover.

To determine if there is a difference in elk life spans between the three regions, a sample of 6, 5, and 6 mature elks from each region are tranquilized and have a tooth removed.

A laboratory examination of the teeth reveals the ages of the elk. Results for each sample are given in the below table.

Are there differences in age (elk life spans) over the different regions?

If so, where are such differences occurring?

- Free software for statistical computing and graphics: http://www.r-project.org/
- Developed at Bell Laboratories
- Considered a baby version of S/S+
- S+ sells for about $2000/year subscription

> elk <- read.csv("elk.csv", sep=",", header=T)

> boxplot(elk$Age~ elk$Region, ylab = "Age", xlab = "Region", main = "Boxplot for Elk data")

> Elk.ANOVA <- aov(elk$Age ~ elk$Region)

> summary(Elk.ANOVA)

Source Df Sum Sq Mean Sq F value Pr(>F)___

elk$Region 2 48.000 24.000 5.0909 0.0218 *

Residuals 14 66.000 4.714

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

- Bonferronicorrection
- Tukey’s HSD test
- Fisher’s LSD
- Newman-Keul test
- Scheffe method

- When we consider the effect of a factor, it can be either fixed or random. If we are interested in the particular levels of a factor, then it is fixed, e.g., gender, socio-economic class, fertilizer, drug. If we are not interested in the particular levels, but rather have selected the levels to make inference about the factor, then the factor is random.
- For example, what if there was an effect of hospital on a person’s recovery? A random sample of hospitals would allow us to study this relationship. Here we are interested in whether there is a relationship rather than describing an effect for each individual hospital. These types of models are called random effects models.