Stat 3130
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

STAT 3130 PowerPoint PPT Presentation


  • 129 Views
  • Uploaded on
  • Presentation posted in: General

STAT 3130. Guest Speaker: Ashok Krishnamurthy, Ph.D. Department of Mathematical and Statistical Sciences 24 January 2011 Correspondence : [email protected] Outline. A brief overview of STAT 3120 One-way Analysis of Variance (ANOVA) ANOVA example

Download Presentation

STAT 3130

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stat 3130

STAT 3130

Guest Speaker:

Ashok Krishnamurthy, Ph.D.

Department of Mathematical and Statistical Sciences

24 January 2011

Correspondence: [email protected]


Outline

Outline

  • A brief overview of STAT 3120

  • One-way Analysis of Variance (ANOVA)

  • ANOVA example

  • Implementing ANOVA in R statistical programming language


A quick review of stat 3120

A quick review of STAT 3120

  • Catalog description

    • A SAS/SPSS based course aimed at providing students with a foundation in statistical methods, including review of descriptive statistics, confidence intervals, hypothesis testing, t-tests, basic Regression and Chi-Square tests.


Statistical inference

Statistical Inference

  • Statistical inference is the process of drawing conclusions from data that are subject to random variation.

  • The conclusion of a statistical inference is a statistical proposition.

    • Estimating the mean and variance of a distribution.

    • Confidence interval estimation the mean and variance of a distribution.

    • Hypothesis tests on the mean and variance of a distribution.


Theory of point estimation

Theory of point estimation

  • There is at least one parameter  whose value is to be approximated on the basis of a sample.

  • The approximation is done using an appropriate statistic.

  • This statistic is called a point estimator for .


Ci for population mean

CI for population mean, 


Ci for population variance

CI for population variance


Ci for difference between two normal population means

CI for difference between two normal population means


Ci for difference between two normal population means1

CI for difference between two normal population means


Hypothesis testing

Hypothesis Testing

  • In the estimation problem there is no preconceived notion concerning the actual value of the parameter .

  • In contrast, when testing a hypothesis on , there is a preconceived notion concerning its value.

  • There are two theories,

    • The hypothesis proposed by the experimenter, denoted H1

    • The negation of H1, denoted H0


Tests concerning the mean of one normal population

Tests concerning the mean of one normal population

(or)


Tests concerning the difference between two normal population means

Tests concerning the difference between two normal population means

Independent samples of sizes n1and n2.

(or)

(or)


Tests concerning variance of one normal population

Tests concerning variance of one normal population


Tests concerning ratio of variances of two normal populations

Tests concerning ratio of variances of two normal populations


Stat 3130

How Many

Dependent

Variables?

What Type

Of Outcome?

How Many

Predictors?

If Categorical Predictor,

Same Participants or Different in each category?

Does Data Meet

Parametric Assumptions?

What type

Of predictors?

If Categorical Predictor,

How many Categories?

ANALYSIS TOOL

Yes

Independent T-test

Different

No

Mann-Whitney Test

Two

Yes

Paired T-test

Same

No

Wilcoxon Rank Sum

Categorical

Yes

One Way ANOVA

Different

No

Kruskall Wallis Test

Three +

One

Yes

Repeated Measures ANOVA

Same

No

Friedman’s ANOVA

Yes

Pearson Correlation or Regression

Continuous

Yes

No

Spearman Correlation or Kendall’s Tau

Continuous

Yes

Ind. Factorial ANOVA or Regression

Different

Categorical

Same

No

Factorial Repeated Measures ANOVA

Yes

Factorial Mixed ANOVA

Both

One

Continuous

Yes

Multiple Regression

Two +

Both

Yes

Multiple Regression/ANCOVA

Categorical

Different

Pearson Chi-Square or Likelihood Ratio

One

Continuous

Logistic Regression

Categorical

Categorical

Different

Loglinear Analysis

Two +

Continuous

Logistic Regression/Discriminant

Both

Different

One

Categorical

Yes

MANOVA

Two +

Continuous

Categorical

Yes

Factorial MANOVA

Two +

Both

Yes

MANCOVA


Comparing several means

Comparing several means

  • It is often necessary to compare many populations for a quantitative variable.

  • That is, we may want to compare the mean outcome over several populations to determine whether they have the same mean outcome and if not, where differences exist.

  • The standard method of analysis for these types of problems is the one-way Analysis of Variance, often abbreviated ANOVA


Can we just use several pairwise t tests

Can we just use several pairwise t-tests?

  • You might be tempted to use t-tests to make such comparisons. Why would this be difficult?

    # groups# pair-wise test

    3 3

    4 6

    5 10

    6 15

    7 21

    and so on….


One way anova contd

One-way ANOVA Contd.

  • The method of ANOVA allow for comparison of the mean over more than two independent groups.

  • In particular, it tests the following hypotheses for comparing over k groups:


Assumptions of anova

Assumptions of ANOVA

  • Populations have normal distributions

  • Population standard deviations are equal

  • Observations are independent, both within and between samples


One way anova contd1

One-way ANOVA Contd.

  • A rejection of the null hypothesis tells us that there is at least one group with a differing mean (though there could be more than one group that is different).

  • If we do not reject the null hypothesis, then we can only conclude that there is no significant difference among the groups.


One way anova procedure

One-way ANOVA procedure

  • Total variation in a measured response is partitioned into components that can be attributed to recognizable sources of variation.

  • For example, suppose we wish to investigate the sulfur content of 5 coal reams in a certain geographical region. Then we would test,


Anova table

ANOVA Table


Computational shortcuts

Computational Shortcuts


Anova table1

ANOVA Table

* See page 682 for a general format of a One-Way ANOVA Table


Anova example

ANOVA Example

A biologist is doing research on elk in their natural Colorado habitat. Three regions are under study, each region having about the same amount of forage and natural cover.

To determine if there is a difference in elk life spans between the three regions, a sample of 6, 5, and 6 mature elks from each region are tranquilized and have a tooth removed.

A laboratory examination of the teeth reveals the ages of the elk. Results for each sample are given in the below table.


Anova example contd

ANOVA Example Contd.

Are there differences in age (elk life spans) over the different regions?

If so, where are such differences occurring?


Anova example contd1

ANOVA Example Contd.


R programming language

R Programming Language

  • Free software for statistical computing and graphics: http://www.r-project.org/

  • Developed at Bell Laboratories

  • Considered a baby version of S/S+

  • S+ sells for about $2000/year subscription


R code to run an anova elk data

R code to run an ANOVA (elk data)

> elk <- read.csv("elk.csv", sep=",", header=T)

> boxplot(elk$Age~ elk$Region, ylab = "Age", xlab = "Region", main = "Boxplot for Elk data")

> Elk.ANOVA <- aov(elk$Age ~ elk$Region)

> summary(Elk.ANOVA)


R output for anova elk data

R output for ANOVA (elk data)

Source Df Sum Sq Mean Sq F value Pr(>F)___

elk$Region 2 48.000 24.000 5.0909 0.0218 *

Residuals 14 66.000 4.714

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Next class post hoc tests

Next class: Post-hoc tests

  • Bonferronicorrection

  • Tukey’s HSD test

  • Fisher’s LSD

  • Newman-Keul test

  • Scheffe method


Fixed versus random effects

Fixed versus random effects

  • When we consider the effect of a factor, it can be either fixed or random. If we are interested in the particular levels of a factor, then it is fixed, e.g., gender, socio-economic class, fertilizer, drug. If we are not interested in the particular levels, but rather have selected the levels to make inference about the factor, then the factor is random.

  • For example, what if there was an effect of hospital on a person’s recovery? A random sample of hospitals would allow us to study this relationship. Here we are interested in whether there is a relationship rather than describing an effect for each individual hospital. These types of models are called random effects models.


  • Login