This presentation is the property of its rightful owner.
1 / 21

# Chi-Square and Analysis of Variance (ANOVA) PowerPoint PPT Presentation

Chi-Square and Analysis of Variance (ANOVA). Lecture 9. The Chi-Square Distribution and Test for Independence. Hypothesis testing between two or more categorical variables. Chi-square Test of Independence. Tests the association between two nominal (categorical) variables.

Chi-Square and Analysis of Variance (ANOVA)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Lecture 9

## The Chi-Square Distribution and Test for Independence

Hypothesis testing between two or more categorical variables

### Chi-square Test of Independence

• Tests the association between two nominal (categorical) variables.

• Null Hyp: The 2 variables are independent.

• Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

### Degrees of freedom

• Chi-square degrees of freedom

• df = (r-1) (c-1)

• Where r = # of rows, c = # of columns

• Thus, in any 2x2 contingency table, the degrees of freedom = 1.

• As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

### Chi-Square Distribution

• The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

### Requirements for Chi-Square test

• Must be a random sample from population

• Data must be in raw frequencies

• Variables must be independent

• Categories for each I.V. must be mutually exclusive and exhaustive

### Using the Chi-Square Test

• Often used with contingency tables (i.e., crosstabulations)

• E.g., gender x race

• Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table.

• In this case, the null hypothesis is that there is no relationship between row and column frequencies.

### Practical Example:

• Expected frequencies versus observed frequencies

• General Social Survey Example

## ANOVA and the f-distribution

Hypothesis testing between a 3+ category variable and a metric variable

### Analysis of Variance

• In its simplest form, it is used to compare means for three or more categories.

• Example:

• Life Happiness scale and Marital Status (married, never married, divorced)

• Relies on the F-distribution

• Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

### What is ANOVA?

• If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests.

• The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known).

• A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

### The F-ratio

• MS = mean square

• bg = between groups

• wg = within groups

• Numerator is the “effect” and denominator is the “error”

• df = # of categories – 1 (k-1)

### Between-Group Sum of Squares (Numerator)

• Total variability – Residual Variability

• Total variability is quantified as the sum of the squares of the differences between each value and the grand mean.

• Also called the total sum-of-squares

• Variability within groups is quantified as the sum of squares of the differences between each value and its group mean

• Also called residual sum-of-squares

### Null Hypothesis in ANOVA

• If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

### F-distribution

• F-test is always a one-tailed test.

• Why?

### Logic of the ANOVA

• Conceptual Intro to ANOVA

### Bringing it all together: Choosing the appropriate bivariate statistic

• Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship.

• Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

### Choosing the Appropriate Statistical Test

• General rules for choosing a bivariate test:

• Two categorical variables

• Chi-Square (crosstabulations)

• Two metric variables

• Correlation

• One 3+ categorical variable, one metric variable

• ANOVA

• One binary categorical variable, one metric variable

• T-test

### Assignment #2

• Online (course website)

• Due next Monday in class (April 10th)