- 241 Views
- Uploaded on
- Presentation posted in: General

Chi-Square and Analysis of Variance (ANOVA)

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Chi-Square and Analysis of Variance (ANOVA)

Lecture 9

The Chi-Square Distribution and Test for Independence

Hypothesis testing between two or more categorical variables

- Tests the association between two nominal (categorical) variables.
- Null Hyp: The 2 variables are independent.

- Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

- Chi-square degrees of freedom
- df = (r-1) (c-1)
- Where r = # of rows, c = # of columns
- Thus, in any 2x2 contingency table, the degrees of freedom = 1.
- As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

- df = (r-1) (c-1)

- The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

- Must be a random sample from population
- Data must be in raw frequencies
- Variables must be independent
- Categories for each I.V. must be mutually exclusive and exhaustive

- Often used with contingency tables (i.e., crosstabulations)
- E.g., gender x race

- Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table.
- In this case, the null hypothesis is that there is no relationship between row and column frequencies.

- Expected frequencies versus observed frequencies
- General Social Survey Example

ANOVA and the f-distribution

Hypothesis testing between a 3+ category variable and a metric variable

- In its simplest form, it is used to compare means for three or more categories.
- Example:
- Life Happiness scale and Marital Status (married, never married, divorced)

- Example:
- Relies on the F-distribution
- Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

- If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests.
- The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known).

- A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

- MS = mean square
- bg = between groups
- wg = within groups

- Numerator is the “effect” and denominator is the “error”
- df = # of categories – 1 (k-1)

- Total variability – Residual Variability
- Total variability is quantified as the sum of the squares of the differences between each value and the grand mean.
- Also called the total sum-of-squares

- Variability within groups is quantified as the sum of squares of the differences between each value and its group mean
- Also called residual sum-of-squares

- If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

- F-test is always a one-tailed test.
- Why?

- Conceptual Intro to ANOVA

- Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship.
- Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

- General rules for choosing a bivariate test:
- Two categorical variables
- Chi-Square (crosstabulations)

- Two metric variables
- Correlation

- One 3+ categorical variable, one metric variable
- ANOVA

- One binary categorical variable, one metric variable
- T-test

- Two categorical variables

- Online (course website)
- Due next Monday in class (April 10th)