- 313 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Chi-Square and Analysis of Variance (ANOVA)' - fitzgerald-walsh

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### The Chi-Square Distribution and Test for Independence

### ANOVA and the f-distribution

Hypothesis testing between two or more categorical variables

Chi-square Test of Independence

- Tests the association between two nominal (categorical) variables.
- Null Hyp: The 2 variables are independent.
- Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

Degrees of freedom

- Chi-square degrees of freedom
- df = (r-1) (c-1)
- Where r = # of rows, c = # of columns
- Thus, in any 2x2 contingency table, the degrees of freedom = 1.
- As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

Chi-Square Distribution

- The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

Requirements for Chi-Square test

- Must be a random sample from population
- Data must be in raw frequencies
- Variables must be independent
- Categories for each I.V. must be mutually exclusive and exhaustive

Using the Chi-Square Test

- Often used with contingency tables (i.e., crosstabulations)
- E.g., gender x race
- Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table.
- In this case, the null hypothesis is that there is no relationship between row and column frequencies.

Practical Example:

- Expected frequencies versus observed frequencies
- General Social Survey Example

Hypothesis testing between a 3+ category variable and a metric variable

Analysis of Variance

- In its simplest form, it is used to compare means for three or more categories.
- Example:
- Life Happiness scale and Marital Status (married, never married, divorced)
- Relies on the F-distribution
- Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

What is ANOVA?

- If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests.
- The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known).
- A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

The F-ratio

- MS = mean square
- bg = between groups
- wg = within groups

- Numerator is the “effect” and denominator is the “error”
- df = # of categories – 1 (k-1)

Between-Group Sum of Squares (Numerator)

- Total variability – Residual Variability
- Total variability is quantified as the sum of the squares of the differences between each value and the grand mean.
- Also called the total sum-of-squares
- Variability within groups is quantified as the sum of squares of the differences between each value and its group mean
- Also called residual sum-of-squares

Null Hypothesis in ANOVA

- If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

F-distribution

- F-test is always a one-tailed test.
- Why?

Logic of the ANOVA

- Conceptual Intro to ANOVA

Reminder About Causality

- Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship.
- Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

Choosing the Appropriate Statistical Test

- General rules for choosing a bivariate test:
- Two categorical variables
- Chi-Square (crosstabulations)
- Two metric variables
- Correlation
- One 3+ categorical variable, one metric variable
- ANOVA
- One binary categorical variable, one metric variable
- T-test

Assignment #2

- Online (course website)
- Due next Monday in class (April 10th)

Download Presentation

Connecting to Server..