Chi square and analysis of variance anova
1 / 21

Chi-Square and Analysis of Variance (ANOVA) - PowerPoint PPT Presentation

  • Uploaded on

Chi-Square and Analysis of Variance (ANOVA). Lecture 9. The Chi-Square Distribution and Test for Independence. Hypothesis testing between two or more categorical variables. Chi-square Test of Independence. Tests the association between two nominal (categorical) variables.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Chi-Square and Analysis of Variance (ANOVA)' - fitzgerald-walsh

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The chi square distribution and test for independence

The Chi-Square Distribution and Test for Independence

Hypothesis testing between two or more categorical variables

Chi square test of independence
Chi-square Test of Independence

  • Tests the association between two nominal (categorical) variables.

    • Null Hyp: The 2 variables are independent.

  • Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

Degrees of freedom
Degrees of freedom

  • Chi-square degrees of freedom

    • df = (r-1) (c-1)

      • Where r = # of rows, c = # of columns

      • Thus, in any 2x2 contingency table, the degrees of freedom = 1.

      • As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

Chi square distribution
Chi-Square Distribution

  • The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

Requirements for chi square test
Requirements for Chi-Square test

  • Must be a random sample from population

  • Data must be in raw frequencies

  • Variables must be independent

  • Categories for each I.V. must be mutually exclusive and exhaustive

Using the chi square test
Using the Chi-Square Test

  • Often used with contingency tables (i.e., crosstabulations)

    • E.g., gender x race

  • Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table.

    • In this case, the null hypothesis is that there is no relationship between row and column frequencies.

Practical example
Practical Example:

  • Expected frequencies versus observed frequencies

  • General Social Survey Example

Anova and the f distribution

ANOVA and the f-distribution

Hypothesis testing between a 3+ category variable and a metric variable

Analysis of variance
Analysis of Variance

  • In its simplest form, it is used to compare means for three or more categories.

    • Example:

      • Life Happiness scale and Marital Status (married, never married, divorced)

  • Relies on the F-distribution

    • Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

What is anova
What is ANOVA?

  • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests.

    • The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known).

  • A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

The f ratio
The F-ratio

  • MS = mean square

  • bg = between groups

  • wg = within groups

  • Numerator is the “effect” and denominator is the “error”

  • df = # of categories – 1 (k-1)

Between group sum of squares numerator
Between-Group Sum of Squares (Numerator)

  • Total variability – Residual Variability

  • Total variability is quantified as the sum of the squares of the differences between each value and the grand mean.

    • Also called the total sum-of-squares

  • Variability within groups is quantified as the sum of squares of the differences between each value and its group mean

    • Also called residual sum-of-squares

Null hypothesis in anova
Null Hypothesis in ANOVA

  • If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

F distribution

  • F-test is always a one-tailed test.

    • Why?

Logic of the anova
Logic of the ANOVA

  • Conceptual Intro to ANOVA

Bringing it all together choosing the appropriate bivariate statistic
Bringing it all together: Choosing the appropriate bivariate statistic

Reminder about causality
Reminder About Causality

  • Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship.

  • Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

Choosing the appropriate statistical test
Choosing the Appropriate Statistical Test

  • General rules for choosing a bivariate test:

    • Two categorical variables

      • Chi-Square (crosstabulations)

    • Two metric variables

      • Correlation

    • One 3+ categorical variable, one metric variable

      • ANOVA

    • One binary categorical variable, one metric variable

      • T-test

Assignment 2
Assignment #2

  • Online (course website)

  • Due next Monday in class (April 10th)