Chi square and analysis of variance anova
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Chi-Square and Analysis of Variance (ANOVA) PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Chi-Square and Analysis of Variance (ANOVA). Lecture 9. The Chi-Square Distribution and Test for Independence. Hypothesis testing between two or more categorical variables. Chi-square Test of Independence. Tests the association between two nominal (categorical) variables.

Download Presentation

Chi-Square and Analysis of Variance (ANOVA)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chi-Square and Analysis of Variance (ANOVA)

Lecture 9


The Chi-Square Distribution and Test for Independence

Hypothesis testing between two or more categorical variables


Chi-square Test of Independence

  • Tests the association between two nominal (categorical) variables.

    • Null Hyp: The 2 variables are independent.

  • Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.


Example Crosstab: gender x binary question


Degrees of freedom

  • Chi-square degrees of freedom

    • df = (r-1) (c-1)

      • Where r = # of rows, c = # of columns

      • Thus, in any 2x2 contingency table, the degrees of freedom = 1.

      • As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.


Chi-Square Distribution

  • The chi-square distribution results when independent variables with standard normal distributions are squared and summed.


Requirements for Chi-Square test

  • Must be a random sample from population

  • Data must be in raw frequencies

  • Variables must be independent

  • Categories for each I.V. must be mutually exclusive and exhaustive


Using the Chi-Square Test

  • Often used with contingency tables (i.e., crosstabulations)

    • E.g., gender x race

  • Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table.

    • In this case, the null hypothesis is that there is no relationship between row and column frequencies.


Practical Example:

  • Expected frequencies versus observed frequencies

  • General Social Survey Example


ANOVA and the f-distribution

Hypothesis testing between a 3+ category variable and a metric variable


Analysis of Variance

  • In its simplest form, it is used to compare means for three or more categories.

    • Example:

      • Life Happiness scale and Marital Status (married, never married, divorced)

  • Relies on the F-distribution

    • Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.


What is ANOVA?

  • If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests.

    • The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known).

  • A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)


The F-ratio

  • MS = mean square

  • bg = between groups

  • wg = within groups

  • Numerator is the “effect” and denominator is the “error”

  • df = # of categories – 1 (k-1)


Between-Group Sum of Squares (Numerator)

  • Total variability – Residual Variability

  • Total variability is quantified as the sum of the squares of the differences between each value and the grand mean.

    • Also called the total sum-of-squares

  • Variability within groups is quantified as the sum of squares of the differences between each value and its group mean

    • Also called residual sum-of-squares


Null Hypothesis in ANOVA

  • If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.


F-distribution

  • F-test is always a one-tailed test.

    • Why?


Logic of the ANOVA

  • Conceptual Intro to ANOVA


Bringing it all together: Choosing the appropriate bivariate statistic


Reminder About Causality

  • Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship.

  • Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.


Choosing the Appropriate Statistical Test

  • General rules for choosing a bivariate test:

    • Two categorical variables

      • Chi-Square (crosstabulations)

    • Two metric variables

      • Correlation

    • One 3+ categorical variable, one metric variable

      • ANOVA

    • One binary categorical variable, one metric variable

      • T-test


Assignment #2

  • Online (course website)

  • Due next Monday in class (April 10th)


  • Login