Chi square test
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Chi-square test PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Chi-square test. FPP 28. More types of inference for nominal variables. Nominal data is categorical with more than two categories Compare observed frequencies of nominal variable to hypothesized probabilities One categorical variable with more than two categories

Download Presentation

Chi-square test

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chi square test

Chi-square test

FPP 28


More types of inference for nominal variables

More types of inference for nominal variables

  • Nominal data is categorical with more than two categories

  • Compare observed frequencies of nominal variable to hypothesized probabilities

    • One categorical variable with more than two categories

    • Chi-squared goodness of fit test

  • Test if two nominal variables are independent

    • Two categorical variables with at least one having more than two categories

    • Chi-squared test of independence


Goodness of fit test

Goodness of fit test

  • Do people admit themselves to hospitals more frequently close to their birthday?

  • Data from a random sample of 200 people admitted to hospitals


Goodness of fit test1

Goodness of fit test

  • Assume there is no birthday effect, that is, people admit randomly. Then,

    Pr (within 7) = = .0411 Pr (8 - 30) = = .1260 Pr (31-90) = = .3288 Pr (91+) = = .5041

  • So, in a sample of 200 people, we’d expect to be in “within 7” to be in “8 - 30” to be in “31 - 90” to be in “91+”


Goodness of fit test2

Goodness of fit test

  • If admissions are random, we expect the sample frequencies and hypothesized probabilities to be similar

  • But, as always, the sample frequencies are affected by chance error

  • So, we need to see whether the sample frequencies could have been a plausible result from a chance error if the hypothesized probabilities are true.

  • Let’s build a hypothesis test


Goodness of fit test3

Goodness of fit test

  • Hypothesis

    • Claim (alternative hyp.) is admission probabilities change according to days since birthday

    • Opposite of claim (null hyp.) is probabilities in accordance with random admissions.

    • H0 : Pr (within 7) = .0411 Pr (8 - 30) = .1260 Pr (31-90) = .3288 Pr (91+) = .5041

    • HA : probabilities different than those in H0 .


Goodness of fit test test statistic

Goodness of fit test: Test statistic

  • Chi-squared test statistic


Goodness of fit test test statistic1

Goodness of fit test: Test statistic


Goodness of fit test calculate p value

Goodness of fit test: Calculate p-value

  • X2 has a chi-squared distribution with degrees of freedom equal to number of categories minus 1.

  • In this case, df = 4 – 1 = 3.


Goodness of fit test calculate p value1

Goodness of fit test: Calculate p-value

  • To get a p-value, calculate the area under the chi-squared curve to the right of 1.397

  • Using JMP, this area is 0.703. If the null hypothesis is true, there is a 70% chance of observing a value of X2 as or more extreme than 1.397

    • Using the table the p-value is between 0.9 and 0.70


Chi squared table

Chi-squared table


Jmp output admissions

JMP output admissions


Goodness of fit test judging p value

Goodness of fit test: Judging p-value

  • The .70 is a large p-value, indicating that the difference between the observed and expected counts could well occur by random chance when the null hypothesis is true. Therefore, we cannot reject the null hypothesis. There is not enough evidence to conclude that admissions rates change according to days from birthday.


Independence test

Independence test

  • Is birth order related to delinquency?

  • Nye (1958) randomly sampled 1154 high school girls and asked if they had been “delinquent”.


Sample of conditional frequencies

Sample of conditional frequencies

  • % Delinquent for each birth order status

    • Based on conditional frequencies, it appears that youngest are more delinquent

    • Could these sample frequencies have plausibly occurred by chance if there is no relationship between birth order and delinqeuncy


Test of independence

Test of independence

  • Hypotheses

    • Want to show that there is some relationship between birth order and delinquency.

    • Opposite is that there is no relationship.

    • H0 : birth order and delinquency are independent.

    • HA : birth order and delinquency are dependent.


Implications of independence

Implications of independence

  • Expected counts

    • Under independence,

      • Pr(oldest and delinquent) = Pr(oldest)*Pr(delinquent)

    • Estimate Pr(oldest) as marginal frequency of oldest

    • Estimate Pr(delinquent) as marginal frequency of delinquent

    • Hence, estimate Pr(oldest and delinquent) as

    • The expected number of oldest and delinquent, under independence, equals

    • This is repeated for all the other cells in table


Test of independence1

Test of independence

  • Expected counts

  • Next we compare the observed counts with the expected to get a test statistic


Chi square test

  • Use the X2statistic as the test statistic:


Test of independence2

Test of independence:

  • Calculate the p-value

  • X 2 has a chi-squared distribution with degrees of freedom:df = (number rows – 1) * (number columns – 1)

  • In delinquency problem, df = (4 - 1) * (2 - 1) = 3.

  • The area under the chi-squared curve to the right of 42.245 is less than .0001. There is only a very small chance of getting an X2as or more extreme than 42.245.


Jmp output for chi squared test

JMP output for chi-squared test

  • This is a small p-value. It is unlikely we’d observe data like this if the null hypothesis is true. There does appear to be an association between delinquency and birth order.


Chi squared test details

Chi-squared test details

  • Requires simple random samples.

  • Works best when expected frequencies in each cell are at least 5.

  • Should not have zero counts

  • How one specifies categories can affect results.


Chi squared test items

Chi-squared test items

  • What do I do when expected counts are less than 5?

  • Try to get more data. Barring that, you can collapse categories.Example: Is baldness related to heart disease? (see JMP for data set)

    Baldness Disease Number of people

    None Yes 251

    None No 331

    Little Yes 165

    Little No 221

    Some Yes 195

    Some No 185 Combine “extreme” and “much” categories

    Much Yes 50 Much or extreme Yes 52

    Much No 34 Much or extreme No 35

    Extreme Yes 2

    Extreme No 1

    This changes the question slightly, since we have a new category.


Chi squared test

Chi-squared test

  • for collapsed data for baldness example

    • Based on p-value, baldness and heart disease are not independent.

    • We see that increasing baldness is associated with increased incidence of heart disease.


  • Login