What if we are interested in seeing if my “crazy” dice are considered “fair”? What can I do?
Chi-square test • Used to test the counts of categorical data • Three types • Goodness of fit (univariate) • Independence (bivariate) • Homogeneity (univariate with two samples)
c2 distribution • Different df have different curves • Skewed right • Cannot take on negative values • As df increases, curve shifts toward right & becomes more like a normal curve • Each curve has a mode at df-2 and a mean at df
c2 assumptions • SRS – reasonably random sample • Have countsof categorical data & we expect each category to happen at least once • Sample size – to insure that the sample size is large enough we should expect at least five in each category. ***Be sure to list expected counts!! Combine these together: All expected counts are at least 5.
Hypotheses – written in words H0: proportions are equal Ha: at least one proportion is not the same Be sure to write in context!
c2 Goodness of fit test Based on df – df = number of categories - 1 • Uses univariate data (one sample, one variable) • Want to see how well the observedcounts “fit” what we expect the counts to be • Use c2cdf function on the calculator to find p-values
Does your zodiac sign determine how successful you will be? Fortune magazine collected the zodiac signs of 256 heads of the largest 400 companies. Is there sufficient evidence to claim that successful people are more likely to be born under some signs than others? Aries 23 Libra 18 Leo 20 Taurus 20 Scorpio 21 Virgo 19 Gemini 18 Sagittarius 19 Aquarius 24 Cancer 23 Capricorn 22 Pisces 29 How many would you expect in each sign if there were no difference between them? How many degrees of freedom? I would expect CEOs to be equally born under all signs. So 256/12 = 21.333333 Since there are 12 signs – df = 12 – 1 = 11
Assumptions: • Have a random sample of CEO’s • All expected counts are greater than 5. (I expect 21.33 CEO’s to be born in each sign.) • H0: The proportions of CEO’s born under each sign are the same. • Ha: At least one of the proportion of CEO’s born under each sign is different.
5. Find the sum of the components (that’s the chi-square statistic) Σ = 5.094
P-value = c2cdf(5.094, 10^99, 11) = .9265 a = .05 Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the CEOs are born under some signs more than under others.
Suppose we roll a six-sided die 60 times and obtain the outcomes recording in the table. Do you think this die is fair? • Assumptions: • SRS • All expected counts are > 5 • Hypotheses: • Ho: The proportion of rolls is the same for each number. • Ha: The proportion of rolls is not the same for each number.
P-value = c2cdf(c2, 10^99, df) Be sure to write Ha in context (words)! “Since the p-value < (>) a, I reject (fail to reject) the H0. There is (is not) sufficient evidence to suggest that Ha.”
Offspring of certain fruit flies may have yellow or ebony bodies and normal wings or short wings. Genetic theory predicts that these traits will appear in the ratio 9:3:3:1 (yellow & normal, yellow & short, ebony & normal, ebony & short) A researcher checks 100 such flies and finds the distribution of traits to be 59, 20, 11, and 10, respectively. What are the expected counts? df? Are the results consistent with the theoretical distribution predicted by the genetic model? (see next page) Since there are 4 categories, df = 4 – 1 = 3 Expected counts: Y & N = 56.25 Y & S = 18.75 E & N = 18.75 E & S = 6.25 We expect 9/16 of the 100 flies to have yellow and normal wings. (Y & N)
Assumptions: • Have a random sample of fruit flies • All expected counts are greater than 5. • Expected counts: • Y & N = 56.25, Y & S = 18.75, E & N = 18.75, E & S = 6.25 • H0: The proportions of fruit flies are the same as the theoretical model. • Ha: At least one of the proportions of fruit flies is not the same as the theoretical model. • P-value = c2cdf(5.671, 10^99, 3) = .129 a = .05 • Since p-value > a, I fail to reject H0. There is not sufficient evidence to suggest that the distribution of fruit flies is not the same as the theoretical model.
A company says its premium mixture of nuts contains 10% Brazil nuts, 20% cashews, 20% almonds, 10% hazelnuts and 40% peanuts. You buy a large can and separate the nuts. Upon weighing them, you find there are 112 g Brazil nuts, 183 g of cashews, 207 g of almonds, 71 g or hazelnuts, and 446 g of peanuts. You wonder whether your mix is significantly different from what the company advertises? Why is the chi-square goodness-of-fit test NOT appropriate here? What might you do instead of weighing the nuts in order to use chi-square? Because we do NOT have counts of the type of nuts. We could count the number of each type of nut and then perform a c2 test.
Example: Does the color of a car influence the chance that it will be stolen? Of 830 cars reported stolen, 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black, and 5% are other colors.
Let π1, π2, . . . Π5 denote true proportions of stolen cars that fall into the 5 color categories Ho: π1 = .15, π2 = .15, π3 = .35, π4 = .30, π5 = .05 Ha: Ho is not true. α = .01 Test statistic: Assumptions: The sample was a random sample of stolen cars. All expected counts are greater than 5, so the sample size is large enough to use the chi-square test.
Calculations: = 1.93 + 4.82 + 1.45 + 1.45 + 56.68 = 66.33 P-value: All expected counts exceed 5, so the P-value can be based on a chi-square distribution with 4 df. The computed value is larger than 18.46, so P-value < .001. Because P-value < α, Ho is rejected. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportion for all cars.