Chapter 12. The Analysis of Categorical Data and Goodness-of-Fit Tests. 12.1 Chi-Square Tests for Univariate Categorical Data. Examples of Univariate Categorical Data: Each student in a sample of 100 is classified as full-time or part-time. (two categories)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
The Analysis of Categorical Data and Goodness-of-Fit Tests
Examples of Univariate Categorical Data:
The manager of a tax preparation company might be interested in determining whether the four possible responses occur equally often, that is, the long-run proportion of responses in each of the four categories is ¼.
The customer relations manager for the department store might be interested in determining whether the four possible dispositions for a return request occur equally often, that is, the long-run proposition of returns in each of the four categories is ¼.
k = number of categories of a categorical variable,
π1 = true proportion for Category 1
π2 = true proportion for Category 2
· · ·
πk = true proportion for Category k
(Note: π1 + π2 + · · · + πk = 1.)
H0: π1 = hypothesized proportion for Category 1.
π2 = hypothesized proportion for Category 2.
· · ·
πk = hypothesized proportion for Category k.
Ha: H0 is not true. At least one of the true category proportion differs from the corresponding hypothesized value.
H0: π1 = 0.0343, π2 = 0.2175, π3 = 0.0343, π4 = 0.2132
π5 = 0.0343, π6 = 0.2146, π7 = 0.0343, π8 = 0.2175
for each cell, where, for a sample of size n,
From Appendix Table 8, P-value = 0.085
H0: π1= hypothesized proportion for Category 1
∙ ∙ ∙
πk= hypothesized proportion for Category k
Ha: H0 is not true.
For a test procedure based on the X2 statistics, the associated P-value is the area under the appropriate chi-square curve and to the right of the computed X2 value. (Appendix Table 8).
Reject H0 if P-value < significance level α.
If the hybrid sales for the 5 states are proportional to their 2004 population,
then, the expected counts for hybrid sales in these states are:
Expected count for California = 406(.495) = 200.970
Expected count for Virginia = 406(.103) = 41.818
Expected count for Washington = 406(.085) = 34.510
Expected count for Florida = 406(.240) = 97.440
Expected count for Maryland = 406(.077) = 31.362
On next slides we use Excel to solve this problem.
Click ƒx, and an “Insert Function” dialog box appears. Select “Statistical” in “select a category” box. In the “Select a function” list choose “CHITEST”. Then click “OK”.
As soon as you input the Actual_range (observed frequency) and Expected_range, you can see the P-value in “Formula result = 3.70981E-12” ( 3.70981 × 10−12 ≈ 0 ).
Does the color of a car influence the chance that it will be stolen? The AP reported that for a random sample of 830 stolen vehicles: 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. Use the X2 goodness-of-fit test and a significance level of α=.01 to test the hypothesis that proportions stolen are identical to population color proportions. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black and 5% are other colors.
Answer: P-value < .001. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportions for all cars.