Chapter 12
Download
1 / 21

Chapter 12 - PowerPoint PPT Presentation


  • 197 Views
  • Updated On :

Chapter 12. The Analysis of Categorical Data and Goodness-of-Fit Tests. 12.1 Chi-Square Tests for Univariate Categorical Data. Examples of Univariate Categorical Data: Each student in a sample of 100 is classified as full-time or part-time. (two categories)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Chapter 12' - MartaAdara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 12 l.jpg

Chapter 12

The Analysis of Categorical Data and Goodness-of-Fit Tests


12 1 chi square tests for univariate categorical data l.jpg
12.1 Chi-Square Tests for Univariate Categorical Data

Examples of Univariate Categorical Data:

  • Each student in a sample of 100 is classified as full-time or part-time. (two categories)

  • Each airline passenger in a sample of 50 is classified based on type of ticket-coach, business class, or first class. (three categories).

  • Each voter in a sample of 100 is asked which of the five city council members he or she favors for mayor. (five categories).


One way frequency table for univariate categorical data l.jpg
One-way Frequency Table for Univariate Categorical Data

  • Fees keep American taxpayers from using credit cards to make tax payments. 100 randomly selected taxpayers are asked if they will use a credit card to pay tax next year. The following are the outcome of the survey:

The manager of a tax preparation company might be interested in determining whether the four possible responses occur equally often, that is, the long-run proportion of responses in each of the four categories is ¼.


One way frequency table for univariate categorical data4 l.jpg
One-way Frequency Table for Univariate Categorical Data

  • Each item returned to a department store is classified according to how it was resolved: cash refund, credit to charge account, merchandise exchange, or return refused. (four categories). A sample of 100 returns summarizes the observations in a one-way frequency table consisting of k = 4 cells:

The customer relations manager for the department store might be interested in determining whether the four possible dispositions for a return request occur equally often, that is, the long-run proposition of returns in each of the four categories is ¼.


Notation l.jpg
Notation

k = number of categories of a categorical variable,

π1 = true proportion for Category 1

π2 = true proportion for Category 2

· · ·

πk = true proportion for Category k

(Note: π1 + π2 + · · · + πk = 1.)

  • The hypotheses to be tested have the form

    H0: π1 = hypothesized proportion for Category 1.

    π2 = hypothesized proportion for Category 2.

    · · ·

    πk = hypothesized proportion for Category k.

    Ha: H0 is not true. At least one of the true category proportion differs from the corresponding hypothesized value.


Example 12 1 births and the lunar cycle l.jpg
Example 12.1 Births and the Lunar Cycle

  • A common legend is that more babies than expected are born during 24 lunar cycle. The following data is from a sample of randomly selected births during 24 lunar cycles.


Example births and the lunar cycle l.jpg
Example: Births and the Lunar Cycle

  • If there is no relationship between number of births and the lunar cycle, then the number of births in each lunar cycle category should be proportional to the number of days included in that category.

  • There are a total of 699 days in the 24 lunar cycles considered and 24 of those days are in the new moon category. If there is no relationship between number of births and lunar cycle

  • Similarly, we can find the proportion of births during other lunar cycles.


Example births and the lunar cycle9 l.jpg
Example: Births and the Lunar Cycle

  • If there is no relationship between number of births and the lunar cycle, then

    H0: π1 = 0.0343, π2 = 0.2175, π3 = 0.0343, π4 = 0.2132

    π5 = 0.0343, π6 = 0.2146, π7 = 0.0343, π8 = 0.2175

  • Ha: H0 is not true.

  • If H0 is true, the expected count for Category 1 (new moon) is

  • And the expected count for Category 2 is


Example births and the lunar cycle10 l.jpg
Example: Births and the Lunar Cycle

  • Expected counts for other categories are computed similarly.


The goodness of fit statistics 2 l.jpg
The Goodness-of-Fit Statistics Χ2

  • First we compute the quantity

    for each cell, where, for a sample of size n,

  • The Χ2statistic is the sum of these quantities for all k cells:


Chi square distribution l.jpg
Chi-Square Distribution

  • The goodness-of-fit statistic, X2, is a quantitative measure of the extent to which the observed counts differ from those expected when H0 is true.

  • Therefore, large values of X2 suggest rejection of H0.

  • For a test procedure based on the X2 statistics, the associated P-value is the area under the appropriate chi-square curve and to the right of the computed X2 value. (Appendix Table 8)

  • Reject H0 if P-value < significance level α.

  • Find the P-value if X2 is 4.93 and df = 2.

From Appendix Table 8, P-value = 0.085


Goodness of fit test procedure l.jpg
Goodness-of-Fit Test Procedure

Hypotheses:

H0: π1= hypothesized proportion for Category 1

∙ ∙ ∙

πk= hypothesized proportion for Category k

Ha: H0 is not true.

Test statistic:

P-value:

For a test procedure based on the X2 statistics, the associated P-value is the area under the appropriate chi-square curve and to the right of the computed X2 value. (Appendix Table 8).

Reject H0 if P-value < significance level α.


Goodness of fit test procedure14 l.jpg
Goodness-of-Fit Test Procedure

  • When H0 is true and all expected counts are at least 5, Χ2has approximately a chi-square distribution with df = k − 1.

  • The P-value associated with the computed test statistic value is the area to the right of Χ2 under the df = k − 1 chi-square curve.

  • Upper-tail areas for chi-square distribution are found in Appendix Table 8.

    Assumptions:

  • Observed cell counts are based on a random sample.

  • The sample size is large. The sample size is large enough for the chi-square test to be appropriate as long as every expected cell count is at least 5.


Example births and the lunar cycle revisited l.jpg
Example: Births and the Lunar Cycle Revisited

  • Test the hypothesis that number of births is unrelated to lunar cycle using the data in Example 12.1. Choose α = 0.05.

  • df = 8 − 1 = 7. The computed value of Χ2 < 12.01 (the smallest entry in df = 7 column), so P-value > .10.

  • Fail to reject H0, because P-value > α.

  • There is no enough evidence to conclude that number of births and lunar cycle are related.


Example hybrid car purchases l.jpg
Example: Hybrid Car Purchases

  • The table on the right lists sales of hybrid cars in the top five states in 2004. Use Χ2 goodness-of-fit test and a significance level α = .01 to test the hypothesis that hybrid sales for these 5 states are proportional to the 2004 population (see table below) for these states.


Solution to example hybrid car purchase l.jpg
Solution to Example: Hybrid Car Purchase

If the hybrid sales for the 5 states are proportional to their 2004 population,

then, the expected counts for hybrid sales in these states are:

Expected count for California = 406(.495) = 200.970

Expected count for Virginia = 406(.103) = 41.818

Expected count for Washington = 406(.085) = 34.510

Expected count for Florida = 406(.240) = 97.440

Expected count for Maryland = 406(.077) = 31.362


Solution to example hybrid car purchase18 l.jpg
Solution to Example: Hybrid Car Purchase

  • Let π1, π2, π3, π4, π5 denote the true proportion of hybrid car sales for California, Virginia, Washington, Florida and Maryland, respectively.

  • Assumption: The sample was a random sample. All expected counts are > 5, so it is appropriate to use chi-square test.

  • H0: π1 = 0.495, π2 = 0.103, π3 = 0.085, π4 = 0.240, π5 = 0.077

  • Ha: H0 is not true.

  • Test statistic:

  • P-value: All expected counts exceed 5, so the P-value can be based on chi-square distribution with df = 5 − 1 =4. From Appendix Table 8, P-value < 0.001 ≈ 0 ( the test statistic 59.49 > 13.81 and any value > 13.81 has the right tail area < 0.001).

  • Conclusion: Reject H0 since P-value < α. There is evidence that hybrid car sales are not proportional to population size for at least one of the five states.

On next slides we use Excel to solve this problem.


Slide19 l.jpg

Click ƒx, and an “Insert Function” dialog box appears. Select “Statistical” in “select a category” box. In the “Select a function” list choose “CHITEST”. Then click “OK”.


Slide20 l.jpg

As soon as you input the Actual_range (observed frequency) and Expected_range, you can see the P-value in “Formula result = 3.70981E-12” ( 3.70981 × 10−12 ≈ 0 ).


Exercise color of stolen cars l.jpg
Exercise: Color of Stolen Cars and Expected_range, you can see the

Does the color of a car influence the chance that it will be stolen? The AP reported that for a random sample of 830 stolen vehicles: 140 were white, 100 were blue, 270 were red, 230 were black, and 90 were other colors. Use the X2 goodness-of-fit test and a significance level of α=.01 to test the hypothesis that proportions stolen are identical to population color proportions. It is known that 15% of all cars are white, 15% are blue, 35% are red, 30% are black and 5% are other colors.

Answer: P-value < .001. There is convincing evidence that at least one of the color proportions for stolen cars differs from the corresponding proportions for all cars.


ad