Prepared by Lloyd R. Jaisingh

A PowerPoint Presentation Package to Accompany Applied Statistics in Business & Economics, 4th edition David P. Doane and Lori E. Seward Prepared by Lloyd R. Jaisingh

Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chi-Square Tests Chapter 15

Chapter Learning Objectives LO15-1: Recognize a contingency table. LO15-2:Find degrees of freedom and use the chi-square table of critical values. LO15-3:Perform a chi-square test for independence on a contingency table. LO15-4:Perform a goodness-of-fit (GOF) test for a uniform distribution. LO15-5:Explain the GOF test for a Poisson distribution. LO15-6: Use computer software to perform a chi-square GOF test for normality. LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests. Chi-Square Tests Chapter 15

15.1 Chi-Square Test for Independence LO15-1 Chapter 15 • A contingency table is a cross-tabulation of n paired observations into categories. • Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading. • For example: LO15-1: Recognize a contingency table. Contingency Tables

15.1 Chi-Square Test for Independence LO15-3, 2 Chapter 15 LO15-3: Perform a chi-square test for independence on a contingency table. • In a test of independence for an r x c contingency table, the hypotheses areH0: Variable A is independent of variable BH1: Variable A is not independent of variable B • Use the chi-square test for independence to test these hypotheses. • This non-parametric test is based on frequencies. • The n data pairs are classified into c columns and r rows and then the observed frequencyfjk is compared with the expected frequencyejk. • The critical value comes from the chi-square probability distribution with n degrees of freedom. (See Appendix E for table values). d.f. = degrees of freedom = (r – 1)(c – 1)where r = number of rows in the tablec = number of columns in the table LO15-2: Find degrees of freedom and use the chi-square table of critical values. Chi-Square Test

15.1 Chi-Square Test for Independence LO15-3 Chapter 15 Expected Frequencies • Assuming that H0 is true, the expected frequency of row j and column k is: ejk = RjCk/n where Rj = total for row j (j = 1, 2, …, r)Ck = total for column k (k = 1, 2, …, c)n = sample size Steps in Testing the Hypotheses • Step 1: State the Hypotheses. • H0: Variable A is independent of variable B • H1: Variable A is not independent of variable B • Step 2: Specify the Decision Rule. • Calculate d.f. = (r – 1)(c – 1) • For a given a, look up the right-tail critical value (c2R) from Appendix E or by using Excel.

15.1 Chi-Square Test for Independence LO15-3 Chapter 15 Steps in Testing the Hypotheses • Step 4: Calculate the Test Statistic. • The chi-square test statistic is • Step 5: Make the Decision. • Reject H0 if test statistic > c2R or if the p-value ≤ a. Small Expected Frequencies • The chi-square test is unreliable if the expected frequencies are too small. • Rules of thumb: • Cochran’s Rule requires that ejk > 5 for all cells. • Up to 20% of the cells may have ejk < 5. • Most agree that a chi-square test is infeasible if ejk < 1 in any cell. • If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.

15.1 Chi-Square Test for Independence LO15-3 Chapter 15 Test of Two Proportions • Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. • For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions, if the samples are large enough to ensure normality. • The hypotheses are: Cross-Tabulating Raw Data Why Do a Chi-Square Test on Numerical Data? • The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. • There are outliers or anomalies that prevent us from assuming that the data came from a normal population. • The researcher has numerical data for one variable but not the other. Figure 14.6

15.2 Chi-Square Tests for Goodness-of-Fit Chapter 15 • The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. • The chi-square test will be used because it is versatile and easy to understand. Purpose of the Test Multinomial GOF Test • A multinomial distribution is defined by any k probabilities p1, p2, …, pk that sum to unity. For example, H0: p1 = .13, p2 = .13, p3 = .24, p4 = .20, p5 = .16, p6 = .14H1: At least one of the pj differs from the hypothesized value. • If no parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom will be d.f. = c – m – 1 = 6 – 0 – 1 = 5.

15.2 Chi-Square Tests for Goodness-of-Fit Chapter 15 Hypotheses for GOF • The hypotheses are: H0: The population follows a _____ distributionH1: The population does not follow a ______ distribution • The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal). Test Statistic and Degrees of Freedom for GOF Where fj = the observed frequency of observations in class j and ej = the expected frequency in class j if H0 were true. • The test statistic follows the chi-square distribution with degrees of freedomd.f. = c – m – 1 where c is the number of classes used in the test m is the number of parameters estimated.

15.3 Uniform Goodness-of-Fit Test LO15-4 Chapter 15 LO15-4: Perform a goodness of-fit (GOF) test for a uniform distribution. • The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence. • The chi-square test for a uniform distribution compares all c groups simultaneously. • The hypotheses are: H0: p1 = p2 = …, pc = 1/cH1: Not all pj are equal Uniform Distribution • The test can be performed on data that are already tabulated into groups. • Calculate the expected frequency ejfor each cell. • The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution. • Obtain the critical value c2a from Appendix E for the desired level of significance a. • The p-value can be obtained from Excel. • Reject H0 if p-value ≤ a.

15.3 Uniform Goodness-of-Fit Test LO15-4 Chapter 15 Uniform GOF Test: Raw Data • First form c bins of equal width and create a frequency distribution. • Calculate the observed frequency fj for each bin. • Define ej= n/c. • Perform the chi-square calculations. • The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution. • Obtain the critical value from Appendix E for a given significance level a and make the decision. • Maximize the test’s power by defining bin width as (As a result, the expected frequencies will be as large as possible.)

15.3 Uniform Goodness-of-Fit Test LO15-4 Chapter 15 Uniform GOF Test: Raw Data • Calculate the mean and standard deviation of the uniform distribution as: • If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed. • So, test the hypothesized uniform mean using

15.4 Poisson Goodness-of-Fit Test LO15-5 Chapter 15 • In a Poisson distribution model, X represents the number of events per unit of time or space. • X is a discrete nonnegative integer (X = 0, 1, 2, …). • Event arrivals must be independent of each other. • Sometimes called a model of rare events because X typically has a small mean. LO15-5: Explain the GOF test for a Poisson distribution. Poisson Data-Generating Situations Poisson Goodness-of-Fit Test • The mean l is the only parameter. • If l is unknown, it must be estimated from the sample. • Use the estimated l to find the Poisson probability P(X) for each value of X. • Compute the expected frequencies. • Perform the chi-square calculations. • Make the decision. • You may need to combine classes until expected frequencies become large enough for the test (at least until ej> 2).

15.4 Poisson Goodness-of-Fit Test LO15-5 Chapter 15 Poisson GOF Test: Tabulated Data • Calculate the sample mean as: • Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = (lxe-l)/x! or Excel. • For c classes with m = 1 parameter estimated, the degrees of freedom are d.f. = c – m – 1 • Obtain the critical value for a given a from Appendix E. • Make the decision.

15.5 Normal Chi-Square Goodness-of-Fit Test LO15-6 Chapter 15 LO15-6: Use computer software to perform a chi-square GOF test for normality. • Two parameters, the mean m and the standard deviation s, fully describe the normal distribution. • Unless m and s are know apriori, they must be estimated from a sample. • Using these statistics, the chi-square goodness-of-fit test can be used. Normal Data Generating Situations Method 1: Standardizing the Data • Transform the sample observations x1, x2, …, xninto standardized values.

15.5 Normal Chi-Square Goodness-of-Fit Test LO15-6 Chapter 15 Method 2: Equal Bin Widths • To obtain equal-width bins, divide the exact data range into c groups of equal width. • Step 1: Count the sample observations in each bin to get observed frequencies fj. • Step 2: Convert the bin limits into standardized z-values by using the formula. • Step 3: Find the normal area within each bin assuming a normal distribution. • Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n. • Classes may need to be collapsed from the ends inward to enlarge expected frequencies.

15.5 Normal Chi-Square Goodness-of-Fit Test LO15-6 Chapter 15 Method 3: Equal Expected Frequencies • Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis. • Define bin limits so that ej = n/c • A normal area of 1/c in each of the c bins is desired. • The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c – 1 cut-points. • The upper limit of bin j can be found directly by using Excel. • Alternatively, find zj for bin j using Excel and then calculate the upper limit for bin j as • Once the bins are defined, count the observations fj within each bin and compare them with the expected frequencies ej = n/c.

15.6 ECDF Tests LO15-7 Chapter 15 LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests. • There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF). • The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values • The K-S test is not recommended for grouped data. • The K-S test assumes that no parameters are estimated. • If parameters are estimated, use a Lilliefors test. • Both of these tests are done by computer. • The Anderson-Darling (A-D) test is widely used for non-normality because of its power. • The A-D test is based on a probability plot. • When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line.

Prepared by Lloyd R. Jaisingh