1 / 19

# Prepared by Lloyd R. Jaisingh - PowerPoint PPT Presentation

A PowerPoint Presentation Package to Accompany. Applied Statistics in Business & Economics, 4 th edition David P. Doane and Lori E. Seward. Prepared by Lloyd R. Jaisingh. Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Prepared by Lloyd R. Jaisingh ' - jane-hoffman

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Applied Statistics in Business & Economics, 4th edition David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh

15.1 Chi-Square Test for Independence

15.2 Chi-Square Tests for Goodness-of-Fit

15.3 Uniform Goodness-of-Fit Test

15.4 Poisson Goodness-of-Fit Test

15.5 Normal Chi-Square Goodness-of-Fit Test

15.6 ECDF Tests (Optional)

Chi-Square Tests

Chapter 15

LO15-1: Recognize a contingency table.

LO15-2:Find degrees of freedom and use the chi-square table of critical values.

LO15-3:Perform a chi-square test for independence on a contingency table.

LO15-4:Perform a goodness-of-fit (GOF) test for a uniform distribution.

LO15-5:Explain the GOF test for a Poisson distribution.

LO15-6: Use computer software to perform a chi-square GOF test for normality.

LO15-7: State advantages of ECDF tests as compared to chi-square GOF tests.

Chi-Square Tests

Chapter 15

15.1 Chi-Square Test for Independence

LO15-1

Chapter 15

• A contingency table is a cross-tabulation of n paired observations into categories.

• Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading.

• For example:

LO15-1: Recognize a contingency table.

Contingency Tables

15.1 Chi-Square Test for Independence

LO15-3, 2

Chapter 15

LO15-3: Perform a chi-square test for independence on a

contingency table.

• In a test of independence for an r x c contingency table, the hypotheses areH0: Variable A is independent of variable BH1: Variable A is not independent of variable B

• Use the chi-square test for independence to test these hypotheses.

• This non-parametric test is based on frequencies.

• The n data pairs are classified into c columns and r rows and then the observed frequencyfjk is compared with the expected frequencyejk.

• The critical value comes from the chi-square probability distribution with n degrees of freedom. (See Appendix E for table values).

d.f. = degrees of freedom = (r – 1)(c – 1)where r = number of rows in the tablec = number of columns in the table

LO15-2: Find degrees of freedom and use the chi-square

table of critical values.

Chi-Square Test

15.1 Chi-Square Test for Independence

LO15-3

Chapter 15

Expected Frequencies

• Assuming that H0 is true, the expected frequency of row j and column k is:

ejk = RjCk/n

where Rj = total for row j (j = 1, 2, …, r)Ck = total for column k (k = 1, 2, …, c)n = sample size

Steps in Testing the Hypotheses

• Step 1: State the Hypotheses.

• H0: Variable A is independent of variable B

• H1: Variable A is not independent of variable B

• Step 2: Specify the Decision Rule.

• Calculate d.f. = (r – 1)(c – 1)

• For a given a, look up the right-tail critical value (c2R) from Appendix E or by using Excel.

15.1 Chi-Square Test for Independence

LO15-3

Chapter 15

Steps in Testing the Hypotheses

• Step 4: Calculate the Test Statistic.

• The chi-square test statistic is

• Step 5: Make the Decision.

• Reject H0 if test statistic > c2R or if the p-value ≤ a.

Small Expected Frequencies

• The chi-square test is unreliable if the expected frequencies are too small.

• Rules of thumb:

• Cochran’s Rule requires that ejk > 5 for all cells.

• Up to 20% of the cells may have ejk < 5.

• Most agree that a chi-square test is infeasible if ejk < 1 in any cell.

• If this happens, try combining adjacent rows or columns to enlarge the expected frequencies.

15.1 Chi-Square Test for Independence

LO15-3

Chapter 15

Test of Two Proportions

• Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories.

• For a 2 × 2 contingency table, the chi-square test is equivalent to a two-tailed z test for two proportions, if the samples are large enough to ensure normality.

• The hypotheses are:

Cross-Tabulating Raw Data

Why Do a Chi-Square Test on Numerical Data?

• The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression.

• There are outliers or anomalies that prevent us from assuming that the data came from a normal population.

• The researcher has numerical data for one variable but not the other.

Figure 14.6

15.2 Chi-Square Tests for Goodness-of-Fit

Chapter 15

• The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population.

• The chi-square test will be used because it is versatile and easy to understand.

Purpose of the Test

Multinomial GOF Test

• A multinomial distribution is defined by any k probabilities p1, p2, …, pk that sum to unity. For example,

H0: p1 = .13, p2 = .13, p3 = .24, p4 = .20, p5 = .16, p6 = .14H1: At least one of the pj differs from the hypothesized value.

• If no parameters are estimated (m = 0) and there are c = 6 classes, so the degrees of freedom will be d.f. = c – m – 1 = 6 – 0 – 1 = 5.

15.2 Chi-Square Tests for Goodness-of-Fit

Chapter 15

Hypotheses for GOF

• The hypotheses are:

H0: The population follows a _____ distributionH1: The population does not follow a ______ distribution

• The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

Test Statistic and Degrees of Freedom for GOF

Where fj = the observed frequency of

observations in class j and ej = the expected

frequency in class j if H0 were true.

• The test statistic follows the chi-square distribution with degrees of freedomd.f. = c – m – 1 where c is the number of classes used in the test m is the number of parameters estimated.

15.3 Uniform Goodness-of-Fit Test

LO15-4

Chapter 15

LO15-4: Perform a goodness of-fit (GOF) test for a uniform

distribution.

• The uniform goodness-of-fit test is a special case of the multinomial in which every value has the same chance of occurrence.

• The chi-square test for a uniform distribution compares all c groups simultaneously.

• The hypotheses are:

H0: p1 = p2 = …, pc = 1/cH1: Not all pj are equal

Uniform Distribution

• The test can be performed on data that are already tabulated into groups.

• Calculate the expected frequency ejfor each cell.

• The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution.

• Obtain the critical value c2a from Appendix E for the desired level of significance a.

• The p-value can be obtained from Excel.

• Reject H0 if p-value ≤ a.

15.3 Uniform Goodness-of-Fit Test

LO15-4

Chapter 15

Uniform GOF Test: Raw Data

• First form c bins of equal width and create a frequency distribution.

• Calculate the observed frequency fj for each bin.

• Define ej= n/c.

• Perform the chi-square calculations.

• The degrees of freedom are d.f. = c – 1 since there are no parameters for the uniform distribution.

• Obtain the critical value from Appendix E for a given significance level a and make the decision.

• Maximize the test’s power by defining bin width as (As a result, the expected frequencies will be as large as possible.)

15.3 Uniform Goodness-of-Fit Test

LO15-4

Chapter 15

Uniform GOF Test: Raw Data

• Calculate the mean and standard deviation of the uniform distribution as:

• If the data are not skewed and the sample size is large (n > 30), then the mean is approximately normally distributed.

• So, test the hypothesized uniform mean using

15.4 Poisson Goodness-of-Fit Test

LO15-5

Chapter 15

• In a Poisson distribution model, X represents the number of events per unit of time or space.

• X is a discrete nonnegative integer (X = 0, 1, 2, …).

• Event arrivals must be independent of each other.

• Sometimes called a model of rare events because X typically has a small mean.

LO15-5: Explain the GOF test for a Poisson distribution.

Poisson Data-Generating Situations

Poisson Goodness-of-Fit Test

• The mean l is the only parameter.

• If l is unknown, it must be estimated from the sample.

• Use the estimated l to find the Poisson probability P(X) for each value of X.

• Compute the expected frequencies.

• Perform the chi-square calculations.

• Make the decision.

• You may need to combine classes until expected frequencies become large enough for the test (at least until ej> 2).

15.4 Poisson Goodness-of-Fit Test

LO15-5

Chapter 15

Poisson GOF Test: Tabulated Data

• Calculate the sample mean as:

• Using this estimate mean, calculate the Poisson probabilities either by using the Poisson formula P(x) = (lxe-l)/x! or Excel.

• For c classes with m = 1 parameter estimated, the degrees of freedom are d.f. = c – m – 1

• Obtain the critical value for a given a from Appendix E.

• Make the decision.

15.5 Normal Chi-Square Goodness-of-Fit Test

LO15-6

Chapter 15

LO15-6: Use computer software to perform a chi-square GOF test for normality.

• Two parameters, the mean m and the standard deviation s, fully describe the normal distribution.

• Unless m and s are know apriori, they must be estimated from a sample.

• Using these statistics, the chi-square goodness-of-fit test can be used.

Normal Data Generating Situations

Method 1: Standardizing the Data

• Transform the sample observations x1, x2, …, xninto standardized values.

15.5 Normal Chi-Square Goodness-of-Fit Test

LO15-6

Chapter 15

Method 2: Equal Bin Widths

• To obtain equal-width bins, divide the exact data range into c groups of equal width.

• Step 1: Count the sample observations in each bin to get observed frequencies fj.

• Step 2: Convert the bin limits into standardized z-values by using the formula.

• Step 3: Find the normal area within each bin assuming a normal distribution.

• Step 4: Find expected frequencies ej by multiplying each normal area by the sample size n.

• Classes may need to be collapsed from the ends inward to enlarge expected frequencies.

15.5 Normal Chi-Square Goodness-of-Fit Test

LO15-6

Chapter 15

Method 3: Equal Expected Frequencies

• Define histogram bins in such a way that an equal number of observations would be expected within each bin under the null hypothesis.

• Define bin limits so that ej = n/c

• A normal area of 1/c in each of the c bins is desired.

• The first and last classes must be open-ended for a normal distribution, so to define c bins, we need c – 1 cut-points.

• The upper limit of bin j can be found directly by using Excel.

• Alternatively, find zj for bin j using Excel and then calculate the upper limit for bin j as

• Once the bins are defined, count the observations fj within each bin and compare them with the expected frequencies ej = n/c.

15.6 ECDF Tests

LO15-7

Chapter 15

LO15-7: State advantages of ECDF tests as compared to chi-square

GOF tests.

• There are many alternatives to the chi-square test based on the Empirical Cumulative Distribution Function (ECDF).

• The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values

• The K-S test is not recommended for grouped data.

• The K-S test assumes that no parameters are estimated.

• If parameters are estimated, use a Lilliefors test.

• Both of these tests are done by computer.

• The Anderson-Darling (A-D) test is widely used for non-normality because of its power.

• The A-D test is based on a probability plot.

• When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line.