Goodness of Fit Tests

Goodness of Fit Tests • The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. • There are various situations to which these tests apply. • The first situation we will explore is when we observe count data in k different categories. • The aim is to test the null hypothesis that the probabilities of the k categories are p1, p2,…,pk. • We distinguish between two cases. STA261 week 12

Chi-Squared Test - Case 1 • The null hypothesis completely specifies the probabilities of each of the k categories. • For each category we calculate the expected count Ei = npi. • The test statistic and its distribution are… STA261 week 12

Example • The statistic department at U of T offers introductory courses for students from other disciplines. The department believes that 40% of the students are math major, 30% are computer science, 20% biology and 10% chemistry. A random sample of 120 students revealed 52, 38, 21, and 9 from the four majors above. Does this data support the department claim? STA261 week 12

Chi-Squared Test - Case 2 • The null hypothesis does not fully specify the probabilities. • In this case the probabilities of the different categories may be functions of other parameters. • First use the sample data to estimate r unknown parameters. • Then use the estimated parameters to estimate the k probabilities. • For each category, calculate the estimated expected count. • The test statistic is… STA261 week 12

Example • A farmer believes that the number of eggs a chicken will give per day has a Poisson(λ) distribution. He observed the following data…. STA261 week 12

Remark • In many cases we will observe data that are not categorized and we would want to test if the data comes from a certain distribution. • If the distribution we are testing is discrete the values of the variable will be the actual categories. • However, if the variable takes infinite possible values, the grouping should be done so that the expected frequency in each category is at least 5. • If the distribution we are testing is continuous we need to group the measurement of the random variable of interest into k intervals. Very often the choice of cells is done arbitrarily. • χ2 tests has low power when they are applied to continuous data, in which case we can use other tests. STA261 week 12

Example STA261 week 12

Kolmogorov-Smirnov Goodness-of-Fit Test • K-S test is also called the Kolmogorov-Smirnov D test. • The K-S goodness-of-fit test tests whether or not a given distribution is not significantly different from one hypothesized. • It is a more powerful alternative to chi-square goodness-of-fit tests. • The test statistic in the K-S test is based on the largest absolute difference between the cumulative observed proportion and the cumulative proportion expected on the basis of the hypothesized distribution. STA261 week 12

Contingency Tables • The goal is to test if two categorical variables are independent. • The row variable has r categories while the column variable has c categories. • The data is the count of observations in the rxc table… • The null hypothesis states that the row variable and the column variable are independent. The alternative states that the variables are dependent. • To conduct the test, we calculate the expected count for each cell… • The test statistic and its distribution is…. STA261 week 12

Example STA261 week 12

Goodness of Fit Tests