- The goal of χ2 goodness of fit tests is to test is the data comes from a
certain distribution.

- There are various situations to which these tests apply.
- The first situation we will explore is when we observe count data in
k different categories.

- The aim is to test the null hypothesis that the probabilities of the k
categories are p1, p2,…,pk.

- We distinguish between two cases.

STA261 week 12

Chi-Squared Test - Case 1

- The null hypothesis completely specifies the probabilities of each of
the k categories.

- For each category we calculate the expected count Ei = npi.
- The test statistic and its distribution are…

STA261 week 12

Example

- The statistic department at U of T offers introductory courses for
students from other disciplines. The department believes that 40% of

the students are math major, 30% are computer science, 20%

biology and 10% chemistry. A random sample of 120 students

revealed 52, 38, 21, and 9 from the four majors above. Does this

data support the department claim?

STA261 week 12

Chi-Squared Test - Case 2

- The null hypothesis does not fully specify the probabilities.
- In this case the probabilities of the different categories may be functions of other parameters.
- First use the sample data to estimate r unknown parameters.
- Then use the estimated parameters to estimate the k probabilities.
- For each category, calculate the estimated expected count.
- The test statistic is…

STA261 week 12

Example

- A farmer believes that the number of eggs a chicken will give per
day has a Poisson(λ) distribution. He observed the following data….

STA261 week 12

Remark

- In many cases we will observe data that are not categorized and we
would want to test if the data comes from a certain distribution.

- If the distribution we are testing is discrete the values of the variable
will be the actual categories.

- However, if the variable takes infinite possible values, the grouping
should be done so that the expected frequency in each category is at

least 5.

- If the distribution we are testing is continuous we need to group the
measurement of the random variable of interest into k intervals.

Very often the choice of cells is done arbitrarily.

- χ2 tests has low power when they are applied to continuous data, in which case we can use other tests.

STA261 week 12

Example

STA261 week 12

Kolmogorov-Smirnov Goodness-of-Fit Test

- K-S test is also called the Kolmogorov-Smirnov D test.
- The K-S goodness-of-fit test tests whether or not a given
distribution is not significantly different from one hypothesized.

- It is a more powerful alternative to chi-square goodness-of-fit tests.
- The test statistic in the K-S test is based on the largest absolute
difference between the cumulative observed proportion and the

cumulative proportion expected on the basis of the hypothesized

distribution.

STA261 week 12

Contingency Tables

- The goal is to test if two categorical variables are independent.
- The row variable has r categories while the column variable has c categories.
- The data is the count of observations in the rxc table…
- The null hypothesis states that the row variable and the column
variable are independent. The alternative states that the variables are

dependent.

- To conduct the test, we calculate the expected count for each cell…
- The test statistic and its distribution is….

STA261 week 12

Example

STA261 week 12

