1 / 10

# Goodness of Fit Tests - PowerPoint PPT Presentation

Goodness of Fit Tests. The goal of χ 2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which these tests apply. The first situation we will explore is when we observe count data in k different categories.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Goodness of Fit Tests' - clarke-mcbride

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• The goal of χ2 goodness of fit tests is to test is the data comes from a

certain distribution.

• There are various situations to which these tests apply.

• The first situation we will explore is when we observe count data in

k different categories.

• The aim is to test the null hypothesis that the probabilities of the k

categories are p1, p2,…,pk.

• We distinguish between two cases.

STA261 week 12

Chi-Squared Test - Case 1

• The null hypothesis completely specifies the probabilities of each of

the k categories.

• For each category we calculate the expected count Ei = npi.

• The test statistic and its distribution are…

STA261 week 12

• The statistic department at U of T offers introductory courses for

students from other disciplines. The department believes that 40% of

the students are math major, 30% are computer science, 20%

biology and 10% chemistry. A random sample of 120 students

revealed 52, 38, 21, and 9 from the four majors above. Does this

data support the department claim?

STA261 week 12

Chi-Squared Test - Case 2

• The null hypothesis does not fully specify the probabilities.

• In this case the probabilities of the different categories may be functions of other parameters.

• First use the sample data to estimate r unknown parameters.

• Then use the estimated parameters to estimate the k probabilities.

• For each category, calculate the estimated expected count.

• The test statistic is…

STA261 week 12

• A farmer believes that the number of eggs a chicken will give per

day has a Poisson(λ) distribution. He observed the following data….

STA261 week 12

• In many cases we will observe data that are not categorized and we

would want to test if the data comes from a certain distribution.

• If the distribution we are testing is discrete the values of the variable

will be the actual categories.

• However, if the variable takes infinite possible values, the grouping

should be done so that the expected frequency in each category is at

least 5.

• If the distribution we are testing is continuous we need to group the

measurement of the random variable of interest into k intervals.

Very often the choice of cells is done arbitrarily.

• χ2 tests has low power when they are applied to continuous data, in which case we can use other tests.

STA261 week 12

STA261 week 12

• K-S test is also called the Kolmogorov-Smirnov D test.

• The K-S goodness-of-fit test tests whether or not a given

distribution is not significantly different from one hypothesized.

• It is a more powerful alternative to chi-square goodness-of-fit tests.

• The test statistic in the K-S test is based on the largest absolute

difference between the cumulative observed proportion and the

cumulative proportion expected on the basis of the hypothesized

distribution.

STA261 week 12

• The goal is to test if two categorical variables are independent.

• The row variable has r categories while the column variable has c categories.

• The data is the count of observations in the rxc table…

• The null hypothesis states that the row variable and the column

variable are independent. The alternative states that the variables are

dependent.

• To conduct the test, we calculate the expected count for each cell…

• The test statistic and its distribution is….

STA261 week 12

STA261 week 12