Hypothesis Testing

Hypothesis Testing An Inference Procedure We will study procedures for both the unknown population mean on a quantitative variable and the unknown population proportion on a qualitative variable. This section is for the proportion.

Analogy Story about hypothesis tests. Not really stats, but an idea to consider. Say I have two decks of cards. One deck is a regular deck – spades, hearts, diamonds and clubs. The other deck is special – 4 sets of hearts. Now, I take out one of the decks, but you do not know which one. In the language of statistics the null hypothesis will be that I took out the regular deck. You will accept the null hypothesis unless an event occurs that has a really low probability. If a really low probability event occurs you will reject the null hypothesis and go with the alternative hypothesis. So, I take out a deck and deal you five cards – a royal flush hearts! You would reject the null hypothesis of a regular deck and go with the alternative that the deck I pulled out is the special one because a royal flush hearts has a low probability in a regular deck.

So, in my deck of cards example I have introduced 2 ideas you may not have heard of at this point in your life. I mention The null hypothesis that we typically call by shorthand Ho:, and I mention The alternative or research hypothesis H1:. The tradition in statistics is to make the null hypothesis the focal point of our test. When we do the test we will either a) Reject the null hypothesis and go with the alternative, or b) Not reject the null hypothesis (which is a way of saying stay with the null hypothesis) So, later we will spend some time doing the mechanics of the test, but I have a few more ideas to consider. On the next slide you will see a table. The columns relate to the idea that there is truth in the world. We really consider 1 column at a time and while looking at a column we call that column the real truth. The rows relate to how we decide. Maybe this is a little theoretical. So, let’s get to the table!

Ho: is true Ho: is false Reject Ho: (and go with alt) Type I error Good job! Do not reject Ho: Good job! Type II error So, let’s look at the first column. Here we say Ho: is true. In the first row if we reject Ho: when it is true, that would be bad because we have rejected the truth and we say a type I error has been made. In the second row we do not reject Ho: when it is true. That is good because we have not rejected the truth. Now, let’s look at the second column. Here we say Ho: is false. In the first row if we reject Ho: when it is false, that would be good because we have rejected something that is not the truth. In the second row we do not reject Ho: when it is false. That is bad because we have not rejected something that is NOT the truth. This is called making a type II error.

Type I Error A Type I error is a situation where you reject the null hypothesis, Ho, when it is true and should not be rejected. The probability of making a type I error is called alpha (α)- and is often referred to as the level of significance. There is a consequence to rejecting a true null hypothesis. Depending on the nature of the consequence we pick the value of alpha. Traditional values of alpha are .01, .05 and .1.

Type II Error A type II error is a situation where the null hypothesis is not rejected when it should be because the null is false. The probability of making a type II error is called beta, β. In an introductory statistics class such as ours we typically focus on the type I error.

Background There are times we would like to know about the unknown population proportion. But, it is often expensive and too time consuming to investigate the whole population. So, a sample is taken. The method of confidence intervals is based on idea that a point estimate would vary from sample to sample in theory and so from the one sample we do take we build in the variability and then are a certain percent confident our interval contains the unknown value. Hypothesis testing will rely on some of the same ideas used in confidence interval, but here there is a least a starting point for the unknown value. The starting point can be from past work or belief one has in a process. Note we never know the truth for sure because we do not look at the whole population. We live in a probabilistic world!

As an example, let’s say a daily newspaper is concerned about readers continuing to buy the paper. One particular area of concern is the coverage of local, state level and national/international sports. Maybe the people who run the paper think about how satisfied the readers are. Readers might be satisfied, not interested, or not satisfied with the sports coverage. Let’s say a population proportion of more than .8 being satisfied is critical for the business in keeping sales of the paper at an agreeable level. If the proportion is not that high they will change the section. The null hypothesis and alternative hypotheses might be stated Ho: p ≤ .8 (which would mean change the section) H1: p > .8 (which would mean no change is needed) Now, if the null hypothesis is really the truth and it is rejected (a type I error) the business will think the proportion satisfied with the sports section is more than .8 and they probably will not change the section. But, the population proportion satisfied is really .8 or less and the business should change the sports section. They will likely start losing customers. They will take no action about the section when they should. If the null hypothesis is true and they do not reject it they will make changes to the section to keep people buying the paper. They will take needed action.

If the null hypothesis is not true and the business rejects the null then they will go with the alternative hypothesis and they can keep the sports section the way it is. If the null hypothesis in not true and the business does not reject the null hypotheses a type II error has occurred. In this case they should change the paper but they won’t. This could lead to less sales. SO making a type I or a type II error could lead to problems with their business. Again here we will pay attention to the type I error. Let’s recall if we go out into the population of interest, collect a sample of data where the sample size is n, and calculate P hat, that the distribution of all P hat is a normal distribution with mean equal to the population proportion and a standard error = the square root of ((p)(1 – p)/n). In our newspaper example the value that was critical to the paper was .8. In this context we can do with a population proportion of .8.

Alpha P hat Critical P hat .8 Rejection region Let’s remember here that we do not know if the population proportion is .8, but it is an important idea for the business so we make it the value. If it was the population proportion then taking a sample and calculating P hat could lead to any value. Any P hat value from a sample less than .8 would be interpreted as we do not reject the null. But, if the population proportion truly is .8, could a sample proportion have a value more than .8? Yes! But, as we move to the right of .8 at some point we put a dividing line where if we are right of the dividing line we will reject Ho and go with the alt.

Notice when we create the dividing line we could be wrong and we would make a type 1 error. So, we will control the probability of making a type I error by making alpha low. Let’s say that alpha = .05 is our definition of low. Then, the Z associated with an upper tail area = .05 is 1.645 and this Z is associated with the critical P hat you see in the picture. In this context our hypothesis testing will follow a normal distribution and we will calculate a Z statistic called the standardized test statistic or z test stat. The z test stat is (P hat – p under Ho)/standard error. The standard error of p = sqrt[(p under Ho)(1 – p Ho null)/sample size]. Now, p under Ho is the hypothesized value of the proportion.

Say that in a sample of size 100, that 87 folks say they are satisfied with the sports page. P hat = 87/100 = .87 The Z test stat = (.87 - .8)/square root((.8)(.2)/100) = .07/.04 = 1.75. Thus 1.75 > 1.645, so we would reject the null! Could a sample proportion of .87 happen when the population proportion is .8? Yes, but the chance of getting .87, or more, is in the low probability area and so this type of error would only happen, in this case, 5 % of the time. You may have noticed the alternative hypothesis is a > sign. In this context we say we have a one-tailed test.

When we have an inequality in the alternative hypothesis we have a one tailed test and we concentration the whole probability of a type I error in one tail. When H1: had a > sign the tail was on the right side of the distribution. When we have a < sign the tail will be on the left side of the distribution. Two tailed test Sometimes the alternative hypothesis will be a not equal sign ≠. This means we need to look at both sides of the distribution for dividing lines to reject the null hypothesis. On the next slide I have an example where we will make alpha = .1. Another thing I will do is just show a picture of the Z distribution that we have on pages 309 and 310 of the book.

.05 Alpha/2 = .05 Upper Critical z Lower critical z Reject region Reject region Let’s do a problem. Say a magazine claims that 25% of its readers are college students. Ho: p = .25 (.25 is punder null, here) Ha: p ≠.25 With a level of significance of .1 and a two-tailed test each tail will have .05. From the z table the critical z’s are – 1.645 and 1.645.

A sample of 200 college students were asked if they read the magazine and the sample proportion that said yes was .21. The z statistic from the sample is (.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31 and thus we can not reject the null. To reject the null we need a z stat of less than -1.645 or greater than 1.645.

Critical Value approach - two tailed Alpha/2 Alpha/2 Reject region Reject region Do not reject region Upper critical value Lower critical value

Critical value approach When the alternative hypothesis is a not equal sign we have what is called a two tailed test because if we are off in either direction we are concerned. In this case we divide up the alpha value in half and make our rejection regions have areas add up to alpha. If alpha = .05 we would have .025 in each tail of the distribution, for example. As we said earlier, if the alternative is an inequality we have a one-tailed test and put all of alpha in that 1 tail. There is another approach to hypothesis testing.

p – value approach The critical value approach had you set up rejection regions and in the end work with a sample. In the p – value approach you will work with the sample almost as soon as you can. Remember we had: A sample of 200 college students were asked if they read the magazine and the sample proportion that said yes was .21. The z statistic from the sample is (.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31 Since the z from the sample is – 1.31 we see in the z table the area to the left of -1.31 = .0951. On that side alpha/2 = .05 When the area from the sample value > alpha/2, then 2 times the area from the sample value > alpha.

p – value approach The p – value for a sample proportion is the probability in the tail given the null hypothesis is true. If we have a two tail test we just double the one tail value to get the p – value. Then if p – value > alpha we do not reject the null, but if the p – value < alpha we reject the null because we know the Zstat is more extreme than the critical values. If the p – value is low, then Ho must go. Note in our work a “low” p – value will be defined from problem to problem. Low from problem to problem may be called the level of significance or alpha.

A sample of 200 college students were asked if they read the magazine and the sample proportion that said yes was .21. The z statistic from the sample is (.21 - .25)/sqrt[(.25)(.75)/200] = - 1.31 and thus we can not reject the null. Since the z from the sample is – 1.31 and we have a two-tailed test the p-value is 2(.0951)=.1902. Note the .0951 is the tail area in the z table for a z = -1.31 Since .1902 > .1 we do not reject Ho.

P - Value approach - two tailed Alpha/2 Alpha/2 Reject region Reject region P hat Do not reject region Upper critical value Lower critical value

You may have noticed on the previous slide that I reproduced the critical value approach slide and re-labeled it. Notice how the P hat value is in the do not reject Ho region. The area to the left of the P hat has to be bigger than the alpha divided by 2 area because we are inside the critical area that was picked by the alpha divided 2 value. I think we should compare the area to the left of P hat and the value alpha divided by 2. BUT, that is not what folks do. They double the area to the left of P hat, call it a p-value and compare it to alpha!

One last point, not to confuse, but try to clear things up for you in hypothesis testing. With critical value approach when you have a two tailed test you set up rejection regions by splitting alpha in half because being away from the center in either direction leads to doubt about the Ho:. If P hat is more extreme than the critical values you reject Ho. With the p-value approach you find P hat and calculate the area more extreme (or away from the center) and then double that area to compare with alpha. Remember, alpha controls for the probability of a type I error.

Hypothesis Testing