Hypothesis Testing

Hypothesis Testing An Inference Procedure We will study procedures for both the unknown population mean on a quantitative variable and the unknown population proportion on a qualitative variable.

Background There are times we would like to know about the unknown mean in a population. But, it is often expensive and too time consuming to investigate the whole population. So, a sample is taken. The method of confidence intervals is based on idea that a point estimate would vary from sample to sample in theory and so from the one sample we do take we build in the variability and then are a certain percent confident our interval contains the unknown value. Hypothesis testing will rely on some of the same ideas used in confidence interval, but here there is a least a starting point for the unknown value. The starting point can be from past work or belief one has in a process.

Example Consider an example about a company that puts cereal in a boxes. On the label of each box it says there are 368 grams of cereal in the box. Does each box have exactly 368 grams? Probably not because maybe a few extra flakes fall in one box and a few less in another box. But in the grand scheme of things the process is filling the boxes on average to 368 grams. (Now if one box had 268 grams and the other had 468 grams for an average of 368 we would have a problem, but not of the kind we are talking about here.)

Null hypothesis From the cereal example we would say the null hypothesis is that the mean amount of cereal put in the boxes is 368 grams and in a shorthand notation we would write: Ho: μ = 368. The Ho stands for null hypothesis. Here this basically means if the company believes they are putting 368 ounces in each box then we will not on face value object to that assertion. The mu, μ, is the idea that we are making a hypothesis about the population of all boxes. Of courses we will only take a sample, but our hypothesis is about the population mean.

Alternative Hypothesis In hypothesis testing there will always be a mutually exclusive alternative hypothesis to the null. In the cereal example the alternative hypothesis may be that the cereal boxes are not being filled to an average of 368 grams and we would write this as H1: μ≠ 368. The general process of hypothesis testing starts with the null and alternative hypotheses. Then a sample is collected and analyzed. The analysis will have one either continue to believe in the null hypothesis and thus fail to reject the null, or one will reject the null and conclude the alternative is the one to go with. Note in the cereal example, if the null is rejected the firm better find out why the machine is not filling the cereal boxes properly and get that situation fixed.

Analogy Story about hypothesis tests. Not really stats, but an idea to consider. Say I have two decks of cards. One deck is a regular deck – spades, hearts, diamonds and clubs. The other deck is special – 4 sets of hearts. Now, I take out one of the decks, but you do not know which one. In the language of statistics the null hypothesis will be that I took out the regular deck. You will accept the null hypothesis unless an event occurs that has a really low probability. If a really low probability event occurs you will reject the null hypothesis and go with the alternative hypothesis. So, I take out a deck and deal you five cards – a royal flush hearts! You would reject the null hypothesis of a regular deck and go with the alternative that the deck I pulled out is the special one because a royal flush hearts has a low probability in a regular deck.

Sampling Distribution You may recall that when we have a quantitative variable and the population standard deviation of the variable is known, the distribution of the sample mean is 1) normal 2) Has the same mean as the mean of the variable in the population, 3) Has standard error = standard deviation in the population divided by the square root of the sample size. When the population standard deviation is not known we rely on the sample standard deviation and the distribution of the sample mean is a t distribution. In what follows we assume population standard deviation is known, but the ideas we bring up are also relevant later.

Regions of Sampling Distribution X μ Imagine that this slide has animation. Think about the arrows as both starting out in the center and as the arrows move out they push the vertical lines with them. Using the cereal example, the center of the distribution is thought to be at 368. As we move in either direction from the center we have sample means that are possible when the population mean really is 368. But at some point as we move out we start to wonder about our 1 sample mean as really coming from a distribution with mean equal to 368.

Regions of Sampling Distribution In the process of hypothesis testing the area of the sampling distribution is divided up into regions. The nonrejection region is the area in the middle of the distribution. These values are relatively close to the center. So if we get a sample value in this area we do not have enough evidence to reject the null hypothesis. The “tail” areas that I have on the previous screen are considered rejection regions. While sample mean values could occur in these regions when in fact the true mean is 368, the probability is low and thus this raises suspicion about the null hypothesized value and leads us to reject the null. (Could I deal you a royal flush hearts from a regular deck? Yes, but chance is small, or much better under the alternative hypothesis.)

Critical Values The values of x bar that occur where the arrows are pushed out are called critical values of x bar. Note that the critical values are not determined from the sample. The null hypothesized value is also NOT determined from the sample. Remember the null hypothesis value is determined from past work or knowledge of some process. The critical values are picked based on some additional ideas I want to explore next.

Type I Error A Type I error is a situation where you reject the null hypothesis, Ho, when it is true and should not be rejected. The probability of making a type I error is called alpha and is often referred to as the level of significance. In the cereal example if we reject the null hypothesis we will have to shut down production and investigate the production process to see why it is not putting in the “correct” amount of cereal. There is a consequence to rejecting a true null hypothesis. Depending on the nature of the consequence we pick the value of alpha. Traditional values of alpha are .01, .05 and .1. The choice of alpha will be part of determining the critical x bar values.

Type II Error A type II error is a situation where the null hypothesis is not rejected when it should be because the null is false. The probability of making a type II error is called beta, β. A type II error also has consequences. In the cereal example if we do not reject the null when we should we could either be giving more cereal than we say we are (and thus not charging for it – we certainly have costs in making it), or giving less than we say we are and thus cheating customers. In an introductory statistics class such as ours we typically focus on the type I error.

Critical Value approach Alpha/2 Alpha/2 X Reject region Reject region μ = μo Do not reject region Upper critical value Lower critical value

Critical value approach The null and alternative hypotheses can be stated in a generic way as Ho: μ = μo H1: μ≠μo, where μo is a specific number. In our cereal example we would have Ho: μ = 368 H1: μ≠ 368. When the alternative is a not equal sign we have what is called a two tailed test because if we are off in either direction we are concerned. In this case we divide up the alpha value in half and make our rejection regions have areas add up to alpha. If alpha = .05 we would have .025 in each tail of the distribution.

Critical Value Approach Our context here is that we know the population standard deviation so we use the Z table (the standard normal table). While my graph a few slides back is of X bar, we translate to Z values. With alpha = .05 and thus .025 in either tail, the lower critical Z = -1.96 and the upper critical Z = 1.96. We would reject the null if from our sample the Zstat is less than -1.96 or greater then 1.96 Now, let’s say we take a sample of 25 observations and we get a mean of 372.5 grams and we know the population standard deviation is 15. The Zstat = (372.5 – 368)/(15/sqrt(25)) = 4.5/3 = 1.50. This means we can not reject the null. The data support the filling process is ok!

p – value approach The critical value approach had you set up rejection regions and in the end work with a sample. In the p – value approach you will work with the sample almost as soon as you can. Remember we had a sample mean of 372.5 and the Zstat for this is 1.50. A Z of 1.50 has area .9332 to the left and .0668 to the right. The area to the right is the upper tail associated with the actual sample mean. In the critical value approach we had .025 in the upper tail. So, the .0668 suggests our sample mean is in the do not reject region. With a two tail test we look at the Zstat from the sample and the negative of the Zstat, here -1.50. Then when alpha = .05 we can see our tail areas add up to .1336.

p – value approach The p – value for a sample mean is the probability in the tail given the null hypothesis is true. If we have a two tail test we just double the one tail value to get the p – value. Then if p – value > alpha we do not reject the null, but if the p – value < alpha we reject the null because we know the Zstat is more extreme than the critical values. If the p – value is low, then Ho must go. Note in our work a “low” p – value will be defined from problem to problem. Low from problem to problem may be called the level of significance or alpha.

With a .01 level of significance we have .005 as the area in each tail. We would reject the null if 1) The Zstat is less than -2.575, or 2) The Zstat is greater than 2.575. Area = .005 Area = .005 -2.575 2.575

Hypothesis Testing