Hypothesis Testing

Hypothesis Testing • A criminal trial is an example of hypothesis testing. • In a trial a jury must decide between two hypotheses. • The null hypothesis is • H0: The defendant is innocent • The alternative hypothesis or research hypothesis is • HA: The defendant is guilty • The jury does not know which hypothesis is true. They must make a decision on the basis of evidence presented.

Hypothesis Testing • Convicting the defendant is called rejecting the null hypothesis in favor of the alternative hypothesis. That is, the jury is saying that there is enough evidence to conclude that the defendant is guilty (i.e., there is enough evidence to conclude that the assumption of innocence is suspect). • If the jury acquits it is stating that there is not enough evidence to reject the null hypothesis in support of the alternative hypothesis. This does not prove that the defendant is innocent, only that there is not enough evidence to support the alternative hypothesis. That is why we never say that we accept the null hypothesis, although most people in industry will say “We accept the null hypothesis”

Errors in Hypothesis Testing • There are two possible errors. • A Type I error occurs when we reject a true null hypothesis. That is, a Type I error occurs when the jury convicts an innocent person. We would want the probability of this type of error [maybe 0.001 – beyond a reasonable doubt] to be very small for a criminal trial where a conviction results in the death penalty, whereas for a civil trial, where conviction might result in someone having to “pay for damages to a wrecked auto”,we would be willing for the probability to be larger [0.49 – preponderance of the evidence ] • P(Type I error) =  [usually 0.05 or 0.01]

Errors in Hypothesis Testing • A Type II error occurs when we don’t reject a false null hypothesis [accept the null hypothesis]. That occurs when a guilty defendant is acquitted. • In practice, this type of error is by far the most serious mistake we normally make. For example, if we test the hypothesis that the amount of medication in a heart pill is equal to a value which will cure your heart problem and “accept the hull hypothesis that the amount is ok”. Later on we find out that the average amount is WAY too large and people die from “too much medication” [I wish we had rejected the hypothesis and threw the pills in the trash can], it’s too late because we shipped the pills to the public.

Errors in Hypothesis Testing • The probability of a Type I error is denoted as α (Greek letter alpha). The probability of a type II error is β (Greek letter beta). • The two probabilities are inversely related. Decreasing one increases the other, for a fixed sample size. • In other words, you can’t have  and β both real small for any old sample size. You may have to take a much larger sample size, or in the court example, you need much more evidence.

Only 4 things can happen when you test a hypothesis!

Conclusions to Hypothesis Testing • 1. There are two hypotheses, the null and the alternative hypotheses. • 2. The procedure begins with the assumption that the null hypothesis is true. • 3. The goal is to determine whether there is enough evidence to infer that the alternative hypothesis is true, or the null is not likely to be true. • 4. There are two possible decisions: • Conclude that there is enough evidence to support the alternative hypothesis. Reject the null. • Conclude that there is not enough evidence to support the alternative hypothesis. Fail to reject the null.

Hypothesis Testing • The two hypotheses are called the null hypothesis and the other the alternative or research hypothesis. The usual notation is: • H0: — the ‘null’ hypothesis • HA: — the ‘alternative’ or ‘research’ hypothesis • The null hypothesis (H0) will always state that the parameter equals the value specified in the alternative hypothesis HA ( some people will use H1 for the alternative hypothesis)

Example: Hypothesis Test • Recall the mean time in the recovery room example earlier. Rather than estimate the mean time, our hospital administrator wants to know whether the mean is different from 350 minutes (which is the standard Blue Cross/Blue Shield uses for insurance). In other words BC/BS claims that the mean time is 350 minutes and we want to check this claim out to see if it appears reasonable. We can rephrase this request into a test of the hypothesis: • H0: μx = 350 • Thus, our research hypothesis becomes: • HA: μx ≠ 350 • Recall that the standard deviation [σ]was assumed to be 75, the sample size [n] was 25, and the sample mean [ ] was calculated to be 370.16. If the sample mean is close to 350, it would appear that BC/BS may be correct in assuming the average time for recovery is in fact 350

Example: Hypothesis Test • The testing procedure begins with the assumption that the null hypothesis is true. • Thus, until we have further statistical evidence, we will assume: • H0: = 350 (assumed to be TRUE) • The next step will be to determine the sampling distribution of the sample mean assuming the true mean is 350. • is normal with 350 • 75/SQRT(25) = 15

Is the Sample Mean in the Guts of the Sampling Distribution??

Three ways to determine this: First way • Unstandardized test statistic: Is in the guts of the sampling distribution? Depends on what you define as the “guts” of the sampling distribution. • If we define the guts as the center 95% of the distribution [this means  = 0.05], then the critical values that define the guts will be 1.96 standard deviations of X-Bar on either side of the mean of the sampling distribution [350], or • UCV = 350 + 1.96*15 = 350 + 29.4 = 379.4 • LCV = 350 – 1.96*15 = 350 – 29.4 = 320.6

1. Unstandardized Test Statistic Approach

1. Unstandardized Test Statistic Approach – Managerial Summary • Reason: Since the sample mean is between the LCV(320.6) and the UCV(379.4) • Conclusion: We fail to reject the null hypothesis Ho: μx = 350 • Level of Significance: At a 5% level of significance. • In other words, we have no statistical evidence to dispute BC/BS’s claim that patients should stay in recovery an average of 350 minutes.

Three ways to determine this: Second way • 2. Standardized test statistic: Since we defined the “guts” of the sampling distribution to be the center 95% [ = 0.05], • If the Z-Score for the sample mean is greater than 1.96, we know that will be in the reject region on the right side or • If the Z-Score for the sample mean is less than -1.97, we know that will be in the reject region on the left side. • Z = ( - μo )/ σ = (370.16 – 350)/15 = 1.344 • Is this Z-Score in the guts of the sampling distribution???

2. Standardized Test Statistic Approach

2. Standardized Test Statistic Approach – Managerial Summary • Reason: Since the sample value for Z is between the LCV(-1.96) and the UCV(+1.96) • Conclusion: We fail to reject the null hypothesis Ho: μx = 350 • Level of Significance: At a 5% level of significance. • In other words, we have no statistical evidence to dispute BC/BS’s claim that patients should stay in recovery an average of 350 minutes.

Three ways to determine this: Third way • 3. The p-value approach (which is generally used with computer and statistical software): Increase the “Rejection Region” until it “captures” the sample mean. • For this example, since is to the right of the mean, calculate • P( > 370.16) = P(Z > 1.344) = 0.0901 • Since this is a two tailed test, you must double this area for the p-value. • p-value = 2*(0.0901) = 0.1802 • Since we defined the guts as the center 95% [ = 0.05], the reject region was 5%. Since our sample mean, , is in the 18.02% region, it cannot be in our 5% rejection region [ = 0.05]. • Therefore, since the p-value > 0.05, we reject the hypothesis that the mean is 350 at a 5% level of significance

Three ways to determine this: Third way

Statistical Conclusions: • Unstandardized Test Statistic: • Since LCV (320.6) < (370.16) < UCV (379.4), we fail to reject the null hypothesis at a 5% level of significance. • Standardized Test Statistic: • Since -Z/2(-1.96) < Z(1.344) < Z/2 (1.96), we fail to reject the null hypothesis at a 5% level of significance. • P-value: • Since p-value (0.1802) > 0.05 [], we fail to reject the hull hypothesis at a 5% level of significance.

NOW, what happens when σ is unknown and we have to use “s” • Now assume the standard deviation [σ] was unknown and the sample standard deviation [s] calculated to be 80.8, the sample size [n] was 25, and the sample mean [ ] was calculated to be 370.16. • Use the t-statistic [assuming population is normal] to calculate the new critical values: • UCV = 350 + t/2*(80.8/5) = ? • LCV = 350 – t/2*(80.8/5) = ? d.f. = n-1 = 24  = 0.05 What is your managerial conclusion???

Hypothesis Testing Advise • In most cases you will not know σ and will need to use the sample standard deviation in all your formulas. • The minute you do this, everywhere you find a “Z score” you will replace it with a “t score” with n-1 degrees of freedom [single mean problems] • You must be able to assume population is approximately normal to do this, especially for small sample sizes

Hypothesis Testing Advise • If sample sizes are “large” the central limit will take care of you and “in the old days” we went ahead and used the Z score whenever the sample sizes were greater than 30. • Bottom Line: If the population is not normal, and you have to estimate σ from the data [ use “s” ], and the sample size is small [usually < 30], YOU CANNOT WORK THIS PROBLEM WITH WHAT YOU LEARN IN THIS CLASS.

Three Forms of Single Mean Hypothesis Test [ is not divided in the one-tail tests]

One Tail Hypothesis Test Example • A health care facility determines that a new billing system will be cost-effective only if the mean monthly bill is more than $170. • A random sample of 400 monthly bills is drawn, for which the sample mean is $178. The bills are approximately normally distributed with a standard deviation of $65 [σ = 65]. • Can we conclude that the new system will be cost-effective?

One Tail Hypothesis Test Example • The system will be cost effective if the mean bill for all customers is greater than $170 [some refer to this as the “research” hypothesis]. • Our null and alternative hypothesis would be: • H0: μ< 170 (some text will leave the < off this null: H0: μ = 170) • Ha: μ > 170 (this is what we want to conclude if we buy the new system) • Fail to Reject Ho – don’t buy system • Reject Ho – buy system

One Tail Hypothesis Test Example • H0: μ< 170 • Ha: μ > 170 • We know: • n = 400, • = 178, and • σ = 65 • σ = 65/SQRT(400) = 3.25 •  = 0.05

One Tail Hypothesis Test Example • Managerial Conclusion???

Homework: • 7.2.3, 7.2.11, 7.2.13, 7.2.15

Questions to thing about! • Is there any significant difference in the mean grade made on the first test between male and female students? • Ho: μF = μM or Ho: μF - μM =0 • “Difference in Means Hypothesis Test”

Questions to thing about! • Is there any significant difference in the mean grade on the first test and the second test? • Would we test • Ho: μ1 = μ2 OR • Ho: μd = 0 • Called a “Paired Difference in Means” test. • The second test Ho: μd = 0takes the variability between the students out of the analysis, resulting in less variability, which means you will have more statistical “POWER” to detect a difference for this sample size. [In effect this is a single mean hypothesis test]

Questions to thing about! • Is there a significant difference in the mean grades depending on which row the student sits in? • Ho: μRow 1 = μRow 2 = μRow 3 = μRow 4 = μRow 5 = μRow 6 • Called a “One Way Analysis of Variance” Conclusion???

One Way Analysis of Variance

One Way Analysis of Variance • You could claim that the means associated with any vertical column are equal BUT for sure Rows 5 and 4 have different means than rows 1, 2, and 6

Homework • Describe an experiment at your work where you might use a “Difference in Means Hypothesis Test” • Describe an experiment at your work where you might use a “Paired Difference in Means Hypothesis Test” • Describe an experiment at your work where you might use a “One Way Analysis of Variance Test”

Hypothesis Testing