Download Presentation
## Hypothesis Testing

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Overview**• This is the other part of inferential statistics, hypothesistesting • Hypothesis testing and estimation are two different approaches to two similar problems • Estimation is the process of using sample data to estimate the value of a population parameter • Hypothesis testing is the process of using sample data to test a claim about the value of a population parameter**What is Hypothesis Testing?**• The environment of our problem is that we want to test whether a particular claim is believable, or not. • Hypothesis testing involves two steps • Step 1 – to state what we think is true • Step 2 – to quantify how confident we are in our claim**An example of what we want to quantify**• A car manufacturer claims that a certain model of car achieves 29 miles per gallon • We test some number of cars • We calculate the sample mean … it is 27 • Is 27 miles per gallon consistent with the manufacturer’s claim? How confident are we that the manufacturer has significantly overstated the miles per gallon achievable?**An example of what we want to quantify**• How confident are we that the gas economy is definitely less than 29 miles per gallon? • We would like to make either a statement “We’re pretty sure that the mileage is less than 29 mpg” or “It’s believable that the mileage is equal to 29 mpg”**Definition**• A hypothesistest for an unknown parameter is a test of a specific claim • Compare this to a confidence interval which gives an interval of numbers, not a “believe it” or “don’t believe it” answer • The levelofsignificance represents the confidence we have in our conclusion**Null Hypothesis**• How do we state our claim? • Our claim • Is the statement to be tested • Is called the nullhypothesis • Is written as H0 (and is read as “H-naught”)**A Useful Analogy**• In the judicial system, the defendant “is innocent until proven guilty” • Thus the defendant is presumed to be innocent • The null hypothesis is that the defendant is innocent • H0: the defendant is innocent**Alternative Hypothesis**• How do we state our counter-claim? • Our counter-claim • Is the opposite of the statement to be tested • Is called the alternativehypothesis • Is written as H1 (and is read as “H-one”)**If the defendant is not innocent, then**• The defendant is guilty • The alternative hypothesis is that the defendant is guilty • H1: the defendant is guilty • The summary of the set-up • H0: the defendant is innocent • H1: the defendant is guilty**There are different types of null hypothesis -alternative**hypothesis pairs, depending on the claim and the counter-claim • One type of H0 / H1 pair, called a two-tailedtest, tests whether the parameter is either equal to, versus not equal to, some value • H0: parameter = some value • H1: parameter ≠ some value**An example of a two-tailed test**• A bolt manufacturer claims that the diameter of the bolts average 10 mm • H0: Diameter = 10 • H1: Diameter ≠ 10 • An alternative hypothesis of “≠ 10” is appropriate since • A sample diameter that is too high is a problem • A sample diameter that is too low is also a problem • Thus this is a two-tailed test**Another type of pair, called a left-tailedtest, tests**whether the parameter is either equal to, versus less than, some value • H0: parameter = some value • H1: parameter < some value**An example of a left-tailed test**• A car manufacturer claims that the mpg of a certain model car is at least 29.0 • H0: MPG = 29.0 • H1: MPG < 29.0 • An alternative hypothesis of “< 29” is appropriate since • A mpg that is too low is a problem • A mpg that is too high is not a problem • Thus this is a left-tailed test**Another third type of pair, called a right-tailedtest, tests**whether the parameter is either equal to, versus greater than, some value • H0: parameter = some value • H1: parameter > some value**An example of a right-tailed test**• A bolt manufacturer claims that the defective rate of their product is at most 1 part in 1,000 • H0: Defect Rate = 0.001 • H1: Defect Rate > 0.001 • An alternative hypothesis of “> 0.001” is appropriate since • A defect rate that is too low is not a problem • A defect rate that is too high is a problem • Thus this is a right-tailed test**A comparison of the three types of tests**• The null hypothesis • We believe that this is true • The alternative hypothesis**A manufacturer claims that there are at least two scoops of**cranberries in each box of cereal • What would be a problem? • The parameter to be tested is the number of scoops of cranberries in each box of cereal • If the sample mean is too low, that is a problem • If the sample mean is too high, that is not a problem • This is a left-tailed test • The “bad case” is when there are too few**A manufacturer claims that there are exactly 500 mg of a**medication in each tablet • What would be a problem? • The parameter to be tested is the amount of a medication in each tablet • If the sample mean is too low, that is a problem • If the sample mean is too high, that is a problem too • This is a two-tailed test • A “bad case” is when there are too few • A “bad case” is also where there are too many**A manufacturer claims that there are at most 8 grams of fat**per serving • What would be a problem? • The parameter to be tested is the number of grams of fat in each serving • If the sample mean is too low, that is not a problem • If the sample mean is too high, that is a problem • This is a right-tailed test • The “bad case” is when there are too many**There are two possible results for a hypothesis test**• If we believe that the null hypothesis could be true, this is called notrejectingthenullhypothesis • Note that this is only “we believe … could be” • If we are pretty sure that the null hypothesis is not true, so that the alternative hypothesis is true, this is called rejectingthenullhypothesis • Note that this is “we are pretty sure that … is”**In comparing our conclusion (not reject or reject the null**hypothesis) with reality, we could either be right or we could be wrong • When we reject (and state that the null hypothesis is false) but the null hypothesis is actually true • When we not reject (and state that the null hypothesis could be true) but the null hypothesis is actually false • These would be undesirable errors**A summary of the errors is**• We see that there are four possibilities … in two of which we are correct and in two of which we are incorrect**When we reject (and state that the null hypothesis is false)**but the null hypothesis is actually true … this is called a TypeIerror • When we do not reject (and state that the null hypothesis could be true) but the null hypothesis is actually false … this called a TypeIIerror • In general, Type I errors are considered the more serious of the two**We can make use of our analogy for Type I and Type II errors**in comparing it to a criminal trial • In the judicial system, the defendant “is innocent until proven guilty” • Thus the defendant is presumed to be innocent • The null hypothesis is that the defendant is innocent • H0: the defendant is innocent**If the defendant is not innocent, then**• The defendant is guilty • The alternative hypothesis is that the defendant is guilty • H1: the defendant is guilty • The summary of the set-up • H0: the defendant is innocent • H1: the defendant is guilty**Our possible conclusions**• Reject the null hypothesis • Go with the alternative hypothesis • H1: the defendant is guilty • We vote “guilty” • Do not reject the null hypothesis • Go with the null hypothesis • H0: the defendant is innocent • We vote “not guilty” (which is not the same as voting innocent!)**A Type I error**• Reject the null hypothesis • The null hypothesis was actually true • We voted “guilty” for an innocent defendant • A Type II error • Do not reject the null hypothesis • The alternative hypothesis was actually true • We voted “not guilty” for a guilty defendant**Which error do we try to control?**• Type I error (sending an innocent person to jail) • The evidence was “beyond reasonable doubt” • We must be pretty sure • Very bad! We want to minimize this type of error • A Type II error (letting a guilty person go) • The evidence wasn’t “beyond a reasonable doubt” • We weren’t sure enough • If this happens … well … it’s not as bad as a Type I error (according to the law system)**“Innocent” versus “Not Guilty”**• This is an important concept • Innocent is not the same as not guilty • Innocent – the person did not commit the crime • Not guilty – there is not enough evidence to convict … that the reality is unclear • To not reject the null hypothesis – doesn’t mean that the null hypothesis is true – just that there isn’t enough evidence to reject**Summary so far…**• A hypothesis test tests whether a claim is believable or not, compared to the alternative • We test the null hypothesis H0 versus the alternative hypothesis H1 • If there is sufficient evidence to conclude that H0 is false, we reject the null hypothesis • If there is insufficient evidence to conclude that H0 is false, we do not reject the null hypothesis**We have the outline of a hypothesis test, just not the**detailed implementation • What is the exact procedure to get to a do not reject / reject conclusion? • How do we calculate Type I and Type II errors?**Our aim is to conduct an hypothesis test about a population**parameter. Like: • A car manufacturer claims that a certain model of car achieves 29 miles per gallon • We test some number of cars • We calculate the sample mean … it is 27 • Is 27 miles per gallon consistent with the manufacturer’s claim? How confident are we that the manufacturer has significantly overstated the miles per gallon achievable?**STEP 1**• We have a null hypothesis, that the actual mean is equal to a value μ0 • We have an alternative hypothesis • STEP 2 • A criterion that quantifies “unlikely” • That the actual mean is unlikely to be equal to μ0 • A criterion that determines what would be a do not reject and what would be a reject**STEP 3**• We run an experiment • We collect the data • We calculate the sample mean • MID-STEP : Our Assumptions • That the sample is a simple random sample • That the sample mean has a normal distribution**We compare the sample mean x to the hypothesized**population mean μ0 • For two-tailed tests • α= 0.05 Shaded regions are called REJECTION REGION Critical Value (1.96)**The least likely 5% is the lowest 2.5% and highest 2.5%**(below –1.96 and above +1.96 standard deviations) … –1.96 and +1.96 are the criticalvalues • The region outside this is the rejectionregion**For left-tailed tests**• The least likely 5% is the lowest 5% (below –1.645 standard deviations) … –1.645 is the criticalvalue • The region less than this is the rejectionregion**For right-tailed tests**• The least likely 5% is the highest 5% (above 1.645 standard deviations) … +1.645 is the criticalvalue • The region greater than this is the rejectionregion**The difference is**• We standardize • This is called the teststatistic • If the test statistic is in the rejection region – we reject**An example of a two-tailed test**• A bolt manufacturer claims that the diameter of the bolts average 10.0 mm • H0: Diameter = 10.0 • H1: Diameter ≠ 10.0 • We take a sample of size 40 • (Somehow) We know that the standard deviation of the population is 0.3 mm • The sample mean is 10.12 mm • We’ll use a level of significance α = 0.05**Do we reject the null hypothesis?**• 10.12 is 0.12 higher than 10.0 • The standard error is (0.3 / √ 40) = 0.047 • The test statistic is 2.53 • The critical normal value, for α/2 = 0.025, is 1.96 • 2.53 is more than 1.96 • Our conclusion • We reject the null hypothesis • We have sufficient evidence that the population mean diameter is not 10.0**An example of a left-tailed test**• A car manufacturer claims that the mpg of a certain model car is at least 29.0 • H0: MPG = 29.0 • H1: MPG < 29.0 • We take a sample of size 40 • (Somehow) We know that the standard deviation of the population is 0.5 • The sample mean mpg is 28.89 • We’ll use a level of significance α = 0.05**Do we reject the null hypothesis?**• 28.89 is 0.11 lower than 29.0 • The standard error is (0.5 / √ 40) = 0.079 • The test statistic is -1.39 • -1.39 is greater than -1.645, the left-tailed critical value for α = 0.05 • Our conclusion • We do not reject the null hypothesis • We have insufficient evidence that the population mean mpg is less than 29.0**An example of a right-tailed test**• A bolt manufacturer claims that the defective rate of their product is at most 1.70 per 1,000 • H0: Defect Rate = 1.70 • H1: Defect Rate > 1.70 • We take a sample of size 40 • (Somehow) We know that the standard deviation of the population is .06 • The sample defect rate is 1.78 • We’ll use a level of significance α = 0.05**Do we reject the null hypothesis?**• 1.78 is 0.08 higher than 1.70 • The standard error is (0.06 / √ 40) = 0.009 • The test statistic is 8.43 • 8.43 is more than 1.645, the right-tailed critical value for α = 0.05 • Our conclusion • We reject the null hypothesis • We have sufficient evidence that the population mean rate is more than 1.70**Two-tailed test**• The critical values are zα/2 and –zα/2 • The rejection region is {less than –zα/2} and {greater than z1-α/2} • Left-tailed test • The critical value is –zα • The rejection region is {less than –zα} • Right-tailed test • The critical value is zα • The rejection region is {greater than zα}**The difference is**• We standardize • This is called the teststatistic • If the test statistic is in the rejection region – we reject