Chapter 11

Chapter 11 Asking and Answering Questions About The Difference Between Two Population Proportions Created by Kathy Fritz

Subscripts are added to distinguish between the two populations. You can either use 1 and 2, or you can use descriptive subscripts like W for women and M for men. Let’s review notation: In this chapter, only the case where the two samples are independent samples is considered.

Estimating the Difference Between Two Population Proportions Properties of the Sampling Distribution of Large Sample Confidence Interval Interpreting Confidence Intervals

Some people seem to think that duct tape can fix anything . . . even remove warts! Investigators at Madigan Army Medical Center tested using duct tape to remove warts versus the more traditional freezing treatment. Suppose that the duct tape treatment will successfully remove 50% of warts and that the traditional freezing treatment will successfully remove60% of warts. Let’s investigate the sampling distribution of

.6 .5 .1 pfreeze = the true proportion of warts that are successfully removed by freezing ptape = the true proportion of warts that are successfully removed by using duct tape Randomly take one of the sample proportions for the freezing treatment and one of the sample proportions for the duct tape treatment and find the difference. Suppose we repeatedly treated 100 warts using the duct tape method and calculated the proportion of warts that are successfully removed. We would have the sampling distribution of . Suppose we repeatedly treated 100 warts using the traditional freezing treatment and calculated the proportion of warts that are successfully removed. We would have the sampling distribution of . pfreeze = .6 ptape = .5 Doing this repeatedly, we will create the sampling distribution of

General Properties of the Sampling Distribution of If is the difference in sample proportions for independently selected random samples, then the following rules hold: 1. This rule specifies the standard error of .The value of the standard error describes how much the values tend to vary from the actual population difference. The sample sizes can be considered large if n1p1≥ 10, n1(1 – p1) ≥ 10, n2p2 ≥ 10, and n2(1 – p2) ≥ 10. • This rule says that the sampling distribution of is centered at the actual value of the difference in population proportions. This means that the sample differences tend to cluster around the value of the actual population difference. 2. 3. If both n1 and n2 are large, then the sampling distribution of is approximately normal.

A Large-Sample Confidence Interval for a Difference in Population Proportions Appropriate when the following conditions are met • The samples are independent random samples from the populations of interest (or the samples are selected in a way that is reasonable to regard each sample as representative of the corresponding population). The sample sizes are large. This condition is met when , , , and . When these conditions are met, a confidence interval for the difference in population proportions is

Confidence Intervals Continued . . . Interpretation of Confidence Interval You can be confident that the actual value of the difference in population proportions is included in the computed interval. In a given problem, this statement should be in context. Interpretation of Confidence Level The confidence level specifies the long-run percentage of the time that this method will be successful in capturing the actual difference in population proportions.

As part of a study, people in a sample of 258 cell phone users ages 20 to 39 were asked if they use their cell phones to stay connected while they are in bed. The same question was also asked of each person in a sample of 129 cell phone users ages 40 to 49. The study found that 168 of the 258 people in the sample of 20- to 39-year-olds and 61 of the 129 people in the sample of 40- to 49-year-olds said that they sleep with their phones. How much greater is the proportion who use a cell phone to stay connected in bed for cell phone users ages 20 to 39 than for those ages 40 to 49? Step 1 (Estimate): You want to estimate p1 – p2, where p1 is the proportion of cell phone users ages 20 to 39 who sleep with their phones and p2is the proportion of cell phone users ages 40 to 49 who sleep with their phones

Because the answers to the four key questions are 1) estimation, 2) sample data, 3) one categorical variable, and 4) two samples, a large sample confidence interval for a difference in population proportions will be considered. A confidence level of 90% will be used. Cell Phones in Beds Continued . . . Step 2 (Method): • Step 3 (Check): • The sample size is large enough because • , • , • No information was provided regarding how the samples were selected. We must assume that the samples were selected in a reasonable way.

(0.09,0.27) Cell Phones in Beds Continued . . . Step 4 (Calculation): Step 5 (Communicate Results): Assuming that the samples were selected in a reasonable way, you can be 90% confident that the actual difference in the proportion of cell phone users who sleep with their cell phone for 20- to 39-year-olds and for 40- to 49-year-olds is between 0.09 and 0.27. The method used to construct this interval estimate is successful in capturing the actual value of the difference in population proportion about 90% of the time.

Interpreting Confidence Intervals for a Difference If p1 – p2 is positive, it means you think p1 is greater than p2 and the interval gives an estimate of how much greater. (0.24, 0.36) You think that p1 is greater than p2 by somewhere between 0.24 and 0.36. Both endpoints of the confidence interval for p1 – p2 are negative If p1 – p2 is negative, it means you think p1 is less than p2 and the interval gives an estimate of how much less. (-0.14, -0.06) You think that p1 is less than p2 by somewhere between 0.14 and 0.06. 0 is included in the confidence interval If the confidence interval includes 0, a plausiblevalue for p1 – p2 is zero. (-0.14, 0.09) Because 0 is included in the confidence interval, it is possible that the two population proportions could be equal.

Testing Hypotheses About the Difference Between Two Population Proportions

A Large-Sample Test for a Difference in Two Population Proportions Appropriate when the following conditions are met • The samples are independent random samples from the populations of interest (or the samples are selected in a way that is reasonable to regard each sample as representative of the corresponding population). The sample sizes are large. This condition is met when , , , and .

A Large-Sample Test for a Difference in Two Population Proportions Continued . . . When these conditions are met, the following test statistic can be used to test the null hypothesis H0: p1 – p2 = 0 where is the combined estimate of the common proportion if H0 is true

A Large-Sample Test for a Difference in Two Population Proportions Continued . . . Null Hypothesis: H0: p1 – p2 = 0 Ha: p1 – p2 < 0 Area under the z curve to the left of the calculated value of the test statistic Ha: p1 – p2 ≠ 0 2·(area to right of z) if z is positive or 2·(area to left of z) if z is negative

Another Way to Write Hypothesis Statements: H0: p1 - p2 = 0 Ha: p1 - p2 > 0 Ha: p1 - p2 < 0 Ha: p1 - p2 ≠ 0 H0: p1 = p2 Ha: p1 > p2 Ha: p1 < p2 Ha: p1 ≠ p2

A survey was conducted by Gallup to investigate public opinion on issues related to rising gas prices. Each person in a representative sample of low-income adult Americans (annual income less than $30,000) and each person in an independently selected representative sample of high-income adult Americans (annual income greater than $75,000) was asked whether he or she would consider buying an electric car if gas prices continued to rise. In the low-income sample, 65% said that they would not buy an electric car no matter how high gas prices were to rise. In the high-income sample, 59% responded this way. Suppose sample sizes were both 300. Is there convincing evidence that the proportion who would never consider buying an electric car is different for low-income adult Americans than for high income adult Americans?

Gas Prices Continued . . . Step 1 (Hypotheses): H0: p1 – p2 = 0 Ha: p1 – p2 ≠ 0 where p1 = proportion of low-income adult Americans who would never consider buying an electric car p2= proportion of high-income adult Americans who would never consider buying an electric car Step 2 (Method): Because the answers to the four key questions are 1) hypothesis testing, 2) sample data, 3) one categorical variable, and 4) two samples, consider a large-sample hypothesis test for a difference in population proportions. In this situation, because neither type of error is much more serious than the other, you might choose a value of 0.05 for a.

Gas Prices Continued . . . H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0 Step 3 (Check): • The sample size is large enough because , , • From the study description, you know that the samples were independently selected. You also know that Gallup believed the samples were selected in a way that would result in representative samples of adult Americans in the two income groups.

Gas Prices Continued . . . H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0 Step 4 (Calculations): Test statistic: P-value = 2 · P(z > 1.50) = 0.1336

Gas Prices Continued . . . H0: p1 – p2 = 0 versus Ha: p1 – p2 ≠ 0 Step 5 (Communicate Results): Decision: 0.1336 > 0.05, Fail to reject H0 Conclusion: Based on the sample data, you are not convinced that there is a difference in the proportions who would never consider buying an electric car for low-income and high-income adult Americans.

Avoid These Common Mistakes

Avoid These Common Mistakes • Remember that the results of a hypothesis test can never show strong support for the null hypothesis. In two-sample situations, this means that you shouldn’t be convinced that there is not difference between two population proportions based on the outcome of a hypothesis test.

Avoid These Common Mistakes • If you have complete information (a census) for both populations, there is no need to carry out a hypothesis test or to construct a confidence interval – in fact, it would be inappropriate to do so.

Avoid These Common Mistakes • Don’t confuse statistical significance with practical significance. In the two-sample setting, it is possible to be convinced that two population proportions are not equal even in situations where the actual difference between them is small enough that it is of no practical use. After rejecting a null hypothesis of no difference, it is useful to look at a confidence interval estimate of the difference to get a sense of practical significance.

Avoid These Common Mistakes • Correctly interpreting confidence intervals in the two-sample case is more difficult than in the one-sample case, so take particular care when providing two-sample confidence interval interpretations. Because the two-sample confidence interval estimates a difference (p1 – p2), the most important thing to note is whether or not the interval includes 0.

Chapter 11

Chapter 11

Presentation Transcript

CHAPTER 11

Chapter 11

Chapter 11

chapter 11

Chapter 11

Chapter 11

Chapter 11

CHAPTER 11

CHAPTER 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11

Chapter 11