130 likes | 346 Views
Summary. Recall that the population proportion is the percent of the population with a certain characteristic, denoted pSample proportions are denoted p-hatOur goal is to use a p-hat gathered using an appropriate sampling method to estimate p. Point Estimate vs. Interval estimate. We can predict
E N D
1. Ch 19: Confidence Intervals for ProportionsINFERENTIAL STATISTICS -point vs. interval
-Standard Error
-finding confidence intervals
Confidence levels
Examples
Some relevant points
2. Summary Recall that the population proportion is the percent of the population with a certain characteristic, denoted p
Sample proportions are denoted p-hat
Our goal is to use a p-hat gathered using an appropriate sampling method to estimate p
3. Point Estimate vs. Interval estimate We can predict the population proportion using a point estimate (a single value) or an interval estimate (a range of values we think contains p)
For a point estimate, just use p-hat
In other words, the single number that best predicts p is p-hat
However it is unlikely that our population proportion is exactly equal to p-hat, so usually we use an interval estimate
To figure out our interval estimate, we will find what is called the standard error
4. Standard Error??!???!?!? Recall that in the last chapter, we estimated the possible values of p-hat and determined that for all sample p-hats of size n:
The mean of the p-hats would equal p
The standard deviation for the p-hats was equal to the sqrt of (pq/n)
For large n, the p-hats would be normally distributed, so that we could assume that most p-hats were within 2 s.d. of p, and virtually all p-hats would be within 3 s.d of p
5. OK, but what about Standard Error? We want to reverse this process, and assert for example that for a given p-hat, the population will be within 2 s.d. of p-hat.. But the problem is we don’t know p (its what we’re looking for!) and therefore can’t find the standard deviation
Instead of the standard deviation, we use what is called the standard error, which is equal to sqrt ((p-hat * q-hat) / n)
In other words, it’s the same formula we used for std dev in the last chapter, except we use p-hat in place of p
6. Example: We want to estimate the proportion of sea fans infected with a disease We collect some sample data and find that 54 out of 104 sampled sea fans are infected
So p-hat = 54/104 = .519
Our S.E. (Standard Error) =
sqrt (.519 * .481 / 104) = .049
Therefore, assuming a normal model for the dist of p-hat is appropriate, we can say that:
68% of sample p-hats will be within 4.9% of p
95% of sample p-hats will be within 9.8% of p
And more importantly…it follows that:
There is a 68% chance that p lies within 4.9% of OUR p-hat of 51.9%
There is a 95% chance that p lies within 9.8% of OUR p-hat
So there is a 95% chance that p is between 42.1% and 61.7%
7. Additional implications of “So there is a 95% chance that p is between 42.1% and 61.7% Seems like an awfully broad interval – to get a more precise fix on p we would need a larger sample
There is a 5% chance that our sample is very unrepresentative of the population and therefore p is NOT within 2 S.E. of OUR p-hat
Using more formal notation, we write “there is a 95% chance that .421 < p < .617” and we refer to this as a CONFIDENCE INTERVAL
We call 95% our CONFIDENCE LEVEL
8. Finding intervals for other confidence levels We generally use z-scores to establish confidence intervals
We call Z* (z-star) the critical z-score for a given confidence level
The most common levels of confidence are 90%, 95%, and 99%
For 90%, Z* = 1.645. Why? Because when we convert the regular scores to z-scores, 90% of the data lies between z = -1.645 and z = 1.645
For 95%, Z* is approximately 2 since 95% of scores lie within 2 SE of the mean.. However more precisely Z* = 1.96
For 99%, Z* = 2.575, because 99% of the data in a normal distribution lies between z = -2.575 and z = 2.5757
9. General formula To find a confidence interval for p, choose a confidence level and then find the values of:
p-hat ± (Z* times SE)
We call the product (Z* times SE) the MARGIN OF ERROR
Often the margin of error is included when parameter estimates are provided in polls, etc.
For example, when a poll mentions a margin of error of 3%, this implies that when the interval was calculated, the product (Z* times SE) = .03
We’ll look at some examples on the next few slides
10. Example: A survey of 1000 voters finds that 56% plan to vote for Matt Shanahan. Find a 95% confidence interval for p P-hat = .56, q-hat = .44, n = 1000
Z* = 1.96
SE = sqrt (.56 * .44/1000) = .016
So (p-hat ± (Z* times SE)) becomes
.56 ± (1.96 * .016) = .56 ± .03136
So there is a 95% probability that p lies between .529 and .591… ALEX WINS!*
* caveat: there is a 5% chance our sample is unrepresentative and p does not lie in the interval
11. What about for c = .9 or c = .99 (c = confidence level) Again, SE = .016, p-hat = .56
So for 90%, the interval is .56 ± 1.645 * .016
There is a 90% chance that p is between 53.4% and 58.6%
For 99%, the interval is .56 ± 2.575 * .016
There is a 99% chance that p is between .519 and .601
Notice anything? What happens as c goes up?
As the confidence level increases, the interval widens… a TRADE OFF
We gain more confidence that our interval contains p.. But sometimes the interval is so wide that it isn’t very useful!
12. CONDITIONS Plausible independence – there is no reason to suspect that the data values somehow affect each other
Randomization condition – the data must be sampled at random
10% condition – Samples must be less than 10% of the population, if drawn without replacement
The model we use for inference (despite our data being sample proportions) is based on the Central Limit Theorem; therefore EITHER the population must be known to be normal OR n must be sufficiently large (say, 25 or 30)
Finally (PHEW) the SUCCESS/FAILURE condition; we must expect at least 10 successes and 10 failures (that is, np >10 and nq > 10)
13. HOMEWORK Pg. 378, 1 – 7 odds, 11, 15