1 / 13

Ch 19: Confidence Intervals for Proportions INFERENTIAL STATISTICS

Summary. Recall that the population proportion is the percent of the population with a certain characteristic, denoted pSample proportions are denoted p-hatOur goal is to use a p-hat gathered using an appropriate sampling method to estimate p. Point Estimate vs. Interval estimate. We can predict

penn
Download Presentation

Ch 19: Confidence Intervals for Proportions INFERENTIAL STATISTICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Ch 19: Confidence Intervals for Proportions INFERENTIAL STATISTICS -point vs. interval -Standard Error -finding confidence intervals Confidence levels Examples Some relevant points

    2. Summary Recall that the population proportion is the percent of the population with a certain characteristic, denoted p Sample proportions are denoted p-hat Our goal is to use a p-hat gathered using an appropriate sampling method to estimate p

    3. Point Estimate vs. Interval estimate We can predict the population proportion using a point estimate (a single value) or an interval estimate (a range of values we think contains p) For a point estimate, just use p-hat In other words, the single number that best predicts p is p-hat However it is unlikely that our population proportion is exactly equal to p-hat, so usually we use an interval estimate To figure out our interval estimate, we will find what is called the standard error

    4. Standard Error??!???!?!? Recall that in the last chapter, we estimated the possible values of p-hat and determined that for all sample p-hats of size n: The mean of the p-hats would equal p The standard deviation for the p-hats was equal to the sqrt of (pq/n) For large n, the p-hats would be normally distributed, so that we could assume that most p-hats were within 2 s.d. of p, and virtually all p-hats would be within 3 s.d of p

    5. OK, but what about Standard Error? We want to reverse this process, and assert for example that for a given p-hat, the population will be within 2 s.d. of p-hat.. But the problem is we don’t know p (its what we’re looking for!) and therefore can’t find the standard deviation Instead of the standard deviation, we use what is called the standard error, which is equal to sqrt ((p-hat * q-hat) / n) In other words, it’s the same formula we used for std dev in the last chapter, except we use p-hat in place of p

    6. Example: We want to estimate the proportion of sea fans infected with a disease We collect some sample data and find that 54 out of 104 sampled sea fans are infected So p-hat = 54/104 = .519 Our S.E. (Standard Error) = sqrt (.519 * .481 / 104) = .049 Therefore, assuming a normal model for the dist of p-hat is appropriate, we can say that: 68% of sample p-hats will be within 4.9% of p 95% of sample p-hats will be within 9.8% of p And more importantly…it follows that: There is a 68% chance that p lies within 4.9% of OUR p-hat of 51.9% There is a 95% chance that p lies within 9.8% of OUR p-hat So there is a 95% chance that p is between 42.1% and 61.7%

    7. Additional implications of “So there is a 95% chance that p is between 42.1% and 61.7% Seems like an awfully broad interval – to get a more precise fix on p we would need a larger sample There is a 5% chance that our sample is very unrepresentative of the population and therefore p is NOT within 2 S.E. of OUR p-hat Using more formal notation, we write “there is a 95% chance that .421 < p < .617” and we refer to this as a CONFIDENCE INTERVAL We call 95% our CONFIDENCE LEVEL

    8. Finding intervals for other confidence levels We generally use z-scores to establish confidence intervals We call Z* (z-star) the critical z-score for a given confidence level The most common levels of confidence are 90%, 95%, and 99% For 90%, Z* = 1.645. Why? Because when we convert the regular scores to z-scores, 90% of the data lies between z = -1.645 and z = 1.645 For 95%, Z* is approximately 2 since 95% of scores lie within 2 SE of the mean.. However more precisely Z* = 1.96 For 99%, Z* = 2.575, because 99% of the data in a normal distribution lies between z = -2.575 and z = 2.5757

    9. General formula To find a confidence interval for p, choose a confidence level and then find the values of: p-hat ± (Z* times SE) We call the product (Z* times SE) the MARGIN OF ERROR Often the margin of error is included when parameter estimates are provided in polls, etc. For example, when a poll mentions a margin of error of 3%, this implies that when the interval was calculated, the product (Z* times SE) = .03 We’ll look at some examples on the next few slides

    10. Example: A survey of 1000 voters finds that 56% plan to vote for Matt Shanahan. Find a 95% confidence interval for p P-hat = .56, q-hat = .44, n = 1000 Z* = 1.96 SE = sqrt (.56 * .44/1000) = .016 So (p-hat ± (Z* times SE)) becomes .56 ± (1.96 * .016) = .56 ± .03136 So there is a 95% probability that p lies between .529 and .591… ALEX WINS!* * caveat: there is a 5% chance our sample is unrepresentative and p does not lie in the interval

    11. What about for c = .9 or c = .99 (c = confidence level) Again, SE = .016, p-hat = .56 So for 90%, the interval is .56 ± 1.645 * .016 There is a 90% chance that p is between 53.4% and 58.6% For 99%, the interval is .56 ± 2.575 * .016 There is a 99% chance that p is between .519 and .601 Notice anything? What happens as c goes up? As the confidence level increases, the interval widens… a TRADE OFF We gain more confidence that our interval contains p.. But sometimes the interval is so wide that it isn’t very useful!

    12. CONDITIONS Plausible independence – there is no reason to suspect that the data values somehow affect each other Randomization condition – the data must be sampled at random 10% condition – Samples must be less than 10% of the population, if drawn without replacement The model we use for inference (despite our data being sample proportions) is based on the Central Limit Theorem; therefore EITHER the population must be known to be normal OR n must be sufficiently large (say, 25 or 30) Finally (PHEW) the SUCCESS/FAILURE condition; we must expect at least 10 successes and 10 failures (that is, np >10 and nq > 10)

    13. HOMEWORK Pg. 378, 1 – 7 odds, 11, 15

More Related