1 / 0

Intro to Confidence Intervals

Intro to Confidence Intervals. AP Statistics. Introduction Our goal in many statistical settings is to use a sample statistic to estimate a population parameter. Earlier we learned if we randomly select the sample, we should be able to generalize our results to the population of interest .

sutton
Download Presentation

Intro to Confidence Intervals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro to Confidence Intervals

    AP Statistics
  2. Introduction Our goal in many statistical settings is to use a sample statistic to estimate a population parameter. Earlier we learned if we randomly select the sample, we should be able to generalize our results to the population of interest. Last Chapter, we learned that different samples yield different results for our estimate. Statistical inference uses the language of probability to express the strength of our conclusions by taking chance variation due to random selection or random assignment into account. In this chapter, we’ll learn one method of statistical inference – confidence intervals – so we may estimate the value of a parameter from a sample statistic. As we do so, we’ll learn not only how to construct a confidence interval, but also how to report probabilities that would describe what would happen if we used the inference method many times.
  3. Example We have a population of infinite size. The population distribution is Normal and its standard deviation is σ = 20 A SRS of n = 16 observations is taken from the population and the sample mean is = 240.79 We want to find the population mean µ is most likely not exactly 240.79 but will be somewhere around 240.79. So we will set up an interval
  4. Example How would the sample mean vary if we took many SRSs of size 16 from this population? The sampling distribution of would be Normal because the population distribution is Normal. = µ 10% condition is met, infinite population N(µ, 5)
  5. Example How would the sample mean vary if we took many SRSs of size 16 from this population?
  6. Example N(µ, 5) From the 68-95-99.7 rule is within 2 standard deviations of the population mean µ in about 95% of all samples of size n = 16. is an unbiased estimator of 𝜇. So the interval to “captures” the population mean µ in about 95% of all samples of size 16 We estimate that µ lies somewhere in the interval 240.79 – 10 = 230.79 to 240.79 + 10 = 250.79
  7. Confidence Intervals Can be written in several ways From our example: 240.79 ± 10 230.79 to 250.79 230.79 < µ < 250.79 (230.79, 250.79)
  8. Inference and Estimation Statistical Inference provides methods for drawing conclusions about a population from sample data. Formal inference will allow us to express the strength of our conclusions. We will explore confidence intervals for estimating the value of a population parameter, then tests of significance which will assess evidence for a claim about a population. You must assume that data is produced from a properly randomized design.
  9. General Form for Any Confidence Interval A confidence interval for a parameter has two parts: An interval calculated from the data, which has the form Estimate ± margin of error The sample mean is an unbiased estimator of the population mean µ. The sample proportion is an unbiased estimator of the population proportion p. A point estimator is a statistic that provides an estimate of a population parameter. The value of that statistic from a sample is called a point estimate. Ideally, a point estimate is our “best guess” at the value of an unknown parameter.
  10. Point estimator and point estimate We learned last chapter that an ideal point estimator will have no bias and low variability. Since variability is almost always present when calculating statistics from different samples, we must extend our thinking about estimating parameters to include an acknowledgement that repeated sampling could yield different results.
  11. General Form for Any Confidence Interval A confidence interval for a parameter has two parts: An interval calculated from the data, which has the form Estimate ± margin of error The margin of error shows us how accurate we believe our guess is based on the variability of the estimate.
  12. Interpreting Confidence Levels and Confidence Intervals A confidence level C, which gives the overall success rate of the method for calculating the confidence interval. That is, in C% of all possible samples, the method would yield an interval that captures the true parameter value. Interpretation for Confidence Level: The method that was used to construct the interval will capture the true (parameter in context) about C% of the time in repeated sampling.”
  13. Interpreting Confidence Levels and Confidence Intervals Confidence interval: To interpret a C% confidence interval for an unknown parameter, say, “We are C% confident that the interval from ____ to _____ captures the actual value of the [population parameter in context].”
  14. Caution!!! Never say that this (single) interval contains the true parameter 95% of the time!!!! It either does (100%) or it doesn’t (0%). The parameter remains stationary the CI moves.
  15. Check your understanding How much does the fat content of Brand X hot dogs vary? To find out, researchers measured fat content (in grams) of a random sample of 10 Brand X hot dogs. A 95% confidence interval for the population standard deviation 𝜎 is 2.84 to 7.55. Interpret the confidence interval. We are 95% confident that the interval from 2.84 to 7.55 captures the true standard deviation of fat content of Brand X hot dogs.
  16. Check your understanding How much does the fat content of Brand X hot dogs vary? To find out, researchers measured fat content (in grams) of a random sample of 10 Brand X hot dogs. A 95% confidence interval for the population standard deviation 𝜎 is 2.84 to 7.55. Interpret the confidence level. The method that was used to construct the interval will capture the true standard deviation of fat content of Brand X hot dogs about 95% of the time in repeated sampling.
  17. Check your understanding How much does the fat content of Brand X hot dogs vary? To find out, researchers measured fat content (in grams) of a random sample of 10 Brand X hot dogs. A 95% confidence interval for the population standard deviation 𝜎 is 2.84 to 7.55. True or False: The interval from 2.84 to 7.55 has a 95% chance of containing the actual population standard deviation σ. False. The probability is either 1 or 0.
  18. Example: For an IQ test N(100,15) Take a SRS (unbiased) of 10 students at KHS for IQ – we find that: 10% rule is satisfied, KHS population > 10(10) Population is normal so sampling distribution will be Normal Recall that 95% is 2 standard deviations by the empirical rule. So… 120 ± 2(4.7434) = 120 ± 9.4868 120 is the sample estimate and 9.4868 is the margin of error.
  19. Example Continued The interval formed is: This is how we interpret this interval: “We are 95% confident that the true mean IQ score at KHS lies between 110.5132 and 129.4868.” This is how to interpret a 95% confidence level: The method that was used to construct the interval will capture the true mean IQ score at KHS about 95% of the time in repeated sampling. *Note that the center of the interval is .
  20. Activity www.whfreeman.com/tps4e
  21. Activity Summary Greater confidence leads to a wider interval We want high confidence and a small margin of error High confidence – method almost always gives correct answer Small margin of error – we have pinned down our parameter quite precisely
  22. Activity Summary The margin of error of a confidence interval get smaller as: The confidence level C decreases The sample size n increases
  23. Calculating a Confidence Interval: We started with Estimate ± margin of error Our general formula is: statistic ± (critical value)(standard deviation of statistic) statistic – point estimate Critical value – depends on confidence level Standard deviation of statistic – proportion or mean
  24. Conditions for Constructing a Confidence Interval: There are 3 important conditions to check for: Random The data come from a well-designed random sample or randomized experiment
  25. Conditions for Constructing a Confidence Interval: There are 3 important conditions to check for: Normal The sampling distribution of the statistic is approximately Normal
  26. Conditions for Constructing a Confidence Interval: There are 3 important conditions to check for: Normal Proportions: np> 10 and n(1 – p) > 10 Means: Normal population or CLT (n > 30) or verify normality
  27. Conditions for Constructing a Confidence Interval: There are 3 important conditions to check for: Independent Individual observations are independent. When sampling without replacement, the sample size n should be no more than 10% of the population size to use our formula for the standard deviation of the statistic.
  28. Confidence Intervals for Proportions

    AP Statistics Chapter 12.1
  29. Introduction The statistic that estimates the parameter p is the sample proportion . Remember, we don’t know p so we use instead.
  30. Confidence Interval for Proportions 3 Conditions to meet: Random: SRS Normal: **count of successes and count of failures > 10 Independent: population has to be at least 10 times the sample size.
  31. Check your understanding Are the conditions met for calculating a confidence interval for the population proportion p? An AP Statistics class at a large high school conducts a survey. They ask the first 100 students to arrive at school one morning whether or not they slept at least 8 hours the night before. Only 17 students say “Yes.” Random: NO, not a SRS Normal: YES, 17 successes 83 failures Independent: YES, pop > 100(10)
  32. Check your understanding Are the conditions met for calculating a confidence interval for the population proportion p? A quality control inspector takes a random sample of 25 bags of potato chips from the thousands of bags filled in an hour. Of the bags selected, 3 had too much salt. Random: YES, a SRS Normal: NO, 3 failures not > 10 Independent: YES, pop > 25(10)
  33. Constructing a Confidence Interval statistic ± (critical value)(standard deviation of statistic) statistic – in this case, critical value – depends on C, we use z* standard deviation of statistic – in this case
  34. Finding Critical Values Definition: A level C confidence interval for a parameter is an interval computed from sample data by a method having a probability of C of producing an interval containing the true parameter. Previously, we used C values estimated by the empirical rule. Now, we will employ a method that will free us to pick any confidence level.
  35. Example For a 90% confidence interval: 90% of the area is between -1.645 and +1.645. z* = 1.645
  36. Example Continued Find z* for C95 z* = 1.96 You can use this process to find any z* For convenience, common ones are listed in Table B
  37. Alternate Way to Calculate z* Use your calculator and input: -invnorm((1 – C)/2, 0, 1) Example: z* for a 97.5% CI: -invnorm((1 - .9750)/2, 0, 1) = 2.2414 -invnorm(.0125, 0, 1) = 2.2414 invnorm(.0125, 0, 1) = -2.2414 use + invnorm(.9875, 0, 1) = 2.2414
  38. Constructing a Confidence Interval statistic ± (critical value)(standard deviation of statistic) standard deviation of statistic – in this case When the standard deviation of a statistic is estimated from data, the result is called the standard error (SE) of the statistic.
  39. One-Sample z Interval for a Population Proportion Choose an SRS of size n from a large population that contains an unknown proportion p of successes. An approximate level C confidence interval for p is Where z* is the critical value for the standard Normal curve with area C between - z* and z*. Use this interval only when the number of successes and failures in the sample are both at least 10 and the population is at least 10 times as large as the sample.
  40. Example Alcohol abuse has been described by college presidents as the number one problem on campus, and it is an important cause of death in young adults. A survey of 10,904 randomly selected U.S. college students collected information on drinking behavior and alcohol related problems. 2486 students were classified as frequent binge drinkers. Construct and interpret a 99% confidence interval for the proportion of college students who are classified as binge drinkers.
  41. Steps to follow When constructing a confidence interval be sure to follow the steps Identify the parameter and confidence level Identify appropriate inference method. Check conditions. If the conditions are met, perform calculations Interpret the interval in the context of the problem
  42. Example Continued Identify the parameter and confidence level We want to estimate the actual proportion p of all college students who are classified as binge drinkers at a 99% confidence level.
  43. Example Continued Identify appropriate inference method. Check conditions. We should use a one-sample z interval for p if the conditions are met - stated a random sample was used - 2486 successes and 8418 failures both > 10 - there are at least 10(10904) = 109040 U.S. college students Conditions met
  44. Example Continued If the conditions are met, perform calculations z*=2.576 for 99%
  45. Example Continued Interpret the interval in the context of the problem We are 99% confident that the interval from .218 to .238 contains the true proportion of U.S. college students who would be classified as binge drinkers.
  46. Example Continued Menu 6: Statistics 6: Confidence Intervals 5: 1-Prop z Interval Successes x: 2486 n: 10904 C Level: .99 Use “CLower” 0.217641 and “CUpper” 0.238339 to write your interval (.218, .238)
  47. Desired Sample Size

    AP Statistics
  48. Formula Formula for margin of error when using proportions: We won’t know p-hat ahead of time…we haven’t done the research yet!
  49. Continued Since we don’t know when choosing our sample size, we need to guess. Our book uses p*. Here are your options: Use a guess p* based on a pilot study or on past experience. Use p* = .5 as the guess. The margin of error is largest when p* = .5, and it gives the largest sample size of all other options (erring on the side of caution).
  50. Final Formula To determine the sample size n that will yield a level C confidence interval for a population proportion p with a maximum margin of error ME, solve the following inequality for n: where is a guessed value for the sample proportion.
  51. Final Formula Using Algebra is where is a guessed value for the sample proportion. Use when
  52. Example A company has received complaints about its customer service. The managers intend to hire a consultant to carry out a survey of customers. Before contacting the consultant, the company president wants some idea of the sample size that she will be required to pay for. One critical question is the degree of satisfaction with the company’s customer service, measured on a 5-point scale. The president wants to estimate the proportion p of customers who are satisfied. She decides that she wants the estimate to be with 3% at a 95% confidence level. How large a sample is needed?
  53. Example Continued ME = .03 (3%) z* = 1.96 and is unknown Divide both sides by 1.96 square both sides
  54. Example Continued Multiply both sides by n Divide both sides by .000234 Must round up, 1067 is not enough We will need 1068 respondents to ensure the margin of error is no more than 3%. use 1068
  55. Confidence Interval for the Mean of a Population – σ is Known

  56. One-Sample z Interval for a Population Mean when σ is known Choose an SRS of size n from a large population having unknown mean µ and known standard deviation σ. As long as the Normal and Independent conditions are met, a level C confidence interval for µ is The critical value z* is found from the standard Normal distribution.
  57. Cautions/Assumptions The point estimate, x-bar, must be calculated from an SRS for the results to generalize. Outliers can have a big effect on confidence intervals. Check for these graphically. When the population is non-normal, the Central Limit Theorem is important (n ≥ 30). You must know σ from the population.
  58. Sample size To determine the sample size n that will yield a level C confidence interval for a population mean with a specified margin of error ME: Get a reasonable value for the population standard deviation σ from an earlier or pilot study Find the critical value z*from a standard Normal curve for confidence level C Set the expression for the ME to be less than or equal to ME and solve for n: or
  59. Example #1 A drug-maker analyzes a specimen from each batch of a product. The results of repeated measurements follow a normal distribution quite closely. The standard deviation of this distribution is known to be σ = .0068 grams per liter. How many samples of the product must be tested to give a margin of error of +/- .005 grams per liter with a 95% confidence?
  60. Example #1 Continued σ = .0068 z* = 1.960 (95% confidence level) Desired margin of error = +/- .005 = m OR Sample size must be 8.
  61. Inference for the Mean of a Population – σ is Not Known

  62. Chapter 11 – Inference for Means Realistically, we won’t know σ. We will use the standard deviation from a sample sx as an estimate for σ. Thus, we will use a t-distribution instead of a z-distribution for calculating confidence intervals.
  63. T-Distributions Facts There are many t-distributions depending on sample size (“degrees of freedom”). Degrees of freedom are calculated: df = n-1. The higher the degrees of freedom, the closer a t-curve gets to becoming a z-curve. A t-curve with infinite degrees of freedom is a z-curve.
  64. T-Distribution Facts
  65. T-Distribution Facts The density curves for t-procedures are similar in shape to the standard normal curve. The spread of the t-curves is a bit greater than the standard normal curve. This is because there is error (extra variance) in using the sample standard deviation in place of the population σ. Since we don’t know σ, our standard error will be .
  66. Confidence Intervals without z* Draw a random sample of size n from a large population with unknown m and s. ? Formula: Z-interval One-sample t-interval
  67. Robustness A test is called robust if violations of assumptions do not greatly affect outc0mes of inference. t-procedures are robust against non-normality of the population when there are no outliers. t-procedures are strongly affected by outliers. Checks for them must be made if dealing with raw data.
  68. Guidelines for Normality If n < 15, normality checks must be closely followed. This may be accomplished graphically, or it may be given. If n > 15, the only real concerns are outliers and strong skewedness. If n is at 30 or more, t-procedures may be used even for clearly skewed distributions.
  69. Assumptions Summarized Samples must be chosen randomly. Outliers checked if using raw data. Normality of the sampling distribution is verified in the usual way: Given Graphically (histogram or NPP) Approximated by sample size Assumed (if unable to verify)
  70. Reading a t chart What is the t* for a 70% confidence interval that has 5 degrees of freedom? t* = 1.156 What is the t* for a 90% confidence interval from a sample of 15 observations? t* = 1.761 What critical value t* would you use for a 99% confidence interval from a sample of 20? t* = 2.861
  71. Example A large bank decided to study the call response times in its customer service department. Response times to a random sample of 241 calls to the bank’s customer service center in a given month were recorded and found to have seconds and seconds. The distribution of calls is skewed to the right, but there are no outliers.
  72. Example Identify the parameter and confidence level The parameter is the actual mean call response time 𝜇 at the 95% confidence level
  73. Example Identify appropriate inference method. Check conditions. One-sample t interval for 𝜇 if conditions met Stated random sample of n = 241 Distribution is skewed to the right but there are no outliers. We can rely on the robustness of t procedures because the sample size is large There must be at least 10(241) = 2410 calls to the customer service center in a given month.
  74. Example If the conditions are met, perform calculations (16.850, 19.856)
  75. Using the calculator Menu 6: Statistics 6: Confidence Intervals 2: t interval Stats Enter C=.95 (16.861, 19.845) More accurate because technology can use df=240
  76. Example Interpret the interval in the context of the problem We are 95% confident that the interval from 16.850 to 19.856 seconds contains the actual mean call response time. We are 95% confident that the interval from 16.861 to 19.845 seconds contains the actual mean call response time.
  77. Example A manufacturer of high resolution video terminals must control the tension on the mesh of fine wires that lies behind the surface of the viewing screen. The tension is measured by an electrical device with output readings in millivolts (mV). Some variation is inherent in the production process. Here are the tension readings from a random sample of 20 screens from a single day’s production: 269.5 297.0 269.6 283.3 304.8 280.4 233.5 257.4 317.5 327.4 264.7 307.7 310.0 343.3 328.1 342.6 338.8 340.1 374.6 336.1 Construct and interpret a 90% confidence interval for the mean tension 𝜇 of all the screens produced on this day.
  78. Example Identify the parameter and confidence level The parameter is the true mean tension 𝜇 of all the video terminals produced this day at a 90% confidence level
  79. Example Identify appropriate inference method. Check conditions. One-sample t interval for 𝜇 if conditions met Stated random sample of n = 20 Sample size is small (n = 20) so we must examine the sample data
  80. Example Identify appropriate inference method. Check conditions. These graphs give us no reason to doubt the Normality of the population We must assume that at least 10(20) = 200 video terminals were produce this day.
  81. Example If the conditions are met, perform calculations (292.32, 320.32)
  82. Using the calculator Menu 6: Statistics 6: Confidence Intervals 2: t interval Stats Enter , C = .90 (292.32, 320.32) OR Menu 6 6 2 Data List name C Level .90
  83. Example Interpret the interval in the context of the problem We are 90% confident that the interval from 292.32 to 320.32 mV captures the true mean tension in the entire batch of video terminals produced that day.
More Related