1 / 58

Section 4 Confidence Intervals

Section 4 Confidence Intervals. William Christensen, Ph.D. Using Confidence Intervals to Estimate Population Parameters.

welker
Download Presentation

Section 4 Confidence Intervals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 4 Confidence Intervals William Christensen, Ph.D.

  2. Using Confidence Intervals to Estimate Population Parameters • Do you understand what a “population parameter” is? We use the word “parameter” as a general way to describe one or all of the characteristics of a population such as; average/mean, proportion, standard deviation and variance. • In the “real world” we usually do not know the true population parameters (such as the mean) because it is too expensive and time consuming to collect data on every member of a population. Therefore, we most often use sample data to estimate things about the population like mean and standard deviation (population parameters)

  3. Using Confidence Intervals to Estimate Population Parameters • The most common method for using sample data to estimate a population parameter is to create Confidence Intervals • Basically, a confidence interval allows us to say something like this: “We are 90% confident that the true population mean is between 24.5 and 27.8.” • Here you can see the two main parts of a confidence interval: • A level of confidence, such as 90% or 95% or 99%. By the way, there is no 100% confidence level in statistics • A range of values. Using the Confidence Interval methods you are about to learn, we will establish a range of values that we think the true population parameter falls between

  4. Using Confidence Intervals to Estimate Population Parameters • Remember: the whole reason for calculating confidence intervals is that we usually only have sample data which is only a small subset of the population we are interested in. Since there are probably some differences between our sample data (which we have) and the true population data (which we don’t have), we need to be able to estimate what the true population parameters are. • Confidence Intervals allow us to use our sample data to estimate the true population parameters, such as mean and standard deviation.

  5. Using Confidence Intervals to Estimate Population Parameters • In this Section you will learn to create confidence intervals to estimate the following population parameters: • Confidence intervals for a Population Mean • When our sample size is large (more than 30) • You will also learn how to calculate the sample size that would be necessary to estimate a population mean with a given level of accuracy • Confidence intervals for a Population Mean • When our sample size is small (less than or equal to 30) • Confidence intervals for a Population Proportion • A proportion is kind of like a mean, but expressed as a probability (between 0 and 1) or percentage • Confidence intervals for a Population Variance and/or Standard Deviation

  6. Point Estimate of Population Parameter • Without a confidence interval, the best estimate (Point Estimate) of a population parameter is simply whatever we calculate from the sample data. For example, if we have a sample of women’s weights with a mean of 143 lbs., then this is the best “Point Estimate” we have of the true population mean. • Confidence Intervals allow us to create a better estimate

  7. Two Parts of a Confidence Interval • Let’s re-visit the two main parts of a confidence interval: • A level of confidence, such as 90% or 95% or 99%. • The 3 most common confidence levels are 90%, 95%, and 99% • Associated with any confidence level is a value called alpha () •  is simply the difference between the confidence level and 1 • For a confidence level of 90%,  = 0.10 • For a confidence level of 95%,  = 0.05 • For a confidence level of 99%,  = 0.01 • You can view a confidence level as the chance we are right about our confidence interval, and  (alpha) as the chance that we are wrong

  8. Two Parts of a Confidence Interval • A range of values. A Confidence Interval is defined as a range (or an interval) of values used to estimate the true value of the population parameter. • The correct form for expressing the range or interval of values is: • Lower #  population parameter  Upper # • Note: the population parameter must always be expressed by using the appropriate symbol: • (mu) for population mean • (sigma) for population standard deviation • 2 (sigma squared) for population variance p for population proportion Example:24.3    27.8

  9. Confidence Intervalsfor Population Meansin large samples (n > 30)

  10. E = zα/2 • σ n x - E x + E Estimating Population MeansCalculating the Lower & Upper Limits Lower #  µ  Upper # We calculate the lower and upper limits of a confidence interval for a population mean by taking the sample mean (x-bar) - / + the margin of error (E). Where: Before proceeding to use this formula, let’s learn a little more about this zα/2, or what is called the CRITICAL VALUE

  11. α/2 α/2 zα/2 -zα/2 z=0 The Critical Value zα/2 • The Critical Value zα/2 is a z-score and number that separates an area α/2 in the each (left and right) tail of the standard normal distribution.

  12. The Critical Value zα/2 95% • The critical value +/- zα/2 sets apart the area or probability for our confidence interval. In this case, we are looking for a 95% confidence interval, so α = 0.05 and α/2 = 0.025 α = 5% α/2 = 2.5% = .025 .95 .025 .025 zα/2 -zα/2

  13. 95% .95 .025 .025 zα/2 -zα/2 The Critical Value zα/2 • There are two ways to determine the zα/2 critical value(s). Note that we have +/- zα/2. These are the same number, only one is positive and the other negative. Therefore, if we find one, we know the other by just changing the sign.

  14. The Critical Value zα/2 • There are two ways to determine zα/2 • The first and easiest way is to use the Excel function we already learned, NORMSINV(probability) • Example: to find the zα/2 critical value for a 90% confidence interval: • If the confidence level is 90% then we know alpha = 0.10 (the difference between the confidence level and 1) • Therefore, α/2 = 0.10 / 2 = 0.05 • Using Excel =NORMSINV(0.05) we get an answer of negative 1.64485 (this is the left-side critical value). The right-side critical value is simply +1.64485 (just change the – to +), or you could calculate the right-side critical value by =NORMSINV(0.95) = 1.64485. • The second method is the old-style or traditional method which involves looking up zα/2 in a “normal distribution” table. Since tables are not always available, I suggest you stick with the Excel method

  15. E = zα/2 • σ n x - E x + E Estimating Population MeansCalculating the Lower & Upper Limits Lower #  µ  Upper # Now that we understand zα/2 and how to use Excel to find its value, we should be able to construct a confidence interval for a population mean. Important note: In this formula, we only need the positive value of zα/2, NOT the negative value. We take the negative value into account later when we subtract E from the sample mean to calculate the Lower# for the confidence interval

  16. Estimating Population MeansCalculating the Lower & Upper Limits Confidence Interval for a population mean: (sample mean – E)  µ  (sample mean + E), where E = zα/2*(σ/sqrt(n)) EXAMPLE: Given a sample of 50 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean To solve this problem we must first calculate E (margin of error). The formula for margin of error is: E = zα/2 * (σ / sqrt(n)) Our sample data already provided us the info that s = 29 lbs, and n=50 women, so the only thing missing is to find zα/2. With a confidence level of 95%, we know α = 0.05, so α/2 = 0.025. Using the Excel function NORMSINV We calculate zα/2 as follows: =NORMSINV(0.025) = -1.96 (we ONLY use the positive value in the formula for calculating E)

  17. Estimating Population MeansCalculating the Lower & Upper Limits EXAMPLE: Given a sample of 50 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Now the we have all the pieces, we can solve for E and then construct the confidence interval. E = zα/2 * (σ / sqrt(n)) = 1.96 * (29 / sqrt(50)) = 1.96 * 4.1012 = 8.04 lbs. Finally, knowing E we can construct our confidence interval as follows: (sample mean – E)  µ  (sample mean + E) (143 – 8.04)  µ  (143 + 8.04) 134.96  µ  151.04 We did it. This is the correct form for a confidence interval. We can read this as follows: we are 95% confident that for womens’ weights the true population mean is between 134.96 lbs. And 151.04 lbs.

  18. Estimating Population MeansCalculating the Lower & Upper Limits PRACTICE, PRACTICE, PRACTICE: You must know how to do and interpret all kinds of confidence interval problems. For confidence intervals that estimate population means of large sample (sample size greater than 30), here are some sample problems. Practice constructing 90%, 95%, and 99% confidence intervals for the population means. • A sample of 54 bears in Yellowstone National Park has a mean weight of 182.9 lbs., with a standard deviation of 121.8 lbs. • A study of hospital costs among 40 automobile accident victims who were wearing seat belts showed an average hospital cost $9000 with a standard deviation of $5600. • Use other data from the data sets provided for this course to create and solve your own problems

  19. Calculating E When σ Is Unknown •   You may have noticed in the sample problems that we did that the formula for E includes σ, which is actually the population standard deviation, NOT the sample standard deviation. • However, it is OK to use the sample standard deviation (s) in place of σ, when the population standard deviation is not known (and it usually is not known)

  20. Find the critical value zα/2 using Excel NORMSINV • Calculate E, the margin of error: E = zα/2 * σ/sqrt(n) •  If σ (population standard deviation) is unknown, use • s (sample standard deviation) • Find the lower and upper limits of the confidence interval • (x - E and x + E). Use the correct form: • Round the final answer (do not round intermediate calculations) to one more decimal place than is used in the original sample data SUMMARY - Procedure for Constructing a Confidence Interval for µ( Based on a Large Sample: n > 30 ) Lower#  µ  Upper#

  21. Confidence Intervalsfor Population Meansin small samples (n  30)

  22. Estimating Population Meansfor small samples • Confidence intervals for population means when we have small samples (n  30) is very similar to what we just learned about large samples (n  30) • We still calculate the confidence interval as: (sample mean – E)  µ  (sample mean + E) • The main difference is that we now have a slightly different formula for E (margin of error) • For small samples (defined as less than or equal to 30): s Use σ (population std. dev.) if available. Otherwise, use s (sample std. dev.) E = t n α/2 (where tα/2 has n - 1 degrees of freedom)degrees of freedom is discussed a little later

  23. The Critical Value tα/2 • tα/2 is similar to a zα/2, but rather than coming from the standard normal distribution, it comes from a distribution called the “student t distribution”. • It should make sense to you that when we have a smaller sample from which to estimate the population mean, our estimate cannot be as accurate as when we have a large sample. • The tα/2 value adjusts our E (margin of error) to account for our smaller sample size

  24. The Critical Value tα/2 • There are two ways to determine tα/2 • The first and easiest way is to use a new Excel function =TINV(probability,deg_freedom) which we will discuss in more detail in just a minute. • The second method is the old-style or traditional method which involves looking up tα/2 in a “student t distribution” table. Since tables are not always available, I suggest you stick with the Excel method

  25. Using TINV to find the Critical Value tα/2 • To use Excel’s TINV function to find the critical value tα/2 we must input 2 things • The area or probability, which is represented by α (alpha) – (NOT α / 2). This means we simply put in the value of alpha • Degrees of freedom. This is how we adjust for our small sample. Degrees of Freedom is the sample size (n) minus 1. df = n -1. • Example: for a small sample of 25, df = 25 – 1 = 24 • Example: for a sample of 11, degrees of freedom (df) = 10

  26. Using TINV to find the Critical Value tα/2 • Here is an example where we have a 95% confidence interval (α = 0.05) and a sample size of 20. Probability is represented by α (NOT α / 2) when using TINV (assumes two-tail test) df = n – 1 = 20 – 1 = 19

  27. Estimating Population Meansfor small samples Confidence Interval for a population mean: (sample mean – E)  µ  (sample mean + E), where E = tα/2*(s/sqrt(n)) EXAMPLE: Given a sample of 15 women (a small sample) in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean To solve this problem we must first calculate E (margin of error). Our formula for margin of error is: E = tα/2 * (s / sqrt(n)) Our sample data already provided us the info that s = 29 lbs, and n=15 women, so the only thing missing is to find tα/2. With a confidence level of 95%, we know α = 0.05. UNLIKE when calculating zα/2, we DO NOT need to divide α/2 when using TINV because the TINV function automatically assumes there are two tails, or that the alpha is split evenly between the left and right sides. Thus, using the Excel function TINV we calculate tα/2 as follows: =TINV(0.05,14) = 2.1448 (Note that df = n-1 = 15-1 = 14. Also note that TINV always returns a positive value which we can input directly into our margin of error formula)

  28. Estimating Population Meansfor small samples EXAMPLE: Given a sample of 15 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Now the we have all the pieces, we can solve for E and then construct the confidence interval. E = tα/2 * (s / sqrt(n)) = 2.1448 * (29 / sqrt(15)) = 2.1448 * 7.4877 = 16.06 lbs. Finally, knowing E we can construct our confidence interval as follows: (sample mean – E)  µ  (sample mean + E) (143 – 16.06)  µ  (143 + 16.06) 126.94  µ  159.06 We did it. This is the correct form for a confidence interval. We can read this as follows: given our sample of 15 women, we are 95% confident that the true population mean of womens’ weights is between 126.94 lbs. And 159.06 lbs. You might notice that this estimate is not as accurate as our estimate when our sample size was 50. That’s indeed how it works – smaller samples yield less-precise estimates of population means

  29. Estimating Population Meansfor small samples PRACTICE, PRACTICE, PRACTICE: You must know how to do and interpret all kinds of confidence interval problems. For confidence intervals that estimate population means of small samples (n  30), here are some sample problems. Practice constructing 90%, 95%, and 99% confidence intervals for the population means. • A sample of 24 bears in Yellowstone National Park has a mean weight of 182.9 lbs., with a standard deviation of 121.8 lbs. • A study of hospital costs among 20 automobile accident victims who were wearing seat belts showed an average hospital cost $9000 with a standard deviation of $5600. • Use other data from the data sets provided for this course to create and solve your own problems

  30. Determining the Sample Size required for a given margin or error (E)

  31. Determining Sample Size given E • Sometimes we want to determine in advance how much error (i.e., E or the margin or error) we are willing to have in our estimate of the population mean • In fact, we can obtain whatever margin of error we would like, IF we are willing and able to adjust our sample size • The relationship between sample size and margin of error is illustrated on the next slide

  32. Sample Size for Estimating Mean µ • Using our basic formula for calculating E (margin of error) we can also find n, given E σ zα/2  Where: zα/2 is based on the desired level of confidence E = desired margin of error Use σ if available, otherwise us s (sample std. dev.) E = n (solve for n by algebra) 2 zα/2  σ n = E

  33. Round-Off Rule for Sample Size n When finding the sample size n, if the calculated n does not result in a whole number, always increase the value of n to the next larger whole number. n = 116.009 = 117 (round up)

  34. Determining Sample Size given E 2 2 n = zα/2σ = (1.96)(15) E 2 Example:If we want to estimate the true population mean IQ for statistics students, how many statistics students would we need to test (i.e., what sample size is needed) so that our estimate is within 2 IQ points of the true population mean with a confidence level of 95%? From previous studies, we believe a conservative estimate of the population standard deviation (σ) is 15. α = 0.05 zα/2 = NORMSINV(.025) = 1.96 E (desired margin of error) = 2 σ = 15 = 216.09 = 217 students We would need to randomly select 217 statistics students and obtain their IQ scores. We would then be 95% confident that our sample mean would be within 2 IQ points of the true mean IQ score for the entire population of statistics students.

  35. Determining Sample Size given E In determining sample size (given a desired margin of error) we have assumed that some value or estimate of the population standard deviation (σ) is available. However, many times we have no estimate of σ. In such cases, we have three alternatives: 1. Use the range rule of thumb to estimate a standard deviation as follows: est. standard deviation ≈ range / 4 2. Conduct a pilot study by starting the sampling process. Based on the first collection of at least 31 randomly selected sample values, calculate the sample standard deviation s and use it in place of σ. That value can be refined as more sample data are obtained. 3. Estimate the value of σby using the results of some other study that was done earlier

  36. Confidence Intervals for Population Proportion

  37. Estimating Population Proportion • Often we are interested in being able to estimate a population “proportion” • Proportion is kind of like an average, but is expressed as a probability (p) or percentage • For example, we might want to estimate what proportion (%) of households in the U.S. who are watching the Olympics on television

  38. We use the following notation to express the confidence interval or estimate for a population proportion: Lower#  p  Upper# “p” represents the true population proportion We use the symbol p (p-hat) to represent the sample proportion. Another symbol, q is defined as p -1 ˆ ˆ ˆ Estimating Population Proportion

  39. Confidence Interval for Population Proportion ˆ ˆ • Zα/2 is again found using Excel function NORMSINV(probability) where “probability” is α / 2 and the absolute value of the result is used in the above formula • Round the confidence interval limits to three significant digits. p - E  p  p + E where ˆ ˆ p q E = zα/2 n

  40. ˆ ˆ p q ˆ ˆ E = zα/2 p - E  p  p + E where n Example: The CBS television show 60 Minutes has a share of 20, meaning that among the TV sets in use, 20% are typically tuned to 60 Minutes (based on Nielsen Media Research data). Assume a sample size of 4,000 (typical for Nielsen surveys). Construct a 97% confidence interval estimate of the population proportion (the proportion of all TV sets in the U.S. tuned to 60 Minutes). • α = 0.03 (97% confidence level) and α/2 = 0.03/2 = 0.015 • Thus, zα/2 is found using Excel =NORMSINV(0.015) = -2.17 (we take the absolute value which is 2.17) • p-hat is given as 20% or 0.20. q-hat is simply 1 - p-hat = 1 – 0.20 = 0.80 (remember: p-hat + q-hat = 1 always) • Thus, E = 2.17 * sqrt ((0.20*0.80) / 4000) = 2.17 * .0063245 = E = 0.0137241 • note: often we are dealing with very small numbers – do not round any intermediate calculations – wait until we have the confidence interval limits to round to 3 significant digits)

  41. ˆ ˆ p q ˆ ˆ E = zα/2 p - E  p  p + E where n Example: The CBS television show 60 Minutes has a share of 20, meaning that among the TV sets in use, 20% are typically tuned to 60 Minutes (based on Nielsen Media Research data). Assume a sample size of 4,000 (typical for Nielsen surveys). Construct a 97% confidence interval estimate of the population proportion (the proportion of all TV sets in the U.S. tuned to 60 Minutes). • Finally, our confidence interval is: • p-hat – E  p  p-hat + E • 0.20 – 0.0137241  p  0.20 + 0.0137241 • 0.186  p  0.214(rounded to 3 significant digits) • With this large sample size of 4,000, the margin or error is quite small and we can be 97% confident that the population proportion of all TV viewers in the U.S. tuned to 60 Minutes varies only between 0.186 (18.6%) and 0.214 (21.4%).

  42. Determining Sample Size when estimating p, given desired E • Note: If p-hat (the sample proportion) is unknown, use a p-hat = 0.50 ˆ ˆ p q zα/2 E = n (solving for n by algebra) (zα/2)2 ˆ ˆ p q n= E2

  43. ˆ ˆ Example: We want to determine, with a margin of error of four percentage points (4%), the current percentage of U.S. households using e-mail. Assuming that we want 90% confidence in our results, how many households must we survey? A recent study indicates 28.9% of U.S. households used e-mail. n = [zα/2 ]2p q Use absolute value of the Excel function NORMSINV(probability), with probability = α / 2 = 0.10/2 = 0.05 2 E To be 90% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 348 households. = [1.645]2(0.289)(0.711) 0.042 = 347.5195 = 348 households

  44. ˆ ˆ Example: We want to determine, with a margin of error of four percentage points (4%), the current percentage of U.S. households using e-mail. Assuming that we want 95% confidence in our results, how many households must we survey? We have no idea what percent of U.S. households may be using email (i.e., p is unknown, therefore we should us p = 0.50) n = [zα/2 ]2p q Use absolute value of the Excel function NORMSINV(probability), with probability = α / 2 = 0.05/2 = 0.025 2 E To be 95% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 601 households. = [1.96]2(0.5)(0.5) 0.042 = 600.25 = 601 households Note: There is an important relationship between margin of error and sample size. That is, to reduce margin of error by half, sample size must be increased four times. In other words, a little less error requires a lot bigger sample. You should remember this.

  45. Confidence Intervals for Population Variance and Standard Deviation

  46. Estimating Population Variance and Standard Deviation • The last population parameters that we will learn how to estimate are Variance and Standard Deviation • Variance is simply Standard Deviation squared • Thus, Standard deviation is simply the square root of Variance • The bad news is that we need a new type of distribution and critical value in order to estimate population variance and standard deviation • This new distribution is called the Chi-squared (2) distribution (see next slide for a density curve graph of the Chi-squared distribution)

  47. df = 10 Not symmetric df = 20 x2 0 All values are nonnegative 5 10 20 25 30 35 40 45 15 0 General Chi-Square Distribution Chi-Square Distribution for df = 10 and df = 20 Properties of the Distribution of the Chi-Square Statistic • The chi-square distribution is not symmetric, unlike the normal and Student t distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric • The values of chi-squared can be zero or positive, but they cannot ever be negative

  48. (n - 1)s2 (n - 1)s2 X X 2 2 L R (n - 1)s2 (n - 1)s2 X X 2 2 L R Confidence Interval for the Population Std. Deviation  and Variance 2 σ2 Right-tail CV Left-tail CV Note: our Chi-square distribution has one critical value for the left tail and a completely separate critical value for the right-tail σ

  49. Chi-Square Critical Values These are the two probabilities for which we must find chi-square values 0.975 Important: The area or probability associated with a chi-square value is always the area (all the area) to the right of that chi-square value For α = 0.05, there is α/2 = 0.025 in each tail. With 0.025 in the left-tail, there is 0.975 to-the-right of that area. With 0.025 in the right-tail, there is only exactly that much (0.025) to-the-right remaining. 0.025 0.025 0.025 0 2 XL2= 2.700 XR= 19.023 X 2 (df = 9)

  50. 0.975 0.025 0.025 0.025 0 2 XL2= 2.700 XR= 19.023 X 2 (df = 9) Chi-Square Critical Values These are the two probabilities for which we must find chi-square values. We can use the Excel function CHIINV to find these chi-square values

More Related