Statistical Methods 1

Statistical Methods 1 Lecture Notes Chapter 7: Inferences Based on a Single Sample: Estimation with Confidence Intervals

Introductory Example • A survey by the Roper Organization found that 45% of the people who were offended by a television program would change the channel, while 15% would turn off their television sets. The survey further stated that the margin of error is 3 percentage points and 4000 adults were introduced. • How do these estimates compare with the true population percentages? • Is the sample of 4000 large enough to represent the population of all adults who watch television in the United States.

objectives • Find the confidence interval for the mean when σ is known or n  30. • Determine the minimum sample size for finding a confidence interval of the mean • Find the confidence interval for the mean when σ is unknown and n< 30. • Find the confidence interval for a proportion • Determine the minimum sample size for finding a confidence interval of the proportion

7.1 Identifying and Estimating the Target Parameter • Target Parameter Key words associated with parameters:

Confidence Interval • Recall that a statistic such as the sample mean is a point estimator of the population mean .

7.2 Confidence Interval for a Population Mean: Normal (z) Statistic • The goal is to determine how to estimate the population mean and assess the estimate’s reliability. • Example: Find such that . Solution:

Example (cont.) • According to the Central Limit Theorem, the distribution of will be approximately a normal and 95% of all from a sample of size n lie within of the mean. • There is a probability of .95 that  will lie in the interval . • The interval is called a large-sample 95% confidence interval for the population mean . • Large sample means the sample must be large enough so that the Central Limit Theorem can be applied. (Rule of thumb, n ≥ 30)

Example (cont.) • Calculating knowing , . We usually don’t know , but with a large sample s is a good estimator of . • The interval being called the 95% confidence interval for the population mean  means that if a large number of samples were taken and this interval calculated each time, 95% would contain . • The probability, .95, that measures the confidence we can place in the interval estimate is called a confidence coefficient. The percentage, 95%, is called the confidence level for the interval estimate. • The Margin of error= (This is the maximum error of estimate for a 95% confidence interval)

Definition • Confidence coefficient – probability that a randomly selected confidence interval encloses the population parameter • Confidence level – Confidence coefficient expressed as a percentage

The confidence coefficient • The confidence coefficient is equal to 1- , where  is called the significance level and it is the amount of area assigned to the tails of the sampling distribution, and is split between the two tails of the distribution.

Example: 90% confidence interval • John says he is 90% confident that the population mean is contained within the interval I when the values of the population are normally distributed.

Example: A publishing company has just published has just a new textbook. Before the company decides the price at which to sell this textbook, it wants to know the average price of all such textbooks in the market. The research department at the company took a sample of 36 comparable textbooks and collected information on their prices. This information produces a mean of $70.50 for this sample. It is known that the standard deviation of the prices of all such textbooks is $4.50. A) What is the point estimate of the mean price of all such college textbooks? B) What is the margin error of this estimate? C) Construct the 90% confidence interval for the mean price of all such college textbooks.

Large-Sample 100(1 - )% Confidence Interval for  Conditions required for a Valid Large-Sample Confidence Interval for  1. A random sample is selected from the target population. 2. The sample size n is large, n  30.  p. 306 # 11

Example: p. 306 #11. A random sample of 100 observations from a normally distributed population possesses a mean equal to 83.2 and standard deviation equal to 6.4. a. Find a 95% confidence interval for μ. b. What do you mean when you say that a confidence coefficient is .95? c. Find a 99% confidence interval for μ. d. What happens to the width of a confidence interval as the value of the confidence coefficient is increased while the sample size is held fixed? e. Would your confidence intervals of parts a and c be valid if the distribution of the original population were not normal? Explain.

Interpretation

7.3 Confidence Interval for a Population Mean: Student's t-Statistic • Many times, inferences must be made from small samples. But The Central Limit Theorem does not guarantee that sampling distribution of will be normal for small sample sizes. The sampling distribution of will be normal if the population is normal. • Also, the population standard deviation may not be known and the sample standard deviation s may not provide a good approximation for . Instead of using the statistic , which requires a good approximation of , the statistic is used.

t-statistic vs z-statistic • The t-statistic is very much like the z-statistic. It is mound shaped, symmetric, and has mean 0. The t-statistic is different in that it has two random quantities ( and s) while the z-statistic only has one ( ). • The variability of t depends on the sample size n. Variability is expressed as (n-1) degrees of freedom (df). As df gets smaller, variability increases.

t-statistic vs z-statistic

Table for t-distribution: Table IV • Table for t-distribution contains t-value for various combinations of degrees of freedom and tα. tα is the point where the upper tail of the t-distribution contains an area of . • The last row, where df = , contains the standard normal z-values.

Small-Sample 100(1 - )% Confidence Interval for  Conditions required for a Valid Large-Sample Confidence Interval for  1. A random sample is selected from the target population. • 2. The population has a relative frequency distribution that • is approximately normal.

Example: • The data below represent a sample of the number of homes fires started by candles for the past several years. Find the 99% confidence interval for the mean of homes started by candle each year.  p. 317 # 33ab,39

Example: p. 317 #33 The following random sample was selected from a normal distribution: 4, 6, 3, 5, 9, 3. a. Construct a 90% confidence interval for the population mean μ. b. Construct a 95% confidence interval for the population mean μ. c. Construct a 99% confidence interval for the population mean μ. d. Assume that the sample mean and sample standard deviation s remain exactly the same as those you just calculated, but that they are based on a sample of n = 25 observations rather than n = 6 observations. Repeat parts a-c . What is the effect of increasing the sample size on the width of the confidence intervals?

Example: p. 318 #39 • Radioactive lichen. Refer to the Lichen Radionuclide Baseline Research project at the University of Alaska, presented in Exercise 2.36 (p. 47). Recall that the researchers collected 9 lichen specimens and measured the amount (in microcuries per milliliter) of the radioactive element cesium-137 for each. (The natural logarithms of the data values are saved in the LICHEN file.) A MINITAB printout with summary statistics for the actual data is shown below.

a. Give a point estimate for the mean amount of cesium in lichen specimens collected in Alaska. • b. Give the t-value used in a small-sample 95% confidence interval for the true mean amount of cesium in Alaskan lichen specimens. • c. Use the result you obtained in part b and the values of and s shown on the MINITAB printout to form a 95% confidence interval for the true mean amount of cesium in Alaskan lichen specimens. • d. Check the interval you found in part c with the 95% confidence interval shown on the MINITAB printout. • e. Give a practical interpretation for the interval you obtained in part c .

7.4 Large-Sample Confidence Interval for a Population Proportion • Confidence intervals around a proportion are confidence intervals around the probability of success in a binomial experiment. • Sample statistic of interest is , where .

Sampling Distribution of

Large Sample Confidence Interval of p. Conditions required for a Valid Large-Sample Confidence Interval of p • A random sample is selected from the target population. • The sample size n is large. (A sample size is considered large if both and . and are the number of success and the number of failures in the sample.)

Example • A sample of 500 nursing applications included 60 from men. Find the 90% confidence interval of the true proportion of men who applied to the nursing program.  p. 325 # 51,55

Example: p. 325#51 • A random sample of size n = 196 yielded = .64. • a. Is the sample size large enough to use the methods of this section to construct a confidence interval for p? Explain. • b. Construct a 95% confidence interval for p. • c. Interpret the 95% confidence interval. • d. Explain what is meant by the phrase “95% confidence interval.”

Example: p. 325#55 • Is Starbucks coffee overpriced? The Minneapolis Star Tribune (August 12, 2008) reported that 73% of Americans say that Starbucks coffee is overpriced. The source of this information was a national telephone survey of 1,000 American adults conducted by Rasmussen Reports. • Identify the population of interest in this study. • Identify the sample for the study. • Identify the parameter of interest in the study. • Find and interpret a 95% confidence interval for the parameter of interest.

Summary of Confidence Intervals A. Large-Sample Confidence Interval for a Population Mean For random samples of size  30, the confidence interval is expressed as B. Small-Sample Confidence Interval for a Population Mean The small sample confidence interval will be Where is based on (n – 1) degrees of freedom. C. Large-Sample Confidence Interval for a Population Proportion Large-Sample Confidence Interval for p Where and . • A sample size is considered large if both and .

7.5 Determining the Sample Size A. Estimating a Population Mean • The width of a confidence interval depends on the sample size: • As the sample size increases, the width of the interval decreases for any given confidence coefficient. • When we want to estimate  to within a given number of units with a (1- ) level of confidence, we can calculate the sample size needed by solving the equation = interval width for n.

Sampling Error The reliability associated with a confidence interval for the population mean is expressed using the sampling error within which we want to estimate  with 100(1- )% confidence.

Example: p. 332#71 • If you wish to estimate a population mean to within .2 with a 95% confidence interval and you know from previous sampling that σ2 is approximately equal to 5.4, how many observations would you have to include in your sample?

sampling error SE of a confidence interval for a population proportion p • The Sampling Error (SE) is half the width of the confidence interval. #71 p.

B. Estimating a Population Proportion #78, 81 p.

Example: p. 332#81 • Scanning errors at Wal-Mart. Refer to the National Institute for Standards and Technology (NIST) study of the accuracy of checkout scanners at Wal-Mart stores in California, presented in Exercise 3.52 (p. 132). NIST sets standards so that no more than 2 of every 100 items scanned through an electronic checkout scanner can have an inaccurate price. Recall that in a sample of 60 Wal-Mart stores, 52 violated the NIST scanner accuracy standard (Tampa Tribune, Nov. 22, 2005). Suppose you want to estimate the true proportion of Wal-Mart stores in California that violate the NIST standard.

a. Explain why the large-sample methodology of Section 7.4 is inappropriate for this study. • b. Determine the number of Wal-Mart stores that must be sampled in order to estimate the true proportion to within .05 with 90% confidence, using the large-sample method.

Statistical Methods 1

Statistical Methods 1

Presentation Transcript

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Basic statistical methods

Chapter 1 : Introduction to Statistical Methods

Statistical Methods

Statistical Learning Methods

Statistical Methods

Statistical Methods

Statistical Methods II

Statistical Methods

Statistical Methods II

Statistical Learning Methods

Statistical Methods II

Multivariate statistical methods

Basic Statistical Methods

Statistical Methods

ELEMENTARY STATISTICAL METHODS

Statistical Methods

Statistical Methods Bayesian methods

Statistical Methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Nonparametric Statistical Methods

Statistical Methods