Estimation of Means and Proportions

Estimation of Means and Proportions

Concepts • Estimator: a rule that tells us how to estimate a value for a population parameter using sample data • Estimate: a specific value of an estimator for particular sample data

Concepts • A point estimator is a rule that tells us how to calculate a particular number from sample data to estimate a population parameter • An interval estimator is a rule that tells us how to calculate two numbers based on sample data, forming a confidence interval within which the parameter is expected to lie

Properties of a Good Estimator • Unbiasedness: mean of the sampling distribution of the estimator equals the true value of the parameter • Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the smallest variance

Properties of a Good Estimator

Estimation of a Population Mean • The CLT suggests that the sample mean may be a good estimator for the population mean. The CLT says that: • Sampling distribution of sample mean will be approximately normally distributed regardless of the distribution of the sampled population if n is large • The sample mean is an unbiased estimator • The standard error of the sample mean is

Estimation of a Population Mean • A point estimator of the population mean is: • An interval estimator of the population mean is a confidence interval, meaning that the true population parameter lies within the interval of the time, where is the z value corresponding to an area in the upper tail of a standard normal distribution

Estimation of a Population Mean • Usually σ (the population standard deviation) is unknown. • If n is large enough (n ≥ 30) then we can approximate it with the sample standard deviation s.

One Sided Confidence Intervals • In some cases we may be interested in the probability the population parameter falls above or below a certain value • Lower One Sided Confidence Interval (LCL): • LCL= (point estimate) – • Upper One Sided Confidence Interval (UCL): • UCL = (point estimate) +

Small Sample Estimation of a Population Mean • If n is large, we can use sample standard deviation s as reliable estimator of population standard deviation • No matter what distribution the population has, sampling distribution of sample mean is normally distributed • As the sample size n decreases, the sample standard deviation s becomes a less reliable estimator of the population standard deviation (because we are using less information from the underlying distribution to compute s) • How do we deal with this issue?

t Distribution • Assume (1) The underlying population is normally distributed (2) Sample is small and σ is unknown • Using the sample standard deviation s to replace σ, the t statistic follows the t – distribution

Properties of the t Distribution • mound-shaped • perfectly symmetric about t=0 • more variable than z (the standard normal distribution) • affected by the sample size n (as n increases s becomes a better approximation for σ) • n-1 is the degrees of freedom (d.f.) associated with the t statistic

More on the t Distribution • Remember the t-distribution is based on the assumption that the sampled population possesses a normal probability distribution. • This is a very restrictive assumption. • Fortunately, it can be shown that for non-normal but mound-shaped distributions, the distribution of the t statistic is nearly the same shape as the theoretical t-distribution for a normal distribution. • Therefore the t distribution is still useful for small sample estimation of a population mean even if the underlying distribution of x is not known to be normal

How to use the t-distribution table • The t-distribution table is in the book (Appendix II, Table 4, pp611). tα is the value of t such that an area α lies to its right. To use the table: • Determine the degrees of freedom • Determine the appropriate value of α Lookup the value for tα

Table: t Distribution

The Difference Between Two Means • Suppose independent samples of n1 and n2 observations have been selected from populations with means , and variances , • The Sampling Distribution of the difference in means ( ) will have the following properties

The Difference Between Two Means • The mean and standard deviation of is • If the sampled populations are normally distributed, the sampling distribution of ( ) is exactly normally distributed regardless of n • If the sampled populations are not normally distributed, the sampling distribution of ( ) is approximately normally distributed when n1 and n2 are large

Point Estimation of the Difference Between Two Means • Point Estimator: • A confidence interval for ( ) is

Difference Between Two Means (small sample) • If n1 and n2 are small then the t statistic is distributed according to the t distribution if the following assumptions are satisfied: 1. Both samples are drawn from populations with a normal distribution 2. Both populations have equal variances

Difference Between Two Means (small sample) • In practice, the t statistic is still appropriate even if the underlying distributions are not exactly normally distributed. • To compute s, we can pool the information from both samples: or

Difference Between Two Means (small sample) • Point Estimate: • Interval Estimate: a confidence interval for is Where s is computed using the pooled estimate described earlier

Sampling Distribution of Sample Proportions • Recall from Chapter 6: • If a random sample of n objects is selected from the population and if x of these possess a chararacteristic of interest, the sample proportion is • The sampling distribution of will have a mean and standard deviation

Estimators for p Assuming n is sufficiently large and the interval lies in the interval from 0 to 1, the: • Point Estimator for p: • Interval Estimator for p: A confidence interval for p is

Estimating the Difference Between Two Binomial Proportions • Point estimate • Confidence interval for the difference

Choosing Sample Size • How many measurements should be included in the sample? • Increasing n increases the precision of the estimate, but increasing n is costly • Answer depends on: • What level of confidence do you want to have (i.e., the value of 100(1- α )? • What is the maximum difference (B) you want to permit between the estimate of the population parameter and the true population parameter

Choosing Sample Size • Once you have chosen B and α, you can solve the following equation for sample size n: • If the resulting value of n is less than 30 and an estimate

Choosing Sample Size

Estimation of Means and Proportions