Estimating Parameters

Estimating Parameters

Parameter Estimation We use statistics to estimate parameters, e.g., effectiveness of pilot training, psychotherapy. We want to know how good our estimates are. Most common ways to examine goodness of a statistic as an estimator are bias and standard error. We will define both, but first:

Sampling Distribution • A sampling distribution is a distribution of a statistic over many samples. • To get a sampling distribution, • 1. Take a sample of size N (a given number like 5, 10, or 1000) from a population • 2. Compute the statistic (e.g., the mean) and record it. • 3. Repeat 1 and 2 a lot (infinitely). • 4. Plot the resulting sampling distribution, a distribution of a statistic over repeated samples.

Sampling Distribution • Class exercise • Find some people’s height, graph it. Find the mean. • Take subsamples of different sizes N and compute mean height. Graph the results. • What happens as N gets larger?

Bias • If the mean of the sampling distribution equals the parameter, the statistic is said to be unbiased. • If the mean of the sampling distribution does not equal the parameter, the statistic is biased. • The mean is an unbiased estimator. The average value of is .

Bias • The sample standard deviation and variance are biased estimators of their population values. Fortunately, the estimators can be made unbiased with a simple correction. Use N-1 instead of N in the denominator. All stat packages (SPSS) do this. (hat means sample estimate of parameter)

Standard Error • We would like the statistic to be unbiased so we know that on average, the statistic equals the mean. • We would like all the estimates to be close to the parameter; the closer the better. • Think of the standard deviation of the sampling distribution as the standard error of the statistic. It tells the average distance of the statistic from the parameter (for unbiased statistics). The standard error tells how close the statistic is on average to the parameter.

Standard Error Notice that the mean of the sampling distribution is close to the mean of the population. The standard deviation of the sampling distribution is much smaller than the SD of the population.

Standard Error • There is a simple relationship between the standard error of the mean, the population SD and the sample size. • where is the standard error, that is the SD of the sampling distribution of the mean, is the SD of the population, and N is the sample size.

Standard Error • This means that the standard error gets large when the population SD is large and when our sample size is small. • We can make our estimates as precise as we want (small standard error) by increasing the size of the sample, that is, by using more participants in our research.

Review • What do these terms mean? • Sampling distribution • Bias • Standard error

Central Limit Theorem • As N increases, the sampling distribution of means becomes Normal. • Notice: • Location of means. • Size of sampling variances. • Shape of distributions.

Descriptive vs. Inferential Statistics • The mean and standard deviation can be used in 2 ways. One way is to describe the distribution of data (our mean is Xbar). • The other way is to infer something about a population (is the population mean 25+?). • Because the sampling distribution of the mean is normally distributed, we can use the normal to show how close the parameter is likely to be to the sample mean and to make decisions about treatments.

Statistical Tables • There are several well-studied statistical tables that are used for conducting statistical tests. • One of these is z, the unit normal. This table shows areas or percentages that correspond to various distances from the mean when measured in SD units. We use z for large sample tests.

Statistical Tables • Another commonly used table is t. The values of t are basically the same as z, but t spreads out more and more as the sample size gets small. • t takes into account the error in and SD with small samples. The values of z and t are virtually identical if N>100.

Example values from t and z

Confidence Intervals • Estimate a parameter • Margin of error – use confidence interval to bracket the estimate (e.g., opinion polls 40 % ± 2%). • Because of the sampling distribution of the mean, we can construct a confidence interval around the estimate of the parameter.

Estimated Mean with Confidence Interval (1) • We have raw data, want to estimate a population mean. • Because is an unbiased estimate of , it is our best bet for the value of the parameter. • The standard error of the mean shows how far the statistic is from the parameter on average. • Because the sampling distribution of the mean is normal, we can use the normal to construct an interval that will probably contain the population mean.

Estimated Mean With Confidence Interval (2) • Let’s say we want to create a 95% confidence interval so 95/100 times, CI will contain the population mean. • 95%CI = • Sample mean plus/minus the value of the t distribution that contains 95 percent of the distribution times the standard error of the mean. The 95 percent value is called a critical value.

Estimated Mean With Confidence Interval (3) Find the value of t

Example Confidence Interval • Want to estimate height of students at USF. Sampled N=100 students. Found mean =68 in and SD = 6 in. • Best guess for population mean is 68 inches plus or minus some. • 95%CI = • 95%CI=68±(1.98)[6/sqrt(100)] • 68 ±1.98(.6) = 68 ±1.19 • Interval is 66.81 to 69.19. Such an interval will contain the mean 95% of the time.

Example Confidence Interval Note. Sample mean is close to population mean. Confidence interval is computed about the sample mean. The confidence interval contains the population mean! Yay!

Review • What is the central limit theorem? • What is the difference between descriptive and inferential statistics? • What is a confidence interval?

Estimating Parameters