Estimation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 94

Estimation PowerPoint PPT Presentation


  • 69 Views
  • Uploaded on
  • Presentation posted in: General

Estimation. Estimators & Estimates. Estimators are the random variables used to estimate population parameters, while the specific values of these variables are the estimates . Example: the estimator of m is often. but if the observed values of X are 1, 2, 3, and 6, the estimate is 3.

Download Presentation

Estimation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Estimation

Estimation


Estimators estimates

Estimators & Estimates

  • Estimators are the random variables used to estimate population parameters, while the specific values of these variables are the estimates.

  • Example: the estimator of m is often

but if the observed values of X are 1, 2, 3, and 6, the estimate is 3.

So the estimator is a formula; the estimate is a number.


Properties of a good estimator

Properties of a Good Estimator

  • Unbiasedness

  • Efficiency

  • Sufficiency

  • Consistency


Unbiasedness

Unbiasedness

  • An estimator (“theta hat”) is unbiased if its expected value equals the value of the parameter (theta) being estimated. That is,

In other words, on average the estimator is right on target.


Examples

Examples

If we divided by n instead of by n-1, we would not have an unbiased estimator of s2. That is why s2 is defined the way it is.


Estimation

Bias

  • The bias of an unbiased estimator is

  • zero.


Mean squared error mse

Mean Squared Error (MSE)


Efficiency

Efficiency

  • The most efficient estimator is the one with the smallest MSE.


Efficiency1

Efficiency

for unbiased estimators (where the bias is zero), MSE = s2.

So if you are comparing unbiased estimators, the most efficient one is the one with the smallest variance.

If you have two estimators, one of which has a small bias & a small variance and the other has no bias but a large variance, the more efficient one may be the one that is just slightly off on average, but that is more frequently in the right vicinity.


Example sample mean median

Example: sample mean & median

  • As we have found, the sample mean is an unbiased estimator of m.

  • It turns out that the sample median is also an unbiased estimator of m.

  • We know the variance of the sample mean is s2/n.

  • The variance of the sample median is (p/2)(s2/n).

  • Since p is about 3.14, p/2 >1.

  • So the variance of the sample median is greater than s2/n, the variance of the sample mean.

  • Since both estimators are unbiased, the one with the smaller variance (the sample mean) is the more efficient one.

  • In fact, among all unbiased estimators of m, the sample mean is the one with the smallest variance.


Sufficiency

Sufficiency

  • An estimator is said to be sufficient if it uses all the information about the population parameter that the sample can provide.


Examples1

Examples

  • Example 1: The sample median is not a sufficient estimator because it uses only the ranking of the observations, and not their numerical values [with the exception of the middle one(s)].

  • Example 2: The sample mean, however, uses all the information, and therefore is a sufficient estimator.


Consistency

Consistency

  • An estimator is said to be consistent if it yields estimates that converge in probability to the population parameter being estimated as n approaches infinity.

  • In other words, as the sample size increases, The estimator spends more and more of its time closer and closer to the parameter value.

  • One way that an estimator can be consistent is for its bias and its variance to approach zero as the sample size approaches infinity.


Example of a consistent estimator

Example of a consistent estimator

distribution of estimator when n = 500

  • distribution of estimator when n = 50

As the sample size increases, the bias & the variance are both shrinking.

distribution of estimator when n = 5

m


Example sample mean

Example: Sample Mean

  • _

  • We know that the mean of X is m.

  • So its bias not only goes to zero as n approaches infinity, its bias is always zero.

  • The variance of the sample mean is s2/n.

  • As n approaches infinity, that variance approaches zero.

  • So, since both the bias and the variance go to zero, as n approaches infinity, the sample mean is a consistent estimator.


A great estimator the sample mean

A great estimator: the sample mean

We have found that the sample mean is a great estimator of the population mean m.

It is unbiased,

efficient,

sufficient,

& consistent.


Point estimators versus interval estimators

Point Estimators versus Interval Estimators

  • Up until now we have considered point estimators that provide us with a single value as an estimate of a desired parameter.

  • It is unlikely, however, that our estimate will precisely equal our parameter.

  • We, therefore, may prefer to report something like this: We are 95% certain that the parameter is between “a” and “b.”

  • This statement is a confidence interval.


Building a confidence interval

0.4750

0 1.96 Z

Building a Confidence Interval

  • We know that Pr(0 < Z < 1.96) = 0.4750

  • Then Pr(-1.96 < Z < 1.96) = 0.95

-1.96

We also know that is distributed as a standard normal (Z).

So there is a 95% probability that


Continuing from w ith 95 probability

Continuing from: with 95% probability,

Multiplying through by ,

Subtracting off ,

Multiplying by -1 and flipping the inequalities appropriately,

Flipping the entire expression,


So we have a 95 confidence interval for the population mean m

So we have a 95% Confidence Interval for the Population Mean m


Estimation

Example: Suppose a sample of 25 students at a university has a sample mean IQ of 127. If the population standard deviation is 5.4, calculate the 95% confidence interval for the population mean.

We are 95% certain that the population mean is between 124.88 & 127.12 .


When we say we are 95 certain that the population mean m is between 124 88 127 12 it means this

When we say we are 95% certain that the population mean m is between 124.88 & 127.12, it means this:

  • The population mean m is a fixed number, but we don’t know what it is.

  • Our confidence intervals, however, vary with the random sample that we take.

  • Sometimes we get a more typical sample, sometimes a less typical one.

  • If we took 100 random samples and from them calculated 100 confidence intervals, 95 of the intervals should contain the population mean that we are trying to estimate.


What if we want a confidence level other than 95

What if we want a confidence level other than 95%?

In our formula, the 1.96 came from our the fact that the Z distribution will be between -1.96 and 1.96 95% of the time.

To get a different confidence level, all we need to do is find the Z values such that we are between them the desired percent of the time.

Using that Z value, we have the general formula for the

confidence interval for the population mean m :


Estimation

Notice: In our confidence interval formula, we used “less than” symbols:

Your textbook uses “less than or equal to” symbols:

Either of these is acceptable. Recall that the formula is built upon the concept of the normal probability distribution. The probability that a continuous variable is exactly equal to any particular number is zero. So it doesn’t matter whether you include the endpoints of the interval or not.


Determining z values for confidence intervals

Determining Z values for confidence intervals

0.9800

0.4900

-k 0 k Z

-2.33 2.33

  • Suppose we want a 98% confidence interval.

  • We need to find 2 values, call them –k and k, such that Z is between them 98% of the time.

  • Then Z will be between 0 and k with probability half of 0.98, which is 0.49 .

  • Look in the body of the Z table for the value closest to 0.49, which is 0.4901 .

  • The number on the border of the table corresponding to 0.4901 is 2.33.

  • So that is your value of k, and the number you use for Z in your confidence interval.


Sometimes 2 numbers in the z table are equally close to the value you want

Sometimes 2 numbers in the Z table are equally close to the value you want.

  • For example, if you want a 90% confidence interval, you look for half of 0.90 in the body of the Z table, that is, 0.45.

  • You find 0.4495 and 0.4505. Both are off by 0.0005.

  • The number on the border of the table corresponding to 0.4495 is 1.64.

  • The number corresponding to 0.4505 is 1.65.

  • Usually in these cases, we use the average of 1.64 and 1.65, which is 1.645.

  • Similarly for the 99% confidence interval, we usually use 2.575. (Draw your graph & work through the logic of this number.)


Estimation

Which interval is wider: One with a higher confidence level (such as 99%) or one with a lower confidence level (such as 90%)?

  • Let’s think it through using an unrealistic but slightly entertaining example.


You have the misfortune of being stranded on an island with a cannibal a bunch of bears

You have the misfortune of being stranded on an island, with a cannibal & a bunch of bears.

  • It gets worse…

  • You get captured by the cannibal.

  • The cannibal, who knows the island well, decides to give you a chance to avoid being dinner.

  • He says if you can correctly estimate the number of bears, he’ll let you go.

  • To give you a fighting chance, he’ll let you give him an interval estimate.


You think that there are probably about a hundred bears on the island

You think that there are probably about a hundred bears on the island.

  • Would you be more confident of not being dinner if you gave the cannibal a narrow interval like 90 to 110 bears, or a wider one like 75 to 125 bears?

  • You would definitely be more confident with the wider interval.

  • Thus, when the confidence level needs to be very high (such as 99%), the interval needs to be wide.


Let s redo the iq example with a different confidence level

Let’s redo the IQ example with a different confidence level.

We had a sample of 25 students with a sample mean IQ of 127. The population standard deviation was 5.4 . Calculate the 99% confidence interval for the population mean.

Our general formula is:

We said that the Z value for 99% confidence is 2.575.

Putting in our values,

or 124.22 < m < 129.78


We had for the 95 confidence interval

We had for the 95% confidence interval:

124.88 < m < 129.12We just got for the 99% confidence interval:124.22 < m < 129.78The 99% confidence interval starts a little lower & ends a little higher than the 95% interval. So the 99% interval is wider than the 95% interval, as we said it should be.


Estimation

What do we do if we want to compute a confidence interval for m, but we don’t know the population standard deviation s?

  • We use the next best thing, the sample standard deviation s.

  • But with s, instead of a Z distribution, we have a t (with n-1 degrees of freedom). So,

becomes


Estimation

Example: From a large class of normally distributed grades, sample 4 grades: 64, 66, 89, & 77. Calculate the 95% confidence interval for the class mean grade m.

is the appropriate formula.

So we need to determine the sample mean, sample standard deviation, and the t-value.


4 grades 64 66 89 77 95 confidence interval for m

4 grades: 64, 66, 89, & 7795% confidence interval for m

Adding our X values,we get 296.


4 grades 64 66 89 77 95 confidence interval for m1

4 grades: 64, 66, 89, & 7795% confidence interval for m

Dividing by 4, we findour sample mean is 74.


4 grades 64 66 89 77 95 confidence interval for m2

4 grades: 64, 66, 89, & 7795% confidence interval for m

Keep in mind that the sample standard deviation is

So, next we subtract our sample mean 74 from each of our X values,


4 grades 64 66 89 77 95 confidence interval for m3

4 grades: 64, 66, 89, & 7795% confidence interval for m

square the differencesand add them up.


4 grades 64 66 89 77 95 confidence interval for m4

4 grades: 64, 66, 89, & 7795% confidence interval for m

Then we divide by n-1 (which is 3) to get the sample variance s2,


4 grades 64 66 89 77 95 confidence interval for m5

4 grades: 64, 66, 89, & 7795% confidence interval for m

and take the square root to get the sample standard deviation s.


So we have and s 11 5

0.95

0.025

0.025

0 3.182 t3

So we have and s = 11.5

Since n = 4, dof = n-1 = 3

Since we want 95% confidence, we want 0.95 as the middle area of our graph, and .025 in each of the 2 tails.

We find the 3.182 in our t table.

Our formula is

Putting in our numbers we have

So our 95% confidence interval is 56 < m < 92.

The interval is very wide, because we only have 4 observations. If we had more information, we’d be able to get a narrower interval.


Estimation

std . dev. or estimate of the std. dev. of our pt. estimate

std . dev. or estimate of the std. dev. of our pt. estimate

z or t

point estimate

z or t

Desired parameter

point estimate

From our previous confidence intervals, we can see that we have a basic format, which can be used when the point estimator is roughly normal.


Calculating confidence intervals for the binomial proportion parameter p

Calculating confidence intervals for the binomial proportion parameter p

  • When the number events of interest (X) and the number of events not of interest (n-X) are each at least five, the binomial distribution can be approximated by the normal and we can develop a confidence interval for the binomial proportion parameter p.

  • That is, we can develop a confidence interval for p ,if X ≥ 5 and n-X ≥ 5 .


We need a point estimate for p the standard deviation of our point estimate

We need a point estimate for p, & the standard deviation of our point estimate.

  • For the point estimate we will use the binomial proportion variable X/n or p .

  • Its standard deviation was .

  • Since we don’t know p, we will use our sample proportion p in the standard deviation formula.


Use our format to get the confidence interval for the binomial proportion p

std . dev. or estimate of the std. dev. of our pt. estimate

std . dev. or estimate of the std. dev. of our pt. estimate

z or t

point estimate

z or t

Desired parameter

point estimate

Use our format to get the confidence interval for the binomial proportion p.


We have our confidence interval for the binomial proportion p

We have our confidence interval for the binomial proportion p.


Estimation

0.95

0.4750

0 1.96 Z

Example: Consider a random sample of 144 families; 48 have 2 or more cars. Compute the 95% confidence interval for the population proportion of families with 2 or more cars.

n = 144

Our z value is 1.96 .


Estimation

We now have n = 144, z = 1.96,

So our 95% confidence interval for p is:

0.256 < p < 0.410 .


Suppose we want a confidence interval not for a mean but for the difference in two means m 1 m 2

Suppose we want a confidence interval not for a mean but for the difference in two means (m1-m2).

  • For example, we may be interested in

  • the difference in the mean income for two counties, or

  • the difference in the mean exam scores for two classes.


We will use the same basic format but it will be a bit more complicated

std . dev. or estimate of the std. dev. of our pt. estimate

std . dev. or estimate of the std. dev. of our pt. estimate

z or t

point estimate

z or t

Desired parameter

point estimate

We will use the same basic format, but it will be a bit more complicated.

Our “desired parameter” is m1 – m2 .

Our point estimate is .

Initially, we will assume that we have the population standard deviations, so we will use a z.

We need the standard deviation of the point estimate, .

To get that we will first determine the variance of .


Recall v ax by a 2 v x b 2 v y 2ab c x y

Recall:V(aX + bY) = a2V(X) + b2V(Y) + 2ab[C(X,Y)]

  • Letting a = 1, b = -1,

If our samples are independent, the covariance term is zero, and the expression becomes

or


Estimation

We now have

Recall that .

Applying subscripts for our samples,

& the standard deviation of is


Apply our basic format

std . dev. or estimate of the std. dev. of our pt. estimate

std . dev. or estimate of the std. dev. of our pt. estimate

z or t

point estimate

z or t

Desired parameter

point estimate

Apply our basic format


Estimation

Example: From 2 large classes, with normally distributed grades, sample 4 grades (64, 66, 89, & 77) & 3 grades (56, 71, & 53). If the population variances for the 2 classes are both 96, compute the 90% confidence interval for the difference in means of the class grades.

We will use the formula we just developed:


We need the 2 sample means the z value

0.90

0.4500

0 1.645 Z

We need the 2 sample means & the z value.

  • Adding the observations & dividing by the number of observations, our sample means are

  • (64 + 66 + 89 + 77) / 4 = 74 and

  • (56 + 71 + 53) / 3 = 60

  • The z value for 90% confidence, as we found before, is 1.645 .


Assembling our formula

Assembling our formula:


Estimation

Interpreting the results

We are 90% certain that the difference in class mean grades is between 1.69 and 12.31 .

Notice that this interval does not include zero.

If m1 – m2 = 0, then m1 = m2 .

That implies that the probability is less than 10% that the class mean grades are equal.


What do we do if we want to compare means but we don t know the population variances

What do we do if we want to compare means, but we don’t know the population variances?

As before, we use the sample variances & the t distribution.

Our formula was

Now the formula is

For the t, the number of degrees of freedom is determined by a very messy formula.


Estimation

The degrees of freedom for the t for the confidence interval for the difference between means with unknown variances


Let s do the same example as before but without knowing the population variances

Let’s do the same example as before, but without knowing the population variances.

  • From 2 large classes, with normally distributed grades, sample 4 grades (64, 66, 89, & 77) & 3 grades (56, 71, & 53). Compute the 90% confidence interval for the difference in means of the class grades.

  • This time we need to calculate the sample variances.


Estimation

We calculate the sample means as before.


Estimation

Then subtract the sample mean from each observation, square that difference,


Estimation

and add up.


Estimation

Dividing by n-1, we haveour sample variances.


What are the dof t value

0.90

0.05

0 2.1318 t4

What are the dof & t value?

So the degrees of freedom is the integer part of 4.86 or 4.

For 90% confidence & 4 dof, the t value is 2.1318 .


Estimation

Assemble our formula

Notice here that zero is contained in our 90% confidence interval.

So we can’t rule out the possibility that the class mean grades are equal.


Estimation

Sometimes we believe the variances of 2 populations are equal, even though we don’t know the actual values.

  • We have another confidence interval for this situation.


Estimation

In our earlier formula above, we can drop the distinguishing subscripts on our variances.

Factoring out the variance, we have

Next we replace the variance by a pooled sample variance, based on information from both samples.

The dof for the t value is n1 + n2 – 2 .


The pooled sample variance

The pooled sample variance

When the 2 samples are the same size, this estimator gives an estimate that is halfway between the two sample variances.

When the samples are not the same size, the estimate will be closer to the sample variance from the larger sample.


Estimation

So our confidence interval for the difference in the population means, when we don’t know the population variances but we believe that they are equal is:

where

and the number of degrees of freedom is n1 + n2 – 2 .


Estimation

We had:

Let’s do the same example as before, assuming that the unknown population variances are believed equal.


Estimation

We have:

0.90

0.05

0 2.015 t5

We want 90% confidence.

dof = n1 + n2 – 2 = 4 + 3 – 2 = 5

So our t value is 2.015 .


Estimation

We have:

We are 90% certain that the difference in the population means is between -2.63 & 30.63.

Again, since zero is in this interval, we can’t rule out the possibility that the class mean grades are equal.


We can also develop a confidence interval for the difference in population proportions p 1 p 2

We can also develop a confidence interval for the difference in population proportions p1 – p2

The point estimate is the difference in the sample proportions


Estimation

Next we need the standard deviation of our point estimate.

Similarly to the case of the difference in population means,

Recalling that our previous estimate of was ,

we have

The estimated standard deviation of our point estimate becomes


Estimation

std . dev. or estimate of the std. dev. of our pt. estimate

std . dev. or estimate of the std. dev. of our pt. estimate

z or t

point estimate

z or t

Desired parameter

point estimate

Using our basic format, we find the confidence interval for the difference in population proportions.


Estimation

Example: Samples from 2 states show proportions of Democrats 1/3 & 1/5 with sample sizes 100 & 225. Calculate the 99% confidence interval for the difference in population proportions.

0.99

0.4950

0 2.575 Z

The z value for a 99% confidence interval is 2.575 .


Estimation

Applying the formula

yields

or

So the 99% confidence interval for the difference in population proportions is


Estimation

Given our confidence interval:

Can we conclude that the two population proportions are not equal?

No. Since zero is in the interval, p1 may equal p2 .


How do you decide the appropriate sample size for a project

How do you decide the appropriate sample size for a project?

2 Decisions:

Desired confidence level

Maximum difference D between the estimate of the population parameter & the true value of the population parameter (that is, the maximum error you’re willing to accept)

For example, if you’re estimating the population mean m using the sample mean ,


Suppose you have chosen 95 as your desired confidence level

Suppose you have chosen 95% as your desired confidence level.

You know that there is a z value (call it z0) such that - z0 < Z < z0 95% of the time.

You also know that


Estimation

First, square both sides of the equation:

Multiply through by n:

Divide through by D2:

Dropping the subscript on z for convenience, we have the formula:


Estimation

So we have a formula for determining the appropriate sample size n when we want to estimate the population mean.


Estimation

Example: Suppose you’re trying to estimate the mean monthly rent of 2-bedroom apartments in towns of 100,000 people or less. The population standard deviation is 20. You want to be 95% sure that your estimate is within $3 of the true mean. How large a sample should you take?

You need to sample 171 observations.

It’s not 170, because sample sizes smaller than 170.3 provide you with less information & therefore less than the desired level of confidence.


Our formula for n has the population standard deviation s in it what do we do if we don t know s

Our formula for n has the population standard deviation s in it. What do we do if we don’t know s?

  • In the past, we used the sample standard deviation s. Why can’t we do that here?

  • s came from the sample. We haven’t taken the sample yet. We’re still trying to figure out how many observations our sample should have.

  • If previous researchers have done related work, you may be able to use their estimate for the standard deviation.

  • Alternatively, you can do a small preliminary sample, & based on that information, estimate the standard deviation.


Determining the appropriate sample size n for estimating the population proportion p

Determining the appropriate sample size n for estimating the population proportion p.

Again we will use - z0 < Z < z0 with the desired confidence level as our starting point.


Estimation

First, square both sides of the equation:

Multiply through by n:

Divide through by D2:

Dropping the subscript on z for convenience, we have the formula:


Estimation

There’s one big problem with this formula. What is it?

We want to collect a sample in order to estimate p, but we have the unknown p in our equation for determining the sample size!

We can’t use the sample proportion as we did before, because we haven’t taken the sample yet.

As it happens, we can resolve this problem fairly easily.


Estimation

The largest possible value for p (1- p ) occurs when p is ½, and that largest value is ¼.

Play with some values for p & 1- p , and convince yourself that this is true. For example,

(1/3)(2/3) = 2/9 < 1/4

(3/10)(7/10) = 21/100 < 1/4

(1/100)(99/100) = 99/10,000< 1/4


Estimation

If we know the largest possible value for pq, we can determine the largest sample size we should need for

Plugging in the maximum value of ¼ for p (1- p ), we have

So our formula for n is:


Estimation

Sometimes you have a rough idea of what p is, but you’re trying to get a more precise value.

You can use your rough idea to determine the sample size.

If p1 is your rough idea, then the sample size formula becomes


Estimation

So we have 2 formulae for determining the appropriate sample size for estimating the population proportion.

If you have no idea at all what p is, you use:

If you have a rough idea of p1 for the value of p, you use:


Estimation

Example: We are estimating the proportion of families with 2 or more cars. We want to be 95% certain that the estimate is within 3% (0.03) of the correct percentage. What is the necessary sample size?

We’re clueless on the proportion p, so we use the formula

0.95

0.4750

0 1.96 Z

The z value for 95% confidence is 1.96 .

Filling in our values, we get

So the needed sample size is 1068.


  • Login