The Guessing Game. The entire business of Statistics is dedicated to the purpose of trying to guess the value of some population parameter. In what follows the population parameter (the target parameter ) will be either a mean or a proportion p .
The entire business of Statistics is dedicated to the purpose of trying to guess the value of some population parameter. In what follows the population parameter (the target parameter) will be either a mean or a proportion p.
All that we have at our disposal is n values taken at random from the population
x1, x2, …, xn
and their average and their variance:
but n ≥ 30 will make us feel better) we have the wonderful
Central Limit Theorem
that tells us that the distribution of the sample mean (the average) is approximately normal
and the one number we know, is somewhere on the horizontal line. Actually
(in both figures the blue dot is )
The next slides show four of them!
Hopefully you’ll catch on. Replace the blue dot (that represents ) with the REDDOT (that represents the standardization of )
What went on in each of the previous four slides?
Let’s see. We picked a percentage of area
From the chosen percentage we got
(via the standard normal tables)
In fact, if you give me any
positive area ≤ 1(call it1 - )
I can find the corresponding
by looking forthe area value
The two z-scores you get are written as
and the number 1 - iscalled
confidence coefficient if in decimal form
confidence level if in percent form
Why are we using the word “confidence”?
Confidence in what?
Of course, we hope it is confidence in our prediction! In fact we want the confidence level to be just the probability that our prediction is correct.
Trouble is ….
We haven’t predicted anything !!
We just have established that
P(red dot between and ) = 1 -
Recall that the red dot stands for the standardized value
that a little 7-th grade algebra transforms into
This is translated into English as:
is inside the interval
Or, in slightly different (and more pompous sounding) words
“We are (1 - )% confident that is inside the interval
We call this interval the (1 - )%
THAT’S OUR PREDICTION !
If we know it (sometimes we do) …
If not, we approximate sigma using the sample standard deviation
wheresis the (computed) sample standard deviation.
The numbers shown have been obtained as timeT(in seconds) elapsed from the time thecage door is openedto the time ofexit from the cagefor40 lab miceinseparate cages; (20 of the mice have been given a tranquilizer, the other 20 a placebo, but this is for another problem later.)
Construct the following
for the mean ofT
3.5 2.2 1.4 3.6 3.5 2.6 2.7 2.1 1.9 4.1
2.7 2.8 2.3 1.9 1.3 3.3 2.8 2.6 2.1 3.8
4.3 4.4 2.8 2.0 3.3 4.1 1.4 3.1 2.8 3.0
4.1 4.2 3.8 3.9 4.1 3.4 3.1 1.3 4.5 3.2
The sample mean and standard deviation are:
For each we compute z’s (from my “stats” program or from the table)
Using the formula
we get the intervals
Note that the higher the confidence the wider the interval. Is this reasonable?
Quite often one needs to estimate what
proportion p of a population prefers option A over option B.
One takes a “large enough” random sample of the population, counts how many prefer A, divides by the size n of the sample and gets a number,
denoted by(a statistic!) .
E( ) = p
If we knew the standard deviation of
we could construct confidence intervals forpas we did for the parameter .
(If n is big enogh the Central Limit Theorem still holds)
We can show that= pq/n (remember that q = 1 - p), but this is tautological (we don’t know p !)
However, if n is large enough, we can use
for p and proceed as with .
and get the interval
What proportion p of Notre Dame students know a language other than English?
In a random sample of 1,500 Notre Dame students, 855 stated they knew some language other than English.
Develop a 98% confidence interval for p based on this sample.
Therefore= 2.33 (why?)
Now=855/1500 = 0.57 and therefore we approximate with 0.0128(why?)
We get the 98% confidence interval as
(0.57 – 2.33x0.128, 0.57 + 2.33x0.128)