10-2 Estimating a Population Mean (σ Unknown). When we substitute the standard error of xbar for its standard deviation, the distribution of the resulting statistic, t , is not Normal. We call it the t distribution. The t distributions.
Estimating a Population Mean
When we substitute the standard error of xbar for its standard deviation, the distribution of the resulting statistic, t, is not Normal.
We call it the t distribution.
There is a different t-distribution for each sample size n.
We specify a t distribution by giving its degrees of freedom, which is equal to n-1
We will write the t distribution with k degrees of freedom as t(k) for short.
We also will refer to the standard Normal distribution as the z-distribution.
Y2= tpdf(x,2) (DISTR menu)
Window X[-3,3] Y[-0.1,0.4]
Compare the shape, center, and spread of the t-distribution with the z-distribution.
As the degrees of freedom k increase, the t(k) density curve approaches the N(0,1) curve ever more closely. As the sample size increases, s estimates σ ever more closely.
Suppose you want to construct a 95% confidence interval for the mean mu of a population based on a SRS of size n=12. What critical value t* should you use?
If you have a TI-84+, you can use invT((1+C)/2, df) to find t*.
Note: When the actual df does not appear in Table C, use the greatest df available that is less than your desired df.
Recall the inference tool-box
Environmentalists, government officials, and vehicle manufacturers are all interested in studying the auto exhaust emissions produced by motor vehicles. The table gives the nitrogen oxide (NOX) levels for a random sample of light-duty engines of the same type.
Construct a 95% confidence interval for the mean amount of NOX emitted by light-duty engines of this type.
Step 1: Parameter
Step 2: Conditions
Step 1: Parameter
Step 2: Conditions
Remember the three C's! Conclusion, Connection, Context
Matched pairs is a form of block design in which just two treatments are compared.
Subjects are matched in pairs and each treatment is given to one subject in each pair.
each subject receives both treatments in some randomized order
When you have two sets of data, ask yourself if there is something that links the values in pairs and, therefore, prevents them from being independent. If so, a one-sample procedure is optimal.
Inference procedures for two samples assume that the samples are selected independently of each other. This assumption does not hold when the same subjects are measured twice.
Too many numbers what do I do?
Okay so it’s the difference in the means for the entire population.
Example 10.10 pg 651
Construct and interpret a 90% confidence interval for the mean change in depression score.
Random Selection of individuals for a statistical study allows us to generalize the results of that study to a larger population.
Random Assignment of treatments to subjects in an experiment lets us investigate whether there is evidence of a treatment effect (cause and effect). That is it lets us compare results of different treatments.
The t-procedures are not robust against outliers, because xbar and s are not resistant to outliers.
Has the Normality assumption been met for a one sample t interval for mu?
Without the outlier, the interval is much narrower and centered differently (1.165, 1.421). Can we really be 95% confident in either interval?
No, since the outlier suggests that the population may not be Normal.
T procedures are not robust against outliers, but they are quite robust against non-Normality of the population when there are no outliers, especially when the distribution is roughly symmetric.
ALWAYS make a plot to check for skewness and outliers BEFORE using t procedures for small samples.
Larger samples improve the accuracy of critical values from the t distribution when the population is not normal.
For most purposes, you can safely use the one-sample t procedures when unless an outlier or some strong skewness is present.
Why can’t I use a z procedure if n is large?Because σ is unknown!
Given the percent of each state's residents who are at least 65 years of age, can or should we use t to approximate the mean of these percents?
Hint: This is a population not a sample. Do we want to estimate a parameter when we have a census?
Given the time of the first lightning strike each day in a mountain region of Colorado, can or should we use t procedures to draw conclusions about the mean time of a day's first lightning strike with complete confidence?
Hint: n =70 and the distribution is what shape?
Given the distribution of word lengths in Shakespeare's plays?
Hint: n is unknown.