Inference for a Population Mean

1 / 25

# Inference for a Population Mean - PowerPoint PPT Presentation

Statistics 111 - Lecture 13. Inference for a Population Mean. Confidence Intervals and Tests with unknown variance and Two-sample Tests . Administrative Notes. Homework 4 due Wednesday Homework 5 assigned tomorrow The final is ridiculously close (next Thursday). Outline. Review:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Inference for a Population Mean

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Statistics 111 - Lecture 13

### Inference for a Population Mean

Confidence Intervals and Tests with unknown variance and Two-sample Tests

Stat 111 - Lecture 13 - One Mean

• Homework 4 due Wednesday
• Homework 5 assigned tomorrow
• The final is ridiculously close (next Thursday)

Stat 111 - Lecture 13 - One Mean

Outline
• Review:
• Confidence Intervals and Hypothesis Tests which assume known variance
• Population variance unknown:
• t-distribution
• Confidence intervals and Tests using the t-distribution
• Small sample situation
• Two-Sample datasets: comparing two means
• Testing the difference between two samples when variances are known
• Moore, McCabe and Craig: 7.1-7.2

Stat 111 - Lecture 13 - One Mean

• Distribution of values taken by sample mean in all possible samples of size n from the same population
• Standard deviation of sampling distribution:
• Central Limit Theorem:
• Sample mean has a Normal distribution
• These results all assume that the sample size is large and that the population variance is known

Stat 111 - Lecture 13 - One Mean

Chapter 6: Confidence Intervals
• We used sampling distribution results to create two different tools for inference
• Confidence Intervals: Use sample mean as the center of an interval of likely values for pop. mean 
• Width of interval is a multiple Z* of standard deviation of sample mean
• Z* calculated from N(0,1) table for specific confidence level (eg. 95% confidence means Z*=1.96)
• We assume large sample size to use N(0,1) distribution, and we assume that  is known (usually just use sample SD s)

Stat 111 - Lecture 13 - One Mean

Chapter 6: Hypothesis Testing
• Compare sample mean to a hypothesized population mean 0
• Test statistic is also a multiple of standard deviation of the sample mean
• p-value calculated from N(0,1) table and compared to -level in order to reject or accept null hypothesis
• Eg. p-value < 0.05 means we reject null hypothesis
• We again assume large sample size to use N(0,1) distribution, and we assume that  is known

Stat 111 - Lecture 13 - One Mean

Unknown Population Variance
• What if we don’t want to assume that population SD  is known?
• If  is unknown, we can’t use our formula for the standard deviation of the sample mean:
• Instead, we use the standard error of the sample mean:
• Standard error involves sample SD s as estimate of 

Stat 111 - Lecture 13 - One Mean

t distribution
• If we have small sample size n and we need to use the standard error formula because the population SD  is unknown, then:

The sample mean does not have a

normal distribution!

• Instead, the sample mean has a

T distribution with n - 1 degrees of freedom

• What the heck does that mean?!?

Stat 111 - Lecture 13 - One Mean

t distribution

Normal distribution

• t distribution looks like a normal distribution, but has “thicker” tails. The tail thickness is controlled by the degrees of freedom
• The smaller the degrees of freedom, the thicker the tails of the t distribution
• If the degrees of freedom is large (if we have a large sample size), then the t distribution is pretty much identical to the normal distribution

t with df = 5

t with df = 1

Stat 111 - Lecture 13 - One Mean

Known vs. Unknown Variance
• Before: Known population SD 
• Sample mean is centered at  and has standard deviation:
• Sample mean has Normal distribution
• Now: Unknown population SD 
• Sample mean is centered at  and has standard error:
• Sample mean has t distribution with n-1 degrees of freedom

Stat 111 - Lecture 13 - One Mean

New Confidence Intervals
• If the population SD is unknown, we need a new formula for our confidence interval
• Standard error used instead of standard deviation
• t distribution used instead of normal distribution
• If we have a sample of size n from a population with unknown , then our 100·C % confidence interval for the unknown population mean  is:
• The critical value is calculated using a table for the t distribution (back of textbook)

Stat 111 - Lecture 13 - One Mean

Tables for the t distribution
• If we want a 100·C% confidence interval, we need to find the value so that we have a probability of C between -t* and t* in a t distribution with n-1 degrees of freedom
• Example: 95% confidence interval when n = 14 means that we need a tail probability of 0.025, so t*=2.16

= 0.95

df = 13

= 0.025

-t*

t*

Stat 111 - Lecture 13 - One Mean

Example: NYC blackout baby boom
• Births/day from August 1966:
• Before: we assumed that  was known, and used the normal distribution for a 95% confidence interval:
• Now: let  be unknown, and used the t distribution with n-1 = 13 degrees of freedom to calculate our a 95% confidence interval:
• Interval is now wider because we are now less certain about our population SD 

Stat 111 - Lecture 13 - One Mean

Another Example: Calcium in the Diet
• Daily calcium intake from 18 people below poverty line (RDA is 850 mg/day)
• Before: used known  = 188 from previous study, used normal distribution for 95% confidence interval:
• Now: let  be unknown, and use the t distribution with n-1 = 17 degrees of freedom to calculate our a 95% confidence interval:

Again, Wider interval because

we have an unknown 

Stat 111 - Lecture 13 - One Mean

New Hypothesis Tests
• If the population SD is unknown, we need to modify our test statistics and p-value calculations as well
• Standard error used in test statistic instead of standard deviation
• t distribution used to calculate the p-value instead of standard normal distribution

Stat 111 - Lecture 13 - One Mean

Example: Calcium in Diet
• Daily calcium intake from 18 people below poverty line
• Test our data against the null hypothesis that 0 = 850 mg (recommended daily allowance)
• Before: we assumed known  = 188 and calculated test statistic T= -2.32
• Now:  is actually unknown, and we use test statistic with standard error instead of standard deviation:

Stat 111 - Lecture 13 - One Mean

Example: Calcium in Diet

Normal

distribution

prob = 0.01

• Before: used normal distribution to get p-value = 0.02
• Now: is actually unknown, and we use t distribution with n-1 = 17 degrees of freedom to get p-value ≈ 0.04
• With unknown , we have a p-value that is closer to the usual threshold of  = 0.05 than before

T= -2.32

T= 2.32

t17

distribution

prob ≈ 0.02

T= -2.26

T= 2.26

Stat 111 - Lecture 13 - One Mean

Review
• Known population SD 
• Use standard deviation of sample mean:
• Use standard normal distribution
• Unknown population SD 
• Use standard deviation of sample mean:
• Use t distribution with n-1 d.f.

Stat 111 - Lecture 13 - One Mean

Small Samples
• We have used the standard error and t distribution to correct our assumption of known population SD
• However, even t distribution intervals/tests not as accurate if data is skewed or has influential outliers
• Rough guidelines from your textbook:
• Large samples (n> 40): t distribution can be used even for strongly skewed data or with outliers
• Intermediate samples (n > 15): t distribution can be used except for strongly skewed data or presence of outliers
• Small samples (n < 15): t distribution can only be used if data does not have skewness or outliers
• What can we do for small samples of skewed data?

Stat 111 - Lecture 13 - Means

Techniques for Small Samples
• One option: use log transformation on data
• Taking logarithm of data can often make it look more normal
• Another option: non-parametric tests like the sign test
• Not required for this course, but mentioned in text book if you’re interested

Stat 111 - Lecture 13 - Means

Comparing Two Samples
• Up to now, we have looked at inference for one sample of continuous data
• Our next focus in this course is comparing the data from two different samples
• For now, we will assume that these two different samples are independent of each other and come from two distinct populations

Population 1:1 , 1

Population 2:2 , 2

Sample 1: , s1

Sample 2: , s2

Stat 111 - Lecture 13 - Means

Blackout Baby Boom Revisited
• Nine months (Monday, August 8th) after Nov 1965 blackout, NY Times claimed an increased birth rate
• Already looked at single two-week sample: found no significant difference from usual rate (430 births/day)
• What if we instead look at difference between weekends and weekdays?

Weekdays

Weekends

Stat 111 - Lecture 13 - Means

Two-Sample Z test
• We want to test the null hypothesis that the two populations have different means
• H0: 1 = 2 or equivalently, 1 - 2 = 0
• Two-sided alternative hypothesis: 1 - 2  0
• If we assume our population SDs 1 and 2 are known, we can calculate a two-sample Z statistic:
• We can then calculate a p-value from this Z statistic using the standard normal distribution
• Next class, we will look at tests that do not assume known 1 and 2

Stat 111 - Lecture 13 - Means

Two-Sample Z test for Blackout Data
• To use Z test, we need to assume that our pop. SDs are known: 1 = s1 = 21.7 and 2 = s2 = 24.5
• We can then calculate a two-sided p-value for Z=7.5 using the standard normal distribution
• From normal table, P(Z > 7.5) is less than 0.0002, so our p-value = 2  P(Z > 7.5) is less than 0.0004
• We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays
• Next class: get rid of assumption of known 1 and 2

Stat 111 - Lecture 13 - Means

Next Class – Lecture 14
• More on Comparing Means between Two Samples
• Moore, McCabe and Craig: 7.1-7.2

Stat 111 - Lecture 13 - Means