- 261 Views
- Uploaded on

Download Presentation
## Inference for a Population Mean

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Inference for a Population Mean

Confidence Intervals and Tests with unknown variance and Two-sample Tests

Stat 111 - Lecture 13 - One Mean

Administrative Notes

- Homework 4 due Wednesday
- Homework 5 assigned tomorrow
- The final is ridiculously close (next Thursday)

Stat 111 - Lecture 13 - One Mean

Outline

- Review:
- Confidence Intervals and Hypothesis Tests which assume known variance
- Population variance unknown:
- t-distribution
- Confidence intervals and Tests using the t-distribution
- Small sample situation
- Two-Sample datasets: comparing two means
- Testing the difference between two samples when variances are known
- Moore, McCabe and Craig: 7.1-7.2

Stat 111 - Lecture 13 - One Mean

Chapter 5: Sampling Distribution of

- Distribution of values taken by sample mean in all possible samples of size n from the same population
- Standard deviation of sampling distribution:
- Central Limit Theorem:
- Sample mean has a Normal distribution
- These results all assume that the sample size is large and that the population variance is known

Stat 111 - Lecture 13 - One Mean

Chapter 6: Confidence Intervals

- We used sampling distribution results to create two different tools for inference
- Confidence Intervals: Use sample mean as the center of an interval of likely values for pop. mean
- Width of interval is a multiple Z* of standard deviation of sample mean
- Z* calculated from N(0,1) table for specific confidence level (eg. 95% confidence means Z*=1.96)
- We assume large sample size to use N(0,1) distribution, and we assume that is known (usually just use sample SD s)

Stat 111 - Lecture 13 - One Mean

Chapter 6: Hypothesis Testing

- Compare sample mean to a hypothesized population mean 0
- Test statistic is also a multiple of standard deviation of the sample mean
- p-value calculated from N(0,1) table and compared to -level in order to reject or accept null hypothesis
- Eg. p-value < 0.05 means we reject null hypothesis
- We again assume large sample size to use N(0,1) distribution, and we assume that is known

Stat 111 - Lecture 13 - One Mean

Unknown Population Variance

- What if we don’t want to assume that population SD is known?
- If is unknown, we can’t use our formula for the standard deviation of the sample mean:
- Instead, we use the standard error of the sample mean:
- Standard error involves sample SD s as estimate of

Stat 111 - Lecture 13 - One Mean

t distribution

- If we have small sample size n and we need to use the standard error formula because the population SD is unknown, then:

The sample mean does not have a

normal distribution!

- Instead, the sample mean has a

T distribution with n - 1 degrees of freedom

- What the heck does that mean?!?

Stat 111 - Lecture 13 - One Mean

t distribution

Normal distribution

- t distribution looks like a normal distribution, but has “thicker” tails. The tail thickness is controlled by the degrees of freedom
- The smaller the degrees of freedom, the thicker the tails of the t distribution
- If the degrees of freedom is large (if we have a large sample size), then the t distribution is pretty much identical to the normal distribution

t with df = 5

t with df = 1

Stat 111 - Lecture 13 - One Mean

Known vs. Unknown Variance

- Before: Known population SD
- Sample mean is centered at and has standard deviation:
- Sample mean has Normal distribution
- Now: Unknown population SD
- Sample mean is centered at and has standard error:
- Sample mean has t distribution with n-1 degrees of freedom

Stat 111 - Lecture 13 - One Mean

New Confidence Intervals

- If the population SD is unknown, we need a new formula for our confidence interval
- Standard error used instead of standard deviation
- t distribution used instead of normal distribution
- If we have a sample of size n from a population with unknown , then our 100·C % confidence interval for the unknown population mean is:
- The critical value is calculated using a table for the t distribution (back of textbook)

Stat 111 - Lecture 13 - One Mean

Tables for the t distribution

- If we want a 100·C% confidence interval, we need to find the value so that we have a probability of C between -t* and t* in a t distribution with n-1 degrees of freedom
- Example: 95% confidence interval when n = 14 means that we need a tail probability of 0.025, so t*=2.16

= 0.95

df = 13

= 0.025

-t*

t*

Stat 111 - Lecture 13 - One Mean

Example: NYC blackout baby boom

- Births/day from August 1966:
- Before: we assumed that was known, and used the normal distribution for a 95% confidence interval:
- Now: let be unknown, and used the t distribution with n-1 = 13 degrees of freedom to calculate our a 95% confidence interval:
- Interval is now wider because we are now less certain about our population SD

Stat 111 - Lecture 13 - One Mean

Another Example: Calcium in the Diet

- Daily calcium intake from 18 people below poverty line (RDA is 850 mg/day)
- Before: used known = 188 from previous study, used normal distribution for 95% confidence interval:
- Now: let be unknown, and use the t distribution with n-1 = 17 degrees of freedom to calculate our a 95% confidence interval:

Again, Wider interval because

we have an unknown

Stat 111 - Lecture 13 - One Mean

New Hypothesis Tests

- If the population SD is unknown, we need to modify our test statistics and p-value calculations as well
- Standard error used in test statistic instead of standard deviation
- t distribution used to calculate the p-value instead of standard normal distribution

Stat 111 - Lecture 13 - One Mean

Example: Calcium in Diet

- Daily calcium intake from 18 people below poverty line
- Test our data against the null hypothesis that 0 = 850 mg (recommended daily allowance)
- Before: we assumed known = 188 and calculated test statistic T= -2.32
- Now: is actually unknown, and we use test statistic with standard error instead of standard deviation:

Stat 111 - Lecture 13 - One Mean

Example: Calcium in Diet

Normal

distribution

prob = 0.01

- Before: used normal distribution to get p-value = 0.02
- Now: is actually unknown, and we use t distribution with n-1 = 17 degrees of freedom to get p-value ≈ 0.04
- With unknown , we have a p-value that is closer to the usual threshold of = 0.05 than before

T= -2.32

T= 2.32

t17

distribution

prob ≈ 0.02

T= -2.26

T= 2.26

Stat 111 - Lecture 13 - One Mean

Review

- Known population SD
- Use standard deviation of sample mean:
- Use standard normal distribution
- Unknown population SD
- Use standard deviation of sample mean:
- Use t distribution with n-1 d.f.

Stat 111 - Lecture 13 - One Mean

Small Samples

- We have used the standard error and t distribution to correct our assumption of known population SD
- However, even t distribution intervals/tests not as accurate if data is skewed or has influential outliers
- Rough guidelines from your textbook:
- Large samples (n> 40): t distribution can be used even for strongly skewed data or with outliers
- Intermediate samples (n > 15): t distribution can be used except for strongly skewed data or presence of outliers
- Small samples (n < 15): t distribution can only be used if data does not have skewness or outliers
- What can we do for small samples of skewed data?

Stat 111 - Lecture 13 - Means

Techniques for Small Samples

- One option: use log transformation on data
- Taking logarithm of data can often make it look more normal
- Another option: non-parametric tests like the sign test
- Not required for this course, but mentioned in text book if you’re interested

Stat 111 - Lecture 13 - Means

Comparing Two Samples

- Up to now, we have looked at inference for one sample of continuous data
- Our next focus in this course is comparing the data from two different samples
- For now, we will assume that these two different samples are independent of each other and come from two distinct populations

Population 1:1 , 1

Population 2:2 , 2

Sample 1: , s1

Sample 2: , s2

Stat 111 - Lecture 13 - Means

Blackout Baby Boom Revisited

- Nine months (Monday, August 8th) after Nov 1965 blackout, NY Times claimed an increased birth rate
- Already looked at single two-week sample: found no significant difference from usual rate (430 births/day)
- What if we instead look at difference between weekends and weekdays?

Weekdays

Weekends

Stat 111 - Lecture 13 - Means

Two-Sample Z test

- We want to test the null hypothesis that the two populations have different means
- H0: 1 = 2 or equivalently, 1 - 2 = 0
- Two-sided alternative hypothesis: 1 - 2 0
- If we assume our population SDs 1 and 2 are known, we can calculate a two-sample Z statistic:
- We can then calculate a p-value from this Z statistic using the standard normal distribution
- Next class, we will look at tests that do not assume known 1 and 2

Stat 111 - Lecture 13 - Means

Two-Sample Z test for Blackout Data

- To use Z test, we need to assume that our pop. SDs are known: 1 = s1 = 21.7 and 2 = s2 = 24.5
- We can then calculate a two-sided p-value for Z=7.5 using the standard normal distribution
- From normal table, P(Z > 7.5) is less than 0.0002, so our p-value = 2 P(Z > 7.5) is less than 0.0004
- We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays
- Next class: get rid of assumption of known 1 and 2

Stat 111 - Lecture 13 - Means

Next Class – Lecture 14

- More on Comparing Means between Two Samples
- Moore, McCabe and Craig: 7.1-7.2

Stat 111 - Lecture 13 - Means

Download Presentation

Connecting to Server..