comparing means from two samples l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Comparing Means from Two Samples PowerPoint Presentation
Download Presentation
Comparing Means from Two Samples

Loading in 2 Seconds...

play fullscreen
1 / 33

Comparing Means from Two Samples - PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on

Statistics 111 – Lecture 14. Comparing Means from Two Samples. and. One-Sample Inference for Proportions. Administrative Notes. Homework 5 is posted on website Due Wednesday, July 1 st. Outline. Two Sample Z-test (known variance) Two Sample t -test (unknown variance)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Comparing Means from Two Samples' - warner


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
comparing means from two samples

Statistics 111 – Lecture 14

Comparing Meansfrom Two Samples

and

One-Sample Inference for Proportions

Stat 111 - Lecture 14 - Two Means

administrative notes
Administrative Notes
  • Homework 5 is posted on website
    • Due Wednesday, July 1st

Stat 111 - Lecture 14 - Two Means

outline
Outline
  • Two Sample Z-test (known variance)
  • Two Sample t-test (unknown variance)
  • Matched Pair Test and Examples
  • Tests and Intervals for Proportions (Chapter 8)

Stat 111 - Lecture 14 - Two Means

comparing two samples
Comparing Two Samples
  • Up to now, we have looked at inference for one sample of continuous data
  • Our next focus in this course is comparing the data from two different samples
  • For now, we will assume that these two different samples are independent of each other and come from two distinct populations

Population 1:1 , 1

Population 2: 2 , 2

Sample 1: , s1

Sample 2: , s2

Stat 111 - Lecture 14 - Means

blackout baby boom revisited
Blackout Baby Boom Revisited
  • Nine months (Monday, August 8th) after Nov 1965 blackout, NY Times claimed an increased birth rate
  • Already looked at single two-week sample: found no significant difference from usual rate (430 births/day)
  • What if we instead look at difference between weekends and weekdays?

Weekdays

Weekends

Stat 111 - Lecture 14 - Means

two sample z test
Two-Sample Z test
  • We want to test the null hypothesis that the two populations have different means
    • H0: 1 = 2 or equivalently, 1 - 2 = 0
    • Two-sided alternative hypothesis: 1 - 2  0
  • If we assume our population SDs 1 and 2 are known, we can calculate a two-sample Z statistic:
  • We can then calculate a p-value from this Z statistic using the standard normal distribution

Stat 111 - Lecture 14 - Means

two sample z test for blackout data
Two-Sample Z test for Blackout Data
  • To use Z test, we need to assume that our pop. SDs are known: 1 = s1 = 21.7 and 2 = s2 = 24.5
  • From normal table, P(Z > 7.5) is less than 0.0002, so our p-value = 2  P(Z > 7.5) is less than 0.0004
  • Conclusion here is a significant difference between birth rates on weekends and weekdays
  • We don’t usually know the population SDs, so we need a method for unknown 1 and 2

Stat 111 - Lecture 14 - Two Means

two sample t test
Two-Sample t test
  • We still want to test the null hypothesis that the two populations have equal means (H0: 1 - 2 = 0)
  • If 1 and 2 are unknown, then we need to use the sample SDs s1 and s2 instead, which gives us the two-sample T statistic:
  • The p-value is calculated using the t distribution, but what degrees of freedom do we use?
    • df can be complicated and often is calculated by software
    • Simpler and more conservative: set degrees of freedom equal to the smaller of (n1-1) or (n2-1)

Stat 111 - Lecture 14 - Two Means

two sample t test for blackout data
Two-Sample t test for Blackout Data
  • To use t test, we need to use our sample standard deviations s1 = 21.7 and s2 = 24.5
  • We need to look up the tail probabilities using the t distribution
  • Degrees of freedom is the smaller of n1-1 = 22

or n2-1 = 7

Stat 111 - Lecture 14 - Two Means

two sample t test for blackout data11
Two-Sample t test for Blackout Data
  • From t-table with df = 7, we see that

P(T > 7.5) < 0.0005

  • If our alternative hypothesis is two-sided, then we know that our p-value < 2  0.0005 = 0.001
  • We reject the null hypothesis at -level of 0.05 and conclude there is a significant difference between birth rates on weekends and weekdays
  • Same result as Z-test, but we are a little more conservative

Stat 111 - Lecture 14 - Two Means

two sample confidence intervals
Two-Sample Confidence Intervals
  • In addition to two sample t-tests, we can also use the t distribution to construct confidence intervals for the mean difference
  • When 1 and 2 are unknown, we can form the following 100·C% confidence interval for the mean difference 1 - 2 :
  • The critical value tk* is calculated from a t distribution with degrees of freedom k
    • k is equal to the smaller of (n1-1) and (n2-1)

Stat 111 - Lecture 14 - Two Means

confidence interval for blackout data
Confidence Interval for Blackout Data
  • We can calculate a 95% confidence interval for the mean difference between birth rates on weekdays and weekends:
  • We get our critical value tk* = 2.365 is calculated from a t distribution with 7 degrees of freedom, so our 95% confidence interval is:
  • Since zero is not contained in this interval, we know the difference is statistically significant!

Stat 111 - Lecture 14 - Two Means

matched pairs
Matched Pairs
  • Sometimes the two samples that are being compared are matched pairs (not independent)
  • Example: Sentences for crack versus powder cocaine
  • We could test for the mean difference between X1 = crack sentences and X2 = powder sentences
  • However, we realize that these data are paired: each row of sentences have a matching quantity of cocaine
  • Our t-test for two independent samples ignores this relationship

Stat 111 - Lecture 14 - Two Means

matched pairs test
Matched Pairs Test
  • First, calculate the difference d = X1 - X2 for each pair
  • Then, calculate the mean and SD of the differences d

Stat 111 - Lecture 14 - Two Means

slide16

Matched Pairs Test

  • Instead of a two-sample test for the difference between X1 and X2, we do a one-sample test on the difference d
  • Null hypothesis: mean difference between the two samples is equal to zero

H0 : d= 0 versus Ha : d 0

  • Usual test statistic when population SD is unknown:
  • p-value calculated from t-distribution with df = 8
    • P(T > 5.24) < 0.0005 so p-value < 0.001
  • Difference between crack and powder sentences is statistically significant at -level of 0.05

Stat 111 - Lecture 14 - Two Means

slide17

Matched Pairs Confidence Interval

  • We can also construct a confidence interval for the mean differenced of matched pairs
    • We can just use the confidence intervals we learned for the one-sample, unknown  case
  • Example: 95% confidence interval for mean difference between crack and powder sentences:

Stat 111 - Lecture 14 - Two Means

summary of two sample tests
Summary of Two-Sample Tests
  • Two independent samples with known 1 and 2
    • We use two-sample Z-test with p-values calculated using the standard normal distribution
  • Two independent samples with unknown 1 and 2
    • We use two-sample t-test with p-values calculated using the t distribution with degrees of freedom equal to the smaller of n1-1 and n2-1
    • Also can make confidence intervals using t distribution
  • Two samples that are matched pairs
    • We first calculate the differences for each pair, and then use our usual one-sample t-test on these differences

Stat 111 - Lecture 14 - Two Means

slide19

One-Sample Inference for Proportions

Stat 111 - Lecture 14 - Two Means

revisiting count data
Revisiting Count Data
  • Chapter 6 and 7 covered inference for the population mean of continuous data
  • We now return to count data:
  • Example: Opinion Polls
    • Xi = 1 if you support Obama, Xi = 0 if not
  • We call p the population proportion for Xi = 1
    • What is the proportion of people who support the war?
    • What is the proportion of Red Sox fans at Penn?

Stat 111 - Lecture 14- One-Sample Proportions

slide21

Inference for population proportion p

  • We will use sample proportion as our best estimate of the unknown population proportion p

where Y = sample count

  • Tool 1: use our sample statistic as the center of an entire confidence interval of likely values for our population parameter

Confidence Interval : Estimate ± Margin of Error

  • Tool 2: Use the data to for a specific hypothesis test
    • Formulate your null and alternative hypotheses
    • Calculate the test statistic
    • Find the p-value for the test statistic

Stat 111 - Lecture 14- One-Sample Proportions

slide22

Distribution of Sample Proportion

  • In Chapter 5, we learned that the sample proportion technically has a binomial distribution
  • However, we also learned that if the sample size is large, the sample proportion approximately follows a Normal distribution with mean and standard deviation:
  • We will essentially use this approximation throughout chapter 8, so we can make probability calculations using the standard normal table

Stat 111 - Lecture 14- One-Sample Proportions

slide23

Confidence Interval for a Proportion

  • We could use our sample proportion as the center of a confidence interval of likely values for the population parameter p:
  • The width of the interval is a multiple of the standard deviation of the sample proportion
  • The multiple Z* is calculated from a normal distribution and depends on the confidence level

Stat 111 - Lecture 14- One-Sample Proportions

slide24

Confidence Interval for a Proportion

  • One Problem: this margin of error involves the population proportion p, which we don’t actually know!
  • Solution: substitute in the sample proportion for the population proportion p, which gives us the interval:

Stat 111 - Lecture 14- One-Sample Proportions

slide25

Example: Red Sox fans at Penn

  • What proportion of Penn students are Red Sox fans?
    • Use Stat 111 class survey as sample
  • Y = 25 out of n = 192 students are Red Sox fans so
  • 95% confidence interval for the population proportion:
  • Proportion of Red Sox fans at Penn is probably between 8% and 18%

Stat 111 - Lecture 14- One-Sample Proportions

slide26

Hypothesis Test for a Proportion

  • Suppose that we are now interested in using our count data to test a hypothesized population proportion p0
  • Example: an older study says that the proportion of Red Sox fans at Penn is 0.10.
    • Does our sample show a significantly different proportion?
  • First Step: Null and alternative hypotheses
  • H0: p = 0.10 vs. Ha: p 0.10
  • Second Step: Test Statistic

Stat 111 - Lecture 14- One-Sample Proportions

slide27

Hypothesis Test for a Proportion

  • Problem: test statistic involves population proportion p
  • For confidence intervals, we plugged in sample proportion but for test statistics, we plug in the hypothesized proportion p0 :
  • Example: test statistic for Red Sox example

Stat 111 - Lecture 14- One-Sample Proportions

slide28

Hypothesis Test for a Proportion

  • Third step: need to calculate a p-value for our test statistic using the standard normal distribution
  • Red Sox Example: Test statistic Z = 1.39
    • What is the probability of getting a test statistic as extreme or more extreme than Z = 1.39? ie. P(Z > 1.39) = ?
  • Two-sided alternative, so p-value = 2P(Z>1.39) = 0.16
    • We don’t reject H0 at a =0.05 level, and conclude that Red Sox proportion is not significantly different from p0=0.10

prob = 0.082

Z = 1.39

Stat 111 - Lecture 14- One-Sample Proportions

slide29

Another Example

  • Mass ESP experiment in 1977 Sunday Mirror (UK)
    • Psychic hired to send readers a mental message about a particular color (out of 5 choices). Readers then mailed back the color that they “received” from psychic
    • Newspaper declared the experiment a success because, out of 2355 responses, they received 521 correct ones ( )
  • Is the proportion of correct answers statistically different than we would expect by chance (p0 = 0.2) ?
  • H0: p= 0.2 vs. Ha: p 0.2

Stat 111 - Lecture 14- One-Sample Proportions

slide30

Mass ESP Example

  • Calculate a p-value using the standard normal distribution
  • Two-sided alternative, so p-value = 2P(Z>2.43) = 0.015
    • We reject H0 at a =0.05 level, and conclude that the survey proportion is significantly different from p0=0.20
  • We could also calculate a 95% confidence interval for p:

prob = 0.0075

Z = 2.43

Interval doesn’t contain 0.20

Stat 111 - Lecture 14- One-Sample Proportions

slide31

Margin of Error

  • Confidence intervals for proportion p is centered at the sample proportion and has a margin of error:
  • Before the study begins, we can calculate the sample size needed for a desired margin of error
  • Problem: don’t know sample prop. before study begins!
  • Solution: use which gives us the maximum m
  • So, if we want a margin of error less than m, we need

Stat 111 - Lecture 14- One-Sample Proportions

slide32

Margin of Error Examples

  • Red Sox Example: how many students should I poll in order to have a margin of error less than 5% in a 95% confidence interval?
  • We would need a sample size of 385 students
  • ESP example: how many responses must newspaper receive to have a margin of error less than 1% in a 95% confidence interval?

Stat 111 - Lecture 14- One-Sample Proportions

next class lecture 15
Next Class - Lecture 15
  • Two-Sample Inference for Proportions
  • Moore, McCabe and Craig: Section 8.2

Stat 111 - Lecture 14- One-Sample Proportions