STAT131 W 11 L 1a: Comparing t w o populations - PowerPoint PPT Presentation

stat131 w 11 l 1a comparing t w o populations n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
STAT131 W 11 L 1a: Comparing t w o populations PowerPoint Presentation
Download Presentation
STAT131 W 11 L 1a: Comparing t w o populations

play fullscreen
1 / 49
STAT131 W 11 L 1a: Comparing t w o populations
240 Views
Download Presentation
kuri
Download Presentation

STAT131 W 11 L 1a: Comparing t w o populations

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. STAT131W11L1a: Comparing two populations by Anne Porter alp@uow.edu.au

  2. Comparing populations • Comparing two proportions • n large Z • Comparing two means • n large • s known Z • s unknown estimated by s t • n small but normally distributed data • s known Z • s unknown estimated by s t

  3. Comparing two means Measurements are from two independent random samples Use of the Normal (Z) or t distribution Choice pertains to assumptions • s1 and s2 known or unknown AND • Small n AND both populations normally distributed OR • Large n

  4. s1 and s2 known Confidence intervals for the difference in means Calculating Z for the hypothesis test

  5. t-test s1 and s2 unknown(the more general case) • (s1 and s2 estimated by S1 and S2) • AND • Small n AND both populations normally distributed • OR • Large n Confidence Interval

  6. Estimates of standard deviation for the difference between two sample means Two sample variances considered to be equal (pooled) Two sample variances considered unequal (unpooled)

  7. Degrees of freedom df= v are given by • Two sample variances considered to be equal df= v=n1+ n2-2 • Two sample variances considered unequal dfcan be approximated by v= minimum (n1-1, n2-1) Rather than

  8. Comparing two population means • Test at the 5% level of significance to see if there is a difference in the mean level of blood pressure for men and women • Find the 95% CI for the difference in two means

  9. SPSS Output –making sense of data Females have a higher mean blood pressure reading than males and possibly a Lower spread. Are these differences sufficiently small to be considered to have occurred by chance if there is no real difference?

  10. Checking assumptions for t-test • Independence - usually by design • Eg males versus females • Not independent • students with a time 1 and time 2 test scores • twins with randomly assigned one to each group

  11. Normality BP Stem-and-Leaf Plot for SEX= m Frequency Stem & Leaf 3.00 10 . 899 1.00 11 . 4 3.00 12 . 045 8.00 13 . 23444569 2.00 14 . 09 2.00 15 . 27 2.00 Extremes (>=164) Stem width: 10.00 Each leaf: 1 case(s)Extreme It is difficult to see with small data sets but these could be interpreted as reasonably normally distributed

  12. Normality BP Stem-and-Leaf Plot forSEX= f Frequency Stem & Leaf 2.00 11 . 69 4.00 12 . 0356 9.00 13 . 444556999 4.00 14 . 0266 3.00 15 . 179 2.00 16 . 33 1.00 Extremes (>=166) Stem width: 10.00 Each leaf: 1 case(s) It is difficult to see with small data sets but these could be interpreted as reasonably normally distributed

  13. Normal Q-Q Plot Process • Determine the normal scores of the sample • (Z scores expected from smallest to largest in a sample this size) • Sample value plotted against its normal score Points are meant to fall on a straight line if normally distributed

  14. Normal Q-Q Plot • How straight does the line need to be? Examine this by simulating samples with the same size and same population parameters

  15. Equal variances – This shows Range and IQR Spread looks similar Why? Ranges approximately equal IQR close (not one twice the other

  16. P-value>.05 therefore No evidence to suggest Unequal variances SPSS Output V11

  17. Formally Step 1: Specify hypotheses Step 2: Specify a Step 3: Decide on the statistic and region of rejection Check normality of both samples --OK Check variances equal or unequal -- Levene’s (can look at boxplot). The test for equality of variances is not significant as p-value= 0.586 Hence t with equal variances and df=n1+n2-2If |t| >see t tables Step 4: Calculate t Step 5 Conclude

  18. Variances assumed equal t=1.176 >tdf,a/2 as probability of getting a t as extreme as this is 0.246 ie >.025 Hence no reason to reject Ho SPSS Output V11

  19. Verify the value of t SPSS Output V11

  20. Conclude • As p-value =.246 ie p is higher than .05 we have no reason to reject H0 • OR • As |-1.176| < See t tables We have no evidence to suggest that there is any difference in blood pressure for males and females

  21. Testing using 95% confidence interval As the difference in means is hypothesised to be 0 and this lies within The confidence interval there is no reason to reject Ho.

  22. Comparing two population means • Test at the 5% level of significance to see if there is a difference in the mean weight loss for two samples. Subjects were randomly assigned to one of two different diets. • Find the 95% CI for the difference in two means

  23. Look at the data: what does it suggest?

  24. Look at the data: what does it suggest?

  25. Look at the data: what does it suggest? • The variable is normal if the points fall on a straight line - it is difficult to assess so why not look at some simulated samples

  26. Look at the data: what does it suggest? WEIGHT Stem-and-Leaf Plot forGROUP= 2.00 Frequency Stem & Leaf 2.00 2 . 79 3.00 3 . 005 1.00 4 . 1 4.00 5 . 1238 6.00 6 . 001178 3.00 7 . 779 3.00 8 . 457 2.00 9 . 02 1.00 10 . 4 Stem width: 1.00 Each leaf: 1 case(s) Looks reasonably Normal - it is hard To tell with small samples

  27. Look at the data: what does it suggest? WEIGHT Stem-and-Leaf Plot for GROUP= 1.00 Frequency Stem & Leaf 1.00 -0 . 2 6.00 0 . 011223 12.00 0 . 566778888899 2.00 1 . 23 Stem width: 10.00 Each leaf: 1 case(s) Looks reasonably Normal - it is hard To tell with small samples

  28. The probability of getting an F statistic as large as this if there is no differences in variances is 0.006 ie variances are different so use SPSS V10 output • As the probability of getting this t of -.11 is high (p-value =.910) ie greater than 0.05 then we retain H0 there appears to be no difference in mean weight loss

  29. SPSS V11output – different layoutAnalyse, Compare means, independent samples, etc

  30. Checking assumptions for t-test • What if we do not have independent groups • That is a two-sample t-test • Examples • students with a time 1 and time 2 test scores • twins with randomly assigned one to each group • Data are paired or matched • Analysis takes into account correlation between the two measures

  31. Paired t-test Using d to represent the difference between x1- x2 • Ho: md = 0 • Ha: md ≠ 0 • Select a • Select test and state decision rule for rejection • Calculate test statistic as for one sample t on diff • Draw conclusions

  32. Data Utts, p446. Time for task performance By 10 pilots under Alcohol and no alcohol Conditions. Diff is the time difference No alcohol - alcohol

  33. Examine plots of both variables One outlier in the alcohol condition. The median time is lower for the alcohol condition (300) than no alcohol. The spread (IQR)is greater for the noalcohol condition. Some asymmetry in both conditions. N is small.

  34. Menu options • Analyse, Compare Means, Paired Samples T-test • Must select two variables to move to paired analysis OR • Analyse, Compare Means, One sample t, analysing the variable diff

  35. Check Assumptions – Normality for n is small DIFF Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 -0 . 00 6.00 0 . 000113 2.00 0 . 55 Stem width: 1000 Each leaf: 1 case(s)

  36. Output – two tailed Ha: m≠0 t =2.286 >t9,0.025 The probability of getting a t this extreme is 0.025 As this is low, <.05, there is evidence that the means times are not the same for the alcohol and no alcohol groups.

  37. Output – one tailed Ha: m>0 As the probability of a t this large is 0.025/2 (one tailed)=0.0125, that is smaller than .05 we reject Ho there is evidence that the mean times are not the same. (Note in Utts the Minitab p is the one tail value and doubled if two tailed)

  38. Comparing two proportions From central limit theorem, n large >30 can use Normal(Z) distribution. Samples must be independent, and for both samples at least 5, preferably 10. Confidence Interval for difference in proportions Test Statistic

  39. Example • A survey of people in NSW revealed the following voting intentions.

  40. Example • (1) Find the 95% confidence interval for the difference in proportions of males and females voting alp

  41. CI for difference -assumptions should be more than 10

  42. CI for difference in two proportions

  43. CI for difference in two proportions As 0 lies between the upper and lower limits of the CI we can conclude that there is no real difference between the Males and female proportion of alp voters.

  44. Example – Hypothesis Test Ho: Ha: a= Decision Calculate From before

  45. Example – Hypothesis Test Ho: pm= pf Ha: pm≠ pf a=.05 Decision if |Z|>1.96 reject H0 Calculate From before

  46. Example – Hypothesis Test Conclusion As 1.45 < 1.96 there is no reason to reject Ho that the proportions of male alp voters is different to the proportion of female alp voters

  47. Formal Steps • Specify hypotheses • Specify Alpha • Select test and rejection region • Compute statistic from data • Draw conclusions

  48. Comparing three means • Not by multiple t-tests • Analysis of Variance techniques see Chapter 16, p561-576

  49. Comparing three medians • Not by multiple t-tests • Non-parametric procedures See Utts Section 16.3