 Download Download Presentation STAT131 W 11 L 1a: Comparing t w o populations

# STAT131 W 11 L 1a: Comparing t w o populations

Download Presentation ## STAT131 W 11 L 1a: Comparing t w o populations

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. STAT131W11L1a: Comparing two populations by Anne Porter alp@uow.edu.au

2. Comparing populations • Comparing two proportions • n large Z • Comparing two means • n large • s known Z • s unknown estimated by s t • n small but normally distributed data • s known Z • s unknown estimated by s t

3. Comparing two means Measurements are from two independent random samples Use of the Normal (Z) or t distribution Choice pertains to assumptions • s1 and s2 known or unknown AND • Small n AND both populations normally distributed OR • Large n

4. s1 and s2 known Confidence intervals for the difference in means Calculating Z for the hypothesis test

5. t-test s1 and s2 unknown(the more general case) • (s1 and s2 estimated by S1 and S2) • AND • Small n AND both populations normally distributed • OR • Large n Confidence Interval

6. Estimates of standard deviation for the difference between two sample means Two sample variances considered to be equal (pooled) Two sample variances considered unequal (unpooled)

7. Degrees of freedom df= v are given by • Two sample variances considered to be equal df= v=n1+ n2-2 • Two sample variances considered unequal dfcan be approximated by v= minimum (n1-1, n2-1) Rather than

8. Comparing two population means • Test at the 5% level of significance to see if there is a difference in the mean level of blood pressure for men and women • Find the 95% CI for the difference in two means

9. SPSS Output –making sense of data Females have a higher mean blood pressure reading than males and possibly a Lower spread. Are these differences sufficiently small to be considered to have occurred by chance if there is no real difference?

10. Checking assumptions for t-test • Independence - usually by design • Eg males versus females • Not independent • students with a time 1 and time 2 test scores • twins with randomly assigned one to each group

11. Normality BP Stem-and-Leaf Plot for SEX= m Frequency Stem & Leaf 3.00 10 . 899 1.00 11 . 4 3.00 12 . 045 8.00 13 . 23444569 2.00 14 . 09 2.00 15 . 27 2.00 Extremes (>=164) Stem width: 10.00 Each leaf: 1 case(s)Extreme It is difficult to see with small data sets but these could be interpreted as reasonably normally distributed

12. Normality BP Stem-and-Leaf Plot forSEX= f Frequency Stem & Leaf 2.00 11 . 69 4.00 12 . 0356 9.00 13 . 444556999 4.00 14 . 0266 3.00 15 . 179 2.00 16 . 33 1.00 Extremes (>=166) Stem width: 10.00 Each leaf: 1 case(s) It is difficult to see with small data sets but these could be interpreted as reasonably normally distributed

13. Normal Q-Q Plot Process • Determine the normal scores of the sample • (Z scores expected from smallest to largest in a sample this size) • Sample value plotted against its normal score Points are meant to fall on a straight line if normally distributed

14. Normal Q-Q Plot • How straight does the line need to be? Examine this by simulating samples with the same size and same population parameters

15. Equal variances – This shows Range and IQR Spread looks similar Why? Ranges approximately equal IQR close (not one twice the other

16. P-value>.05 therefore No evidence to suggest Unequal variances SPSS Output V11

17. Formally Step 1: Specify hypotheses Step 2: Specify a Step 3: Decide on the statistic and region of rejection Check normality of both samples --OK Check variances equal or unequal -- Levene’s (can look at boxplot). The test for equality of variances is not significant as p-value= 0.586 Hence t with equal variances and df=n1+n2-2If |t| >see t tables Step 4: Calculate t Step 5 Conclude

18. Variances assumed equal t=1.176 >tdf,a/2 as probability of getting a t as extreme as this is 0.246 ie >.025 Hence no reason to reject Ho SPSS Output V11

19. Verify the value of t SPSS Output V11

20. Conclude • As p-value =.246 ie p is higher than .05 we have no reason to reject H0 • OR • As |-1.176| < See t tables We have no evidence to suggest that there is any difference in blood pressure for males and females

21. Testing using 95% confidence interval As the difference in means is hypothesised to be 0 and this lies within The confidence interval there is no reason to reject Ho.

22. Comparing two population means • Test at the 5% level of significance to see if there is a difference in the mean weight loss for two samples. Subjects were randomly assigned to one of two different diets. • Find the 95% CI for the difference in two means

23. Look at the data: what does it suggest?

24. Look at the data: what does it suggest?

25. Look at the data: what does it suggest? • The variable is normal if the points fall on a straight line - it is difficult to assess so why not look at some simulated samples

26. Look at the data: what does it suggest? WEIGHT Stem-and-Leaf Plot forGROUP= 2.00 Frequency Stem & Leaf 2.00 2 . 79 3.00 3 . 005 1.00 4 . 1 4.00 5 . 1238 6.00 6 . 001178 3.00 7 . 779 3.00 8 . 457 2.00 9 . 02 1.00 10 . 4 Stem width: 1.00 Each leaf: 1 case(s) Looks reasonably Normal - it is hard To tell with small samples

27. Look at the data: what does it suggest? WEIGHT Stem-and-Leaf Plot for GROUP= 1.00 Frequency Stem & Leaf 1.00 -0 . 2 6.00 0 . 011223 12.00 0 . 566778888899 2.00 1 . 23 Stem width: 10.00 Each leaf: 1 case(s) Looks reasonably Normal - it is hard To tell with small samples

28. The probability of getting an F statistic as large as this if there is no differences in variances is 0.006 ie variances are different so use SPSS V10 output • As the probability of getting this t of -.11 is high (p-value =.910) ie greater than 0.05 then we retain H0 there appears to be no difference in mean weight loss

29. SPSS V11output – different layoutAnalyse, Compare means, independent samples, etc

30. Checking assumptions for t-test • What if we do not have independent groups • That is a two-sample t-test • Examples • students with a time 1 and time 2 test scores • twins with randomly assigned one to each group • Data are paired or matched • Analysis takes into account correlation between the two measures

31. Paired t-test Using d to represent the difference between x1- x2 • Ho: md = 0 • Ha: md ≠ 0 • Select a • Select test and state decision rule for rejection • Calculate test statistic as for one sample t on diff • Draw conclusions

32. Data Utts, p446. Time for task performance By 10 pilots under Alcohol and no alcohol Conditions. Diff is the time difference No alcohol - alcohol

33. Examine plots of both variables One outlier in the alcohol condition. The median time is lower for the alcohol condition (300) than no alcohol. The spread (IQR)is greater for the noalcohol condition. Some asymmetry in both conditions. N is small.

34. Menu options • Analyse, Compare Means, Paired Samples T-test • Must select two variables to move to paired analysis OR • Analyse, Compare Means, One sample t, analysing the variable diff

35. Check Assumptions – Normality for n is small DIFF Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 -0 . 00 6.00 0 . 000113 2.00 0 . 55 Stem width: 1000 Each leaf: 1 case(s)

36. Output – two tailed Ha: m≠0 t =2.286 >t9,0.025 The probability of getting a t this extreme is 0.025 As this is low, <.05, there is evidence that the means times are not the same for the alcohol and no alcohol groups.

37. Output – one tailed Ha: m>0 As the probability of a t this large is 0.025/2 (one tailed)=0.0125, that is smaller than .05 we reject Ho there is evidence that the mean times are not the same. (Note in Utts the Minitab p is the one tail value and doubled if two tailed)

38. Comparing two proportions From central limit theorem, n large >30 can use Normal(Z) distribution. Samples must be independent, and for both samples at least 5, preferably 10. Confidence Interval for difference in proportions Test Statistic

39. Example • A survey of people in NSW revealed the following voting intentions.

40. Example • (1) Find the 95% confidence interval for the difference in proportions of males and females voting alp

41. CI for difference -assumptions should be more than 10

42. CI for difference in two proportions

43. CI for difference in two proportions As 0 lies between the upper and lower limits of the CI we can conclude that there is no real difference between the Males and female proportion of alp voters.

44. Example – Hypothesis Test Ho: Ha: a= Decision Calculate From before

45. Example – Hypothesis Test Ho: pm= pf Ha: pm≠ pf a=.05 Decision if |Z|>1.96 reject H0 Calculate From before

46. Example – Hypothesis Test Conclusion As 1.45 < 1.96 there is no reason to reject Ho that the proportions of male alp voters is different to the proportion of female alp voters

47. Formal Steps • Specify hypotheses • Specify Alpha • Select test and rejection region • Compute statistic from data • Draw conclusions

48. Comparing three means • Not by multiple t-tests • Analysis of Variance techniques see Chapter 16, p561-576

49. Comparing three medians • Not by multiple t-tests • Non-parametric procedures See Utts Section 16.3