Exploring Dependent Samples t-Test Analysis

Go to the beginning of Handout #4: Dependent Samples t-Test To compare means (1) for two repeated measures of a quantitative variable at two different times, (2) for two measurements of a quantitative variable from the same subject under two different conditions, or (3) for two commensurate measurements of quantitative variables from the same subject, one sample of paired observations is often used (as opposed to two independent samples of observations). When there is a choice, collecting data from one sample of paired observations is preferable to collecting data from two independent samples, since the former will tend to reduce random variation so as to eliminate/decrease the influence of variables other than the independent variable(s) of primary interest on the dependent variable of interest. We can let d represent the mean difference when one of the two paired measurements is subtracted from the other, where d defines order of subtraction. In this situation, one can think of the quantitative variable which is measured twice as the dependent variable and the dichotomous variable which describes the conditions under which the two measurements are made as the independent variable. It is also possible in this situation to think of the two quantitative measurements as two dependent variables with no independent variable.

A dependent samples t-test, also known as a one-sample paired t-test, can be used to decide if there is a significant difference in means, that is, whether the mean difference d is significantly different from 0 (zero). This test is considered to be parametric, since the focus is on estimating the mean difference. The H0 states that the mean difference d is 0 (zero). The H1 can be a one-sided or two-sided statement that the hypothesized value of 0 (zero) for d is not correct. The one-sample t test statistic is available based on the assumption that the two measures whose difference is the dependent variable are each a quantitative-continuous variable having a normal distribution. A confidence interval for estimating the mean difference is simply the one-sample t confidence interval. The Wilcoxon signed ranks test can be used to decide if there is a significant tendency for the difference between measures to be in one direction or the other. This test is considered to be nonparametric, since the focus is on rankings instead of actual values. The H0 states that there is no tendency for the difference between measures to be in one direction or the other. The H1 can be a one-sided or two-sided statement that there is a significant tendency for the difference between measures to be in one direction or the other.

The test statistic is based on ranking the absolute values of all the differences, and then comparing the sum of the ranks corresponding to the positive differences with the sum of the ranks corresponding to the negative differences; this test assumes only that the two measures are either both quantitative or both qualitative-ordinal. For a very small sample size, special tables are available to decide whether H0 should be rejected or not and to calculate the p-value. For a “sufficiently large” sample size, a z statistic can be calculated from the signed ranks so that a standard normal distribution can be used to decide whether H0 should be rejected or not and to calculate the p-value. Go to Exercise #5 on Class Handout #4: 5. Two identical footballs, one air-filled and one helium-filled, were used outdoors on a windless day at the Ohio State University's athletic complex to compare the distances traveled by the ball with helium and with air. A novice punter, who was not informed which football contained the helium, was the kicker. He kicked each football 39 times and changed footballs after each kick so that his leg would play no favorites if he tired or improved with practice. The experimenter recorded the distance in yards traveled by each ball. The following data are reported by Lafferty, M. B. (1993), "OSU scientists get a kick out of sports controversy," The Columbus Dispatch (November, 21, 1993), B7:

Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Air 25 23 18 16 35 15 26 24 24 28 25 19 27 25 34 26 Helium 25 16 25 14 23 29 25 26 22 26 12 28 28 31 22 29 Trial 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Air 20 22 33 29 31 27 22 29 28 29 22 31 25 20 27 26 Helium 23 26 35 24 31 34 39 32 14 28 30 27 33 11 26 32 Trial 33 34 35 36 37 38 39 Air 28 32 28 25 31 28 28 Helium 30 29 30 29 29 30 26 A 0.05 significance level is chosen to see if there is any evidence that the mean distance traveled by a football is larger with helium than with air. (a) State whether a paired t test or a two sample t test should be used and why. Since the data consists of one sample of paired measurements, a paired t test should be used.

5.-continued (b) Use SPSS to do the calculations necessary for the hypothesis test and to create an appropriate graphical display. Then, complete the four steps of the hypothesis test by completing the table titled Hypothesis Test About Mean Football Distance. The data is stored in the SPSS data file football. When using the Analyze >Compare Means > Paired-Samples T Test options in SPSS, two variables must be selected for the Paired Variables section. In case we reject H0 and want to estimate the mean difference with a confidence interval, set the confidence level in SPSS to be 95%, since we have  = 0.05.

Hypothesis Test About Mean Football Distance Step 1 H0: H1:  = Step 2 Step 3 Step 4 H–A = 0 H–A > 0 Note that the order of subtraction on the SPSS output is the opposite of the order chosen in H0 . 0.05 (one sided) n = xH–A= sH–A = t38 = 39 0.462 6.867 0.420 These statistics can all be obtained from the SPSS output. p-value do not reject H0 t distribution with df = 38 from the Student’s t distribution table 0.10 < p from the SPSS output p = 0.677/2 = 0.3385 t38; 0.05 = 1.684 Since t38 = 0.420 and t38; 0.05 = 1.684, we do not have sufficient evidence to reject H0 at the 0.05 level. We conclude that the mean yards traveled by a football is not larger with helium than with air (0.10 < p). That is, there is no statistically significant difference in mean yards with helium (mean = 26.38) and with air (mean = 25.92) in n = 39 pairs of kicks. or (p = 0.3385)

5.-continued Considering the results of the hypothesis test, decide which of the Type I or Type II errors is possible, and describe this error. (c) (d) (e) Since H0 is not rejected, the Type II error is possible, which is concluding that H–A = 0 when actually H–A > 0. Decide whether H0 would have been rejected or would not have been rejected with each of the following significance levels: (i)  = 0.01 , (ii)  = 0.10 . H0 would not have been rejected with  = 0.01 nor with  = 0.10. Considering the results of the hypothesis test, explain why a confidence interval for the mean difference is not of interest. Since H0 is not rejected, we have no reason to doubt the hypothesized value for the mean difference; in fact the 95% confidence interval will most likely contain the hypothesized value for the mean difference. Note that the limits of the confidence interval (displayed on the SPSS output) contain the hypothesized value for the mean difference zero (0).

Two box plots, one for the distances with helium and one for the distances with air, would be an appropriate graphical display for one sample of quantitative paired measurements; one box plot of the differences would also be appropriate. Since we notice that there are several outliers among the differences, we might decide to perform the paired t test with these outliers removed from the data. (See part (f).) Each outlier is labeled with its case number (i.e., its line number in the SPSS data file).

5.-continued (f) If the paired t test is performed after removing the outliers among the differences in this data, the resulting p-value is 0.071. Would this change the conclusion in the hypothesis test (i) with the significance level  = 0.05 actually selected? (ii) if a significance level  = 0.01 had been selected? (iii) if a significance level  = 0.10 had been selected? H0 is not rejected with  = 0.05 both when the outliers are removed from the data and when the outliers are included in the data. H0 is not rejected with  = 0.01 both when the outliers are removed from the data and when the outliers are not removed. H0 is rejected with  = 0.10 when the outliers are removed from the data, but H0 is not rejected when the outliers are included in the data.

6. Obtain the SPSS output for the example on pages 9 to 11 of the textbook by first selecting options Analyze > Compare Means > Paired-Samples T Test, choosing Anxiety_Pre to be Variable1 for the Paired Variables, and choosing Anxiety_Post to be Variable2 for the Paired Variables; then, select options Analyze > Nonparametric Tests > 2 Related Samples, choose Anxiety_Pre to be Variable1 for the Paired Variables, and choose Anxiety_Post to be Variable2 for the Paired Variables. Compare the syntax file commands generated by the output with those shown in the textbook. Look at the Analysis: SPSS output section on pages 10 to 11 of the textbook. Results concerning the dependent samples t-test stated in the text continue to use, as in the past, conventions /formats that are popular in the social sciences. Notice that the information displayed is as in the past except there is more detailed information about the p-value. Finally, notice how the results for the Wilcoxon test are stated as a confirmation of the results of the t test.

Go to the beginning of Class Handout #4: One-Way Analysis of Variance (ANOVA) Determining whether a relationship exists between a qualitative-nominal variable and a quantitative variable is essentially equivalent to determining whether there is a difference in the distribution of the quantitative variable for each of the categories of the qualitative-nominal variable. (Note that the independent samples t-test can be used when the qualitative-nominal variable is a dichotomous variable.) When looking for a difference in the distribution of a quantitative variable for k categories of a qualitative-nominal variable (and for k = 2 the qualitative-nominal variable is dichotomous), it is common (but not necessary) to focus on the mean of the distribution. We can let 1 , 2 , … , k represent the respective means of the quantitative variable for the k categories of the qualitative-nominal variable.

In this situation, one can think of the quantitative variable as the dependent variable and the qualitative-nominal variable as the independent variable, that is, we can think of predicting the (mean of the) quantitative variable from one qualitative-nominal variable. The technique called one-way analysis of variance (ANOVA) can be used; when predicting from two or three qualitative-nominal variables, a two-way ANOVA or three-way ANOVA respectively could be used. A one-way ANOVA f-test can be used to decide if there is at least one significant difference in the means. This test is considered to be parametric, since the focus is on estimating the difference in means. The H0 states that there is no difference among the means 1 , 2 , … , k. The H1 states that there is at least one difference among the means 1 , 2 , … , k. A one-way ANOVA f test statistic is available based on the assumptions that for each of the k categories of the qualitative-nominal variable, the dependent variable is a quantitative-continuous variable having a normal distribution with the same variance for the k categories. Alternative f statistics (such as the Brown-Forsythe and Welch statistics) are available when the assumption about equal variances is not met. The one-way ANOVA f statistic is a ratio defined by a measure of variation among sample means divided by a measure of variation within samples. The measure of variation among sample means and the measure of variation within samples are each calculated by dividing a sum of squares by a corresponding degrees of freedom.

These box plots illustrate data for three samples where the variation within samples is large relative to the variation between samples. 3.0 3.5 4.0 These box plots illustrate data for three samples where the variation between samples is large relative to the variation within samples. 3.0 3.5 4.0

The between groups sum of squares is computed by first squaring the difference between each sample mean and the overall mean from combining all samples together, then multiplying each squared difference by the corresponding sample size, and finally summing all these results. The between groups mean square is obtained by dividing the between groups sum of squares by its degrees of freedom, which is one less than the number of groups The within groups sum of squares is computed by first squaring the difference between each observation and the corresponding sample mean, and summing all these results. The within groups mean square is obtained by dividing the within groups sum of squares by its degrees of freedom, which is the total sample size minus the number of groups. The one-way ANOVA f statistic is the between groups mean square divided by the within groups mean square, and the calculations to obtain this f statistic, denoted by fk–1, n–k, are often organized into what is called an ANOVA table: k – 1 n – k n – 1

Note that k is the number of groups (samples), and n is the total sample size (i.e., the sum of the k sample sizes). The one-way ANOVA table has a column for the different sources of variation, a column for degrees of freedom (df), a column for sums of squares (SS), and a column for mean squares (MS); also included are the f statistic and corresponding p-value. The first row (Between Groups) is concerned with the variation among sample means, the second row (Within Groups) is concerned with the variation within samples, and the third and last row (Total) is concerned with the total variation in the data. The sum of the between groups sum of squares and the within groups sum of squares is equal to the total sum of squares, and the sum of the between groups degrees of freedom and the within groups degrees of freedom is equal to the total degrees of freedom (which is one less than the sum of the sample sizes). When displaying an ANOVA table, the label “Between Groups” for the first row is usually replaced by a label which reflects the groups defined by the qualitative variable, and the label “Within Groups” for the second row is sometimes replaced by the label “Error” or “Residual”.

Since we conclude that the mean is the same for each category (or population) when the null hypothesis is not rejected in a one-way ANOVA, there is generally no further statistical analysis of interest. Since we conclude that the mean is different for at least one category (or population) when the null hypothesis is rejected in a one-way ANOVA, there is generally a need for further statistical analysis to decide which differences in pairs of means are significant. Go to Exercise #7 on Class Handout #4:

7. The mean lifetime of light bulbs is being studied for three brands named Brite, Softlite, and Nodark. A 0.05 significance level is chosen for a hypothesis test to see if there is any evidence that mean lifetime is not the same for the brands Brite, Softlite, and Nodark. Light bulbs are randomly selected from each brand, and the lifetimes in hours are recorded as follows: Brite 1094 1121 1151 Softlite 1066 1097 1117 1112 1078 Nodark 1158 1139 1147 1112 (a) Since it is believed that the standard deviation of lifetime is not significantly different for the three brands, it is decided that a one-way ANOVA will be used. Calculate the sample means, overall mean, between groups sum of squares, and within group sum of squares; then complete the ANOVA table. Bulb Brands Error

Bulb Brands 2322 4644 4.46 0.01 < p < 0.05 2 Error 9 4682 520.222 11 9326 Sample Means (1094 + 1121 + 1151) / 3 = 1122 (1066 + 1097 + 1117 + 1112 + 1078) / 5 = 1094 (1158 + 1139 + 1147 + 1112) / 4 = 1139 Overall Mean (1094 + 1121 + … + 1147 + 1112) / 12 = 1116 Between Groups Sum of Squares (3)(1122 – 1116)2 + (5)(1094 – 1116)2 + (4)(1139 – 1116)2 = 4644 Within Groups Sum of Squares (1094 – 1122)2 + (1121 – 1122)2 + … + (1147 – 1139)2 + (1112 – 1139)2 = 4682 Total Sum of Squares (1094 – 1116)2 + (1121 – 1116)2 + … + (1147 – 1116)2 + (1112 – 1116)2 = 9326 = 4644 + 4682 Since f2, 9 = 4.46 and f2, 9; 0.05 = 4.26, we have sufficient evidence to reject H0 at the 0.05 level. We conclude that mean lifetime is not the same for the brands Brite,

Since f2, 9 = have sufficient evidence to reject H0 at the 0.05 level. We conclude that 4.46 and f2, 9; 0.05 = 4.26, we mean lifetime is not the same for the brands Brite, Softlite, and Nodark (0.01 < p < 0.05). Next class, we shall continue this exercise, using SPSS to do the calculations.

Exploring Dependent Samples t-Test Analysis

Exploring Dependent Samples t-Test Analysis

Presentation Transcript

Journey to the Beginning of Time

Go to Exercise #6 on Class Handout #5:

Go to Exercise #5 on Class Handout #6:

Go to Slide Show menu and From Beginning

Correction of the Handout

Augmented Reality: Pokémon Go is just the beginning!

Journey to the Beginning of Time