Comparing Two Samples: Part II

Comparing Two Samples: Part II

Tests for Two Paired Samples • When each observation in sample A is in some way correlatedwith an observation in sample B, the data may be said to occur in pairs. • Test Ho: left foreleg and left hindleg lengths of deer are equal • Comparison between • 2 analytical (or measurement) methods for the same sample • 2 operators or experimenters for the same task • Other application • Testing if point discharges in the town significantly increase the mean phosphate concentration below the town (over a period of time)

Paired-Sample t Test • Does nothave the equality of variances assumptions of the two-sample t test • But assumes only that the differences, di, come from a normally distributed population of differences. • If there is pairwise correlation of data from the two samples, the paired-sample t test will be more powerful than the two-independent sample t test.

Paired-sample t test – An example t = d/Sd = d/(Sd/n) t = 3.3/ 0.97 = 3.402 t 0.05(2), 9 = 2.262, 0.005<p<0.01 Thus, reject Ho • Ho: left foreleg and left hindleg lengths of deer are equal i.e. Ho: d = 0 and HA: d  0 n = 10 d = 3.3 df = n - 1 = 9 Sd2 = 9.3444 Sd = 3.08 Sd = 3.08/10 = 0.97

For one-tailed hypotheses with paired samples, you can test either Ho: d  o and HA: d < o, or Ho: d  o and HA: d > o • e.g. To test whether a new fertilizer results in an increase of more than 250 kg/ha in crop yield over the old fertilizer. Ho: d  250 and HA: d > 250 t = (d - o)/Sd t = (295.6 - 250)/ 26.9 = 1.695 t 0.05(1), 8 = 1.860, 0.05<p<0.10 Thus, accept Ho

A new method is being developed for blood urea nitrogen (BUN) and you wish to compare the new method with a standard method. You have 6 blood samples which are analyzed for BUN by these two methods, respectively. • To test whether the new method gives comparable results as those determined by the standard method Ho: d = 0 and HA: d  0 t = (d)/Sd t = 0.28/ 0.17 = 1.665 t 0.05(2), 5 = 2.571, p>0.05 Thus, accept Ho

A research wishes to compare the concentrations of dissolved phosphate (PO42-) in two stretches of the same river using the molydenum blue method. • At midday, a sample is taken upstream above the town, and the researcher then drives below the town to take a second sample downstream a few minutes later. • The same procedure is followed for next 9 days to obtain a total of 10 upstream and 10 downstream samples. • Most PO42- in polluted river systems is known to originate from discrete outfalls. • This study was undertaken to see whether point discharges in the town significantly increase the mean PO42-concentration below the town Photo source: EPD

During the sampling period, there were heavy rainfalls on days 4 and 5. Since these would have effect on both up- and downstream samples, the samples were not independent. In this case, a paired test must be used. Ho: above  below HA: above < below t = (d)/Sd t = -0.089/ 0.022 = - 3.994 t = 3.994 t 0.05(1), 9 = 1.833 0.002<p<0.005 Thus, reject Ho but accept HA

A non-parametric paired test - Wilcoxon’s matched pairs signed ranks test • An alternative method for paired measurements is based upon a ranking procedure • Wilcoxon’s test is used when the assumptions for the paired t test fail, i.e. when the measurements are not normally distributed • A test statistic T is obtained by a ranking procedure and is compared with critical values of T • The null hypothesis is rejected if observed T value is less than the critical T value

Wilcoxon’s test • Test Ho: left foreleg and left hindleg lengths of deer are equal i.e. Ho: d = 0 and HA: d  0

Test Ho: left foreleg and left hindleg lengths of deer are equal i.e. Ho: d = 0 and HA: d  0 Wilcoxon’s test T+ = 4.5 + 4.5 + 7 + 7 + 9.5 + 7 + 9.5 + 2 = 51 T- = 3 + 1 = 4 T 0.05(2), 10 = 8 from Table B.12 (Zar’s Book) Since T- < T 0.05(2), 10 , Ho is rejected, 0.01<p<0.02

Wilcoxon’s test • A long-term experiment is conducted to determine if total soil nitrogen (N) is depleted in grassland which is given one cut per year but is left unfertilized. The study will show whether atmospheric N sources are sufficient to replenish the N lost by cutting, denitrification and leaching. • Twelve 1 m2 grassland plots were maintained for a period of 10 years. The 12 measurements may be considered paired as they are taken from the same 12 sites over the 10 year period. • Test Ho: d = 0 and HA: d  0

Wilcoxon’s test Test Ho: d = 0, HA: d  0 n =12 T+ = 9 + 12 + 5 + 10 + 2 + 11 + 6.5 + 4 + 8 = 67.5 T- = 1 + 3 + 6.5 = 10.5 T 0.05(2), 12 = 13 Since T- < T 0.05(2), 10 , Ho is rejected, 0.02<p<0.05 Accept HA: there is a significant decline of nitrogen content after the 10 year period.

Power and sample size for Student’s t test • We can estimate the minimum sample size to use to achieve desired test characteristics: • n  (2SP2/2)(t, + t(1),)2 • where  is the smallest population difference we wish to detect:  = 1 - 2 • Required sample size depends on , population variance(2), , and power(1-) • If we want to detect a very small , we need a larger sample. • If the variability within samples is great, a large n is required. The results of pilot study or pervious study of this type would provide such an information.

sp2 = (SS1+ SS2) / (υ1+ υ2)= 0.519 sX1 – X2 = √(sp2/n1 + sp2/n2) = 0.401 t = (X1 – X2) / sX1 – X2=-2.470 t = 0.05, df = 6 + 7 -2 = 11, 2-tailed = 2.201 < 2.470; 0.02<p<0.05 Thus, reject Ho but accept HA. • The data are human blood-clotting time (in minutes) of individuals given one of two different drugs. It is hypothesized that Ho: a = b while HA: a  b (Given that the data are normally distributed) n 6 7 mean 8.75 9.74 S2 0.339 0.669 Example showed before !!

Estimation of required sample size for a two-sample t test n  (2SP2/2)(t, + t(1),)2 • We want to test for significant difference between the mean blood-clotting times of persons using two different drugs. We wish to test at  = 0.05, with a 90% chance of detecting a true difference between population means as small as 0.5 min (i.e. Desired Power = 0.9). • The within population variability SP2 =((na  1)Sa2 + (nb  1)Sb2)/ • Based the above example, SP2 =0.52 min2 • Let us guess that n = 100,  = 2(n-1) =198 t0.05(2), 198 = 1.972,  = 1- 0.90 = 0.10, t0.10, 198 = 1.286 n  (2SP2/2)(t, + t(1),)2 n  (2(0.52)/0.52)(1.972 + 1.286)2 = 44.2 • Let us now use n =45,  = 88, t0.05(2), 88 = 1.98, t0.10, 88 = 1.291 n  (2(0.52)/0.52)(1.987 + 1.291)2 = 44.7 • Therefore we conclude that each of the two samples should contain at least 45 data

Estimation of required sample size for a two-sample t test • Let us now use n =45,  = 88, t0.05(2), 88 = 1.98, t0.10, 88 = 1.291 n  (2(0.52)/0.52)(1.987 + 1.291)2 = 44.7 • Therefore we conclude that each of the two samples should contain at 45 data • For unequal sample sizes: If n1were constrained to be 30, then using the following equation to estimate n2 n2 = nn1/(2n1 - n) where n is the estimated minimal sample size n2 = (44.7)(30)/[2(30) - 44.7] = 88

Estimation of minimum detectable difference n  (2SP2/2)(t, + t(1),)2 • The above equation can be rearranged to ask how small a population difference () is detectable with a given sample size:  [(2SP2/n)](t, + t(1),) • Use the previous example, given that n = 20,  = 2(20 - 1) = 38  = [(2(0.5193)/20)](2.024 + 1.304) = 0.76 min.

Power of the test n  (2SP2/2)(t, + t(1),)2 • Rearrange the above equation results in: t(1), {/[(2SP2/n)]} - t, • Then we can consult Table B3 to determine (1),where 1- (1) is the power. But this only provides a range of power (e.g. 0.75 to 0.90) • Used the previous example: given that n1 = n2=15, and (2) = 0.05, to estimate the probability of detecting a true difference of 1 min between the two sample mean • n = 15,  = 2(15 - 1) = 28, and t0.05(2), 28 = 2.048 t(1), {1/[(2(0.5193)/15)]} - 2.048 = 1.752 Consult Table B3, for one-tailed probabilities and  = 28: 0.025<p<0.05 and therefore so 0.95 < power < 0.975

Important Notes • Where the measurements obtained in one sample can be shown to depend on measurements in the other sample, paired tests should be used. • A paired t test can be used for dependent interval/ ratio scale data and the sample differences are approximately normally distributed. • The non-parametric Wilconxon matched pairs signed test can be used for paired interval/ ratio data sets. • With a prior information of the within population variability, we can estimate the minimum sample size and detectable difference as well as power of the test.

Comparing Two Samples: Part II