Nonparametric Inference

Nonparametric Inference

Why Nonparametric Tests? • We have been primarily discussing parametric tests; i.e. , tests that hold certain assumptions about when they are valid, e.g. t-tests and ANOVA both had assumptions regarding the shape of the distribution (normality) and about the necessity of having similar groups (homogeneity of variance). • When these assumptions hold we can use standard sampling distributions (e.g. t-distribution, F-distribution) to find p-values.

Why Nonparametric Tests? • When these assumptions are violated it is necessary to turn to tests that do not have such stringent assumptions ~ nonparametric or "distribution-free" tests. • Specifically, there are three cases which necessitate the use of non-parametric tests: 1) The data for the response is not at least interval scale, i.e. measurements. For example the response might be ordinal. 3) There exists severely unequal variances between groups, i.e. there is obviously a violation of the homogeneity of variance assumption required for parametric tests. In the last two cases, we have interval level data, but it violates our parametric assumptions. Therefore, we no longer treat this data as interval, but as ordinal. In a sense, we demote it because it fails to meet specific assumptions. 2) The distribution of the data for the response is not normal. Recall that a relatively normal distribution is assumed for parametric tests.

Table of Parametric & Nonparametric Tests

Independent Samples • For two populations we use… Mann-Whitney/Wilcoxon Rank Sum Test • For three or more populations we use… Kruskal-Wallis Test (at the end)

Mann-Whitney/Wilcoxon Rank Sum Test • Alternative to two-sample t-Test • Use when… - populations being sampled are not normally distributed. - sample sizes are small so assessing normality is not possible (ni< 20). - response is ordinal

Mann-Whitney/Wilcoxon Rank Sum Test General Hypotheses Ho: distribution of pop. A and pop. B are the same, i.e. A = B HA: distribution of pop. A and pop. B are NOT the same, i.e A = B HA: distribution of pop. A is shifted to the right of pop. B, i.e. A > B. HA: distribution of pop. A is shifted to the left of pop. B, i.e. A < B

Mann-Whitney/Wilcoxon Rank Sum Test Ho: A = B vs. HA: A > B Q: Is there evidence that the values in population A are generally larger than those in population B?

Mann-Whitney/Wilcoxon Rank Sum Test(Test Procedure) • Rank all N = nA + nB observations in the combined sample from both populations in ascending order. • Sum the ranks of the observations from populations A and B separately and denote the sums wA and wB. Assign average rank to tied observations. • For HA: A < B reject Ho if wA is “small” or wB is “big”.For HA: A > B reject Ho if wA is “big” or wB is “small”. • Use tables to determine how “big” or “small” the rank sums must be in order to reject Ho or use software to conduct the test.

Mann-Whitney/Wilcoxon Rank Sum Test(Critical Value Table) This table contains the value the smaller rank sum must be less than in order to reject the Ho for a one-tailed test situation for two significance levels (a = .05 & .01) Tables exist for the two-tailed tests as well. n is the sample size of the group with the smaller rank sum.

Example: Huntington’s Disease and Fasting Glucose Levels Davidson et al. studied the responses to oral glucose in patients with Huntington’s disease and in a group of control subjects. The five-hour responses are shown below. Is there evidence to suggest the five-hour glucose (mg present) is greater for patients with Huntington’s disease? Ho: Control = Huntington’s i.e. C = H HA: Control < Huntington’s i.e. C < H

Example: Observations & Ranks 10.5 9 15 3 13 1.5 17 1.5 16 5.5 5.5 19 7 21 8 20 18 10.5 4 13 13 wA = 78 wB = 153

Example: Critical Value Table Here, nC = 10 (control) nH= 11 (Huntington’s) we will reject Ho: C = H in favor of HA: C < H if the rank sum for the control group is less than 86 at a = .05 level and less than 77 at a = .01 level.

Example: Decision/Conclusion Using the Wilcoxon Rank Sum Test we have evidence to suggest that the five hour glucose level for individuals with Huntington’s disease is greater than that for healthy controls (p < .05). Note: p < .05 because the observed rank sum for the control group is less than 86 which is the critical value for a = .05.

Rank Sum Test in JMP The p-values reported based upon large sample approximations which generally should not be used when sample sizes are small. Here the conclusion reached is the same but in general we should use tables if they are available.

Dependent Samples • Sign Test • Wilcoxon Signed-Rank Test

Sign Test • The sign test can be used in place of the paired t-test when we have evidence that the paired differences are NOT normally distributed. • It can be used when the response is ordinal. • Best used when the response is difficult to quantify and only improvement can be measured, i.e. subject got better, got worse, or no change. • Magnitude of the paired difference is lost when using this test.

Sign Test • The sign test looks at the number of (+) and (-) differences amongst the nonzero paired differences. • A preponderance of +’s or –’s can indicate that some type of change has occurred. • If the null hypothesis of no change is true we expect +’s and –’s to be equally likely to occur, i.e. P(+) = P(-) = .50 and the number of each observed follows a binomial distribution.

Example: Sign Test • A study evaluated hepatic arterial infusion of floxuridine and cisplatin for the treatment of liver metastases of colorectral cancer. • Performance scores for 29 patients was recorded before and after infusion. Is there evidence that patients had a better performance score after infusion?

Example: Sign Test

Example: Sign Test • Ho: No change in performance score following infusion, or more specifically median change in performance score is 0. • HA: Performance scores improve following infusion, or more specifically median change in performance score > 0. Intuitively we will reject Ho if there is a “large” number of +’s.

17 nonzeros differences, 11 +’s 6 –’s Example: Sign Test - + + - + - + + + + + - + - - + +

Example: Sign Test • If Ho is true, X = the number of +’s has a binomial dist. with n = 17 and p = P(+) = .50. • Therefore the p-value is simply the P(X > 11|n=17, p = .50)=.166 > a • We fail to reject Ho, there is insufficient evidence to conclude the performance score improves following infusion (p = .166).

Wilcoxon Signed-Rank Test • The problem with the sign test is that the magnitude or size of the paired differences is lost. • The Wilcoxon Signed-Rank Test uses ranks of the paired differences to retain some sense of their size. • Use when the distribution of the paired differences are NOT normal or when sample size is small. • Can be used with an ordinal response.

Wilcoxon Signed Rank Test(Test Procedure) • Exclude any differences which are zero. • Put the rest of differences in ascending order ignoring their signs. • Assign them ranks. • If any differences are equal, average their ranks.

Example: Wilcoxon Signed Rank Test Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis • A researcher believes that patients with cystic fibrosis (CF) expend greater energy during resting than those without CF. To obtain a fair comparison she matches 13 patients with CF to 13 patients without CF on the basis of age, sex, height, and weight.

Example: Wilcoxon Signed Rank Test 6 3 -2 1 13 -5 9 11 4 12 7 8 10

Example: Wilcoxon Signed Rank Test We then calculate the sum of the positive ranks ( T+ ) and the sum of the negative ranks (T- ). Here we have T+ = 6 + 3 + 1 + 13 + 9 + 11 + 4 + 12 + 7 + 8 + 10 = 84and T-= 2 + 5 = 7

Wilcoxon Signed Rank Test(Test Statistic) • Intuitively we will reject the Ho ,which states that there is no difference between the populations, if either one of these rank sums is “large” and the other is “small”. • The Wilcoxon Signed Rank Test uses the smaller rank sum, T = min( T+ ,T- ) , as the test statistic.

Example: Wilcoxon Signed Rank Test For the cystic fibrosis example we have the following hypotheses: Ho: there is no difference in the resting energy expenditure of individuals with CF and healthy controls who are the same gender, age, height, and weight. HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight. MEDIAN PAIRED DIFFERENCE = 0 MEDIAN PAIRED DIFFERENCE > 0

Example: Wilcoxon Signed Rank Test HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight. • The alternative is clearly supported if T+ is “large” or T- is “small”. • The test statistic T = min( T+ , T- ) = 7 • Is T = 7 considered small, i.e. what is the corresponding p-value? • To answer this question we need a Wilcoxon Signed Rank Test table or statistical software.

Example: Wilcoxon Signed Rank Test This table gives the value of T = min( T+ , T- ) that our observed value must be less than in order to reject Ho for the both two- and one-tailed tests. Here we have n = 13 & T = 7. We can see that our test statistic is less than 21 (a = .05) and 12 (a = .01) so we will reject Ho and we also estimate that our p-value < .01.

Example: Wilcoxon Signed Rank Test • We conclude that individuals with cystic fibrosis (CF) have a large resting energy expenditure when compared to healthy individuals who are the same gender, age, height, and weight (p < .01).

Select Test Mean from Difference pull-down menu, 0 for null value, and check Wilcoxon option. Analysis in JMP The test statistic is reported as (T+ - T-)/2 = (84 – 7)/2 = 38.50 but we only need p-value = .0023.

Analysis in SPSS Click on CF first and then Healthy to specify that the paired difference will be defined as CF – Healthy & specify which tests to conduct. Note: the Difference column is not actually used in the SPSS analysis.

Independent Samples • If we have three or more populations to compare we use… Kruskal – Wallis Test

Kruskal-Wallis Test • One-way ANOVA for a completely randomized design is based on the assumption of normality and equality of variance. • The nonparametric alternative not relying on these assumptions is called the Kruskal-Wallis Test. • Like the Mann-Whitney/Wilcoxon Rank Sum Test we use the sum of the ranks assigned to each group when considering the combined sample as the basis for our test statistic.

Kruskal-Wallis Test Basic Idea: 1) Looking at all observations together, rank them. 2) Let R1, R2, …,Rk be the sum of the ranks of each group 3) If some Ri’s are much larger than others, it indicates the response values in different groups come from different populations.

Kruskal-Wallis Test • The test statistic is where, N= total sample size = n1 + n2 + ... + nk

Kruskal-Wallis Test • The test statistic is • Under the null hypothesis, this has an approximate chi-square distribution with df = k -1, i.e. . • The approximation is OK when each group contains at least 5 observations. • N= total sample size = n1 + n2 + ... + nk

Chi-squared Distribution and p-value Area = p-value

Example: Kruskal-Wallis Test A clinical trial evaluating the fever reducing effects of aspirin, ibuprofen, and acetaminophen was conducted. Study subjects were adults seen in an ER with diagnoses of flu with body temperatures between 100o F and 100.9o F. Subjects were randomly assigned to treatment. Changes in body temperature were recorded 2 hrs. after administration of treatments.

Example: Kruskal-Wallis Test Resulting Data: Temperature Decrease (deg. F) 5 4 8 6 9 14 11 12 3 15 10 2 13 7 1 N = 15R1 = 44 R2 = 50 R3 = 26 n1 = 4 n2 = 5 n3 = 6

Example: Kruskal-Wallis Test N = 15R1 = 44 R2 = 50 R3 = 26 n1 = 4 n2 = 5 n3 = 6

Chi-squared Distribution and p-value Area = .033

Kruskal-Wallis in JMP (Demo) Analyze > Fit Y by X RESULTS R1 = 44 n1 = 4 R2 = 50 n2 = 5 R3 = 26 n3 = 6 H = 6.833 df = 2 p = .033

Decision/Conclusion • Using the Kruskal-Wallis test have evidence to suggest that the temperature changes after taking the different drugs are not the same (p = .033). • Now we might like to know which drugs significantly differ from one another.

Multiple Comparisons forKruskal – Wallis Test • If we decide at least two populations differ in term of what is typical of their values we can use multiple comparisons to determine which populations differ. • To do this we calculate an approximate p-value for each pair-wise comparison and then compare that p-value to a Bonferroni corrected significance level (a).

Multiple Comparisons forKruskal – Wallis Test To determine if group i significantly differs from group j we compute . and then compute p-value = and compare to a/2m where mis the number of possible pair-wise comparisons, m =

Multiple Comparisons forKruskal – Wallis Test • Comparing Aspirin to Acetominophen N = 15 Aspirin Acetominophen R1 = 44 R3 = 26 n1 = 4 n3 = 6 Computing the Bonferroni corrected significance level we have .05/2(3) = .00833

Nonparametric Inference