1 / 53

# Nonparametric Inference - PowerPoint PPT Presentation

Nonparametric Inference. Why Nonparametric Tests?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Nonparametric Inference' - sylvie

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Nonparametric Inference

• We have been primarily discussing parametric tests; i.e. , tests that hold certain assumptions about when they are valid, e.g. t-tests and ANOVA both had assumptions regarding the shape of the distribution (normality) and about the necessity of having similar groups (homogeneity of variance).

• When these assumptions hold we can use standard sampling distributions (e.g. t-distribution, F-distribution) to find p-values.

• When these assumptions are violated it is necessary to turn to tests that do not have such stringent assumptions ~ nonparametric or "distribution-free" tests.

• Specifically, there are three cases which necessitate the use of non-parametric tests:

1) The data for the response is not at least interval scale, i.e. measurements. For example the response might be ordinal.

3) There exists severely unequal variances between groups, i.e. there is obviously a violation of the homogeneity of variance assumption required for parametric tests.

In the last two cases, we have interval level data, but it violates our parametric assumptions. Therefore, we no longer treat this data as interval, but as ordinal. In a sense, we demote it because it fails to meet specific assumptions.

2) The distribution of the data for the response is not normal.

Recall that a relatively normal distribution is assumed for parametric tests.

• For two populations we use…

Mann-Whitney/Wilcoxon Rank Sum Test

• For three or more populations we use…

Kruskal-Wallis Test (at the end)

• Alternative to two-sample t-Test

• Use when…

- populations being sampled are not normally distributed.

- sample sizes are small so assessing normality is not possible (ni< 20).

- response is ordinal

General Hypotheses

Ho: distribution of pop. A and pop. B are the same, i.e. A = B

HA: distribution of pop. A and pop. B are NOT the same, i.e A = B

HA: distribution of pop. A is shifted to the right of pop. B, i.e. A > B.

HA: distribution of pop. A is shifted to the left of pop. B, i.e. A < B

Ho: A = B vs. HA: A > B

Q: Is there evidence that the values in population A are generally larger than those in population B?

Mann-Whitney/Wilcoxon Rank Sum Test(Test Procedure)

• Rank all N = nA + nB observations in the combined sample from both populations in ascending order.

• Sum the ranks of the observations from populations A and B separately and denote the sums wA and wB. Assign average rank to tied observations.

• For HA: A < B reject Ho if wA is “small” or wB is “big”.For HA: A > B reject Ho if wA is “big” or wB is “small”.

• Use tables to determine how “big” or “small” the rank sums must be in order to reject Ho or use software to conduct the test.

Mann-Whitney/Wilcoxon Rank Sum Test(Critical Value Table)

This table contains the value the smaller rank sum must be less than in order to reject the Ho for a one-tailed test situation for two significance levels (a = .05 & .01)

Tables exist for the two-tailed tests as well.

n is the sample size of the group with the smaller rank sum.

Example: Huntington’s Disease and Fasting Glucose Levels

Davidson et al. studied the responses to oral glucose in patients with Huntington’s disease and in a group of control subjects. The five-hour responses are shown below. Is there evidence to suggest the five-hour glucose (mg present) is greater for patients with Huntington’s disease?

Ho: Control = Huntington’s i.e. C = H

HA: Control < Huntington’s i.e. C < H

Example: Observations & Ranks

10.5

9

15

3

13

1.5

17

1.5

16

5.5

5.5

19

7

21

8

20

18

10.5

4

13

13

wA = 78

wB = 153

Example: Critical Value Table

Here,

nC = 10 (control)

nH= 11 (Huntington’s)

we will reject

Ho: C = H

in favor of

HA: C < H

if the rank sum for the control group is less than 86 at a = .05 level and less than 77 at a = .01 level.

Using the Wilcoxon Rank Sum Test we have evidence to suggest that the five hour glucose level for individuals with Huntington’s disease is greater than that for healthy controls (p < .05).

Note: p < .05 because the observed rank sum for the control group is less than 86 which is the critical value for a = .05.

The p-values reported based upon large sample approximations which generally should not be used when sample sizes are small. Here the conclusion reached is the same but in general we should use tables if they are available.

• Sign Test

• Wilcoxon Signed-Rank Test

• The sign test can be used in place of the paired t-test when we have evidence that the paired differences are NOT normally distributed.

• It can be used when the response is ordinal.

• Best used when the response is difficult to quantify and only improvement can be measured, i.e. subject got better, got worse, or no change.

• Magnitude of the paired difference is lost when using this test.

• The sign test looks at the number of (+) and (-) differences amongst the nonzero paired differences.

• A preponderance of +’s or –’s can indicate that some type of change has occurred.

• If the null hypothesis of no change is true we expect +’s and –’s to be equally likely to occur, i.e. P(+) = P(-) = .50 and the number of each observed follows a binomial distribution.

• A study evaluated hepatic arterial infusion of floxuridine and cisplatin for the treatment of liver metastases of colorectral cancer.

• Performance scores for 29 patients was recorded before and after infusion. Is there evidence that patients had a better performance score after infusion?

• Ho: No change in performance score following infusion, or more specifically median change in performance score is 0.

• HA: Performance scores improve following infusion, or more specifically median

change in performance score > 0.

Intuitively we will reject Ho if there is a “large” number of +’s.

Example: Sign Test

-

+

+

-

+

-

+

+

+

+

+

-

+

-

-

+

+

• If Ho is true, X = the number of +’s has a binomial dist. with n = 17 and p = P(+) = .50.

• Therefore the p-value is simply the

P(X > 11|n=17, p = .50)=.166 > a

• We fail to reject Ho, there is insufficient evidence to conclude the performance score improves following infusion (p = .166).

• The problem with the sign test is that the magnitude or size of the paired differences is lost.

• The Wilcoxon Signed-Rank Test uses ranks of the paired differences to retain some sense of their size.

• Use when the distribution of the paired differences are NOT normal or when sample size is small.

• Can be used with an ordinal response.

Wilcoxon Signed Rank Test(Test Procedure)

• Exclude any differences which are zero.

• Put the rest of differences in ascending order ignoring their signs.

• Assign them ranks.

• If any differences are equal, average their ranks.

Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis

• A researcher believes that patients with cystic fibrosis (CF) expend greater energy during resting than those without CF. To obtain a fair comparison she matches 13 patients with CF to 13 patients without CF on the basis of age, sex, height, and weight.

6

3

-2

1

13

-5

9

11

4

12

7

8

10

We then calculate the sum of the positive ranks ( T+ ) and the sum of the negative ranks (T- ).

Here we have

T+ = 6 + 3 + 1 + 13 + 9 + 11 + 4 + 12 + 7 + 8 + 10 = 84and

T-= 2 + 5 = 7

Wilcoxon Signed Rank Test(Test Statistic)

• Intuitively we will reject the Ho ,which states that there is no difference between the populations, if either one of these rank sums is “large” and the other is “small”.

• The Wilcoxon Signed Rank Test uses the smaller rank sum, T = min( T+ ,T- ) , as the test statistic.

For the cystic fibrosis example we have the following hypotheses:

Ho: there is no difference in the resting energy expenditure of individuals with CF and healthy controls who are the same gender, age, height, and weight.

HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight.

MEDIAN PAIRED DIFFERENCE = 0

MEDIAN PAIRED DIFFERENCE > 0

HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight.

• The alternative is clearly supported if T+ is “large” or T- is “small”.

• The test statistic T = min( T+ , T- ) = 7

• Is T = 7 considered small, i.e. what is the corresponding p-value?

• To answer this question we need a Wilcoxon Signed Rank Test table or statistical software.

This table gives the value of T = min( T+ , T- ) that our observed value must be less than in order to reject Ho for the both two- and one-tailed tests.

Here we have n = 13 & T = 7. We can see that our test statistic is less than 21 (a = .05) and 12 (a = .01) so we will reject Ho and we also estimate that our p-value < .01.

• We conclude that individuals with cystic fibrosis (CF) have a large resting energy expenditure when compared to healthy individuals who are the same gender, age, height, and weight (p < .01).

Select Test Mean from Difference pull-down menu, 0 for null value, and check Wilcoxon option.

Analysis in JMP

The test statistic is reported as

(T+ - T-)/2 = (84 – 7)/2 = 38.50 but we only need p-value = .0023.

Click on CF first and then Healthy to specify that the paired difference will be defined as CF – Healthy & specify which tests to conduct. Note: the Difference column is not actually used in the SPSS analysis.

• If we have three or more populations to compare we use…

Kruskal – Wallis Test

• One-way ANOVA for a completely randomized design is based on the assumption of normality and equality of variance.

• The nonparametric alternative not relying on these assumptions is called the Kruskal-Wallis Test.

• Like the Mann-Whitney/Wilcoxon Rank Sum Test we use the sum of the ranks assigned to each group when considering the combined sample as the basis for our test statistic.

Basic Idea:

1) Looking at all observations together, rank them.

2) Let R1, R2, …,Rk be the sum of the ranks of each group

3) If some Ri’s are much larger than others, it indicates the response values in different groups come from different populations.

• The test statistic is

where,

N= total sample size = n1 + n2 + ... + nk

• The test statistic is

• Under the null hypothesis, this has an approximate chi-square distribution with df = k -1, i.e. .

• The approximation is OK when each group contains at least 5 observations.

• N= total sample size = n1 + n2 + ... + nk

Area = p-value

A clinical trial evaluating the fever reducing effects of aspirin, ibuprofen, and acetaminophen was conducted. Study subjects were adults seen in an ER with diagnoses of flu with body temperatures between 100o F and 100.9o F. Subjects were randomly assigned to treatment. Changes in body temperature were recorded 2 hrs. after administration of treatments.

Resulting Data: Temperature Decrease (deg. F)

5

4

8

6

9

14

11

12

3

15

10

2

13

7

1

N = 15R1 = 44 R2 = 50 R3 = 26 n1 = 4 n2 = 5 n3 = 6

N = 15R1 = 44 R2 = 50 R3 = 26 n1 = 4 n2 = 5 n3 = 6

Area = .033

Kruskal-Wallis in JMP (Demo)

Analyze > Fit Y by X

RESULTS

R1 = 44 n1 = 4

R2 = 50 n2 = 5

R3 = 26 n3 = 6

H = 6.833 df = 2

p = .033

• Using the Kruskal-Wallis test have evidence to suggest that the temperature changes after taking the different drugs are not the same (p = .033).

• Now we might like to know which drugs significantly differ from one another.

Multiple Comparisons forKruskal – Wallis Test

• If we decide at least two populations differ in term of what is typical of their values we can use multiple comparisons to determine which populations differ.

• To do this we calculate an approximate p-value for each pair-wise comparison and then compare that p-value to a Bonferroni corrected significance level (a).

Multiple Comparisons forKruskal – Wallis Test

To determine if group i significantly differs from group j we compute

.

and then compute p-value = and compare to a/2m where mis the number of possible pair-wise comparisons, m =

Multiple Comparisons forKruskal – Wallis Test

• Comparing Aspirin to Acetominophen

N = 15 Aspirin Acetominophen

R1 = 44 R3 = 26 n1 = 4 n3 = 6

Computing the Bonferroni corrected significance level we have .05/2(3) = .00833

Multiple Comparisons forKruskal – Wallis Test

As this is not significant no others will either, so how can this be?

The problem is the Bonferroni correction is too conservative and the approximate normality of the multiple comparison is valid only when sample sizes are “large” and the sample sizes here quite small.

Thus the comparison shown is fine for a demonstration of the procedure but the results cannot be trusted.