# STT430/530: Nonparametric Statistics - PowerPoint PPT Presentation

1 / 44

STT430/530: Nonparametric Statistics. Chapter 7: Basic Tests for Three or More Samples Dr. Cuixian Chen. Ch7 Basic test for three or more samples. Example 7.1. Given k independent samples from normal distributions all with the same (but usually unknown) variance and means μ1, μ2, . . . , μk.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

STT430/530: Nonparametric Statistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## STT430/530:Nonparametric Statistics

Chapter 7: Basic Tests for Three or More Samples

Dr. Cuixian Chen

### Ch7 Basic test for three or more samples.

Example 7.1

Given k independent samples from normal distributions all with the same (but usually unknown) variance and means μ1, μ2, . . . , μk.

The basic overall significance test is that of

H0: μ1 = μ2 = . . . = μk

v.s.

H1: not all μi are equal.

A link for illustration of ANOVA

Ch7 Basic test for three or more samples.

Assumptions of ANOVA:

• Independence of cases – this is an assumption of the model that simplifies the statistical analysis.

• Normality – the distributions of the residuals are normal.

• Equality (or "homogeneity") of variances, called homoscedasticity

Test statistics of ANOVA :

Drawback of ANOVA : the heavy assumptions!

How do we drop this assumption and use the same idea?

What we have done in the past?

### Ch7 BTTMS The Kruskal–Wallis test for three or more samples

For Asymptotic p-value

H0: μ1 = μ2 = . . . = μk v.s. H1: not all μi are equal.

### Ch7 BTTMS-- Kruskal–Wallis test

Similar to WMW test, first find the sum of overall ranks for each groups:

H0: μ1 = μ2 = . . . = μk v.s. H1: not all μi are equal.

### Ch7 BTTMS The Kruskal–Wallis test for three or more samples

• Description in R:

• Density, distribution function, quantile function and random generation for the chi-squared (chi^2) distribution with df degrees of freedom and optional non-centrality parameter ncp.

• Usage

dchisq(x, df, ncp = 0, log = FALSE)

pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

rchisq(n, df, ncp = 0)

P-value=1-pchisq(6.745, 2).

x <- c(139, 145, 171)

y <- c(151, 163, 188, 197)

z <- c(199, 250, 360)

boxplot(x,y,z, col=rainbow(3));

kruskal.test(list(x, y, z));

### Ch7 BTTMS The Kruskal–Wallis test for three or more samples

## Example 7.1 ##

## More details about calculating Kruskal-Wallis Test

x <- c(139, 145, 171)

y <- c(151, 163, 188, 197)

z <- c(199, 250, 360)

nx=length(x);

ny=length(y);

nz=length(z);

N=nx+ny+nz;

xr<-rank(c(x,y,z))[1:nx]

yr<-rank(c(x,y,z))[(nx+1):(nx+ny)]

zr<-rank(c(x,y,z))[(nx+ny+1):N]

sk<-(sum(xr))^2/nx+(sum(yr))^2/ny+(sum(zr))^2/nz

T<- sk*12/(N*(N+1))-3*(N+1)

1-pchisq(T,2)

### Ch7 BTTMS-- Kruskal–Wallis test

For N moderate or large, T follows chisq distribution with k-1 degree of freedom.

The rational of using T:

• If all the samples are from the same population we expect a mixture of small, medium and high ranks in each sample

• under the alternative hypothesis high (or low) ranks may dominate in one or more samples, which leads to larger values of T.

Toy example: when the ranks of three groups of data are :

G1: 1,2; G2: 3,4; G3: 5,6; what is T=4.5713?

Toy example 2: when the ranks of the three groups are:

G1: 1,6; G2: 3,4; G3: 2,5; what is T=0?

### Ch7 BTTMS-- Kruskal–Wallis test

54

H0: μ1 = μ2 = . . . = μk v.s. H1: not all μi are equal.

### Ch7 BTTMS in R

## More Example #1 in R

x <- c(2.9, 3.0, 2.5, 2.6, 3.2) # normal subjects

y <- c(3.8, 2.7, 4.0, 2.4) # with obstructive airway disease

z <- c(2.8, 3.4, 3.7, 2.2, 2.0) # with asbestosis

boxplot(x,y,z, col=rainbow(3));

kruskal.test(list(x, y, z));

#### detailed calculation of the p-value ####

xr<-rank(c(x,y,z))[1:5]

yr<-rank(c(x,y,z))[6:9]

zr<-rank(c(x,y,z))[10:14]

sk<-(sum(xr))^2/5+(sum(yr))^2/4+(sum(zr))^2/5

1-pchisq(sk*12/14/15-45,2)

## More Example #2 in R

xa<-c(39, 45, 71)

xb<-c( 51, 63, 88, 97)

xc<-c( 99 ,150, 260)

boxplot(xa,xb,xc, col=rainbow(3));

kruskal.test(list(xa, xb, xc));

### Ch7 BTTMS The Kruskal–Wallis test with ties

Example 7.2

Use Mid ranks!

H0: μ1 = μ2 = . . . = μk v.s. H1: not all μi are equal.

### Ch7 BTTMS The Kruskal–Wallis test with ties

x<- c(13, 27, 26, 22, 26)

y<- c( 43, 35, 47, 32 ,31 ,37 )

z<- c(33, 37, 33, 26, 44, 33, 54 )

boxplot(x,y,z, col=rainbow(3));

kruskal.test(list(x, y, z));

## More details about calculations ##

x<- c(13, 27, 26, 22, 26)

y<- c( 43, 35, 47, 32 ,31 ,37 )

z<- c(33, 37, 33, 26, 44, 33, 54 )

boxplot(x,y,z, col=rainbow(3));

kruskal.test(list(x, y, z));

## Details of calculations ##

nx=length(x);

ny=length(y);

nz=length(z);

N=nx+ny+nz;

xr<-rank(c(x,y,z))[1:nx]

yr<-rank(c(x,y,z))[(nx+1):(nx+ny)]

zr<-rank(c(x,y,z))[(nx+ny+1):N]

cc=N*(N+1)^2/4;

sk<-(sum(xr))^2/nx+(sum(yr))^2/ny+(sum(zr))^2/nz

sr=sum(c(xr,yr,zr)^2);

T=(N-1)*(sk-cc)/(sr-cc)

1-pchisq(T,2)

## More Example 7.2

N=18;

Sr=2104.5

Sk=1882.73;

T=9.146

### Ch7 BTTMS The Kruskal–Wallis test with ties

Use Mid ranks!

H0: μ1 = μ2 = . . . = μk v.s. H1: not all μi are equal.

### Ch7 BTTMS The Jonckheere—Terpstra test

If there are differences, is there a monotone pattern among the group averages?

we may want to test hypotheses about means or medians, θi, of the form

H0: all θi are equal

vs.

H1: θ1 ≤ θ2≤ θ3 ≤ . . . ≤ θk orH1: θ1≥ θ2 ≥ θ3≥. . . ≥ θk

Example 7.3

### Ch7 BTTMS The Jonckheere—Terpstra test

Example 7.3

For testing H0: all θi are equal vs. H1: θ1 ≤ θ2≤ θ3 ≤ . . . ≤ θk

Calculate U and Uij, for the pair of the ith and jth samples with i<j, where

For example, U12 is the sum of the number of sample 2 values that exceeds each sample 1 value.

### Ch7 BTTMS The Jonckheere—Terpstra test

Why U?

When there is no pattern, what we would expect for U?

Example 7.3

### Ch7 BTTMS The Jonckheere—Terpstra test

Extra Example 1 for Jonckheere—Terpstra test

Use the Jonckheere—Terpstra test to assess the evidence for a tendency for house princes to increase as to Village A, B and C.

### Review: Ch6 ---Median Test, also called Fisher Exact Test

For two samples:

Overall Median M=1.1 ml/min

In R: to find P*=Pr(X=1|X+Y=14)=choose(7,1)*choose(21,13)/choose(28,14).

It looks like a Hyper-geometric prob…

### Ch7 BTTMS The median test for several samples

H0: μ1 = μ2 = . . . = μk

v.s.

H1: not all μi are equal.

For several samples:

Let M=overall median for all observations.

Note: ai+bi=ni

Assuming sample values that equal to M have already been dropped.

Now we have k*2 contingence table, with k rows and 2 columns.

### Ch7 BTTMS The median test for several samples

Example 7.4

Overall Median M=19.5 minutes

H0: μ1 = μ2 = . . . = μk

v.s.

H1: not all μi are equal.

## Example 7.4 ##

P*=choose(4,4)*choose(7,2)*choose(5,2)*choose(4,3)*choose(2,2)*choose(6,1)/choose(28,14)

=0.0001256338

### Ch7 BTTMS The median test for several samples

Unlike Median test for 2*2 table, for k*2 table, it requires a computer program or some suitable approximation to get exact p-value.

Be smart!!!!!

Draw a table with expected values, and observed values above/below M.

Or say,

### Ch7 BTTMS The median test for several samples

Or say,

Example 7.5: write out test statistic T.

## Example 7.5 ##

x=c(4, 2, 2, 3, 2, 1)

y=c(0, 5, 3, 1, 0, 5)

E=(x+y)/2

T=sum(c((x-E)^2/E, (y-E)^2/E))

1-pchisq(T, 5)

### Ch7 BTTMS The median test for several samples

Extra Example 1 for Median test:

Median test

Be smart!!!!!

Draw a table with expected values, and observed values above/below M.

Or say,

### Ch7 BTTMS The median test for several samples

Extra Example 2 for Median test:

Be smart!!!!!

Draw a table with expected values, and observed values above/below M.

Or say,

### Ch7.3: nonparametric random block experiments analysis

The type of problems that we have considered so far is called one way analysis of variance, in which we have one variable that associates with the outcome.

Next, we are considering: The Randomized Block Design (RBD)

• Divide the group of experimental units into n homogeneous groups of size t.

• These homogeneous groups are called blocks.

• The treatments are then randomly assigned to the experimental units in each block - one treatment to a unit in each block.

Survival

Men

Women

### Ch7.3: nonparametric random block experiments analysis

Extra Example for random block design (RBD):

The following experiment is interested in comparing the effect four different chemicals (A, B, C and D) in producing water resistance (y) in textiles.

• A strip of material, randomly selected from each bolt, is cut into four pieces (samples). The pieces are randomly assigned to receive one of the four chemical treatments.

• This process is replicated three times producing a Randomized Block Design (RBD) .

• Moisture resistance (y) were measured for each of the samples. (Low readings indicate low moisture penetration).

The data is given in the diagram and table on the next slide.

### Ch7 BTTMS nonparametric random block experiments analysis

n=3 blocks, and

t=4 treatments.

Diagram: Blocks (Bolt Samples)

Blocks (Bolt Samples)

ChemicalBlock1Block2Block3

Treat A10.112.211.9

Treat B11.412.912.7

Treat C9.912.311.4

Treat D12.113.412.9

### Ch7 BTTMS nonparametric random block experiments analysis

Comparison of randomized block design and one way ANOVA

### Ch7 BTTMS RBD, Friedman Test without ties

b=3 blocks, and

t=4 treatments.

• Suppose that in a study in each block t subjects are treated

• Let xij denote the measurement of subject receiving the i-th treatment in the j-th block, where i=1..t, and j=1,..,b.

• Let rij denote the rank of the i-th treatment within the j-th block.

H0: μ1 = μ2 = . . . = μt

v.s.

H1: not all μi are equal.

Q: what kind of T should we reject H0?

### Ch7 BTTMS RBD, Friedman Test without ties

Q: why rank within each block?

H0: μ1 = μ2 = . . . = μt

v.s.

H1: not all μi are equal.

chem.a<-c(10.1, 12.2, 11.9);

chem.b<-c(11.4,12.9,12.7);

chem.c<-c(9.9,12.3,11.4);

chem.d<-c(12.1,13.4,12.9);

chem<-cbind(chem.a,chem.b,chem.c,chem.d);

friedman.test(chem)

### Ch7 BTTMS RBD, Friedman Test without ties

Example 7.6

• Let rij denote the rank of the i-th treatment within the j-th block.

b=? blocks, and

t=? treatments.

### Ch7 BTTMS RBD, Friedman Test without ties

## R codes for Example 7.6

r=matrix(c(1,3,2,2,3,1,1,3,2,1,3,2,1,3,2,2,3,1,2,3,1),ncol=3,byrow=TRUE)

z=colSums(r)

b=7

t=3

T=12*sum(z^2)/(b*t*(t+1))-3*b*(t+1)

1-pchisq(T,(t-1))

### Ch 7 BTTMS RBD, Friedman Test with ties (use mid-rank)

Example 7.7

• Let rij denote the rank of the i-th treatment within the j-th block.

### Ch7 BTTMS RBD, Friedman Test with ties

b=? blocks, and

t=? treatments.

KW: 7.6, 7.13

JT:7.6, 7.15

Median:7.6

Friedman: 7.12, 7.7(ties)

### Ch7 BTTMS RBD, Friedman Test with ties

b=? blocks, and

t=? treatments.

Control<-c(60, 62, 61, 60)

Gibberellic<-c(65, 65, 68, 65)

Kinetin<-c(63, 61, 61, 60)

Indole<-c(64, 67, 63, 61)

Maelic<-c(61, 62, 62, 65)

friedman.test(flower)

KW: 7.6, 7.13

JT:7.6, 7.15

Median:7.6

Friedman: 7.12, 7.7(ties)

### Review STT215: Chap3.1 Design Of Experiments(Outline of a randomized designs)

Completely randomized experimental designs: Individuals are randomly assigned to groups, then the groups are randomly assigned to treatments.

### Review STT215: Example 3.13, page 179

What are the effects of repeated exposure to an advertising message (digital camera)? The answer may depend on the length of the ad and on how often it is repeated. Outline the design of this experiment with the following information.

• Two Factors: length of the commercial (30 seconds and 90 seconds – 2 levels) and repeat times (1, 3, or 5 times – 3 levels)

• Response variables: their recall of the ad, their attitude toward the camera, and their intention to purchase it. (see page 187 for the diagram.)

HWQ: 3.18, 3.30(b),3.32

### Review STT215: 3.1 Design Of Experiments (Block designs)

In a block,orstratified, design, subjects are divided into groups, or blocks, prior to experiments to test hypotheses about differences between the groups. The blocking, or stratification, here is by gender (blocking factor).

This example gives Randomized Block Design (RBD)

EX3.19

Ex: 3.17 (p182), 3.18

HWQ: 3.47(a,b), 3.126.

The most closely

matched pair

studies use

identical twins.

### Review STT215: 3.1 Design Of Experiments (Matched pairs designs)

Matched pairs: Choose pairs of subjects that are closely matched—e.g., same sex, height, weight, age, and race. Within each pair, randomly assign who will receive which treatment.

It is also possible to just use a single person, and give the two treatments to this person over time in random order. In this case, the “matched pair” is just the same person at different points in time.

HWQ 3.120

### Basic One Way ANOVA Concepts

Within- vs. Between-Group Variation

Suppose 12 recent college graduates are assigned to three groups: 4 subjects to an exercise group (I), 4 subjects to a drug treatment group (II), and 4 subjects to a control group (III). Ages, pulse rates, diastolic blood pressures, and triglyceride measurements are taken after 8 weeks in the study, with the following results:

### Basic One Way ANOVA Concepts

Within- vs. Between-Group Variation

Suppose 12 recent college graduates are assigned to three groups: 4 subjects to an exercise group (I), 4 subjects to a drug treatment group (II), and 4 subjects to a control group (III). Ages, pulse rates, diastolic blood pressures, and triglyceride measurements are taken after 8 weeks in the study, with the following results:

### Basic One Way ANOVA Concepts

Within- vs. Between-Group Variation

For the triglyceride results, there might not be a real difference that can be attributed to the groups, but you need an analytic method to determine this.

The ANOVA methods are used for exactly this purpose, i.e., to analyze the variability among groups relative to the variability within groups to determine if differences among groups are meaningful or significant.

An ANOVA is conducted using F-tests that are constructed from the ratio of

between-group to within-group variance estimates.

Under the hypothesis of nogroup effect, the variation among groups is just another measure of patient-to-patient variability, so their ratio should be about 1.

Assumptions for these F-tests usually entail independent samples from normally distributed populations with equal variances.

### Basic One Way ANOVA Concepts

Within- vs. Between-Group Variation

The one-way ANOVA has one main effect or grouping factor with two or more

levels. In analyzing clinical trials, the main effect will often be a treatment effect.

The levels of the factor Treatment might be ‘low dose’, ‘middle dose’, ‘high dose’,

and ‘placebo’.

The two-way ANOVA has two main effects, usually a grouping or treatment factor and a blocking factor (such as Gender, Study Center, Diagnosis Group, etc.).

The two-way ANOVA is one of the most commonly used analyses for multi-center clinical studies, usually with Treatment (or Dose group) and Study Center as the two main effects.

In most types of ANOVA used in clinical trials, the primary question the researcher

wants to answer is whether there are any differences among the group population means based on the sample data.

The null hypothesis to be tested is ‘there is no Group effect’ or, equivalently, ‘the mean responses are the same for all groups’. The alternative hypothesis is that ‘the Group effect is important’ or, equivalently, ‘the Group means differ for at least one pair of groups’.