- By
**reina** - Follow User

- 126 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Nonparametric Statistical Methods' - reina

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Nonparametric Statistical Methods

Presented by

Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang

Definition

Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.

Used for small sample sizes.

Used when the data are measured on an ordinal scale and only their ranks are meaningful.

Outline

- 1. Sign Test
- 2. Wilcoxon Signed Rank Test
- 3. Inferences for Two Independent Samples
- 4. Inferences for Several Independent Samples
- 5. Friedman Test
- 6. Spearman’s Rank Correlation
- 7. Kendall’s Rank Correlation Coefficient

Parameter of interest: Median

Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions.

Hypothesis test

H0: µ = µ0 vs Ha: µ > µ0 where µ0 is a specified value and µ is unknown median

Testing Procedure

- Step 1: Given a random sample x1, x2, …, xn from a population with unknown median µ, count the number of xi’s that exceed µ0.
- Denote them by s+.
- s-= n - s+
- Step 2: Reject H0 if s+ is large or s- is small.

How to reject H0?

- To determine how large s+ must be in order to reject H0, we need to find out the distribution of the corresponding random variable S+.
- Xi: random variable corresponding to the observed values xi
- S-: random variable corresponding to s-

SAS code

DATA themostat;

INPUT temp;

datalines;

202.2

203.4

…

;

PROCUNIVARIATEDATA=themostat loccountmu0=200;

VAR temp;

RUN;

SAS Output

Basic Statistical Measures

Location Variability

Mean 201.7700 Std Deviation 2.41019

Median 201.7500 Variance 5.80900

Mode . Range 8.30000

Interquartile Range 2.90000

Tests for Location: Mu0=200

Test -Statistic- -----p Value------

Student's t t 2.322323 Pr > |t| 0.0453

Sign M 3 Pr >= |M| 0.1094

Signed Rank S 19.5 Pr >= |S| 0.048

Inventor

Frank Wilcoxon (2 September 1892

in County Cork, Ireland – 18 November

1965, Tallahassee, Florida, USA) was

a chemist and statistician, known for

development of several statistical tests.

What is it used for?

- Two related samples
- Matched samples
- Repeated measurements on a single sample

SAS codes

DATA thermo;

INPUT temp;

datalines;

202.2

203.4

…

;

PROCUNIVARIATEDATA=thermo loccountmu0=200;

TITLE"Wilcoxon signed rank test the thermostat";

VAR temp;

RUN;

SAS outputs (selected results)

Basic Statistical Measures

Location Variability

Mean 201.7700 Std Deviation 2.41019

Median 201.7500 Variance 5.80900

Mode . Range 8.30000

Interquartile Range 2.90000

Tests for Location: Mu0=200

Test -Statistic- -----p Value------

Student's t t 2.322323 Pr > |t| 0.0453

Sign M 3 Pr >= |M| 0.1094

Signed Rank S 19.5 Pr >= |S| 0.048

Example

- To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows
- A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63
- B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70

SAS code

Data exam;

Input group $ score @@;

Datalines;

A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63

B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20

B 8.32 B 7.70

;

Introduction

- We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test.

When to use Kruskal-Wallis test?

- But what happens when our data is not normal?
- This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution.
- The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data.

Kruskal-Wallis Test (kw Test)

- A non-parametric method for testing whether samples originate from the same distribution.
- Used for comparing more than two samples that are independent.

Kruskal-Wallis Test: History

- William Henry Kruskal
- October 10th, 1919 – April 21st, 2005
- Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955.
- Wilson Allen Wallis
- November 5th,1912 – October 12th, 1998
- Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in 1933.

Kruskal-Wallis Test: Steps

1. Create Hypothesis:

Null Hypothesis (Ho): The samples from populations are identical

Alternative Hypothesis (Ha): At least one sample is different

Kruskal-Wallis Test: Steps

2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied.

3. All the ranks of the different samples are added together. Label these sums L1, L2, L3, and L4.

Kruskal-Wallis Test: Steps

4. Find Test Statistic:

n = total number of observations in all samples

Li = total rank of each sample

kw = test statistic

5. Reject Ho if H is greater than the chi-square table value.

Kruskal-Wallis Test: Example

- An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal-Wallis test to the test scores data set.

SAS Input

- Equation 24.92
- Equation 24.92
- Equation 28.68
- Equation 23.32
- Equaiton 32.85
- Equation 33.90
- Equation 23.42
- Unitary 33.16
- Unitary 26.93
- Unitary 30.43
- Unitary 36.43
- Unitary 37.04
- Unitary 29.76
- Unitary 33.88
- ;
- proc npar1way data=test wilcoxon;
- class methodname;
- var scores;
- run;

- data test;
- input methodname $ scores;
- cards;
- case 14.59
- case 23.44
- case 25.43
- case 18.15
- Case 20.82
- Case 14.06
- Case 14.26
- Formula 20.27
- Formula 26.84
- Formula 14.71
- Formula 22.34
- Formula 19.49
- Formula 24.92
- Formula 20.20
- Equation 27.82

SAS Output

Wilcoxon Scores (Rank Sums) for Variable scores

Classified by Variable methodname

Sum of Expected Std Dev Mean

methodname N Scores Under H0 Under H0 Score

case 7 49.00 101.50 18.845498 7.000000

formula 7 66.50 101.50 18.845498 9.500000

equation 7 125.50 101.50 18.845498 17.928571

unitary 7 165.00 101.50 18.845498 23.571429

Average scores were used for ties.

Kruskal-Wallis Test

Chi-Square 18.1390

DF 3

Pr > Chi-Square 0.0004

Introduction

- A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it.
- The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data.

Define Null and Alternative Hypothesis

- H0: There is no difference between 8 treatments
- Ha: There exists difference between 8 treatments

Introduction

- From Pearson to Spearman
- Spearman’s Rank Correlation Coefficient
- Large-Sample Approximation
- Hypothesis Test
- Examples

From Pearson to Spearman

- Pearson’s
- Measure only the degree of linear association
- Based on the assumption of bivariate normally of two variables
- Spearman’s
- Take in account only the ranks
- Measure the degree of monotone association
- Inferences on the rank correlation coefficients are distribution-free

From Pearson to Spearman

Charles Edward Spearman

- As a psychologist

① General factor of intelligence

②the nature and causes of

variations in human

- As a statistician

① Rank correlation

② two-way analysis

Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945)

③Correlation coefficient

Example

Table 5.1 Wine Consumption and Heart Disease Deaths

Example

Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths

Kendall’s Tau

- It is a coefficient use to measure the association between two pairs of ranked data.
- Named after British statistician Maurice Kendall who developed it in 1938.
- Ranges from -1.0 to 1.0
- Tau-a (with no ties) and Tau-b (with ties)

Example 1 Kendall’s tau-a

- Raw data for 11 students in 2 exams:

Steps for calculating ṫ

1.Sort data x in ascending order, pair y ranks with x

2.Count c and d for each y

3.Sum C and D

4.Use formula to calculate ṫ

Example 2 Kendall’s tau-b

Wine Consumption and heart disease deaths data

SAS Code

Data exams;

Input exam1 exam2;

Datalines;

85 85

98 95

…

;

Run;

Proc corr data=exams kendall;

Var exam1 exam2;

Run;

Summary

- Nonparametric tests are very useful when we don’t know anything about the distributions.
- Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods.
- Median is a better measurement of central tendency for non-normal population.
- Sample can be ordinal and sample size is usually small.

Summary

In summary, we have briefly introduced some most common methods in our presentation including:

- Sign test
- Wilcoxon rank sum test and signed rank test
- Kruskal-Wallis Test
- Friedman Test
- Spearman’s Rank Correlation
- Kendall’s Rank Correlation Coefficient

Thank You !

Download Presentation

Connecting to Server..