Nonparametric Statistical Methods

Nonparametric Statistical Methods Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang

Introduction

Definition Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled. Used for small sample sizes. Used when the data are measured on an ordinal scale and only their ranks are meaningful.

Outline • 1. Sign Test • 2. Wilcoxon Signed Rank Test • 3. Inferences for Two Independent Samples • 4. Inferences for Several Independent Samples • 5. Friedman Test • 6. Spearman’s Rank Correlation • 7. Kendall’s Rank Correlation Coefficient

1 .Sign Test

Parameter of interest: Median Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions.

Hypothesis test H0: µ = µ0 vs Ha: µ > µ0 where µ0 is a specified value and µ is unknown median

Testing Procedure • Step 1: Given a random sample x1, x2, …, xn from a population with unknown median µ, count the number of xi’s that exceed µ0. • Denote them by s+. • s-= n - s+ • Step 2: Reject H0 if s+ is large or s- is small.

How to reject H0? • To determine how large s+ must be in order to reject H0, we need to find out the distribution of the corresponding random variable S+. • Xi: random variable corresponding to the observed values xi • S-: random variable corresponding to s-

Distribution of S+ and S-

Calculating P-value

Rejection criteria

Large sample z-test

Confidence Interval

Example

SAS code DATA themostat; INPUT temp; datalines; 202.2 203.4 … ; PROCUNIVARIATEDATA=themostat loccountmu0=200; VAR temp; RUN;

SAS Output Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

2. Wilcoxon signed rank test

Inventor Frank Wilcoxon (2 September 1892 in County Cork, Ireland – 18 November 1965, Tallahassee, Florida, USA) was a chemist and statistician, known for development of several statistical tests.

What is it used for? • Two related samples • Matched samples • Repeated measurements on a single sample

Hypothesis

Testing procedure

Example

SAS codes DATA thermo; INPUT temp; datalines; 202.2 203.4 … ; PROCUNIVARIATEDATA=thermo loccountmu0=200; TITLE"Wilcoxon signed rank test the thermostat"; VAR temp; RUN;

8 SAS outputs (selected results) Basic Statistical Measures Location Variability Mean 201.7700 Std Deviation 2.41019 Median 201.7500 Variance 5.80900 Mode . Range 8.30000 Interquartile Range 2.90000 Tests for Location: Mu0=200 Test -Statistic- -----p Value------ Student's t t 2.322323 Pr > |t| 0.0453 Sign M 3 Pr >= |M| 0.1094 Signed Rank S 19.5 Pr >= |S| 0.048

Large sample approximation

Derive E(x) & Var(x)

Rejection region:

3. Inferences for Two Independent Samples

Hypothesis

Definition

Wilcoxon sum rank test

Mann-Whitney-U test

Between two tests

Advantages

For large samples

Treatment of ties

Example • To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows • A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63 • B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70

Example

SAS code Data exam; Input group $ score @@; Datalines; A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63 B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20 B 8.32 B 7.70 ;

SAS code Proc npar1way data=exam wilcoxon; Var score; Class group; Exact wilcoxon; Run;

Output

4. Inferences for Several Independent Samples

Introduction • We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test.

Nonparametric Statistical Methods