nonparametric statistical methods n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Nonparametric Statistical Methods PowerPoint Presentation
Download Presentation
Nonparametric Statistical Methods

Loading in 2 Seconds...

play fullscreen
1 / 110

Nonparametric Statistical Methods - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Nonparametric Statistical Methods. Presented by Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang. Introduction. Definition.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nonparametric Statistical Methods' - reina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
nonparametric statistical methods
Nonparametric Statistical Methods

Presented by

Guo Cheng, Ning Liu , Faiza Khan, Zhenyu Zhang, Du Huang, Christopher Porcaro, Hongtao Zhao, Wei Huang

definition
Definition

Nonparametric methods 1: rank-based methods are used when we have no idea about the population distribution from which the data is sampled.

Used for small sample sizes.

Used when the data are measured on an ordinal scale and only their ranks are meaningful.

outline
Outline
  • 1. Sign Test
  • 2. Wilcoxon Signed Rank Test
  • 3. Inferences for Two Independent Samples
  • 4. Inferences for Several Independent Samples
  • 5. Friedman Test
  • 6. Spearman’s Rank Correlation
  • 7. Kendall’s Rank Correlation Coefficient
parameter of interest median
Parameter of interest: Median

Median is used as a parameter because it is a better measure of data as compared to the mean for skewed distributions.

hypothesis test
Hypothesis test

H0: µ = µ0 vs Ha: µ > µ0 where µ0 is a specified value and µ is unknown median

testing procedure
Testing Procedure
  • Step 1: Given a random sample x1, x2, …, xn from a population with unknown median µ, count the number of xi’s that exceed µ0.
    • Denote them by s+.
    • s-= n - s+
  • Step 2: Reject H0 if s+ is large or s- is small.
how to reject h 0
How to reject H0?
  • To determine how large s+ must be in order to reject H0, we need to find out the distribution of the corresponding random variable S+.
  • Xi: random variable corresponding to the observed values xi
  • S-: random variable corresponding to s-
sas code
SAS code

DATA themostat;

INPUT temp;

datalines;

202.2

203.4

;

PROCUNIVARIATEDATA=themostat loccountmu0=200;

VAR temp;

RUN;

sas output
SAS Output

Basic Statistical Measures

Location Variability

Mean 201.7700 Std Deviation 2.41019

Median 201.7500 Variance 5.80900

Mode . Range 8.30000

Interquartile Range 2.90000

Tests for Location: Mu0=200

Test -Statistic- -----p Value------

Student's t t 2.322323 Pr > |t| 0.0453

Sign M 3 Pr >= |M| 0.1094

Signed Rank S 19.5 Pr >= |S| 0.048

inventor
Inventor

Frank Wilcoxon (2 September 1892

in County Cork, Ireland – 18 November

1965, Tallahassee, Florida, USA) was

a chemist and statistician, known for

development of several statistical tests.

what is it used for
What is it used for?
  • Two related samples
  • Matched samples
  • Repeated measurements on a single sample
sas codes
SAS codes

DATA thermo;

INPUT temp;

datalines;

202.2

203.4

;

PROCUNIVARIATEDATA=thermo loccountmu0=200;

TITLE"Wilcoxon signed rank test the thermostat";

VAR temp;

RUN;

sas outputs selected results

8

SAS outputs (selected results)

Basic Statistical Measures

Location Variability

Mean 201.7700 Std Deviation 2.41019

Median 201.7500 Variance 5.80900

Mode . Range 8.30000

Interquartile Range 2.90000

Tests for Location: Mu0=200

Test -Statistic- -----p Value------

Student's t t 2.322323 Pr > |t| 0.0453

Sign M 3 Pr >= |M| 0.1094

Signed Rank S 19.5 Pr >= |S| 0.048

example2
Example
  • To test if the grades of two classes which have the same teacher are the same, we randomly pick 7 students from Class A and 9 from Class B, their scores are as follows
  • A: 8.50 9.48 8.65 8.16 8.83 7.76 8.63
  • B: 8.27 8.20 8.25 8.14 9.00 8.10 7.20 8.32 7.70
sas code1
SAS code

Data exam;

Input group $ score @@;

Datalines;

A 8.50 A 9.48 A 8.65 A 8.16 A 8.83 A 7.76 A 8.63

B 8.27 B 8.20 B 8.25 B 8.14 B 9.00 B 8.10 B 7.20

B 8.32 B 7.70

;

sas code2
SAS code

Proc npar1way data=exam wilcoxon;

Var score;

Class group;

Exact wilcoxon;

Run;

introduction1
Introduction
  • We know that if our data is normally distributed and that the population standard deviations are equal, we can test for a difference among several populations by using the One-way ANOVA F test.
when to use kruskal wallis test
When to use Kruskal-Wallis test?
  • But what happens when our data is not normal?
    • This is when we use the nonparametric Kruskal-Wallis test to compare more than two populations as long as our data comes from a continuous distribution.
    • The notion of the kw rank test is to rank all the data from each group together and then apply one-way ANOVA to the ranks rather than to the original data.
kruskal wallis test kw test
Kruskal-Wallis Test (kw Test)
  • A non-parametric method for testing whether samples originate from the same distribution.
  • Used for comparing more than two samples that are independent.
kruskal wallis test history
Kruskal-Wallis Test: History
  • William Henry Kruskal
    • October 10th, 1919 – April 21st, 2005
    • Obtained Bachelors and Masters degree in Mathematics at Harvard University and received his Ph. D. from Columbia University in 1955.
  • Wilson Allen Wallis
    • November 5th,1912 – October 12th, 1998
    • Undergraduate work at the University of Minnesota and Graduate work at the University of Chicago in 1933.
kruskal wallis test steps
Kruskal-Wallis Test: Steps

1. Create Hypothesis:

Null Hypothesis (Ho): The samples from populations are identical

Alternative Hypothesis (Ha): At least one sample is different

kruskal wallis test steps1
Kruskal-Wallis Test: Steps

2. Rank all the data. The lowest number gets the lowest rank and so on. Tied data gets the average of the ranks they would have obtained if they weren’t tied.

3. All the ranks of the different samples are added together. Label these sums L1, L2, L3, and L4.

kruskal wallis test steps2
Kruskal-Wallis Test: Steps

4. Find Test Statistic:

n = total number of observations in all samples

Li = total rank of each sample

kw = test statistic

5. Reject Ho if H is greater than the chi-square table value.

kruskal wallis test example
Kruskal-Wallis Test: Example
  • An experiment was done to compare four different ways of teaching a concept to a class of students. In this experiment, 28 tenth grade classes were randomly assigned to the four methods (7 classes per method). A 45 question test was given to each class. The average test scores of the classes are given in the following table. Apply the Kruskal-Wallis test to the test scores data set.
kruskal wallis test example1
Kruskal-Wallis Test: Example

Given Data

Ranks

of Data values

sas input
SAS Input
  • Equation 24.92
  • Equation 24.92
  • Equation 28.68
  • Equation 23.32
  • Equaiton 32.85
  • Equation 33.90
  • Equation 23.42
  • Unitary 33.16
  • Unitary 26.93
  • Unitary 30.43
  • Unitary 36.43
  • Unitary 37.04
  • Unitary 29.76
  • Unitary 33.88
  • ;
  • proc npar1way data=test wilcoxon;
  • class methodname;
  • var scores;
  • run;
  • data test;
  • input methodname $ scores;
  • cards;
  • case 14.59
  • case 23.44
  • case 25.43
  • case 18.15
  • Case 20.82
  • Case 14.06
  • Case 14.26
  • Formula 20.27
  • Formula 26.84
  • Formula 14.71
  • Formula 22.34
  • Formula 19.49
  • Formula 24.92
  • Formula 20.20
  • Equation 27.82
sas output1
SAS Output

Wilcoxon Scores (Rank Sums) for Variable scores

Classified by Variable methodname

Sum of Expected Std Dev Mean

methodname N Scores Under H0 Under H0 Score

case 7 49.00 101.50 18.845498 7.000000

formula 7 66.50 101.50 18.845498 9.500000

equation 7 125.50 101.50 18.845498 17.928571

unitary 7 165.00 101.50 18.845498 23.571429

Average scores were used for ties.

Kruskal-Wallis Test

Chi-Square 18.1390

DF 3

Pr > Chi-Square 0.0004

introduction2
Introduction
  • A distribution-free rank-based test for comparing the treatments is known as the Friedman test, named after the Nobel Laureate economist Milton Friedman who proposed it.
  • The Friedman Test is a version of the repeated-Measures ANOVA that can be performed on ordinal(ranked) data.
example6
Example

Now we have 8 treatments separated in 3 blocks,

α = 0.025

define null and alternative hypothesis
Define Null and Alternative Hypothesis
  • H0: There is no difference between 8 treatments
  • Ha: There exists difference between 8 treatments
introduction3
Introduction
  • From Pearson to Spearman
  • Spearman’s Rank Correlation Coefficient
  • Large-Sample Approximation
  • Hypothesis Test
  • Examples
from pearson to spearman
From Pearson to Spearman
  • Pearson’s
    • Measure only the degree of linear association
    • Based on the assumption of bivariate normally of two variables
  • Spearman’s
    • Take in account only the ranks
    • Measure the degree of monotone association
    • Inferences on the rank correlation coefficients are distribution-free
from pearson to spearman2
From Pearson to Spearman

Charles Edward Spearman

  • As a psychologist

① General factor of intelligence

②the nature and causes of

variations in human

  • As a statistician

① Rank correlation

② two-way analysis

Charles Edward Spearman (10 Sept. 1863 – 17 Sept. 1945)

③Correlation coefficient

example7
Example

Table 5.1 Wine Consumption and Heart Disease Deaths

example9
Example

Table 5.2 Ranks of Wine Consumption and Heart Disease Deaths

kendall s tau
Kendall’s Tau
  • It is a coefficient use to measure the association between two pairs of ranked data.
  • Named after British statistician Maurice Kendall who developed it in 1938.
  • Ranges from -1.0 to 1.0
  • Tau-a (with no ties) and Tau-b (with ties)
example 1 kendall s tau a
Example 1 Kendall’s tau-a
  • Raw data for 11 students in 2 exams:
steps for calculating
Steps for calculating ṫ

1.Sort data x in ascending order, pair y ranks with x

2.Count c and d for each y

3.Sum C and D

4.Use formula to calculate ṫ

example 2 kendall s tau b
Example 2 Kendall’s tau-b

Wine Consumption and heart disease deaths data

sas code3
SAS Code

Data exams;

Input exam1 exam2;

Datalines;

85 85

98 95

;

Run;

Proc corr data=exams kendall;

Var exam1 exam2;

Run;

summary
Summary
  • Nonparametric tests are very useful when we don’t know anything about the distributions.
  • Especially when the distribution is not normal, we can’t use T-test, then we have to study the nonparametric methods.
  • Median is a better measurement of central tendency for non-normal population.
  • Sample can be ordinal and sample size is usually small.
summary1
Summary

In summary, we have briefly introduced some most common methods in our presentation including:

  • Sign test
  • Wilcoxon rank sum test and signed rank test
  • Kruskal-Wallis Test
  • Friedman Test
  • Spearman’s Rank Correlation
  • Kendall’s Rank Correlation Coefficient
slide110

The End.

Thank You !