162 Views

Download Presentation
## Nonparametric Statistical Techniques

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Nonparametric StatisticalTechniques**Chapter 17**17.1 Introduction**• The statistical techniques introduced in this chapter deal with ordinal data. • We test to determine whether the population locations differ. • In testing the locations we will not refer to any parameter, thus the procedure’s name.**17.1 Introduction**• When comparing two populations the hypotheses generally are: H0: The population locations are the same H1: (i) The locations differ, or (ii) Population 1 is located to the right (left) of population 2 The random variable X1 is generally larger (smaller) than X2.**17.2 Wilcoxon Rank Sum Test**• The problem characteristics of this test are: • The problem objective is to compare two populations. • The data are either ordinal or interval (but not normal). • The samples are independent.**Wilcoxon Rank Sum Test – Example**• Example 17.1 • Based on the two samples shown below, can we infer at 5% significance level that the location of population 1 is to the left of the location of population 2? • Sample 1: 22, 23, 20; Sample 2: 18, 27, 26;The hypotheses are:H0: The two population locations are the same. H1: The location of population 1 is to the left of the location of population 2.**Sum of ranks = 41**Sum of ranks = 37 1 2 3 4 5 6 7 8 9 10 11 12 Graphical DemonstrationWhy use the sum of ranks to test locations? If the locations of the two populations are about the same, (the null hypothesis is true) we would expect the ranks to be evenly spread between the samples. In this case the sum of ranks for the two samples will be close to one another. Two hypothetical populations and their corresponding samples are presented, the GREEN population and the PURPLE population. Populations Let us rank the observations of the two samples together**Graphical DemonstrationWhy use the sum of ranks to test**locations? Allow the GREENpopulation to shift to the left of the PURPLEpopulation.**Sum of ranks = 41**Sum of ranks = 37 Sum of ranks = 40 Sum of ranks = 38 Sum of ranks = 33 Sum of ranks = 45 2 1 2 3 4 5 6 6 7 7 8 9 9 10 11 12 Attention Attention Attention Graphical DemonstrationWhy use the sum of ranks to test locations? The green sample is expected to shift to the left too. As a result, several observations exchange location. What happens to the sum of ranks? Click.**Sum of ranks = 41**Sum of ranks = 37 Sum of ranks = 40 Sum of ranks = 38 Sum of ranks = 33 Sum of ranks = 45 Graphical DemonstrationWhy use the sum of ranks to test locations? 1 3 4 5 8 10 11 12 9 2 6 7 The “green” sum decreases , and the “purple” sum increases. Changing the relative location of two populations affect the sum of ranks of the two samples combined.**Sample 1**22 23 20 Sample 2 18 27 26 3 4 2 1 6 5 Rank Rank 2. Calculate the sum of ranks: 9 2. Calculate the sum of ranks:12 Wilcoxon Rank Sum Test – Example • Example 17.1 – continued • Test statistic1. Rank all the six observations (1 for the smallest). 3. Let T = 9 be the test statistic (We arbitrarily define the test statistic as the rank sum of sample 1.**Wilcoxon Rank Sum Test – Rationale**• Example 17.1 - continued • If T is sufficiently small then most of the smaller observations are located in population 1. Reject the null hypothesis. • Question: How small is sufficiently small? • We need to look at the distribution of T.**The distribution of T under H0for two samples of size 3**This sample received the ranks 1, 2, 3 This sample received the ranks 3, 4, 5 .15 2,3,4 2,3,5 2,4,5 3,4,5 .10 1,3,4 1,3,5 1,4,5 2,3,6 2,4,6 3,4,6 .05 1,2,3 1,2,4 1,2,5 1,2,6 1,3,6 1,4,6 1,5,6 2,5,6 3,5,6 4,5,6 T 6 7 8 9 10 11 12 13 14 15 T is the rank sum of a sample of size 3. If H0 is true (the two populations have the same location), each ranking is equally likely, and each possible value of T has the same probability = 1/20**The distribution of T under H0for two samples of size 3**.15 2,3,4 2,3,5 2,4,5 3,4,5 .10 1,3,4 1,3,5 1,4,5 2,3,6 2,4,6 3,4,6 .05 1,2,3 1,2,4 1,2,5 1,2,6 1,3,6 1,4,6 1,5,6 2,5,6 3,5,6 4,5,6 T 6 7 8 9 10 11 12 13 14 15 The significance level is 5%, and under H0 P(T £ 6) = .05. Thus, the critical value of T is 6.**Wilcoxon Rank Sum Test – Example**• Example 17.1 - continued • Conclusion • H0 is rejected if T£6. Since T = 9, there is insufficient evidence to conclude that population 1 is located to the left of population 2, at the 5% significance level.**Critical values of the Wilcoxon Rank Sum Test**a = .025 for two tail test, or a = .05 for one tail test TL TU TL TU TL TU TL TU 11 25 For a two tail test: P(T<11) = P(T>25) = .025 if n1=4 and n2=4. For a one tail test: P(T<11) = P(T>25) = .05 if n1=4 and n2=4. Using the table: For given two samples of sizes n1 and n2,P(T<TL)=P(T>TU)= a. A similar table exists for a = .05 (one tail test) and a = .10 (two tail test)**n1(n1 + n2 + 1)**2 E(T) = Wilcoxon rank sum test for samples where n > 10 • The test statistic is approximately normally distributed with the following parameters: Therefore, Z = T - E(T) sT**Wilcoxon rank sum test for samples where n > 10, Example**• Example 17.2 (using Wilcoxon rank sum test with ordinal data) • A pharmaceutical company is planning to introduce a new painkiller. • To determine the effectiveness of the drug, 30 people were randomly selected. • 15 were given the tested drug (Sample 1). • 15 were given aspirin (Sample 2). • Each participant was asked to indicate which one of five statements best represented the effectiveness of the drug they took.**Wilcoxon test for samples where n > 10, Example**• Example 17.2 – continued • Summary of the experiment results. • Solution The objective is to compare two populations of ordinal data. The two samples are independent. Wilcoxon rank test is the appropriate technique to apply.**Note: A high score selected from**among the five possible scores 1, 2, 3, 4, 5, indicates high effectiveness. Wilcoxon rank sum test for samples where n > 10, Example Received the new painkiller Received Aspirin • The hypotheses H0: The locations of population 1 and 2 are the same H1: The location of population 1 is to the right of the location of population 2. • Solving by hand • To reject the null hypothesis, we need to show that z is “large enough”. • First we rank the observations, • Secondly, we run a z-test, with rejection region of Z > Za.**These are the effectiveness scores provided by**the experiment participants for each drug. Wilcoxon rank sum test for samples where n > 10, Example • Ranking the raw data There are three observations with an effectiveness score of 1. The original ranks for these observations are 1, 2 , and 3. This tie is broken by giving each observation the average rank of 2. Sum of ranks: T1=276.5 T2=188.5**Wilcoxon rank sum test for samples where n > 10, Example**To standardize the test statistic we need: E(T) = n1(n1+n2+1)/2= (15)(31)/2=232.5**Wilcoxon rank sum test for samples where n > 10, Example**For 5% significance level z=1.645. Since z = 1.83 > 1.645, there is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At 5% significance level, the new drugs is perceived as more effective than Aspirin.**Wilcoxon rank sum test for samples where n > 10, Example**• Excel solution (Xm17-02)**Wilcoxon rank sum test for non-normal interval data, Example**• Retaining Workers • The human resource manager of a large company wanted to compare how long business and non-business graduates worked for the company before quitting. • Two samples of 25 business graduates and 20 non-business graduates were randomly selected. • The data representing their time with the company were recorded.**Wilcoxon rank sum test for non-normal interval data,**Example • Retaining workers - continued Can the personnel manager conclude at 5% significance level that a difference in duration of employment exists between business and non-business graduates?**Non Business graduates**Business graduates Wilcoxon rank sum test for non-normal interval data, Example • Solution • The problem objective is to compare two populations of interval data. • The samples are independent. • The non-normality of the two populations is apparent from the sample histograms:**Wilcoxon rank sum test for non-normal interval data, Example**• Solution – continued • The Wilcoxon rank test is the correct procedure to run. H0: The two population locations are the same H1: The location of population 1(business graduates) is different from the location of population 2 (non- business graduates).**Reject the null hypothesis**Wilcoxon rank sum test for non-normal interval data, Example • Solution – continued • Solving by handThe rejection region is • After the ranking process is completed, we have:T = Tbusiness graduates = 463. E(T) = n1(n1+n2+1)/2=575; sT=[n1n2(n1+n2+1)/12]1/2=43.8**Wilcoxon rank sum test for non-normal interval data, Example**• Excel solution (Workers.xls) • There is a strong evidence to infer that the duration of employment is different for business and non-business graduates**Required conditions for nonparametric tests**• A rejection of the null hypothesis when performing a nonparametric test can occur due to: • different location • different spread (variance) • different shape (distribution). • Since we are interested in the location, we require that the two distributions are identical, except for location.**17.3 Sign Test and Wilcoxon Signed Rank Sum Test**• Two techniques for matched pairs experiment are introduced. • the objective is to compare two populations. • the data are either ordinal or interval (but not normal). • The samples are matched by pairs.**The Sign Test**• This test is employed when: • The problem objective is to compare two populations, and • The data areordinal, and • The experimental design is matched pairs. • The hypothesesH0: The two population locations are the same H1: The two population locations differ or population 1 is right (left) of population 2**The Sign Test –Statistic and Sampling Distribution**• A matched pair experiment calls for a test of matched pair differences. • The test statistic and sampling distribution • Recordthe sign of all the matched-pair-differences. • The number of positive (or negative) differences is the test statistic.**The Sign Test - Rationale**• The number of positive or negative differences is binomial, with: • n = the number of non-zero differences • p = the probability that a difference is positive (negative) • If the two populations have the same locations (H0 is true), it is expected that Number of positive differences = Number of negative differences Thus, under H0: p = 0.5**The Sign Test - Rationale**• The test statistic and sampling distribution • The hypotheses: H0: The two population locations are the same H1:The two population locations are different H0: p = .5 H1: p ¹ .5**The Sign Test –Statistic and Sampling Distribution**• The Test – continued • The hypotheses tested H0: p = .5 H1: p ¹ .5 • The binomial variable can be approximated by a normal variable if np and n(1-p) > 5. • The Z- statistic becomes**The Sign Test – Example**• Example 17.3 (Xm17-03) • In an experiment to determine which car is perceived to have the more comfortable ride, 25 people took two rides: • One ride in a European model. • One ride in a North American car. • Each person ranked the cars on a scale of 1 (ride is very uncomfortable) to 5 (ride is very comfortable).**The Sign Test – Example**Do these data allow us to conclude at 5% significance level that the European car is perceived to be more comfortable?**The Sign Test – Example**• Solution • We compare two populations • The data are ordinal • A matched pair experiment**The Sign Test – Example**• Solution • The hypotheses are: • H0: The two population location are the same. • H1: The European car population is located to the right of the American car population. • The test. • There were 18 positive, 5 negatives, and 2 zero differences. Thus, X = 18, n = 23(!). • Z = [x-np]/[np(1-p)].5 = [18-.5(23)]/[.5{23}.5] = 2.71 • The rejection region is z > za. For a = .05 we have z > 1.645. The p-value = P(Z > 2.71) = .0034**The Sign Test – Example**• Excel – Solution (Xm17-03) Using the computer: Tools > Data Analysis Plus > Sign Test**The Sign Test – Example**Conclusion: Since the p-value < a we reject the null hypothesis. At 5% significance level there is sufficient evidence to infer that the European car is perceived as more comfortable than the American car.**The Sign Test – Example**• Checking the required conditions • Observe the sample histograms (Xm17-03) • The populations are similar in shape and spread**Wilcoxon Signed Rank Sum Test**• This test is used when • the problem objective is to compare two populations, • the data are interval but not normal, • the samples are matched pairs. • The test statistic and sampling distribution • T is based on rank sum of the absolute values of the positive and negative differences • When n <=30, reject H0 if T>TU or T<TL(TL and TU tabulated values related to n). • When n > 30, T is approximately normally distributed. Use a Z-test.**Wilcoxon Signed Rank Sum Test,Example**• Example 17.4 • Does “flextime” work-schedule help reduce the travel time of workers to work? • A random sample of 32 workers was selected, and workers recorded their travel time before and after the program was implemented. • The hypotheses test are • The two population locations are the same. • The two population locations are different.**Wilcoxon Signed Rank Sum Test, Example**• Example 17.4 • Does “flextime” work-schedule help reduce the travel time of workers to work? • A random sample of 32 workers was selected, and workers recorded their travel time before and after the program was implemented. • The hypotheses are • H0: The two population locations are the same. • H1: The two population locations are different. The rejection region: |z| > za/2**Average rank =(1 + 8)/2 = 4.5**1 2 3 4 5 6 7 8 This data were sorted by the absolute value of the differences. Ties were broken by assigning the average rank to the tied observations**- E(T)**- T E(T) sT sT 264 = 53.48 The test statistic is: Z = T 367.5 - = 1.94 T is the rank sum of the positive differences. T = T+ = 367.5 E(T) = n(n+1)/4 = 32(33)/4 = 264 sT = [n(n+1)(2n+1)/24].5 =53.48**Wilcoxon Signed Rank Sum Test,Example**• Excel – solution (Xm17-04)**Wilcoxon Signed Rank Sum Test,Example**• Solution – continued The rejection region for a = .05 is |z| > z.025 = 1.96 Conclusion: Since |1.94| < 1.96, There is insufficient evidence to infer that the flextime program was effective at 5% significance level.