1 / 69

# Chapter 21 - PowerPoint PPT Presentation

Chapter 21. Nonparametric Statistics. Nonparametric Statistics…. This chapter deals with statistical techniques that deal with ordinal data .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Chapter 21 ' - rhona

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Chapter 21

Nonparametric Statistics

• This chapter deals with statistical techniques that deal with ordinal data.

• Recall: when the data are ordinal, the mean is not an appropriate measure of central location. Instead, we will test characteristics of populations without referring to specific parameters, hence the term nonparametric.

• Rather than testing to determine whether the population means differ, we will test to determine whether the population locations differ…

• These two populations have the same location…

population 1

population 2

• The location of pop’n 1 is to the left of the location of pop’n 2…

• The location of pop’n 1 is to the right of the location of pop’n 2…

population 1

population 2

population 2

population 1

• When the problem objective is to compare two populations the null hypothesis will state:

• H0: The two population locations are the same.

• The alternative hypothesis can take on any one of the following three forms:

• u H1: The location of population 1 is different from the location of population 2

• v H1: The location of population 1 is to the right of the location of population 2

• w H1: The location of population 1 is to the left of the location of population 2

• u H1: The location of population 1 is different from the location of population 2

•  Used when we want to know whether there is sufficient evidence to infer that there is a difference between the two populations.

• v H1: The location of population 1 is to the right of the location of population 2

• Used when we want to know whether we can conclude that the random variable in population 1 is larger in general than the random variable in population 2,

and, not surprisingly…

• w H1: The location of population 1 is to the left of the location of population 2

• Used when we want to know whether we can conclude that the random variable in population 1 is smaller in general than the random variable in population 2.

NOTE: all of our hypotheses are phrased in terms of “1 then 2”.

This is for consistency. Rather than state:

H1: The location of population 2 is to the left of the location of population 2, we would want to phrase this as:

H1: The location of population 1 is to the right of the location of population 2

• We’ll use the Wilcoxon Rank Sum Test for problems where:

• — We’re asked to compare two populations,

• — The data are ordinal or interval (where the normality requirement is unsatisfied), and

• — The samples are independent.

• From these samples:

• u: 22, 23, 20

• v: 18, 27, 26Can we conclude (at 5% confidence level of course) that the location of population 1 is to the left (i.e. “smaller”) that the location of population 2?

• That is, we want to test:

• H0: The two population locations are the same.

• H1: The location of population 1 is to the left of the location of population 2.

• We can test this, we just need a test statistic…

• Step #1… rank the observations from smallest to largest, assign a rank number, and add up the “rank sum”…

*in the case of “ties” we average the ranks of the tied observations.

We arbitrarily select T1 as the test statistic and label it “T”

• A small value of T indicates most of the smaller observations are in sample 1 which was drawn from population 1 — but how small is “small”? Is 9 “small” enough?

• We have our test statistic, T=9. We need to compare it to some critical value of “T” to know if we’re in the rejection region for H0 (or not).

• So, what then, does the sampling distribution of “ranks” look like?

• We can build up the sampling distribution of the test statistic in much the same way we we built histograms for the outcomes of rolls of 2 and 3 dice…

• j Enumerate all possible combinations of ranks

• k Calculate ranks sums for the combinations

• l The probability of any rank sum is the number of occurrences divided by the total number of combinations…

• Enumerate & k Calculate & l Probabilities…

1 combination

3 combinations

Total of

20 combinations

5%

X

P(T≤6) = 1/20 = .05

Thus our critical value of T is 6

Since T=9 < TCritical=6, we cannot

reject H0…

INTERPRET

• We cannot reject the null hypothesis, that is, there is not enough evidence to conclude that the location of population 1 is located to the left of population 2 (at 5% significance).

• For sample sizes smaller than 10 observations (in each sample), refer to the Critical Values in Table 8 (Appendix B)

• For sample sizes larger than 10, the test statistic is approximately normally distributed with:

• Mean:Hence:

• Standard Deviation:

ni=size of sample i, i=1,2

• A drug company is trialing a new painkiller. 30 people were selected at random, half were given the new drug, half given aspirin, and all were told to rate the effectiveness on a five point scale (hence ordinal data):

• 5 = The drug was extremely effective.

• 4 = The drug was quite effective.

• 3 = The drug was somewhat effective.

• 2 = The drug was slightly effective.

• 1 = The drug was not at all effective.

IDENTIFY

• The data were recorded. Can we conclude (at 5% significance) that the new painkiller is perceived to be more effective?

• Its important to note here that “5” is a “good” score, so if the drug is effective, we’d likely see its location “greater than” the location of aspirin users, hence:

• H1: The location of population 1 is to the right of the location of population 2, and so:

• H0: The two population locations are the same.

IDENTIFY

• The data looks like:

These three ones would occupy ranks 1, 2, & 3 — we average them (2) and each is assigned that rank…

These five twos would occupy ranks 4,5,6,7, & 8 — again, average them to (4+5+6+7+8)/5 = 6

and so on and so forth…

COMPUTE

• (though not shown here) The rank sum for the new painkiller is T1=276.5, and the rank sum for aspirin: T2=188.5

• Set T= T1=276.5, and begin calculating…

COMPUTE

• The p-value of the test is:

• p-value = P(Z > 1.83) = .5 - .4664 = .0336

• (or Z=1.83 > ZCritical=1.645), hence:

• “There is sufficient evidence to infer that the new painkiller is perceived to be more effective than aspirin”

COMPUTE

• We can use the Wilcoxon Rank Sum Test in the Data Analysis Plus set of tools to come to the same conclusion…

p-value

compare…

• The Wilcoxon rank sum test actually tests to determine whether the population distributions are identical. This means that it tests not only for identical locations, but for identical spreads (variances) and shapes (distributions) as well.

• The rejection of the null hypothesis may be due instead to a difference in distribution shapes and/or spreads.

• To avoid this problem, we will require that the two probability distributions be identical except with respect to location.

• Factors that identify the Wilcoxon Rank Sum…

• We will now look at two nonparametric techniques (Sign Test and Wilcoxon Signed Rank Sum Test) that test hypotheses in problems with the following characteristics:

• — We want to compare two populations,

• — The data are either ordinal or interval (nonnormal),

• — and the samples are matched pairs.

• As before, we’ll compute matched pair differences and work from there…

• We can use the Sign Test when we’re dealing with two populations of ordinal data in a matched pairs experiment.

• For each matched pair, take the differences and count up the number of positive differences and negative differences.

• If population locations are the same (say), we’d expect the number of positives and negatives to net out to zero. If we have more positives than negatives (or vice versa) what can we learn? Again, how many is enough to make a difference?

• We can think of the sign test in terms of a binomial experiment, getting a positive sign is like flipping heads on a coin. We use this notion along with previously developed statistics to come up with our standardized test statistic (assuming the null hypothesis is true):

• Our null hypothesis:

• H0: the two population locations are the same

• is equivalent to:

• H0: p = .5 (i.e. equal proportions of +’s & –’s)

n≥10

• Since our null hypothesis is:

• H0: the two population locations are the same(i.e. p=.5)

• Our research hypothesis must be:

• H1: the two population locations are different

• which is the same as:

• H1: p ≠ .5

• 25 people were asked to ride in a European car (and rate the ride) then ride in a North American car (and again, rate the ride). The ratings were ordinal, from 1 – very uncomfortable to 5 – very comfortable, and it’s a matched pairs experiment since the same rider tried both cars. [Xm21-03.xls]

• Can we conclude (at 5% significance) that the European car is perceived to be more comfortable than the North American car?

COMPUTE

• The data was analyzed…

We had 5 negative responses.

We had 25 pairs of data initially, two pairs gave identical ratings (i.e. delta = zero) so these data points are dropped, hence n=23

We had 18 positive responses, thus x=18

INTERPRET

• The p-value is P(Z > 2.71) = .0034, hence we reject H0 in favor of H1, and conclude:

• H1: the two population locations are different

• Or, in the context of this problem…

• “There is relatively strong evidence to indicate that people perceive the European car to provide a more comfortable ride than the North American car.”

COMPUTE

• Again, we can leverage Excel to reduce the amount of work that we have to do to perform the Sign Test*

p-value

compare…

*Data Analysis Plus

• The sign test requires:

•  The populations be similar in shape and spread:

•  The sample size exceeds 10 (n=23).

• We’ll use Wilcoxon Signed Rank Sum test when we want to compare two populations of interval (but not normally distributed) date in a matched pairs type experiment.

• j Compute paired differences, discard zeros.

• k Rank absolute values of differences smallest (1) to largest (n), averaging ranks of tied observations.

• l Sum the ranks of positive differences (T+) and of negative differences (T–).

• m Use T=T+ as our test statistic…

• Now we have a test statistic, but what to compare it against?

• For small sample sizes, i.e. n ≤ 30, critical values of T can be read from Table 9 in Appendix B.

• For large sample sizes, i.e. n > 30, T is approximately normally distributed, so we have:

IDENTIFY

• Do travel times to the office vary between an 8:00 am start and a “flextime” start? 32 workers recorded their travel times

• We want to research this hypothesis:

• H1: the two population locations are different

• Thus we require:

• H0: the two population locations are the same.

IDENTIFY

• The data are interval (i.e. times) and were produced by a matched pairs experiment (same drivers, same day of the week – Wednesday). Why aren’t we using a t-test for ?

• A histogram of the paired differences reveals a non-normal distribution, hence we must use a non-parametric technique.

COMPUTE

ranks of +ve differences…

ranks of -ve differences…

The Original Data

Rank Sums

Sorted ascending by |difference|

COMPUTE

• We compute our test statistic as follows…

• Our rejection region is…

INTERPRET

• The Wilcoxon Signed Rank Sum Test tool in Data Analysis Plus yields the same result: there is not enough evidence to infer that flextime commute times differ from 8:00 am start commute times.

compare…

p-value

• Factors that Identify the Sign Test…

• Factors that Identify the Wilcoxon Signed Rank Sum Test…

• So far we’ve been comparing locations of two populations, now we’ll look at comparing two or more populations.

• The Kruskal-Wallis test is applied to problems where we want to compare two or more populations or ordinal or interval (but nonnormal) data from independent samples.

• Our hypotheses will be:

• H0: The locations of all k populations are the same.

• H1: At least two population locations differ.

• In order to calculate the Kruskal-Wallis test statistic, we need to:

• j Rank all the observations from smallest (1) to largest (n), and average the ranks in the case of ties.

• k We calculate rank sums for each sample: T1, T2, …, Tk

• l Lastly, we calculate the test statistic (denoted H):

• For sample sizes greater than or equal to 5, the test statistic H is approximately Chi-squared distributed with k–1 degrees of freedom.

• Our rejection region is:

• And our p-value is:

IDENTIFY

• Can we compare customer ratings (4=good … 1=poor) for “speed of service” across three shifts in a fast food restaurant? Our hypotheses will be:

• H0: The locations of all 3 populations are the same.

• (that is, there is no difference in service between shifts), and

• H1: At least two population locations differ.

• Customer ratings for service were recorded…

COMPUTE

• One way to solve the problem is to take the original data,

• “stack” it, and then

• sort by customer response

• & rank bottom to top…

sorted by response

COMPUTE

• Once its in “stacked” format, put in straight rankings from 1 to 30, average the rankings for the same response, then parse them out by shift to come up with rank sum totals…

COMPUTE

• Our critical value of Chi-squared (5% significance and k–1=2 degrees of freedom) is 5.99147, hence there is not enough evidence to reject H0.

COMPUTE

• From Data Analysis Plus, a similar finding…

• “There is not enough evidence to infer that a difference in speed of service exists between the three shifts, i.e. all three of the shifts are equally rated, and any action to improve service should be applied to all three shifts”

compare…

p-value

• Factors that Identify the Kruskal-Wallis Test…

• The Friedman Test is a technique used compare two or more populations of ordinal or interval (nonnormal) data that are generated from a matched pairs experiment.

• The hypotheses are the same as before:

• H0: The locations of all k populations are the same.

• H1: At least two population locations differ.

• Since this is a matched pairs experiment, we first rank each observation within each of bblocks from smallest to largest (i.e. from 1 to k), averaging any ties. We then compute the rank sums: T1, T2, …, Tk. The we calculate our test statistic:

• This test statistic is approximate Chi-squared with k–1 degrees of freedom (provided either k or b ≥ 5). Our rejection region and p-value are:

IDENTIFY

• Four managers evaluate and score job applicants on a scale from 1 (good) to 5 (not so good). There have been complaints that the process isn’t fair. Is it the case that all managers score the candidates equally or not? That is:

• H0: The locations of all 4 populations are the same.

• (i.e. all managers score like candidates alike)

• H1: At least two population locations differ.

• (i.e. there is some disagreement between managers on scores)

COMPUTE

• The data looks like this:

• Applicant #1 for example, received a top score from manager v and next-to-top scores from the other three.

• Applicant #7 received a top score from manager v as well, but the other three scored this candidate very low…

There are k=4 populations (managers) and b=8 blocks (applicants) in this set-up.

COMPUTE

• “rank each observation within block from smallest to largest (i.e. from 1 to k), averaging any ties”… For example, consider the case of candidate #2:

checksum = 1 + 2 + 3 + … + k

COMPUTE

• Compute the rank sums: T1, T2, …, Tk and our test statistic…

INTERPRET

• The value of our Friedman test statistic is 10.61 compared to a critical value of Chi-squared (at 5% significance and 3 d.f.) which is: 7.81473

• Thus, there is sufficient evidence to reject H0 in favor of H1

It appears that the managers’ evaluations of applicants do indeed differ

• Factors that Identify the Friedman Test…

• Previously we looked at the t-test of the coefficient of correlation ( ). In many situations, one or both variables may be ordinal; or if both variables are interval, the normality requirement may not be satisfied.

• In such cases, we measure and test to determine whether a relationship exists by employing a nonparametric technique, the Spearman rank correlation coefficient.

• We are interested whether a relationship exists between the two variables, hence the hypotheses to be tested are:

• H0:= 0 (no linear pattern, hence no correlation)

• H1:≠ 0 (correlation; we can also do one-tail tests)

• Since is a population parameter, our sample statistic is rs,

• and is calculated as:

• (where a and b are the ranks of x and y respectively)

• [ is referred to as the Spearman correlation coefficient]

• For values of n between 5 and 30, critical values of rs are available in Table 10 of Appendix B.

• When n is greater than 30, rs is approximately normally distributed with

• — a mean of zero, and

• — a standard deviation of

• Hence our standardized test statistic is:

• Is there a relationship between aptitude test scores before being hired and performance ratings 3 months into the job?

• Aptitude test scores range: [0…100] (i.e. interval data)

• Performance ratings scale:1 – below average

• :

• 5 – above average (ordinal)

• The problem is we’re trying to correlate interval & ordinal data. We’ll treat the aptitude scores as ordinal, and apply the Spearman rank correlation coefficient…

IDENTIFY

• We specify our hypotheses as:

• H0:= 0

• H1:≠ 0

• At a 5% significance level and n=20 observations, the rejection region (from Table 10) is:

• rs < –.450 -or- rs > .450

COMPUTE

• As before, we rank each of the variables separately and average any ties…

• Now we use the ranks columns, compute their standard deviations (sa, sb) and covariance (sab)…

COMPUTE

• Thus…

• Compare this to our critical value of rs=.450, and…

INTERPRET

• “There is not enough evidence to believe that the aptitude test scores and performance ratings are related.”

compare…

p-value

• Factors that Identify the Spearman Rank Correlation Coefficient Test…