220 likes | 507 Views
Lecture 9 . Today: Log transformation: interpretation for population inference (3.5) Rank sum test (4.2) Wilcoxon signed-rank test (4.4.2) Thursday: Welch’s t-test (4.3.2) Practical vs. statistical significance (4.5.1) Presentation of statistical findings (4.5.2) Begin review if time
E N D
Lecture 9 • Today: • Log transformation: interpretation for population inference (3.5) • Rank sum test (4.2) • Wilcoxon signed-rank test (4.4.2) • Thursday: • Welch’s t-test (4.3.2) • Practical vs. statistical significance (4.5.1) • Presentation of statistical findings (4.5.2) • Begin review if time • Next Tuesday (2/17): Review • Next Thursday (2/19): Midterm
When to use log transformation What indicates that log might work in making distributions have same spread and symmetric shape? • Distributions are skewed • Spread is greater in the distribution with larger center • The data values differ by orders of magnitude, e.g., as a rough guide, the ratio of the largest to the smallest is >10 (or perhaps >4) • Multiplicative statement is desirable
Example • Study of cellular immunity in infectious mononucleosis. Two groups of healthy controls were considered. One group of 16 Epstein-Barr virus seropositive donors and another group of 10 Epstein-Barr virus seronegative donors. The file cellimmunity.JMP contains stimulation indices with the P3HR-1 virus as antigen. The interest is in testing whether there is any difference between seropositive and seronegative donors in stimulation indices.
Log Transformation for Population Inference • Consider comparing means of two populations. If the populations appear skewed with the larger population having the larger spread, using the t-tools to analyze the log transformed data might be more appropriate. • Using the t-tools on the log transformed data is appropriate (i.e., produces approximately valid results) if and are approximately normally distributed.
Inference for Population Medians • If distributions of Z1=log(Y1) and Z2=log(Y2) appear approximately normal with equal SD, then we can make inferences about the ratio of population medians for Y1 and Y2 as follows: • To test if population medians are the same, test the null hypothesis that the means of Z1 and Z2 are the same • An estimate of the ratio of the population 2 median to the population 1 median is exp( ). • To form a confidence interval for the ratio of population medians, form a confidence interval for the difference in the means of Z1 and Z2, (U,L). A confidence interval for the ratio of the population 2 median to the population 1 median is
Other transformations • Square root transformation - applies to data that are counts and to measurements of area • Reciprocal transformation - applies to data that are waiting times (e.g., time to failure of lightbulbs), reciprocal of time measurement can often be interpreted directly as a rate or a speed • Goals of transformation: Establish a scale on which two groups have roughly the same spread. • Inferences from log transformation are directly interpretable when converted back to original scale of measurement. Other transformations are not so easily interpretable, e.g., square of difference between means of and is not so easily interpretable.
Nonparametric Methods • Nonparametric (distribution free) methods do not assume that the population distributions follow any particular form. • (Wilcoxon) rank-sum test – Chapter 4.2. • Let F and G denote the population distributions of group 1 and group 2. Tests vs. by comparing the ranks of the two groups. • Advantages: Distribution free, resistant. Drawbacks: Confidence interval is difficult to get and difficult to extend to more complicated settings.
Rank Sum Test • List all observations from both samples in increasing order. • Identify which sample each observation came from. • Create a new column labeled “order,” as a straight sequence of numbers from 1 to • Search for ties in the combined data set. The ranks for tied observations are taken to be the average of the orders for those cases. • The test statistic T is the sum of all the ranks in the first group. We reject for values of T that are far away from the mean of T under H0.
Exact computation of p-value • Under H0: F=G, the ranks are randomly distributed among the two groups. • Exact p-value: Enumerate all possible groups and reject H0 if T is far away from its mean.
Example • Two subjects in each group. Group I: 1, 3. Group II: 4,6 • T=3 • There are 24 possible groupings of the ranks. Under H0, the groupings are equally likely and P(T=3)=1/6, P(T=4)=1/6, P(T=5)=1/3, P(T=6)=1/6, P(T=7)=1/6 • Two sided p-value = Probability that T would be at least as far from its mean under the null hypothesis (5) as the observed T (3) = 2/6.
Normal approximation to p-value • Let where the mean(T) and SD(T) refer to the mean and SD under H0: F=G. Under H0, z has approximately standard normal distribution when • Approximate p-value: Probability that standard normal r.v. would be at least as far from zero as observed test statistic z, Prob>|Z| in JMP.
Rank Sum Test in JMP • Analyze, Fit Y by X. • Click red triangle next to Oneway Analysis and click Nonparametric, Wilcoxon Test. • The p-value is listed under 2 Sample Test, Normal Approximation. The p-value is Prob>|Z|.
Cognitive Load in Teaching • Case Study 4.1.2 • A randomized experiment was done to compare (i) a conventional approach to teaching coordinate geometry in which presentation is split into diagram, text and algebra with (ii) a modified approach in which algebraic manipulations and explanations are presented as part of the graphical display. Students’ performance on a test was compared after being taught by two methods. • Both distributions are highly skewed. In addition, there were five students who did not come to any solution in the five minutes allotted so that their solution times are censored (all that is known about them is that they exceed 300 seconds.
Wilcoxon Signed Rank Test • Chapter 4.4.2 • Wilcoxon Signed Rank Test: distribution free test of for a matched pairs experiment (where it is assumed that distribution of differences is symmetric). • Signed-rank statistic: • Compute the difference in each of the n pairs • Drop zeros from list • Order the absolute differences from smallest to largest and assigned them ranks 1,…,n (average rank for ties) • Signed rank statistic S is the sum of the ranks from pairs for which the difference is positive.
Computing p-value • Under , the assignment of the observations in each pair to treatment or control are randomly distributed. Exact p-value can be determined. • JMP uses a normal approximation to calculate p-value (reliable if number of pairs 20). • Wilcoxon Signed Rank test in JMP: Analyze, Distribution, Test Mean, click box Wilcoxon Signed Rank test.
Duality between CI and Hypothesis Tests • Confidence interval: Range of plausible values for parameter. • Connection to hypothesis testing: A parameter value is plausible if it cannot be rejected when it is considered as the null hypothesis. • CI based on hypothesis tests: The set of values which are not rejected by two sided hypothesis tests at the 0.05 significance level is a 95% CI.
CI using Signed Rank Test • See Section 4.2.4. • To test using Wilcoxon Signed Rank Test, we use the sum of ranks from pairs for which difference is . In JMP, set “Specify Hypothesized Mean” equal to • By trial and error, we can find approximately the smallest and largest for which is not rejected at the .05 significance level (i.e., has p-value > .05). These are the endpoints of a 95% confidence interval. The conditions for validity of this CI are the same as those for Wilcoxon Signed Rank Test – random sampling and distribution of differences is symmetric.