Medical Biometry I

Medical Biometry I (Biostatistics 511) Week 10 Discussion Section Phillip Keung Biostat 511

Outline • Review: • 2 sample tests • Statistical power • Sample size calculations • Hypothesis testing • Non-parametric hypothesis tests Biostat 511

The Story So Far Biostat 511

Two sample tests • Want to compare the means of 2 populations (e.g. effectiveness of seatbelts on reducing morbidity and mortality among pediatric victims of motor vehicle accidents) Biostat 511

Review of Seatbelt Example Biostat 511

CI and p-values • While we do have Stata code to do all of the calculations, it is still important to know how to read the Stata output • Can use prtesti command to calculate everything • But prtesti gives you a lot of extra info that you don’t need Biostat 511

CI and p-values • 95% CI for is (-0.057, 0.015) • P-value = 0.31 Biostat 511

Aside on Hypothesis Testing • We are not testing whether the sample means/proportions are different • We already know that the sample means are different • We are testing whether the true population means/proportions are different Biostat 511

Aside on Hypothesis Testing • Do not write: H0, Ha • Write instead: H0, Ha • The point of statistical inference is to generalize from the particular sample to the population at large Biostat 511

Aside on Hypothesis Testing • What is the logic behind hypothesis testing? • In 2 sample t-tests, for instance: • There may be some difference between the 2 populations of interest • But we can never observe the true pop. means, only the sample means from the samples for each population • Based on the statistical properties of the sample means, we can try to make a decision about whether or not to reject our null hypothesis Biostat 511

Significance and Power • We talk about the significance (alpha) level and power of a test because we want to quantify the probabilities of drawing correct and incorrect conclusions based on that test • Ideally, we would like to get as many true positives as possible while controlling the number of false positives Biostat 511

Significance and Power • Power = Pr(reject H0 | Ha true) • Power depends on 0, a, 2, and n. • Sample size calculations ensure that the study is capable of detecting departures from the null hypothesis. • Power and Sample size require a model for the data under both the null and the alternative is required. Biostat 511

Significance and Power • Perform Sample Size and Power Calculations in Stata for • One- or two-sample problems • Tests of proportions or means • One- or two-sided hypothesis tests • Varying some of the factors that are related to statistical power • The variation in the characteristic: s • The alpha-level: a • The size of the effect: D = | m0 – m1| Biostat 511

Significance and Power • The sampsi command • The commands can be written with the following options sampsi#1 #2, alpha(#) power(#) n1(#) n2(#) ratio(#) sd1(#) sd2(#) onesampleonesided • where Biostat 511

Significance and Power • sampsi computes sample size or power for 4 types of tests: • Two-sample comparisons of means Biostat 511

Significance and Power • sampsi computes sample size or power for 4 types of tests: • One-sample comparisons of null mean to a hypothesized value Biostat 511

Significance and Power • sampsi computes sample size or power for 4 types of tests: • Two-sample comparisons of proportions. Biostat 511

Significance and Power • sampsi computes sample size or power for 4 types of tests: • One-sample comparisons of null proportion to a hypothesized value. Biostat 511

Significance and Power • Additional notes: • n1(#) specifies the size of the first (or only) sample and n2(#) specifies the size of the second sample. If specified, sampsi reports the power calculations. If not specified, sampsi computes sample size. • ratio(#) used in two-sample tests, allows one to compute the sample size when the sample sizes for the two groups are designed to be unequal. This would be used, for example, when cases and controls are not equally selected (e.g., 2 controls for every case - ratio(2)). • sd1(#) and sd2(#) are the standard deviations for the comparisons of means tests. If not specified, a comparison of proportions is made. In two-sample cases, if only sd1(#) is specified, sd2(#) is assumed to equal to sd1(#). • onesample indicates a one-sample test. The default is a two-sample test. • onesided indicates a one-sided test. The default is a two-sided test. Biostat 511

Examples Biostat 511

Examples • Exercises Biostat 511

Significance and Power • Why do we always specify a significance level first? • Because your tolerance for false positives determines how difficult it is to reject the null • From our z-tables, when alpha is 5%, we know that the rejection region on the z-scale is anything outside (-1.96, 1.96) • But when alpha is 1%, the rejection region becomes anything outside (-2.57, 2.57) Biostat 511

Significance and Power • It is always possible to get a test with 100% power • Just reject everything! • Of course, this also leads to a 100% false positive rate • There is always a tradeoff between detecting true positives and suppressing false positives Biostat 511

Power • Power is a measure of the test’s ability to detect a departure from the null hypothesis • More precisely, it is the probability that the test rejects the null when the alternative is true Biostat 511

Power • Sensible tests are more likely to reject the null (when the alternative is true) as: • The difference between the alternative and the null grows larger • The sample size increases • The variance decreases • Your tolerance for false positives increases (i.e. if alpha increases) • Mostly common sense, but let’s look at a picture that illustrates 1. Biostat 511

Power • Suppose that we get 30 observations from a population with known variance (let’s say ) and unknown mean • We want to test H0, Ha • We can use a z-test with the usual alpha level of 5% • How does the power of the z-test change as the true mean gets farther away from 0? Biostat 511

Power: z-test Biostat 511

Some details, optional • Power() = p(reject null | is true)=p()=p()=p()+p()=p()+ p()=p() + p() Biostat 511

Non-parametric Tests • The z-statistics and t-statistics that we use have critical values based on the normal or t distributions • Not exactly correct unless the data is normally distributed • Statisticians devised other kinds of tests, which do not depend on the distribution of the underlying data Biostat 511

Caution • At some point, you may hear someone argue that the z or t test is not appropriate because the data are not normally distributed, and that a non-parametric test should be preferred • This is not quite true • The z and t statistics are roughly normally distributed, even if the data are not, so long as n is reasonably large (CLT) • The test that you choose to use in inference should be based on what you want to compare (i.e. means, medians, something else?) rather than purely statistical concerns Biostat 511

Nonparametric Tests • Some non-parametric two-sample tests: • Sign test • Wilcoxon signed rank test • Wilcoxon rank sum test Biostat 511

Data Biostat 511

Sign Test • Based on the total number of differences greater than the null median • It is a test of the median difference: vs (or or) • If the median difference really is 0, then half of the differences should be above 0, and the other half should be below 0 Biostat 511

Sign Test • Therefore, under the null, the total number of differences above 0 should follow a Binomial(n, 0.5) distribution • We can use the bitesti command in Stata: Biostat 511

Sign Test • The sign test discards a lot of useful information; in fact, it ignores all of the information about the magnitude of the differences • Naturally, we would like to improve on that with another test Biostat 511

Wilcoxon Signed Rank Test • Also a test of the median difference • One way to incorporate some knowledge of the size of the differences is by considering the rank of their magnitudes • Test statistic based on the sum of the ranks of the differences that are greater than 0 • Implemented in Stata as signrank Biostat 511

Wilcoxon Signed Rank Test • In lecture, details about the mean and variance of the sum of ranks were presented without proof • Somewhat complicated to demonstrate • Corrections for ties and 0 differences are required, and generally left to Stata Biostat 511

Wilcoxon Rank Sum Test • Based on the pooled rank-sum • Tests something rather unusual: • Tests whether p(a random person drawn from pop. 1 has a value exceeding the value for a random person drawn from pop. 2) = 0.5 • Implemented in Stata as ranksum Biostat 511

Outline • Review: • 2 sample tests • Statistical power • Sample size calculations • Hypothesis testing • Non-parametric hypothesis tests Biostat 511

Medical Biometry I