- 112 Views
- Uploaded on
- Presentation posted in: General

Power and Sample Size

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Power and Sample Size

(Images taken from van Belle)

When we conduct a test of any hypothesis regardless of the test used we make one of two possible decisions:

Reject the null (Ho) in favor of the alternative (Ha)

OR

Fail to reject the null hypothesis (Ho)

Notice we do NOT say “Accept the null (Ho)

Because we are making two possible decisions in the presence of uncertainty we can make two possible errors:

- We choose P(Type I Error) = aTypically a = .05 or .01 or .10
- However we do NOT directly chooseb = P(Type II Error)

b

a

We can decrease b by “doing” any of the following:

- Increasing the sample size(s)
- Increasing a
- Choose the “most powerful” test procedure that is appropriate for our analysis.
- Decreasing variation no direct control

Power = P(Reject Ho|Ha true) = 1 – b

Thus the same approaches for decreasing (b) also increase the power.

As sample size is the most direct way to decrease (b) – same is true for the POWER.

Power also increases the further the true parameter value(s) is away from the hypothesized value under the null.

To determine the sample size necessary to achieve a specified power we need specify the following:

- Desired minimal power (1 – b)
- The size of the difference from the null we wish to detect with this power. (Clinical)
- A “guesstimate” for the variation of the population(s) involved.
- For independent samples – equal or unequal sample sizes?

For males between the ages of 20-24 the mean cholesterol level is 180 mg/dl with a standard deviation s = 46 mg/dl. Suppose we are interested in determining if the mean cholesterol level for a special diet group of this population is greater than 180 mg/dl.

- = mean cholesterol of special diet group
mg/dl

mg/dl

Suppose that the alternative is true and specifically the mean of this special diet group is 200 mg/dl. If we use a sample size of n = 16 what would be the power of our test?

That is, what is the chance that we would obtain a sample mean that would lead to the rejection of the null hypothesis?

We will reject the null hypothesis if the observed sample mean is greater than 198.92 mg/dl. How was this value determined?

a = .05

198.92

To find the power when m = 200 mg/dl, we need to find the probability that would obtain a sample mean larger than 198.92 mg/dl.

198.92

200

0.463

Note: NOT DRAWN TO SCALE!!

Thus the Power = (1 – b) = 1 - .463 = .537 or 53.7% chance!

Given that,

Power = P(Reject Ho|m= 200 mg/dl) = .537

Why is it incorrect to say “Accept Ho”? Because there is only 53.7% we would reject when we should, in other words even when the alternative is actually true we still have a 46.3% chance to NOT reject the null hypothesis!

So failure to reject does not necessarily imply that the null hypothesis (Ho) is true!

As stated earlier we can increase the power by - raising a, decreasing s, increasing the difference between the mean under the null and the mean making the alternative true, or increasing sample size.

We could plot this relationship to between power and sample size (n) and use this plot to determine sample size needed to achieve a power of say 80% or 90%.

We could plot this relationship between power and sample size (n) and use this plot to determine the sample size needed to achieve a power of say, 80% or 90%.

Power = 80%,

Power = 90%,

- There are general formulas for determining sample sizes necessary to achieve a desired power for many different test procedures. These formula require specification of:
- significance level of the test (a)
- information about the variation in the population (e.g. s)
- size of the difference to detect (e.g. 200 – 180 = 20 mg/dl)
- desired power (by specifying b)
- quantiles from a standard normal distribution (Za & Zb )
- I will not derive the formulae for the different testing situations, but will show they are used. Also realize JMP has many of these built-in and allow for doing various calculations pertaining to sample size, power, and difference to detect. You can also find very nice web-apps to do this!

For one-tailed testing

The sample size necessary to achieve a specified power (1 – b) given the required inputs is:

and for two-tailed testing,

one-sided

The relationships between formula inputs

- dec./inc. ninc./dec.
- b dec./inc. n inc./dec.
- s inc./dec. n inc./dec.
- a dec./inc.ninc./dec.

One problem with all sample size formulae is they require information about variation that is unknown apriori, i.e. at the start of the study. So we need to “guesstimate” this information.

For this hypothetical cholesterol study we need to have a “guesstimate” for the population standard deviation (s).

Estimation Strategies:

- Use results from similar research in the literature, at least with the same response. (Best approach!)
- Use a small pilot study to obtain a sample standard deviation(s) which estimates (s).
- Estimated Range/(4 or 6) – guess what the observed range would likely be and divide by 4 or 6. (Sketchy)
- Pick a “worst” case value for s, if this value is too large then n will be too large as result, but power goals will be achieved. (Upper bound)

For males between the ages of 20-24 the mean cholesterol level is 180 mg/dl with a standard deviation s = 46 mg/dl. Suppose we are interested in determining if the mean cholesterol level for a special diet group of this population is greater than 180 mg/dl.

- = mean cholesterol of special diet group
mg/dl

mg/dl

Suppose we want to have power of 90% with an for testing these hypotheses and we consider 20 mg/dl above 180 mg/dl to clinically important (i.e. 200 mg/dl or more). What sample size do we need?

- = mean cholesterol of special diet group
mg/dl

mg/dl

one-sided

- Assuming we have…
- and
- Thus

Round up

In JMP you can use the Sample Size and Power calculator located in the DOE (Design of Experiments) pull-down menu.

You can see there are several testing situations where you can do Power and Sample Size calculations when designing an experiment.

We will cover those highlighted in this lecture, as well as others.

In this example we are working with a single population mean, thus we would select the One Sample Mean option.

Here the input values are:

Difference to detect ()

Power

After inputting the values leaving the sample size field blank and clicking Continue we find a sample size of n = 58.

n = 58 will achieve our power goals (two-sided)

To obtain a one-sided a=.05 test we actually have to input a = .10, because JMP does two-sided calculations. This gives a sample size n = 47.

- Testing a single population proportion (p)
- Comparing two population means with independent samples
- Comparing two population proportions with independent samples
- Comparing two population means with dependent samples.
- Comparing two population proportions with dependent samples.

For one-tailed testing

The sample size necessary to achieve a specified power (1 – b) given the required inputs is:

and for two-tailed testing,

Consider the planning of a survey to find out how smoking behavior changed while students were in college. A comprehensive survey four years ago found that 30% of freshmen smoked. The investigator wants to know how many seniors to be sampled. He wants to perform a two-tailed test at the a = .05 level.

p = proportion of seniors who smoke

(smoking rate same as freshman rate)

(smoking rate is different from freshman rate)

Suppose further that the researcher wants to have a 90% of detecting a 5% change in the smoking rate.

What sample size should the researcher use?

Here…

is maximized when p = .50.

What sample size should the researcher use?

seniors

If we had used rather than we would have obtained To be safe use which gives larger variance!

The inputs for the sample size calculation are:

and

Power

JMP returns the sample size n = 903

For testing,

Assuming equal sample sizes (i.e. & equal population variances (i.e. ).

For two-sided tests use instead of .

Assuming unequal sample sizes, i.e.

For example if then we are assuming we want the sample size from population 1 twice as large as the sample size from population 2.

For two-sided tests use instead of .

Again we need to have a “guesstimate” for the population standard deviation (s) common to both populations.

Estimation Strategies:

- Use results from similar research in the literature, at least with the same response. (Best approach!)
- Use a small pilot study to obtain a sample standard deviation(s) which estimates (s).
- Estimated Range/(4 or 6) – guess what the observed range would likely be and divide by 4 or 6. (Sketchy)
- Pick a “worst” case value for s, if this value is too large then n will be too large as result, but power goals will be achieved. (Upper bound)

For testing,

Assuming equal sample sizes (i.e. & unequal population variances (i.e. ).

For two-sided tests use instead of .

Assuming unequal sample sizes, i.e.

For example if then we are assuming we want the sample size from population 1 twice as large as the sample size from population 2.

For two-sided tests use instead of .

Amongst the women between the ages of 35-39 years of age, suppose we wish to compare the mean systolic blood pressure of nonpregnant, premenopausal oral contraceptive (OC) users to nonpregnant, premenopausal non-OC users.

Let,

then,

(two-sided alternative)

(two-sided alternative)

Suppose researchers would like to have a power of 80% to detect a difference of 5 mmHg between these two population means at the level. Also as they anticipate it will be much easier to find non-OC users in this age group, they have decided to sample twice as many non-OC users as OC-users in conducting this study.

What samples sizes () and () do should they use?

A small pilot study is conducted where a sample of 21 nonpregnant, premenopausal non-OC users and a sample of 8 nonpregnant, premenopausal, OC-users between the ages of 35-39 is taken. The following results from this pilot study were obtained:

Assuming unequal variances and that we want,

we will use the following formulae to determine the sample sizes:

Here we have,

Here we have,

Plugging in this information we have,

non-OC users

OC users

Assuming equal variances and equal sample sizesHere we have,

Hence we would use equal samples sizes of 193.

Assuming equal variances we can compute pooled estimate of

From problem set-up on previous slides

The inputs for the sample size calculation from the previous slide:

Difference to detect () = 5

Power

JMP returns a COMBINED sample size of 2n = 388, in other words 194 subjects per group for a two-sided test. For a one-sided test we would change

For testing,

Assuming values for & , giving a difference to detect and wishing to conduct a size test with power using equal sample sizes the sample size for each group is given by:

with and using for two-sided alternative.

For unequal sample sizes where then the formula for is given by,

with

The breast cancer rate in women between the ages 45 – 49 is 150 cases per 100,000 amongst currently disease free individuals. We wish to conduct a study to determine if ingesting large doses of vitamin E in capsule form will prevent breast cancer. The study will be set up with a control group (placebo) and a treatment group taking a vitamin E supplement. Researcher expect a 20% reduction in risk for those taking the vitamin E supplement.

How large should the equal sample sizes be if a two-sided test with a significance level of is used and a power of 80% is desired?

the breast cancer rate for control group

the breast cancer rate for vitamin E group

Plugging in these quantities we have,

or about 235,000 women per group!!

The inputs for this example into the sample size calculator for Two Proportions are:

Power

JMP returns equal sample sizes for each group = 235,100.

For testing,

Assuming the variance for the paired differences or the change in the response is .

For two-sided tests use instead of .

Again we need to have a “guesstimate” for the population standard deviation of the paired differences ().

Estimation Strategies:

- Use results from similar research in the literature, at least with the same response. (Best approach!)
- Use a small pilot study to obtain a sample standard deviation of the paired differences which estimates ().
- Estimated Range/(4 or 6) – guess what the observed range would likely be and divide by 4 or 6. (Sketchy)
- Pick a “worst” case value for , if this value is too large then n will be too large as result, but power goals will be achieved. (Upper bound)

More on “guesstimating”

Another way to estimate is consider the variation of the two populations being sampled using dependent samples. It is probably safe to assume that and as we are using dependent samples it reasonable to assume the responses are being measured are correlated. If we let this correlation be denoted then one can show

if we assume

More on “guesstimating”

assuming

Thus in order to estimate we need information about the variation of the response being measured and the correlation (). Again a small pilot study or literature review might yield reasonable values for these unknown quantities.

In the review lecture we examined the results from a study that looked at the change is systolic and diastolic blood pressure of patients 30 minutes after taking the drug Captopril. If we treat this small study (n = 15 patients) as a pilot study, what sample size should we use in a larger study if we wish to detect a change of 5 mmHg in systolic blood pressure with a power of .95 or 95%?

As this is clearly a pre-test vs. post-test situation, we have dependent samples.

Pilot study results

Equal variation

Could look at distribution of paired differences sysdiff to obtain this also.

Here

Using the standard deviation of the paired differences from the pilot study.

Using the correlation-based method,

thus…

McNemar’s Test is used to compare two proportions when the samples from the two populations have been drawn dependently.

This will involve either:

- looking at a binary or dichotomous outcome before and after “treatment” on the same subjects
- or by matching individuals in one population to those in another population on a one-to-one basis according to some criterion used to establish similarity.

Recall that basic set-up for McNemar’s Test is:

Population 2

Population 1

We wish to compare the proportion of subjects with a certain trait in each population, namely vs. any usual alternative .

As and , if the null is true we expect the discordant pairs to be equally distributed, i.e. b = c. Put another way the probability of a discordancy of either type Yes/No or No/Yes is ½.

Recall that basic set-up for McNemar’s Test is:

Population 2

Population 1

If we wish to show we expect , or in other words we expect the discordancy where the subject in population 1 has the trait but the subject in population 2 does to be more common, i.e.

Conversely if we wish to show we expect , or in other words we expect the discordancy where the subject in population 2 has the trait but the subject in population 1 does not to be more common, i.e. .

For a two-tailed test we are testing whether

To determine the sample size needed we need to specify for a specific alternative, i.e. something other than a ½.

In addition, for the test to sufficient power, we need to have a fairly large overall sample (n) so that the number of discordant pairs is large enough! This means we need to specify a probability that an observed pair will be discordant, we will call this .

The sample size calculation requires these inputs.

For testing,

(or )

(or )

The sample size required to achieve power with significance level is given by:

which would mean n subjects if it was a pre-test/post-test situation, and 2n subjectsif it is a matched-pairs situation.

or

Suppose we wish to compare two treatments in terms of presence of infection one-week after treatment. We will match patients based on age and past history of infection. We expect that in 85% of patients the response to the treatments will be the same, that is both will have no infection present one-week later or both will still have an infection.

Furthermore, for matched pairs in which there is a difference in the response to treatment, it is estimated that in two-thirds of the pairs the treatment 1 subject will still have an infection and the treatment 2 subject will not. Thus in one-third of the pairs the treatment 2 subject will still have an infection and the treatment 1 subject will not.

What sample size is needed to have a 90% chance of detecting a difference this large when using a two-sided test?

What sample size is needed to have a 90% chance of detecting a difference this large when using a two-sided test at the ?

The inputs needed are listed below:

- Some software packages computer do power calculations post-hoc, that is after conducting a statistical test. Treating the observed difference between the sample results and hypothesis values as the difference to detect.
- Variation information is course available because we can calculate it from our observed data.
- The power can be calculated, but the question is what does it show?

- Generally if we failed to reject the null, the PHP will be less .50 or 50%. If we reject the null it will be greater .50 or 50%.
- One can show the PHP is a function of the p-value so what new information is gained?
- We could in theory decide which sample size we would need to use in the future to find our observed difference significant, but truth be told, with enough data ANY null hypothesis will be rejected!

- The best approach is to decide what is the clinically meaningful difference that you want to have a good chance to detect.
- Estimate variation and unknown quantities to the best of your abilities.
- Compute the sample size need to achieve power and proceed.

- We have reviewed sample size calculations for achieving a specified power in one-sample and two-sample inference for means and proportions.
- Covered both independent and dependent situations for two-sample inference.
- We saw that many of calculations require information about unknown parameters that pertain to variation of the population(s).

- Literature review and pilot studies are the preferred method for obtaining information about unknown variation parameters.
- Picking large values when in doubt will produce sample sizes that are probably larger than necessary.
- When independent sample size calculations produce sample sizes that are “too” large, consider using dependent samples if possible.

- PHP power should be used cautiously and viewed for what it is. It will not help you prove the null is TRUE.
- There are power formulae for other testing & modeling situations, e.g. one-way ANOVA but I think we have look at enough.
- You should have some understanding of the issues and what is required to use the formulae for these other situations.

- On D2L I have posted a Tegrity Recording that demonstrates the Sample Size and Power calculator in JMP.
- It shows how to use it to obtain sample sizes like we have discuss in this lecture.
- It also how to use it to look at the relationship between Power and Sample Size.
- I will also demonstrate how to do post-hoc Power calculations, which as I stated at the end of this lecture really don’t serve a valuable purpose.