380 likes | 726 Views
What resources do I need?. How long will it take to conduct the study?I need 50 participants in my studyAbout 5 individuals per year will be enrolledTherefore, it will take 10 years to finish the study
E N D
1. Sample Size and Power Calculations Marcia A. Ciol
04/09/08
2. What resources do I need? How long will it take to conduct the study?
I need 50 participants in my study
About 5 individuals per year will be enrolled
Therefore, it will take 10 years to finish the study
How much money do I need?
I will follow a cohort of 500 individuals
A lab test that costs US$100 will be conducted for each person
Therefore, I will need US$50,000 just for lab tests
3. Am I going to reach my objective? I have 2 years to finish my thesis, of which one year is for data collection
I think I can get data on 50 people in that year
Is 50 a sufficient number of people to test my hypothesis with the significance level I want?
4. Why to calculate sample size and power? To show that under certain conditions, the hypothesis test has a good chance of showing a desired difference (if it exists)
To show to the funding agency that the study has a reasonable chance to obtain a conclusive result
To show that the necessary resources (human, monetary, time) will be minimized and well utilized
5. What do I need to know to calculate sample size? Most Important: sample size calculation is an educated guess
It is appropriate for studies involving hypothesis testing
There is no magic involved; only statistical and mathematical logic and some algebra
Researchers need to know something about what they are measuring and how it varies in the population of interest
6. Factors related to the sample size Population factor (cannot be controlled by researcher)
Characteristics of the study design
Quantities related to the research question (defined by the researcher)
There are many factors that are intertwined in the calculation of sample sizes or power of a study. Some factors depend on the design of the study, others on the investigators choices, and others on the data themselves.
The first consideration is the type of response variable. The study design and the response variable will determine the type of statistical method used in the data analysis. For example, if the data are continuous, and two groups are being compared for their means, a t-test may be the appropriate statistical method of analysis. The t-test defines the formula for the sample size or power calculation.
The second set of factors depend on the investigators choices. He/she needs to define the acceptable levels of significance and power of the study. The sample size may be more a consideration of availability and/or resources than what is necessary to achieve a certain power.
The third factor is the variation of the data in the population of interest. It is intuitive to realize that the higher the variation of the data (this may includes measures of variance and correlation among observations), the larger the sample size will have to be to give us enough confidence that we have a good estimate of the mean, for example.
The last five items in the slide above are related to each other in the formula defined by specific statistical test used in the study. If one knows four of those values, the fifth will be determined. Therefore, some values will have to either come from previous studies and/or knowledge, or will have to be assumed.
There are many factors that are intertwined in the calculation of sample sizes or power of a study. Some factors depend on the design of the study, others on the investigators choices, and others on the data themselves.
The first consideration is the type of response variable. The study design and the response variable will determine the type of statistical method used in the data analysis. For example, if the data are continuous, and two groups are being compared for their means, a t-test may be the appropriate statistical method of analysis. The t-test defines the formula for the sample size or power calculation.
The second set of factors depend on the investigators choices. He/she needs to define the acceptable levels of significance and power of the study. The sample size may be more a consideration of availability and/or resources than what is necessary to achieve a certain power.
The third factor is the variation of the data in the population of interest. It is intuitive to realize that the higher the variation of the data (this may includes measures of variance and correlation among observations), the larger the sample size will have to be to give us enough confidence that we have a good estimate of the mean, for example.
The last five items in the slide above are related to each other in the formula defined by specific statistical test used in the study. If one knows four of those values, the fifth will be determined. Therefore, some values will have to either come from previous studies and/or knowledge, or will have to be assumed.
7. Where do we get this knowledge? Previous published studies
Pilot studies
If information is lacking, there is no good way to calculate the sample size!
8. Population factor Variance of the measure (outcome) within the population
11. Study Design
12. Quantities related to the research question (defined by the researcher)
13. Quantities related to the research question (defined by the researcher)
14. Example: test of difference of means in two populations Researcher fixes probabilities of type I and II errors
Prob (type I error) = Prob (reject H0 when H0 is true) = ?
Smaller error ? greater precision ? need more information ? need larger sample size
Prob (type II error) = Prob (dont reject H0 when H0 is false) = ?
Power =1- ?
More power ? smaller error ? need larger sample size
15. Example: test of difference of means in two populations The equation for sample size is derived from the equation for the statistical test
In a t-test the equation for the test is
t = (x1 - x2) - (m1 - m2)??
??????????????????(s12? n??+ s?2? n??)???
The derived equation for sample size is
16. Using PASS: t-test example Question: does exercise help to decrease body weight?
Study design: participants will be randomized into two groups (exercise and control)
Outcome: change in weight
Want to detect: a change of at least 15 pounds
Known: from past studies, the standard deviation varies between 10 and 15 pounds.
28. Other Types of Hypothesis Tests Different methods of data analysis require different input for sample size calculations
29. Cox Regression (Survival analysis)
30. Logistic Regression
31. Repeated measures
32. Simple designs may not require complex calculations Read chapter 2 of Statistical Rules of Thumb, by Gerald van Belle (2002, John Wiley and Sons)
Using specialized software is useful if many calculations will be performed
33. Important to remember Pilot studies do not need sample size calculation!!!
There is no point in doing power analysis after the study is done
Sample size is an educated guess, and it works only if:
The study samples comes from the same or similar populations to the pilot study populations
The population of interest is not changing over time
The difference or association being studied exists
34. How about Effect Size? Most common definition
E = m1 - m2??
?????????????????spooled
If we change de value of E, how do we know what we changed in the formula?
35. Some situations I have encountered Question: How many more people do I need to enroll in the study (already in progress) to show statistical significance?
Answer: It depends
If the two populations have the same mean, increasing the sample size will not help!
Since when is the objective of a study to find a statistically significant result??
36. Some situations I have encountered Researcher is interested in outcome A, which differs very little for two treatments
Sample size needed is around 3000!!
Researchers changes the outcome to B, where sample size is smaller
B does not answer the researchers question and he needs to accept that his new treatment is not really different (clinically speaking) from the already existent treatment
37. Some situations I have encountered Researcher is interested in comparing two groups regarding prediction of outcome A by using a regression analysis (using several variables)
He uses the only available formula from his statistical book (for a t-test)
Wrong! He should find a software that can calculate the sample size appropriately
38. Summary Define research question well
Consider study design, type of response variable, and type of data analysis
Decide on the type of difference or change you want to detect (make sure it answers your research question)
Choose ? and ?
Use appropriate equation sample size calculation