- 93 Views
- Uploaded on
- Presentation posted in: General

How to Learn Everything You Ever Wanted to Know About Biostatistics

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

How to Learn Everything You Ever Wanted to Know About Biostatistics

Daniel W. Byrne

Director of Biostatistics and Study Design

General Clinical Research Center

Vanderbilt University Medical Center

The presenter has no financial interests in the products mentioned in this talk.

- To provide a 1-hour overview of the important practical information that a clinical investigator needs to know about biostatistics to be successful.

I. You Will Need the Right Tools

- I recommend SPSS for Windows.
- Bring an 1180 for $80 to Karen Montefiori in 143 Hill Student Center (3-1630).
- She will lend you the SPSS CD for the day and you can install this software easily.

SPSS is the 2nd most popular package.

It is much easier to use than SAS and Stata.

- Instat by GraphPad – graphpad.com
- for summary data analysis - $100

- True Epistat by Epistat Services – true-epistat.com - $395
- for random number table, etc.

- CIA (Confidence Interval Analysis) – bmj.com
- for confidence intervals - $35.95 with book
- “Statistics with Confidence” D. Altman

- If you can afford to spend $400, buy nQuery Advisor – statistical solutions - www.statsol.com
- If you can afford to spend $0, download PS from the Vanderbilt web site –
- http://www.mc.vanderbilt.edu/prevmed/ps/index.htm

- Both packages are on the CRC’s statistical workstation in room A-3101. VUMC investigators are welcome to use this workstation.

II. You Will Need a Plan

- State the problem
- Formulate the null hypothesis
- Design the study
- Collect the data
- Interpret the data
- Draw conclusions

- Among patients hospitalized for a hip fracture who develop pneumonia during their stay in the hospital, the mortality rate is 2.3 times higher at non-trauma centers compared with trauma centers
- (48.7% vs. 21.1%, P=0.043.)

- It is not clear if, or how, those who will develop pneumonia could be identified on admission.

- Among patients hospitalized for treatment of a hip fracture, there are no factors known upon admission that are statistically different between those who develop pneumonia during their stay and those who do not.

- For the same reason that we assume that a person is innocent until proven guilty.
- The burden of responsibility is on the prosecutor to demonstrate enough evidence for members of a jury to be convinced of that the charges are true and to change their minds.
- Outcome after treatment with Drug A will not be significantly different from placebo.

- Data on 933 patients with a hip fracture from a New York trauma registry will be analyzed.
- The 58 patients with pneumonia will be compared with the 875 without pneumonia.

The Most Common Type of Flaw

- A control group is asked,
- “Two weeks ago from today, did you eat X for breakfast?”

- Two weeks after their MI, patients are asked
- “Did you eat X for breakfast on the day of your heart attack?”

- You can prove any food causes an MI using this method (X=bacon, X=Flintstone vitamins, etc.)

- “Study design and bias are much more important than complex statistical methods.”
- Devote more time to improving the study design, and minimizing and measuring bias.
- Become an expert at study design issues and biases in your area of research.

- Power
- Beta
- Alpha
- Sample size
- Ratio of treated to control group
- Measure of outcome

- See Table 9-1 in the handout
- “Sample Size Requirements for Each of Two Groups”.

- See the handouts for:
- ITEC Trauma Systems Study

III. You Will Need Data Management Skills

- For small projects enter data into Microsoft Excel or directly into SPSS.
- For large projects, create a database with Microsoft Access.
- Keep variables names in the first row, with <=8 characters, and no internal spaces.
- Enter as little text as possible and use codes for categories, such as 1=male, 2=female.

IV. You Will Need to Learn Descriptive Statistics

- Descriptive statistics summarize your group.
- average age 78.5, 89.3% white.

- Inferential statistics use the theory of probability to make inferences about larger populations from your sample.
- White patients were significantly older than black and Hispanic patients, P<0.001.

- Check the lowest and highest value for each variable.
- For example, age 1-777.

- Look at histograms to detect typos.
- Cross-check variables to detect impossible combinations.
- For example, pregnant males, survivors discharged to the morgue, patients in the ICU for 25 days with no complications.

- The age of 777 should be checked and changed to the correct age.
- Suspicious values, such as an age of 106 should be checked. In this case it is correct.

V. You Will Need to Learn Inferential Statistics

- A P value is an estimate of the probability of results such as yours could have occurred by chance alone if there truly was no difference or association.
- P < 0.05 = 5% chance, 1 in 20.
- P <0.01 = 1% chance, 1 in 100.
- Alpha is the threshold. If P is < this threshold, you consider it statistically significant.

- Based on the total number of observations and the size of the test statistic, one can determine the P value.

- Test statistic & sample size (degrees of freedom) convert to a probability or P Value.

- There are hundreds of statistical tests.
- A clinical researcher does not need to know them all.
- Learn how to perform the most common tests on SPSS.
- Learn how to use the statistical flowchart to determine which test to use.

VI. You Will Need to Understand the Statistical Terminology Required to Select the Proper Inferential Test

- Univariate analysis usually refers to one predictor variable and one outcome variable
- Is gender a predictor of pneumonia?

- Multivariate analysis usually refers to more than one predictor variable or more than one outcome variable being evaluated simultaneously.
- After adjusting for age, is gender a predictor of pneumonia?

- Some tests are designed to assess whether there are statistically significant differences between groups.
- Is there a statistically significant difference between the age of patients with and without pneumonia?

- Some tests are designed to assess whether there are statistically significant associations between variables.
- Is the age of the patient associated with the number of days in the hospital?

- Some statistical tests are designed to assess groups that are unmatched or independent.
- Is the admission systolic blood pressure different between men and women?

- Some statistical tests are designed to assess groups that are matched or data that are paired.
- Is the systolic blood pressure different between admission and discharge?

- Categorical vs. continuous variables
- If you take the average of a continuous variable, it has meaning.
- Average age, blood pressure, days in the hospital.

- If you take the average of a categorical variable, it has no meaning.
- Average gender, race, smoker.

- If you take the average of a continuous variable, it has meaning.

- Nominal - categorical
- gender, race, hypertensive

- Ordinal - categories that can be ranked
- none, light, moderate, heavy smoker

- Interval - continuous
- blood pressure, age, days in the hospital

- Nominal
- Did this horse come in first place?
- 0=no, 1=yes

- Ordinal
- In what position did this horse finish?
- 1=first, 2=second, 3=third, etc.

- Interval (scale)
- How long did it take for this horse to finish?
- 60 seconds, etc.

- Parametric statistical test can be used to assess variables that have a “normal” or symmetrical bell-shaped distribution curve for a histogram.
- Nonparamettric statistical test can be used to assess variables that are skewed or nonnormal.
- Look at a histogram to decide.

VII. You Will Need to Know Which Statistical Test to Use

- See the handout, Figure 16-1, pages 78-79.

- 1. Chi-square
- 2. Logistic regression
- 3. Student's t-test
- 4. Fisher's exact test
- 5. Cox proportional-hazards
- 6. Kaplan-Meier method
- 7. Wilcoxon rank-sum test
- 8. Log-rank test
- 9. Linear regression analysis
- 10. Mantel-Haenszel method

- 11.One-way analysis of variance (ANOVA)
- 12. Mann-Whitney U test
- 13. Kruskal-Wallis test
- 14. Repeated-measures analysis of variance
- 15. Paired t-test
- 16. Chi-square test for trend
- 17.Wilcoxon signed-rank test
- 18.Analysis of variance (two-way)
- 19. Spearman rank-order correlation
- 20. Analysis of covariance (ANCOVA)

- The most commonly used statistical test.
- Used to test if two or more percentages are different.
- For example, suppose that in a study of 933 patients with a hip fracture, 10% of the men (22/219) of the men develop pneumonia compared with 5% of the women (36/714).
- What is the probability that this could happen by chance alone?
- Univariate, difference, unmatched, nominal, =>2 groups, n=>20.

- This test can be used for 2 by 2 tables when the number of cases is too small to satisfy the assumptions of the chi-square.
- Total number of cases is <20 or
- The expected number of cases in any cell is <1 or
- More than 25% of the cells have expected frequencies <5.

- Used to assess a nominal variable and an ordinal variable.
- Does the pneumonia rate increase with the total number of comorbidities?
- Univariate, association, nominal.
- Analyze, Descriptive Statistics, Crosstabs.

- Used to assess a factor across a number of 2 by 2 tables.
- Is the mortality rate associated with pneumonia different between trauma centers and nontrauma centers?
- Analyze, Descriptive Statistics, Crosstabs.

- Used to compare the average (mean) in one group with the average in another group.
- Is the average age of patients significantly different between those who developed pneumonia and those who did not?
- Univariate, Difference, Unmatched, Interval, Normal, 2 groups.

- Same as the Wilcoxon rank-sum test
- Used in place of the Student’s t-test when the data are skewed.
- A nonparametric test that uses the rank of the value rather than the actual value.
- Univariate, Difference, Unmatched, Interval, Nonnormal, 2 groups.

- Used to compare the average for measurements made twice within the same person - before vs. after.
- Used to compare a treatment group and a matched control group.
- For example, Did the systolic blood pressure change significantly from the scene of the injury to admission?
- Univariate, Difference, Matched, Interval, Normal, 2 groups.

- Used to compare two skewed continuous variables that are paired or matched.
- Nonparametric equivalent of the paired t-test.
- For example, “Was the Glasgow Coma Scale score different between the scene and admission?”
- Univariate, Difference, Matched, Interval, Nonnormal, 2 group.

One-way used to compare more than 3 means from independent groups.

“Is the age different between White, Black, Hispanic patients?”

Two-way used to compare 2 or more means by 2 or more factors.

“Is the age different between Males and Females, With and Without Pnuemonia?”

- Used to compare continuous variables that are not normally distributed between more than 2 groups.
- Nonparametric equivalent to the one-way ANOVA.
- Is the length of stay different by ethnicity?
- Analyze, nonparametric tests, K independent samples.

- Used to assess the change in 2 or more continuous measurement made on the same person. Can also compare groups and adjust for covariates.
- Do changes in the vital signs within the first 24 hours of a hip fracture predict which patients will develop pneumonia?
- Analyze, General Linear Model, Repeated Measures.

- Used to assess the linear association between two continuous variables.
- r=1.0 perfect correlation
- r=0.0 no correlation
- r=-1.0 perfect inverse correlation

- Univariate, Association, Interval

- Use to assess the relationship between two ordinal variables or two skewed continuous variables.
- Nonparametric equivalent of the Pearson correlation.
- Univariate, Association, Ordinal (or skewed).

Summary of Inferential Tests

Student’s t-test

Chi-square

One-way ANOVA

Mann-Whitney U test

Kruskal-Wallis H test

Paired t-test

McNemar’s test

Repeated-measures

Wilcoxon signed-rank

Friedman ANOVA

Student’s t-test

One-way ANOVA

Paired t-test

Pearson correlation

Correlated F ratio (repeatedmeasures ANOVA)

Mann-Whitney U test

Kruskal-Wallis test

Wilcoxon signed-rank

Spearman’s r

Friedman ANOVA

- Always check your results with a nonparametric.
- If you test your null hypothesis with a Student’s t-test, also check it with a Mann-Whitney U test.
- It will only take an extra 25 seconds.

VIII. You Will Need to Understand Regression Techniques

- Used to assess how one or more predictor variables can be used to predict a continuous outcome variable.
- “Do age, number of comorbidities, or admission vital signs predict the length of stay in the hospital after a hip fracture?”
- Multivariate, Association, Interval/Ordinal dependent variable.

- Used to assess the predictive value of one or more variables on an outcome that is a yes/no question.
- “Do age, gender, and comorbidities predict which hip fracture patients will develop pneumonia?”
- Multivariate, Difference, Nominal dependent variable, not time-dependent, 2 groups.

- We reject the null hypothesis.
- Patients who are at high risk of developing pneumonia during their hospitalization for a hip fracture can be identified by:
- total number of pre-existing conditions
- cirrhosis
- COPD
- male gender

- Z=-4.899 + (number of comorbidities x 0.469) + (cirrhosis x 2.275) + (COPD x 0.714) + (age x 0.021) + (gender[female=1, male=0] x –0.715)
- e=2.718
- Example, an 80 year old male with cirrhosis and one other comorbidity (but not COPD) had a 99.4% chance of developing pneumonia.
- Z=-4.899 + (2 x 0.469) + (1 x 2.275) + (0 x 0.714) + (80 x 0.021) (0 x –0.715)

- Kaplan-Meier method
- Used to plot cumulative survival

- Log-rank test
- Used to compare survival curves

- Cox proportional-hazards
- Used to adjust for covariates in survival analysis

Odds and Ends You Will Need

- A 95% confidence interval is an estimate that you make from your sample as to where the true population value lies.
- If your study were to be repeated 100 times, you would expect the 95% CIs to cross the true value for the population in 95 of these 100 studies.
- the value might be a mean, percentage or RR

- Confidence intervals should be included in publications for the major findings of the study.

- Prevalence
- How many of you now have the flu?

- Incidence
- How many of you have had the flu in the past year?

- Random is not the same as haphazard, unplanned, incidental.
- Allocating patients to the treatment group on even days and to the control group on odd days is systematic – not random.
- Random refers to the idea that each element in a set has an equal probability of occurrence.

- See the handout, Table 3-2 pages18-19.
- “Checklist to Be Used by Authors When Preparing or by Readers When Analyzing a Report of a Randomized Controlled Trial”.

IX. You Will Need to Continue Learning About Statistics

- Kuzma – Statistics in the Health Sciences
- Norusis – Data Analysis with SPSS
- Altman – Statistics with Confidence
- Friedman – Fundamentals of Clinical Trials
- Pagano – Principles of Biostatistics
- Encyclopedia of Biostatistics
- SPSS manuals

Future Workshops

- Oct 11 - How to use wireless hand-helds for clinical research(Paul St Jacques, MD, Anesthesiology)
- Oct 18 - How to conduct Anova statistical tests - Part 1/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
- Oct 25 - How to conduct Anova statistical tests - Part 2/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
- Nov 1 - How to conduct Anova statistical tests - Part 3/3(Ayumi Shintani, PhD, MPH, Center for Health Services Research)
- Nov 8 - How to write a data and safety-monitoring plan(Harvey Murff, MD)

X. One Final Skill You Will Need to Master

- “No – this is comparing apples and oranges!”