Review of Methods from Prerequisite Course Assuming exposure to all of the content fromSTAT 601 – Statistical Methods for Healthcare Research
Presentation Outline • Review of variable types • Review will cover both descriptive and inferential methods • Methods for numeric (or possibly ordinal) response variables • Methods for categorical (or possibly ordinal) response variables *Before viewing this presentation download and print the supplements!
Brief Review of Data Types There are three main data types with further subclasses within some of them. • Continuous – measurements or counts Important subclasses – discrete, continuous, ratio scale, & interval scale (Wiki these scales) • Ordinal – ordered categoriesMay be coded numerically and could be treated as such. • Nominal – unordered categories May also be coded numerically, BUT cannot be treated as such.
Brief Review of Data Types In JMP (and SPSS) these are the three classifications. In JMP (which we’ll use)… • Continuous variables are denoted: • Ordinal variables are denoted: • Nominal variables are denoted:
ICU Study – used in most examples • This study consists of 200 subjects who were admitted to an adult intensive care (ICU). A major goal of this study was to predict the probability of survival to hospital discharge of these patients. (Lemeshow, Teres, Avrunin & Pastides, 1988) • Several measurements were taken at the time of admission and the ultimate survival of the patients was recorded.
ICU Study – used in most examples The variable descriptions and coding are found in this table. Comments:Notice that most of the information has been coded numerically, although only Age, Systolic BP, and Heart Rate are continuous. Some of the dichotomous variables have been created using continuous measurements (e.g. PO2, PH, PCO, etc.) The Level of Consciousness variable (LOC) could be treated as ordinal as the levels indicate increasing states of unresponsiveness.
Methods for a Numeric Response • Print this flowchart for reference (see website) • One population inference • Two population inference • More than two population inference • Covers both parametric and nonparametric methods.
One or Two or More Populations? • Is the study comparative in nature or are we making an inference about a single population? • Most studies are certainly comparative (i.e. multivariable) in nature! • However, we will review methods for a single numeric variable first.
Methods for a Single Numeric Variable Descriptive Methods • Visual Descriptions • Histogram • Boxplots • Stem Leaf Plots (archaic) • Cumulative Distribution Plots (CDF) • Normal Quantile Plots • Numeric Descriptions • Measures of central tendency • Measures of variation • Measures of relative standing • Measures of distributional shape
Plots for a Single Numeric Variable CDF Plot - shows P(X < x) vs. x e.g. P(X < 100) = .60 or 60% chance a patient’s heart rate is less or equal to 100 bpm at admission to ICU. • Visual Summaries of Heart Rate @ Admission (ICU Study) • Histogram • Boxplots (outlier and quantile) • Normal quantile plot • CDF plot
Summary Statistics for a Numeric Variable • Measure of Central Tendency • Mean, Median, Mode (3 M’s) • - mode is not unique! • Trimmed Mean (5%) – mean with the 5% of the obs. trimmed off the tails. • Geometric Mean - mean in the log-scale transformed back to original scale. Good measure for skewed right data!
Summary Statistics for a Numeric Variable • Measure of Relative Standing • Quantiles/Percentiles – values such that k% of the observations are less and (100-k)% are greater. • Quartiles – specific percentiles • Q1 – first quartile (25th percentile) • Q2 – second quartile (median) • Q3 – third quartile (75th percentile) • Measures of Shape • Skewness– measures degree of skewness of the distribution. If the distribution is symmetric (e.g. normal) then Skewness is 0. If Skewness > 0 then distribution is skewed to the right, if Skewness < 0 then distribution is skewed to the left. • Kurtosis – measures degree of kurtosis. If the distribution is approx. normal the kurtosis is zero. If it is positive the distribution has heavier tails than a normal distribution (outliers on each end) and if it negative the distribution has thinner tails than a normal distribution and more observations near the mean. (Wiki kurtosis for pictures)
Parametric Inference for the Population Mean (m) Assuming either the outcome comes from a normally distributed population or if the sample size is sufficiently “large”. Test Statistic Sample size required for margin of error (E) with Confidence Interval for m 95% confidence
Example: Heart Rate of ICU patients Output from JMP The upper-tail test p-value = .00000238 or (p < .0001), thus we have strong evidence to suggest that patients admitted to the adult ICU have a mean heart rate that would be considered high (i.e. m > 90 bpm). Furthermore we estimate that the mean resting heart rate of adults admitted to the ICU is between 95.18 bpm and 102.67 bpm with 95% confidence.
Nonparametric Inference for a Single Numeric Variable If the outcome/response does NOT come from a normally distributed population or if the sample size is NOT sufficiently “large”. To test the general hypothesis that in the population of patients admitted to the adult ICU have elevated/high resting heart rates we could use the Wilcoxon Signed-Rank Test as an alternative to the t-Test. • Form differences and drop any that are 0. • Compute the signed rank statistics . • Compare the smaller of these to the critical values from a Wilcoxon Signed-Rank Test table. • Better yet, use statistical software!
Nonparametric Inference for a Single Numeric Variable The upper-tail p-value from Wilcoxon Signed-Rank Test is (p < .0001) thus we conclude that the median heart rate of the population of patients admitted to the adult ICU is considered high (above 90 bpm). The Wilcoxon Signed-Rank Test is used to make inferences about the population median rather than the mean.
Comparing a Continuous Response Between Two Populations • When comparing a numeric response between two populations we must first consider the sampling scheme or experiment that generated the data, namely were the two samples drawn independently or dependently? • For dependent samples, there is a one-to-one correspondence between an individual in one population to an individual in the other.e.g. Pre-test vs. Post-test situations
More on Dependent Samples • Pre-test vs. Post-test, e.g. Before treatment vs. After treatment (i.e. subjects = blocks) • Comparing different treatments using the same subjects, e.g. pain relievers used on the same subjects (again subjects = blocks) • Matched subjects in the two populations according to some criteria, e.g. matched patients on basis of age, race, gender, socioeconomic status, weight, height, existing health conditions, etc. (Note: Need to be careful here!)
Example 1: Captopril & Systolic Blood Pressure • Research Question:Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg? • Experiment:Measure the blood pressure of 15 patients before and after taking Captopril. Our interest is on the measured changes in blood pressure and whether or not we believe that those changes have a mean greater than 10 mmHg.
Example 1: Captopril & Systolic Blood Pressure Once the paired differences have been formed we simply treat them as a single numeric response and make inferences accordingly. Summary Statistics
Parametric Inference for the Mean Paired Difference (md) Assuming either the paired differences come from a normallydistributed population or if the sample size (i.e. # of pairs) is sufficiently “large”. Test Statistic t-distribution df = n - 1 Confidence Interval for md the hypothesized difference under the null hypothesis. Typically this will be 0! Note: These formulae are the same as those for single population mean (m)!
Example 1: Captopril & Systolic Blood Pressure • Research Question:Is there evidence that patients will experience a mean decrease in systolic blood pressure of more than 10 mmHg? • HYPOTHESES , mean decrease in systolic blood pressure 30 minutes following taking Captopril is not greater than 10 mmHg. , mean decrease in systolic blood pressure 30 minutes following taking Captopril is greater than 10 mmHg.
Example 1: Captopril & Systolic Blood Pressure We have evidence to suggest that the mean decrease in systolic blood pressure 30 minutes after taking Captopril is more than 10 mmHg (p = .0009). Furthermore, we estimate the mean decrease is between 13.93 mmHg and 23.93 mmHg with 95% confidence.
Nonparametric Inference for Paired Differences Use if the paired differences do NOT come from a normally distributed population or if the sample size (# of pairs) is NOT sufficiently “large”. To test the general hypothesis that the change in systolic blood pressure is more than 10 mmHg we could use the Wilcoxon Signed-Rank Test as an alternative to the paired t-Test. • Form paired differences and subtract 10, dropping any that are 0. If simply testing for a difference we would not subtract 10. • Compute the signed rank statistics . • Compare the smaller of these to the critical values from a Wilcoxon Signed-Rank Test table. • Better yet, use statistical software!
Nonparametric Inference for Paired Differences We have evidence to suggest the median change in systolic blood pressure 30 minutes following taking Captopril is more than 10 mmHg (p = .0010).
Nonparametric Inference for Paired Differences • Another nonparametric option is to use the Sign Test. • For the Sign Test we simply looks at the number of positive and negative paired differences and computes the p-value using a binomial distribution with n = # of pairs and p = .50. • This should only be used if the response is difficult to measure or is ordinal !
Independent Samples Comparison of Two Population Means • Forindependent samples we are either: - drawing samples from two existing populations (i.e. observational study), e.g. males & females, smokers & non-smokers. - randomly allocating subjects into two populations (i.e. experiment), e.g. treatment vs. placebo, therapy A vs. therapy B, etc.
Independent Samples Comparison of Two Population Means • Analysis of these two situations is the same, although the conclusions reached may differ (i.e. association vs. causation). • This an example of a bivariate analysis,Y = response (continuous, possibly ordinal)X = population identifier (nominal) • If the response is normally distributed or if both sample sizes are “large” we can use a parametric approach.
Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The heart rate at admission appears higher for those admitted through the ER, about 10 bpm higher on average. This apparent difference could be due to chance variation however! Heart rate is approximately normally distributed for both samples. Variation in the heart rates appear to be similar.
Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The separation between the CDF plots suggest a potential difference in the heart rate distributions for patients admitted to the adult ICU through the ER and those that were not. In particular, it looks like the heart rate of patients admitted through the ER have tendency to have higher heart rates.
Independent Samples Comparison of Two Population Means For testing equality of means Ho: m1 = m2or (m1 – m2) = 0 The possible alternatives are: Ha: m1 > m2or (m1 – m2) > 0 (upper-tailed) Ha: m1 < m2or (m1 – m2) < 0 (lower-tailed) Ha: 1 2or (m1 – m2) 0 (two-tailed) Note: If we wanted to establish that one mean was say e.g. at least 10 units larger than the other we could replace 0 in these statements by 10. In general to establish a difference of at least Dunits then we replace 0 by D.
Independent Samples Comparison of Two Population Means Test statistic ~ t-distribution (df) The standard error of the difference in the sample means and the degrees of freedom (df) are calculated two different ways depending on whether or not we assume the population variances are equal. Rule O’ Thumb:Assume variances are equal only if neither sample variance is more than twice that of the other sample variance.
Independent Samples Comparison of Two Population Means –Pooled t-Test where Test statistic Pooled estimate of the common variance to both populations, it is essentially a weighted average of the two sample variances. It is called pooled because both samples are combined (or pooled) to estimate the variance common to both populations. ~ t-distribution (df) Confidence Interval for ( ) The degrees of freedom for the associated test statistic is Assuming common variance
Independent Samples Comparison of Two Population Means –Welch’s t-Test where Test statistic ~ t-distribution (df) Always round down! Confidence Interval for ( ) Assuming , i.e. unequal variances The degrees of freedom for the associated test statistic is
Independent Samples Comparison of Two Population Means – Formally Testing Equality of the Population Variances Assumption • We can formally test the equality of the population variances rather than use the Rule O’ Thumb. • In some situations it may also be of interest to compare the population variances in addition to the population means. • HYPOTHESES (or we could use a one-tailed alternative) Test Statistic (for comparing two population variances) ~ F-distribution with respectively. Large F statistic value small p-value (Reject Ho) • There are several other tests for equality of variance.
Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The F-test for comparing population variances do not provide evidence of a significant difference in heart rate variation between the two groups of patients (p = .3992). None of the other tests (O’Brien, Brown-Forsythe, Levene, Bartlett) have significant p-values either. Given these results we could conduct a pooled t-Test to compare the mean heart rates.
Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The two-tailed p-value = .0131, thus we conclude there is a statistically significant difference in the population mean heart rates between these two populations of patients admitted to the adult ICU. Furthermore, we estimate that the mean heart rate for patients admitted to the adult ICU through the emergency room anywhere from 2.26 bpm to 19 bpm larger than the mean for those who were not admitted to the ICU through the emergency room. Note: order of subtraction 1-0, i.e. , i.e. ER mean – non-ER mean. The results from the confidence interval lend themselves to a brief discussion of the concept of practical significance and/or effect size (ES). While a difference in the means of 19 bpm seems physiologically meaningful, the same could not be said for the lower confidence limit which is roughly 2 bpm. We will examine the concepts of practical significance and effect size in more detail later in the course. The output from the non-pooled option (t-Test) is presented in exactly the same format.
Nonparametric Testing for Two Independent Samples • If the population distributions do not appear to be normally distributed or if the sample sizes are “small”, we may choose to use a nonparametric test to compare the size of the values from the two populations. • There a few options available but by far the most frequently used nonparametric test for comparing a numeric response across two populations is the Wilcoxon Rank Sum Test (also known as the Mann-Whitney Test). • The test utilizes the sum of the ranks assigned to observations from the two populations when the two samples are combined. Essentially the larger the difference in the rank sums when taking the sample sizes into account, the more evidence we have against equality of the two distributions in terms of the size of the values.
Nonparametric Testing for Two Independent Samples HYPOTHESES , i.e. the distribution of the two populations is essentially the same, particularly in terms of the size of the values. , i.e. the distributions of the two population is different, specifically we believe one distribution is shifted to the right or left of the other. Note: One-tailed alternatives are fine also, meaning we can specify which population has larger values than the other in the alternative. Here the alternative hypothesis states population A is shifted to the right of population B, i.e. population A has larger values than population B.
Example: Heart Rate and Type of Admission Type of admission (TYP) 1 = ER 0 = non-ER The Wilcoxon Rank Sum Test p-value = .0137, thus we conclude the two populations of patients differ in terms of their heart rate at admission to the adult ICU. In particular, we conclude those that were admitted to the adult ICU via the ER had higher heart rates in general than those not admitted through the ER.
Comparing a Continuous Response Between Three or More Populations • As with two populations comparisons, there are independent and dependent sampling schemes when comparing several populations. • Assuming normality and equality of population variances across populations both situations use a form of Analysis of Variance (ANOVA) to compare the means of the populations.
Comparing a Continuous Response Between Three or More Populations • We will cover ANOVA in more detail later in the course and review both one-way ANOVA and randomized block designs as part of that discussion. • For now we will look at an quick example of each.
Example: Age and Race (Descriptive Summaries) Race of Patient 1 = White 2 = Black 3 = Other Although this may not be of interest in this study, here we compare the ages of patients in this study across race classified as white, black, or other. White patients in the sample were the oldest with a mean age of 59, while the other two race groups have a mean age of around 47. The age distributions do appear to be left-skewed or kurtotic (i.e. non-normal) and the standard deviations differ enough that equality of variances may be suspect.
Example: Age & Race (Comparing Variances) Race of Patient 1 = White 2 = Black 3 = Other All four tests for equality of variance do provide statistically significant evidence of unequal population variances (p > .05). If these tests did suggest a problem with the equality of population variance assumption we could use Welch’s ANOVA (like the non-pooled t-Test) to determine if the mean ages differed across race.
Example: Age and Race (One-way ANOVA) Race of Patient 1 = White 2 = Black 3 = Other From the one-way ANOVA F-test we conclude that at least two population means differ (p = .0222). With only three populations controlling for the experiment-wise error rate using Tukey’s HSD is not vital, as there are only three possible pairwise comparisons (white vs. black, white vs. other, and black vs. other).
Example: Age & Race (Multiple Comparisons) Race of Patient 1 = White 2 = Black 3 = Other Using Tukey’s HSD we see that none of the pairwise comparisons suggest a difference between the population means (all p > .05). Two-sample t-Tests (pooled) not controlling for experiment-wise error rate (EER) Without controlling for EER we see that the mean ages of white and black patients differ significantly (p = .0283). However, the estimated difference in means covers a wide range 1.26 years to 22.24 years. On the low end of the confidence interval this difference is certainly inconsequential.
Example: Age and Race (Nonparametric Test and Multiple Comparisons) Race of Patient 1 = White 2 = Black 3 = Other The nonparametric alternative to the one-way ANOVA F-test is the Kruskal-Wallis test. We conclude the populations differ in terms of the age distributions (p = .0110). The nonparametric alternative to Tukey’s HSD is the Steel-Dwass Method which suggests that the age distributions between white and black patients significantly differ (p = .0268). Again the CI for the difference in typical ages is wide, from 1 year to 25 years, with the low end representing a very small difference.
Methods for a Numeric Response • We have just reviewed the following:: • One population inference • Two population inference • More than two population inference • Covered both parametric and nonparametric methods. • We will cover block designs and their analysis when we cover ANOVA in more detail later in the course.
Methods for a Categorical Response • For a dichotomous categorical response we covered many of the methods in the flow chart to the left in the prerequisite course. • A dichotomous response has two levels which we can generically classify as “success” or “failure” or “yes” or “no”. • We will cover more advanced methods for the analysis of categorical data later in the course. We will briefly review some these methods from the prerequisite course using the ICU study data and data from other studies.