INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen BandeenRoche, Ph.D. July 19, 2010. Acknowledgements. Marie DienerWest Rick Thompson ICTR Leadership / Team. Outline. Regression: Studying association between (health) outcomes and (health) determinants
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
INTRODUCTION TO CLINICAL RESEARCH
Introduction to Linear Regression
Karen BandeenRoche, Ph.D.
July 19, 2010
JHU Intro to Clinical Research
Regression: Studying association between (health) outcomes and (health) determinants
Correlation
Linear regression: Characterizing relationships
Linear regression: Prediction
Future topics: multiple linear regression, assumptions, complex relationships
July 2009
JHU Intro to Clinical Research
4
July 2009
JHU Intro to Clinical Research
5
Study: 32 heart lung transplant recipients aged 1159 years
July 2009
JHU Intro to Clinical Research
7
Some specific names for “correlation” in one’s data:
r
Sample correlation coefficient
Pearson correlation coefficient
Product momentcorrelation coefficient
Correlation Analysis
8
Characterizes the extent of linear relationship between two variables, and the direction
How closely does a scatterplot of the two variables produce a nonflat straight line?
Exactly: r = 1 or 1
Not at all (e.g. flat relationship): r=0
1 ≤ r ≤ 1
Does one variable tend to increase as the other increases (r>0), or decrease as the other increases (r<0)
9
10
11
Correlation: Lung Capacity Example Correlation
r=.865
Heuristic: If I draw a straight line through the middle of a scatterplot of y versus x, r divides the standard deviation of the heights of points on the line by the sd of the heights of the original points
13
The value of r is independent of the unitsused to measure the variables
The value of r can be substantially influenced by a small fraction of outliers
The value of r considered “large” varies over science disciplines
Physics : r=0.9
Biology : r=0.5
Sociology : r=0.2
r is a “guess” at a population analog
14
Aims to predict the value of a health outcome, Y, based on the value of an explanatory variable, X.
What is the relationship between average Y and X?
The analysis “models” this as a line
We care about “slope”—size, direction
Slope=0 corresponds to “no association”
How precisely can we predict a given person’s Y with his/her X?
15
Health outcome, Y
Dependent variable
Response variable
Explanatory variable (predictor), X
Independent variable
Covariate
16
Y Correlation
y
1 = y / x
= slope
i
yi
0 = Intercept
xi
X
Linear regression  Relationship
Mean Y
at xi
x
0
Linear regression  Relationship Correlation
Linear regression – Sample inference Correlation
Linear regression – Lung capacity data Correlation
In STATA  “regress” command:
Syntax “regress yvarxvar”
TLC of 17.1 liters among persons of height = 0
b0
If centered at 165 cm: TLC of 6.3 liters
among persons of height = 165
In STATA  “regress” command:
Syntax “regress yvarxvar”
b1
On average, TLC increases by 0.142 liters per cm
increase in height.
Inference: pvalue tests the null hypothesis that the coefficient = 0.
pvalue for the slope very small
We reject the null hypothesis of 0 slope (no linear relationship).
The data support a tendency for TLC to increase with height.
Inference: Confidence interval for coefficients; these both exclude 0.
We are 95% confident that the interval (0.111, 0.172) includes
the true slope. Data are consistent with an average percm of
height increase in TLC ranging between 0.111 and 0.172. The
data support a tendency for TLC to increase with height.
Aims to predict the value of a health outcome, Y, based on the value of an explanatory variable, X.
What is the relationship between average Y and X?
The analysis “models” this as a line
We care about “slope”—size, direction
Slope=0 corresponds to “no association”
How precisely can we predict a given person’s Y with his/her X?
25
Linear regression  Prediction Correlation
^
Y Correlation
y
b1 = y / x
= slope
i
yi
b0 = Intercept
xi
X
Linear regression  Prediction
x
0
Linear regression  Prediction Correlation
^
^
Linear regression  Prediction Correlation
^
^
^
Inference: Confidence interval for coefficients; these both exclude 0.
Rsquared = 0.748: 74.8 % of variation in TLC is characterized
by the regression of TLC on height. This corresponds to correlation
of sqrt(0.748) = .865 between predictions and actual TLCs. This is
a precise prediction.
Linear regression  Prediction Correlation
To study how mean TLC varies with height… Correlation
Could dichotomize height at median and compare TLC between two height groups using a
twosample ttest
[Could also multicategorize patients into four height groups, (height quartiles), then do ANOVA to test for differences in TLC]
Another way to think of SLR
ttest generalization
Lung capacity example – two height groups Correlation
. Correlationttesttlc , by(height_above_med) unequal
Two

sample t test with unequal variances

Group  Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

+


<= Median 16 5.024375 .3286601 1.314641 4.323853 5.724898
> Median  16 8.150625 .2974915 1.189966 7.516537 8.784713

+


combined  32 6.5875 .3554756 2.010874 5.862503 7.312497

+

diff 

3.12625 .4433043

4.031973

2.220527

Satterthwaite's degrees of freedom: 29.707
Ho: mean(Median o)

mean(Above Me) = diff = 0
Ha: diff < 0 Ha: di
ff != 0 Ha: diff > 0
t =

7.0522 t =

7.0522 t =

7.0522
P < t = 0.0000 P > t = 0.0000 P > t = 1.0000
Lung capacity example – two height groups
Could ~replicate this analysis with SLR of TLC on X=1 if height > median and X=0 otherwise
More advanced topics CorrelationRegression with more than one predictor
More advanced topics CorrelationRegression with more than one predictor
More advanced topics CorrelationAssumptions
More advanced topics CorrelationLinear Regression Assumptions
More advanced topics CorrelationLinear Regression Assumptions
Assumptions well met:
More advanced topics CorrelationLinear Regression Assumptions
Nonnormal responses per X
More advanced topics CorrelationLinear Regression Assumptions
Nonconstant variability of responses per X
More advanced topics CorrelationLinear Regression Assumptions
Lung capacity example
More advanced topics CorrelationTypes of relationships that can be studied
Regression: Studying association between (health) outcomes and (health) determinants
Correlation
Linear regression: Characterizing relationships
Linear regression: Prediction
Future topics: assumptions, model checking, complex relationships