Loading in 2 Seconds...
Loading in 2 Seconds...
Biostatistics in Practice Session 5: Associations and confounding. Youngju Pak, Ph.D. Biostatistician http://research.LABioMed.org/Biostat . Revisiting the Food Additives Study. From Table 3. Unadjusted. What does “adjusted” mean? How is it done?. Adjusted.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Youngju Pak, Ph.D.
Biostatistician
http://research.LABioMed.org/Biostat
From Table 3
Unadjusted
What does “adjusted” mean?
How is it done?
Adjusted
Earlier: Compare means for a single measure among groups.
Use ttest, ANOVA.
Session 5: Relate two or more measures.
Use correlation or regression.
ΔY/ΔX
Δ
Qu et al(2005), JCEM 90:15631569.
Try to isolate the effects of different characteristics on an outcome.
Previous slide:
Gender
GH Peak
BMI
Standard English word correlate
to establish a mutual or reciprocal relation between
In statistics, it has a more precise meaning
5
Correlation: measure of the strength of LINEAR association
Positive correlation: two variables move to the same direction As one variable increase, other variables also tends to increase LINEARLY or vice versa.
Example: Weight vs Height
Negative correlation: two variables move opposite of each other. As one variable increases, the other variable tends to decrease LINEARLY or vice versa (inverse relationship).
Example: Physical Activity level vs. Abdominal height
(Visceral Fat)
6
r can be any value from 1 to +1
r = 1 indicates a perfect negative LINEAR relationship between the two variables
r = 1 indicates a perfect positive LINEAR relationship between the two variables
r = 0 indicates that there is no LINEAR relationship between the two variables
7
r expresses how well the data fits in a straight
line. Here, Pearson’s r =0.673

+

+
Σ(XXmean) (YYmean)
√Σ(XXmean)2Σ(YYmean)2
Pearson’s r =
Statistical software gives r.
A
B
Graph B contains only the graph A points in the ellipse.
Correlation is reduced in graph B.
Thus: correlations for the same quantities X and Y may be quite different in different study populations.
Minimizes
Σei2
ei
Range for Individuals
Range for individuals
Range for Individuals
Range for mean
Statistical software gives all this info.
H0: true slope = 0 vs. Ha: true slope ≠0, with the rule:
Claim association (slope≠0) if
tc=slope/SE(slope) > t ≈ 2.
There is a 5% chance of claiming an XY association that really does not exist.
Note similarity to ttest for means:
tc=mean/ SE(mean)
Formula for SE(slope) is in statistics books.
The regression equation is: Ymean= 81.6 + 2.16 X
Predictor CoeffStdErr T P
Constant 81.64 11.47 7.12 <0.0001
X 2.1557 0.1122 19.21 <0.0001
S = 21.72 RSq = 79.0%
Predicted Values:
X: 100
Fit: 297.21
SE(Fit): 2.17
95% CI: 292.89  301.52
95% PI: 253.89  340.52
19.21=2.16/0.112 should be between ~ 2 and 2 if “true” slope=0.
Refers to Intercept
Predicted y = 81.6 + 2.16(100)
Range of Ys with 95% assurance for:
Mean of all subjects with x=100.
Individual with x=100.
We now generalize to prediction from multiple characteristics.
The next slide gives a geometric view of prediction from two factors simultaneously.
Suppose multiple predictors are continuous.
Geometrically, this is fitting a slanted plane to a cloud of points:
www.StatisticalPractice.com
LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B12).
LHCY = b0 + b1LCLC + b2LB12 is the equation of the plane
LHCYmean= b0 + b1LCLC + b2LB12
Outcome
Predictors
LB12 may have both an independent and an indirect (via LCLC) association with LHCY
LCLC
b1 ?
LHCY
Correlation
b2 ?
LB12
LHCY = b0 + b1LCLC + b2LB12
Outcome
Predictors
Mean LHCY increases by b2 for a 1unit increase in LB12
… if other factors (LCLC) remain constant, or
… adjusting for other factors in the model (LCLC)
May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.
Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.
*
* for age, gender, and BMI.
Output:
Std
Coefficient Error t Pr > t
Intercept 1.16448 0.28804 4.04 <.0001
AGE 0.00092 0.00125 0.74 0.4602
BMI 0.01205 0.00295 4.08 <.0001
BLC 0.05055 0.02215 2.28 0.0239
PRSSY 0.00041 0.00044 0.95 0.3436
DIAST 0.00255 0.00103 2.47 0.0147
GLUM 0.00046 0.00018 2.50 0.0135
SKINF 0.00147 0.00183 0.81 0.4221
LCHOL 0.31109 0.10936 2.84 0.0051
The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is:
Log(HDL) mean = 1.16  0.00092(Age) +…+ 0.311(LCHOL)
www.
Statistical
Practice
.com
Continued …
So far, our predictors were all measured over a continuum, like age or concentration.
This is simply called multiple regression.
When some predictors are grouping factors like gender or ethnicity, regression has other special names:
ANOVA
Analysis of Covariance
(Trt.– control)Female– (Trt.– control)Male
(Trt.– control)Femaleand (Trt.– control)Male