biostatistics in practice session 5 associations and confounding
Skip this Video
Download Presentation
Biostatistics in Practice Session 5: Associations and confounding

Loading in 2 Seconds...

play fullscreen
1 / 33

Biostatistics in Practice Session 5: Associations and confounding - PowerPoint PPT Presentation

  • Uploaded on

Biostatistics in Practice Session 5: Associations and confounding. Youngju Pak, Ph.D. Biostatistician . Revisiting the Food Additives Study. From Table 3. Unadjusted. What does “adjusted” mean? How is it done?. Adjusted.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Biostatistics in Practice Session 5: Associations and confounding' - malia

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
biostatistics in practice session 5 associations and confounding

Biostatistics in PracticeSession 5: Associations and confounding

Youngju Pak, Ph.D.


revisiting the food additives study
Revisiting the Food Additives Study

From Table 3


What does “adjusted” mean?

How is it done?


goal one of session 5
Goal One of Session 5

Earlier: Compare means for a single measure among groups.

Use t-test, ANOVA.

Session 5: Relate two or more measures.

Use correlation or regression.



Qu et al(2005), JCEM 90:1563-1569.

goal two of session 5
Goal Two of Session 5

Try to isolate the effects of different characteristics on an outcome.

Previous slide:


GH Peak



Standard English word correlate

to establish a mutual or reciprocal relation between b: to show correlation or a causal relationship between

In statistics, it has a more precise meaning


correlation in statistics
Correlation in Statistics

Correlation: measure of the strength of LINEAR association

Positive correlation: two variables move to the same direction  As one variable increase, other variables also tends to increase LINEARLY or vice versa.

Example: Weight vs Height

Negative correlation: two variables move opposite of each other.  As one variable increases, the other variable tends to decrease LINEARLY or vice versa (inverse relationship).

Example: Physical Activity level vs. Abdominal height

(Visceral Fat)


pearson r correlation coefficient
Pearson r correlation coefficient

r can be any value from -1 to +1

r = -1 indicates a perfect negative LINEAR relationship between the two variables

r = 1 indicates a perfect positive LINEAR relationship between the two variables

r = 0 indicates that there is no LINEAR relationship between the two variables


anemic women anemia sav n 20
Anemic women: Anemia.sav n=20

r expresses how well the data fits in a straight

line. Here, Pearson’s r =0.673

logic for value of correlation
Logic for Value of Correlation





Σ(X-Xmean) (Y-Ymean)


Pearson’s r =

Statistical software gives r.

correlation depends on ranges of x y
Correlation Depends on Ranges of X & Y



Graph B contains only the graph A points in the ellipse.

Correlation is reduced in graph B.

Thus: correlations for the same quantities X and Y may be quite different in different study populations.

simple linear regression slr
Simple Linear Regression (SLR)
  • X and Y now assume unique roles:
  • Y is an outcome, response, output, dependent variable.
  • X is an input, predictor, explanatory, independent variable.
  • Regression analysis is used to:
    • Measure more than X-Y association, as with correlation.
    • Fit a straight line through the scatter plot, for:
    • Prediction of Ymeanfrom X.
    • Estimationof Δ in Ymeanfor a unit change in X
    • = Rate of change of Ymean as a unit change in X
    • (slope = regression coefficient
    •  measure “effect” of X on Y).
slr example
SLR Example




Range for Individuals

Range for individuals

Range for Individuals

Range for mean

Statistical software gives all this info.

hypothesis testing for the true slope 0
Hypothesis testing for the true slope=0

H0: true slope = 0 vs. Ha: true slope ≠0, with the rule:

Claim association (slope≠0) if

tc=|slope/SE(slope)| > t ≈ 2.

There is a 5% chance of claiming an X-Y association that really does not exist.

Note similarity to t-test for means:

tc=|mean/ SE(mean)|

Formula for SE(slope) is in statistics books.

example software output
Example Software Output

The regression equation is: Ymean= 81.6 + 2.16 X

Predictor CoeffStdErr T P

Constant 81.64 11.47 7.12 <0.0001

X 2.1557 0.1122 19.21 <0.0001

S = 21.72 R-Sq = 79.0%

Predicted Values:

X: 100

Fit: 297.21

SE(Fit): 2.17

95% CI: 292.89 - 301.52

95% PI: 253.89 - 340.52

19.21=2.16/0.112 should be between ~ -2 and 2 if “true” slope=0.

Refers to Intercept

Predicted y = 81.6 + 2.16(100)

Range of Ys with 95% assurance for:

Mean of all subjects with x=100.

Individual with x=100.

multiple regression
Multiple Regression

We now generalize to prediction from multiple characteristics.

The next slide gives a geometric view of prediction from two factors simultaneously.

multiple lienar regression geometric view
Multiple Lienar Regression: Geometric View

Suppose multiple predictors are continuous.

Geometrically, this is fitting a slanted plane to a cloud of points:

LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B12).

LHCY = b0 + b1LCLC + b2LB12 is the equation of the plane

multiple regression software1
Multiple Regression: Software

Output: Values of b0, b1, and b2 for

LHCYmean= b0 + b1LCLC + b2LB12

how are coefficients interpreted
How Are Coefficients Interpreted?

LHCYmean= b0 + b1LCLC + b2LB12



LB12 may have both an independent and an indirect (via LCLC) association with LHCY


b1 ?



b2 ?


coefficients meaning of their values
Coefficients: Meaning of their Values

LHCY = b0 + b1LCLC + b2LB12



Mean LHCY increases by b2 for a 1-unit increase in LB12

… if other factors (LCLC) remain constant, or

… adjusting for other factors in the model (LCLC)

May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.

Figure 2.

Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.


* for age, gender, and BMI.

another example hdl cholesterol
Another Example: HDL Cholesterol



Coefficient Error t Pr > |t|

Intercept 1.16448 0.28804 4.04 <.0001

AGE -0.00092 0.00125 -0.74 0.4602

BMI -0.01205 0.00295 -4.08 <.0001

BLC 0.05055 0.02215 2.28 0.0239

PRSSY -0.00041 0.00044 -0.95 0.3436

DIAST 0.00255 0.00103 2.47 0.0147

GLUM -0.00046 0.00018 -2.50 0.0135

SKINF 0.00147 0.00183 0.81 0.4221

LCHOL 0.31109 0.10936 2.84 0.0051

The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is:

Log(HDL) mean = 1.16 - 0.00092(Age) +…+ 0.311(LCHOL)





hdl example coefficients
HDL Example: Coefficients
  • Interpretation of coefficients on previous slide:
  • Need to use entire equation for making predictions.
  • Each coefficient measures the difference in meanLHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is 0.012 lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors.

Continued …

hdl example coefficients1
HDL Example: Coefficients
  • Interpretation of coefficients two slides back:
  • P-values measure how strong the association of a factor with Log(HDL) is , if other factors do not change.
  • This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and is called independent association.
  • SKINF probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.
special cases of multiple regression
Special Cases of Multiple Regression

So far, our predictors were all measured over a continuum, like age or concentration.

This is simply called multiple regression.

When some predictors are grouping factors like gender or ethnicity, regression has other special names:


Analysis of Covariance

analysis of variance
Analysis of Variance
  • All predictors are grouping factors.
  • One-way ANOVA: Only 1 predictor that may have only 2 “levels”, such as gender, or more levels, such as ethnicity.
  • Two-way ANOVA: Two grouping predictors, such as decade of age and genotype.
two way anova
Two way ANOVA
  • Interaction in 2-way ANOVA: Measures whether the effect of one factor depends on the other factor. Difference of a difference in outcome. E.g.,

(Trt.-– control)Female– (Trt.– control)Male

  • The effect of treatment, adjusted for gender, is a weighted average of groupdifferences overtwo gender group, i.e., of :

(Trt.– control)Femaleand (Trt.– control)Male

analysis of covariance
Analysis of Covariance
  • At least one primary predictor is a grouping factor, such as treatment group , and at least one predictor is continuous, such as age, called a “covariate”.
  • Interest is often on comparing the groups.
  • The covariate is often a nuisance.
  • Confounder: A covariate that both co-varies with the outcome and is distributed differently in the groups.