Download Presentation
## Multiple Regression

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Multiple Regression - Overview**Multiple Regression in statistics is the science and art of creating an equation that relates an outcome Y, to one or more predictors, X1, X2, X3, .. Xk. Linear Regression Y = a + b1X1 + b2 X2 + b3 X3 + ... + bk Xk + e = Ŷ + e where "e" is the residual error between the observed Y and the prediction (Ŷ). In linear regression, bi is the average change in Y for a single unit change in Xi. “Performance” stats: R2, SDe**Logistic Regression**In multiple logistic regression, Y is 0 or 1 with mean P, the “risk”. The logit of P, not P, is assumed a linear function of the Xs Logit(P) = ln(P/(1-P)) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk (“Logit” = log of the odds since P/(1-P) is the odds) odds=exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) risk = P = odds/(odds+1) “Performance” stats: Sensitivity, Specificity, Accuracy, Concordance (C), mean deviance Poisson Regression When Y is a positive integer (0,1,2,3…), we model the log of Y so Y can never be negative. This is the multiple Poisson regression model. ln(mean Y) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk mean Y = exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) mean Y cannot be negative. “Performance” stats: R2, mean deviance**Logistic regression**Predictors of in hospital infection Characteristic Odds Ratio (95% CI) pvalue Incr APACHE score 1.15 (1.11-1.18) <.001 Transfusion (y/n) 4.15 (2.46-6.99) <.001 Increasing age (yr) 1.03 (1.02-1.05) <.001 Malignancy 2.60 (1.62-4.17) <.001 Max Temperature 0.70 (0.58-0.85) <.001 Adm to treat>7 d 1.66 (1.05-2.61) 0.03 Female (y/n) 1.32 (0.90-1.94) 0.16 *APACHE = Acute Physiology & Chronic Health Evaluation Score**Multiple Proportional Hazards Regr (Cox model)**For time dependent outcomes (ie time to death), we model the hazard rate,h , the event rate per unit time (for death, it is the mortality rate). Since h > 0, we model the log of the hazard as a linear function of the Xs so h can never be zero (similar to Poisson regression) . ln(h) = a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk so h = exp(a + b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) > 0 If h0=exp(a) is the ‘baseline’ hazard, (that is, a=log(h0)) the hazard ratio is HR = h/h0 = exp(b1 X1 + b2 X2 + b3 X3 + ... + bk Xk) no intercept ‘a’. If S0(t) is the ‘baseline’ survival curve corresponding to the baseline hazard, then the survival curve for a given combination of X1, X2, … Xk is given by S(t) = S0(t)HR where HR is computed with the equation above. exp(bi) is the hazard rate ratio for a one unit change in Xi. “Performance” stats: Harrell’s Concordance (C) (0.5 < C < 1.0)**Cox HRs for donor age(Busuttil 2005)**Harrell C= 0.70**Multiple Linear Regression Example**Consider predictors of Y=Bilirubin (mg/dl) in liver transplant candidates. Two predictors are X1=Prothombin time (PT) in seconds X2=ALT (alanine aminotransferase in U/L). A multiple regression equation (on the log scale) is Ŷ = (predicted) log Bilirubin = -3.96 + 3.47 log PT + 0.21 log ALT**Regression output - equation**(Equation) Parameter estimates term estimate SE t ratio p value Intercept -3.96 0.257 -15.4 < 0.001 log PT 3.47 0.214 16.2 < 0.001 log ALT 0.21 0.055 3.8 0.0002 equation: Log Bili=-3.96 + 3.47 log PT + 0.21 log ALT**Regression-analysis of variance table**Source df sum squares mean square F Model 2 37.76 18.88 147.2 Error 363 46.56 0.1283=SDe2 -- Total 365 84.32 F = 18.88/0.1283 ->Screening F test that none of the predictors are related to Y. R2 = 37.76/84.32 = 0.448 = 44.8% R2 = model sum squares/total sum squares**Residual error plotResidual Log Bilirubin by Predicted**When the model is valid, this plot should look like a circular cloud if the errors have constant variance. The example above is a “good” result.**Example of a “good” residual error histogramerrors have**a Gaussian distribution about zero Moments - errors Mean 0.000 Std Dev (SDe) 0.357 Std Err Mean 0.0187 n 366 Quantiles - errors 100.0% maximum 1.711 97.5% 0.834 90.0% 0.454 75.0% quartile 0.177 50.0% median -0.025 25.0% quartile -0.213 10.0% -0.355 2.5% -0.630 0.0% minimum -1.191**Normal Quantile Plot(Should be approximately a straight line**if the residual error data are Gaussian) residual error (e)**Interpreting multiple regression coefficients (cont.)**The multiple regression coefficients will not in general be the same as the individual regression coefficient for each variable one at a time, even though the same Y is being modeled. Simple regression Multiple (simultaneous) variable (one Y, one X) (b1X1+b2X2) Log PT 3.560 3.470 Log ALT 0.310 0.211 Log Bilirubin = - 3.70 + 3.56 log PT, R2 = 0.425 Log Bilirubin = -.105 + 0.310 log ALT, R2 = 0.049 Log Bilirubin = -3.96 + 3.47 log PT + 0.211 log ALT , R2 = 0.448 Log PT coefficients don’t match**Orthogonally**In general, regression coefficients from simple and multiple regression are not the same ↔ controlling for covariates does not give the same answer as ignoring covariates. Only when all the X variables have correlation zero with each other will the simple and multiple regression coefficients be the same orthogonally**Log PT (X1) vs log ALT (X2)since correlation is low,**unadjusted and adjusted regression results are similar r12 = 0.111, R2 = 0.0123**Interaction Effects & subgroups**The model Y = 0 + 1X1 + 2X2 + implies that change in Y due to X1 (=1) is the same (constant) for all values of X2. An ADDITIVE model. In the model Y = 0 + 1X1 + 2X2 + 3 X1X2 + the 3 term is an interaction term. Change in Y for a unit change in X1 is (1+3X2) and is therefore not constant. Positive 3 is often termed a “synergism” Negative 3 is often termed an “antagonism” Additive only if β3=0 How to implement? Make new variable W = X1X2.**Interaction example**Response: Y= log HOMA IR (MESA study) R2=0.280, Root Mean Square Error=SDe=0.623 Mean Response= 0.395, n= 6782 Parameter Estimates Term Estimate Std Error t Ratio p value Intercept -1.388 0.049 -28.16 <.0001 Gender -0.669 0.085 -7.83 <.0001 BMI 0.061 0.00168 36.45 <.0001 gender*BMI 0.028 0.00299 9.38 <.0001 Predicted log HOMA IR = -1.39 – 0.669 gender + 0.061 BMI + 0.028 gender * BMI (gender is coded 0 for female and 1 for male)**gender x BMI interaction- non additivity**In Females Log HOMA IR = -1.39 +0.061 BMI In Males Log HOMA IR = -2.06 + 0.089 BMI**Gender x BMI interactionrelation is different in males vs**females**Hierarchically well formulated (WWF) regression models**HWF Rule – To correctly evaluate the X1*X2 interaction, must also have X1 and X2 in the model. In general, one must include the lower order terms in order to correctly evaluate the higher order terms.**HWF example- NON HWF**Model:chol = a0 + a1 smoke x age = SMOKEAGE 0, 1 (dummy) coding: smoke=0 or 1, smokeage = smoke x age Variable DF Estimate std error t p value INTERCEPT 1 156.863 3.993 39.286 0.0001 SMOKEAGE 1 0.361 0.182 1.988 0.0567 ---------------------------------------------------------------------------- -1, 1 (effect) coding: smoke=-1 or 1, smokeage =smoke x age Variable DF Estimate std error t p value INTERCEPT 1 162.278 3.103 52.320 0.0001 SMOKEAGE 1 0.055 0.100 0.548 0.5881 p value changes just because coding changes!**HWF example (cont)- HWF**HWF:Model: chol = b0 + b1 smoke + b2 age + b3 smoke x age For HWF models, significance is the same regardless of coding 0, 1 (dummy) coding: smoke=0 or 1, smokeage = smoke x age Variable DF Estimate std error t p value INTERCEPT 1 100.221 1.110 90.304 0.0001 SMOKE 1 3.812 1.570 2.429 0.0224 AGE 1 2.010 0.036 56.297 0.0001 SMOKEAGE 1 -0.009 0.050 -0.178 0.8599 -1, 1 (effect) coding:smoke=-1 or 1, smokeage=smoke x age Variable DF Estimate std error t p value INTERCEPT 1 102.127 0.785 130.138 0.0001 SMOKE 1 1.906 0.785 2.429 0.0224 AGE 1 2.005 0.025 79.437 0.0001 SMOKEAGE 1 -0.004 0.025 -0.178 0.8599**Regression assumptions**Regression can simultaneously evaluate all factors and thus reduce confounding. But must check two critical assumptions. • If X is continuous/interval, check if relation of X & Y is linear on some scale. (otherwise polychotomize X) 2. Check if effects of X1, X2, X3 … are additive by adding interaction terms. (Ex: X4 = X1 x X2). Not additive if interactions are significant.**3 Also, in linear regression, prefer residual errors (e)**to have a Gaussian distribution with a constant variance that is independent of Y. But additivity and linearity are more important since lack of additivity and linearity lead to bias and is more misleading.**Nonlinear Regression**Log(Bilirubin)= -3.96 + 3.47 log(PT) + 0.211 log(ALT) is a nonlinear model in terms of PT and ALT but is a linear model in terms of log PT, log ALT and the regression coefficients b0=-3.96, b1=3.47 and b2=0.211. Consider model of the form: Ŷ= Drug concentration = b1 10 b2 X This is nonlinear in b2 but can be made linear with a transformation: log10(concentration)=log10(b1) + b2 x What about: Drug concentration = b0+ b1 10 b2 X This model is nonlinear in b2 and cannot be transformed. Nonlinear regression software is needed to estimate b0, b1, & b2.**Nonlinear ExampleCompartmental drug models**Model of how drug (or any chemical) is metabolized by an organism. Y1=concentration in serum, Y2=concentration in organ, x=time d (Y1)/dx = -b1 Y1 d (Y2)/dx = b1Y1 - b2Y2 b1 > b2 > 0 solutions: Y1 = constant e -b1 x Y2 = (b1/(b1-b2)) [e -b2x – e -b1x] < - model Y2 takes on a maximum value when x = ln(b1/b2)/[b1-b2] Y2 is zero when x=0 or x is very large The constants b1 and b2 are rates. They are in units of 1/x (i.e 1/time). Y1 serum Y2 organ**Nonlinear equationY = (b1/(b1-b2)) [e -b2x – e -b1x]**Y= [0.0967/(0.0967-0.0506)]*[exp(-0.0506*t)-exp(-0.0967*t)] at peak, t = 14, Ŷ= 0.49**Residual diagnostics & “model criticism”**Assumptions of linear regression: 1. Linear relation between Y and each X except for random “noise” (but can transform X). 2. Effect of each X is additive (but can make interaction terms) 3. Errors (e) have constant variance and come from a Gaussian distribution 4. All observations from the same population 5. All observations independent (usually ok) A plot of Ŷ versus e, called a residual error (diagnostic) plot, can help verify if these assumptions are met.**Regression diagnostics**Problem-outliers Solution – Find on residual plot & eliminate Problem-Curvilinear (non linear) Solution- try nonlinear trans formation (x2, 1/x, log(x), ex) Problem-Errors not Gaussian Solution-Robust regression (future class) Problem-Non constant SDe or Var(e) Solution-Weights (future class)**Ex: Meditation & change in pct body fat**Overweight persons chose a meditation program or a “sham” (lectures) as part of a weight loss effort. They were NOT randomized. Change in percent body fat by treatment group (mediation or sham) over three months Unadjusted Means Level n Mean pct body SEM Mean dietary fat (gm) fat change (before study start) 1-meditate 439 -7.51% 0.47% 32.7 g 2-sham 704 1.34% 0.35% 67.1 g Unadjusted Mean difference (sham - meditation) = 8.85% SE of the difference = SEd = √0.47222 + 0.35322 = 0.59% t = mean diff/SEd = 8.85% / 0.59% = 15.1, p < 0.0001 Overall unweighted dietary fat = 49.9 g**Result via “regression”**Y= change in body fat, X = 1 if sham, 0 if meditation Y = a + b X + error = a + b sham + error term estimate SE t p value a -7.51 0.46 -16.4 < 0.001 b 8.85 0.59 15.2 < 0.001 Ŷ = -7.51 + 8.85 sham**Regr-control for dietary fat**pct body fat= Y = a + b1 X1 + b2 X2 + error X1= 1 if sham, 0 if meditation X2 = dietary fat in g term estimate SE t p value a -14.5 0.54 -26.6 < 0.001 b11.51 0.56 2.7 0.007 b2 0.21 0.011 18.9 <0.001 body fat chg=-14.5 +1.51 sham +0.21 diet fat**Adjusted means**Plug into equation for X1=1=sham or X1=0. Hold X2 = diet fat= 49.9 g – overall mean X2 Med: -14.5 + 1.51(0)+0.213(49.9)=-3.84% Sham:-14.5 + 1.51(1)+0.213(49.9)=-2.33% Adj mean difference =2.33-(-3.84) =1.51% (Unadjusted was 8.85%)**General procedure**Adjusted means for X1,controlling for (confounders) X2, X3, X4 … • Estimate model regression coefficients • Plug in different values for X1, and values for all other Xs held constant at their overall means. Gives adjusted means and their SEs. Assumes ????