Linear Models II Wednesday, May 30, 10:15-12:00

Linear Models IIWednesday, May 30, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology

Linear Models: Using the Regression Equation Assessing Effect Modification and Confounding Using "dummy" variables Generating and testing custom contrasts Strategies for model-building

Confounding and Effect Modification • If another factor is related to both the risk factor and outcome of interest, and if the association between the risk factor and outcome is different after considering a third factor, then confounding is present. • If the association between a risk factor and outcome is not only different after considering a third factor, but the difference varies depending on values of the third factor, then effect modification is present.

Confounding and Effect Modification • The concepts of confounding and effect modification apply to any modeling approach as well as to data at any level of measurement. • For example, if we conduct 'normal' regression with a continuous, normally distributed outcome variable, we are interested in whether differences between means on a variable vary across levels of another variable or whether the crude difference between means is different than the adjusted difference between means.

Confounding and Effect Modification • Comparing Results—Contingency Tables and Linear Modeling: • 2 Dichotomous Independent Variables (coded 1 and 0) • and a Dichotomous Outcome • tables VarB*VarA*outcome • / relriskriskdiffcmh; • A model with 2 main effects only (no product term) • model outcome = VarA VarB • / link = __ dist = __ ; • A model with 2 main effects and a product term • model outcome = VarA VarB VarA*VarB • / link = __ dist = __ ;

Confounding and Effect Modification • Comparing Results--Contingency Tables and Linear Modeling: • 2 Dichotomous Independent Variables (coded 1 and 0) • and a Dichotomous Outcome

Confounding and Effect Modification • tables VarB*VarA*outcome / relriskriskdiffcmh; • Given how these tables are specified, we will obtain: • An OR and RR along with a statistical test for the association between VarA and the outcome, adjusted for VarB • Stratum-specific ORs and RRs along with statistical tests for the associations between VarA and the outcome at each level of VarB • A statistical test for homogeneity of the stratum-specific measures of association (effect modification/interaction)

Confounding and Effect Modification • tables VarB*VarA*outcome / relriskriskdiffcmh; • Given how these tables are specified, we will not obtain: • An OR and RR, or a statistical test for the association between VarB and the outcome, adjusted for VarA • Stratum-specific ORs and RRs, or the statistical tests for the associations between VarB and the outcome at each level of VarA

Confounding and Effect Modification • A model with 2 main effects only (no product term) • model outcome = VarA VarB • / link = __ dist = __ ; • Given how the model is specified, we will obtain: • ORs or RRs along with statistical tests for both the association between VarA and the outcome, adjusted for VarB, and the association between VarB and the outcome, adjusted for VarA • A global statistical test for the simultaneous effect of VarA and VarB

Confounding and Effect Modification • A model with 2 main effects only (no product term) • model outcome = VarA VarB • / link = __ dist = __ ; • Given how the model is specified, we will not obtain: • Any stratum-specific estimates, either for VarA stratified by VarB or vice versa • A statistical test for homogeneity of stratum-specific measures of association (effect modification/interaction)

ExampleUsing Logistic Regression • A model with 2 main effects only (no product term) • model outcome = smoking late_no_pnc • Mutually Adjusted ORs • To assess confounding, • compare each to • its own crude OR

Example:Using Logistic Regression Computing the OR for the association between smoking and low birthweight adjusting for late/no prenatal care: • Holding late_no_pnc constant at 0Holding late_no_pnc constant at 1 The result is the same regardless ofwhether we use '1' or '0' for the value of the prenatal care variable—this is the meaning of 'adjustment' or 'control for' confounding.

Example:Using Log-binomial Regression • A model with 2 main effects only (no product term) • model outcome = smoking late_no_pnc • In logistic regression, the predicted values—odds—are not that meaningful, but from log-binomial (or Poisson) regression, estimated prevalences / risks can be reported in addition to adjusted measures of association.

Example:Using Log-binomial Regression • Among women who got late Among women who got early • or no prenatal care, the low prenatal care, the low • birthweight rate was 12% for birthweight rate was 9.5% for • smokers compared to 7% for smokers compared to 5.4% for nonsmokers. nonsmokers.

Example:Using Log-binomial Regression • Only a Model Without Considering Effect Modification • Yields Adjusted Relative Prevalences • Estimated Adjusted Relative Prevalence for Smoking Controlling for late_no_pnc • The summary relative prevalence was: Cohort Mantel-Haenszel 1.749 (1.635,1.873) • Calculations yield the • same result regardless • of the category of • late_no_pnc • s.e. for smoking from modeling = 0.0347

Confounding and Effect Modification • A model with 2 main effects and a product term • model outcome = VarA VarB VarA*VarB • / link = __ dist = __; • Given how the model is specified, we will obtain: • Stratum-specific ORs / RRs along with statistical tests for the associations between VarA and the outcome in the stratum where VarB = 0, and between VarB and the outcome in the stratumwhere VarA=0 • A statistical test for homogeneity of stratum-specific measures of association (multiplicative effect modification /interaction)

Confounding and Effect Modification • A model with 2 main effects and a product term • model outcome = VarA VarB VarA*VarB • / link = __ dist = __; • Given how the model is specified, we will not obtain: • Any adjusted ORs/RRs, either for VarA adjusted for VarB or vice versa. Including a product term in a model by definition assumes that effect modification is present and that we only want to report stratum specific, rather than adjusted measures.

Modeling Effect Modification • So why a product term in the model? • Effect modification/interaction is commonly assessed under a multiplicative model. The null hypothesis can be stated in terms of joint and separate effects as follows: • This approach explicitly acknowledges why this is multiplicative effect modification.

Modeling Effect Modification • So why a product term in the model? • Alternatively, the null hypothesis of no multiplicative interaction can be stated in the more familiar terms of homogeneity of stratum-specific estimates:

Modeling Effect Modification • The null hypothesis of no multiplicative interaction • Joint and • Separate Effects • Stratum-specific • Effects

Modeling Effect Modification • The product term is literally the multiplication of the values of two independent variables. • Assuming two dichotomous variables, values of the modeled variables are:

Modeling Effect Modification • Assuming one dichotomous and one continuous variable, values of the modeled variables might be:

Modeling Effect Modification • Assuming two ordinal variables, values of the modeled variables might be:

Example: • Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784 • Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731 • Breslow-Day Test p =0.2524

Example:Modeling Effect Modification • /*product term in model */ • proclogisticorder=formatted; • model lbw = smoking late_no_pnc smoking*late_no_pnc; • run; • How do we compute the adjusted estimates? • How do we compute the stratum-specific estimates? • What is the interpretation of the 2 main effects?

Example:Modeling Effect Modification • Only Stratum-SpecificMeasures of Association • are possible for variables included in a product term in a model • Computation for 1ststratum--- > • The association between • smoking and LBW • amongwomen with late/no PNC. • <---Computation for 2ndstratum • The association between • smoking and LBW • among women with early PNC.

Modeling Effect Modification • Computation of joint effect: Computation of Separate Effects: • Both smoking and late / no PNC Smoking only: Smoking & early PNC • Late / no PNC only: • Nonsmoking & late / no PNC • The common reference group for • all three odds ratios is • nonsmoking and early PNC

Modeling Effect Modification • Whether considering stratum-specific or joint and • separate effects in modeling, the test for interaction is the test of the product term in the model.

Modeling Effect Modification • Outcome = vA + vB + vA*vB • Either way, the • null hypothesis of • no interaction is • Ho: vA*vB = 0.

Modeling Effect Modification While the p-value for the product term is the test for multiplicative interaction, what is the meaning of the beta coefficient itself? For our example : • model lbw = smoking late_no_pnc smoking*late_no_pnc • The additional effect of smoking on LBW in the presence of late or no pnc compared to the separate effect of smoking alone • and • The additional effect of late or no pnc in the presence of smoking compared to the separate effect of the late or no pnc alone e0.0987 = 1.1037 = 1.9701 / 1.7858 (OR1 / OR2)

Modeling Effect Modification • Re-specify a model using joint and separate effects • Instead of setting up stratified tables or using a product term in a model, the data are arranged to look directly at the joint effect, and each of the separate effects • Construct a composite variable with categories corresponding to combinations of values from the original variables: • ifVarA = 1 and VarB = 1thenjoint_sep = 3; • elseifVarA = 1and VarB = 0then joint_sep = 2; • elseifVarA = 0 and VarB = 1thenjoint_sep = 1; • elseifVarA = 0 and VarB = 0thenjoint_sep = 0;

Table 1 of VarA by outcome Table 2 of VarA by outcome Controlling for VarB=yes Controlling for VarB=no VarA outcome VarA outcome Frequency| Frequency| Percent | Percent | Row Pct | Row Pct | Col Pct | yes |no | Total Col Pct | yes |no | Total --------- + -------- + -------- + --------- + -------- + -------- + yes | 365 | 2135 | 2500 yes | 135 | 1865 | 2000 | 7.30 | 42.70 | 50.00 | 1.35 | 18.65 | 20.00 | 14.60 | 85.40 | | 6.75 | 93.25 | | 67.59 | 47.87 | | 20.45 | 19.97 | ---- ----- + -------- + -------- + --------- + -------- + -- ------ + no | 175 | 2325 | 2500 no | 525 | 7475 | 8000 | 3.50 | 46.50 | 50.00 | 5.25 | 74.75 | 80.00 | 7.00 | 93.00 | | 6.56 | 93.44 | | 32.41 | 52.13 | | 79.55 | 80 .03 | --------- + -------- + -------- + --------- + -------- + -------- + Total 540 4460 5000 Total 660 9340 10000 10.80 89.20 100.00 6.60 93.40 100.00 Example:Modeling Effect Modification • RR=2.08 Breslow-Day Test p <0.0001RR=1.03

Example:Using Logistic Regression • proclogisticdata = interaction order=formatted; • model outcome = VarA VarB varA*VarB • run; • Standard Wald • Parameter DF Estimate Error Chi-Square Pr > ChiSq • Intercept 1 -2.6559 0.0452 3460.2770 <.0001 • VarA 1 0.0302 0.0999 0.0912 0.7626 • VarB 1 0.0692 0.0905 0.5857 0.4441 • VarA*VarB 1 0.7903 0.1390 32.3012 <.0001 • VarB=YesVarB=No • What are the adjusted estimates?

Example:Using Logistic Regression • Computation of joint effect: Computation of Separate Effects: • Both VarA = 1 and VarB = 1VarA only: VarA = 1 and VarB = 0 • VarB only: VarA = 0 and VarB = 1 • The common reference group for • all three odds ratios is • VarA = 0 and VarB = 0

Example:Using Logistic Regression • proclogisticorder=formatted; • classjoint_sep(ref='VarA n,VarB n') / param=ref; • model outcome = joint_sep; • run; • With no product term in the model, the joint and separate effects are obtained with simple exponentiation of each beta coefficient.

Example:Using Logistic Regression • Using the re-specified model to get Back to • Stratum-Specific Estimates • Computation for 1ststratum---> • where VarB = ‘yes’ • Computation for 2ndstratum---> • where covariate = ‘no’

Confounding and Effect Modification • Summary • In multivariable modeling (>1 variable in a model): • In a model without a product term or joint and separate effects, it is not possible to compute stratum-specific estimates • main effects are mutually adjusted measures of association • To assess confounding, an adjusted estimate from a model has to be compared to a crude estimate

Confounding and Effect Modification • Summary • In a model with a product term, it is not possible to compute adjusted estimates • In a model with a product term, main effects are measures of association within one stratum--the separate effects • The p-value for the product term is the test for multiplicative interaction • Joint and separate effects can be modeled directly, but then none of the beta coefficients provide a statistical test for interaction

Modeling with Dummy Variables

Modeling with Dummy Variables What if you want to model a nominal independent variable—a variable with more than 2 categories? For example, what if you want to model three categories of race/ethnicity—African American, Hispanic, and White? If the outcome variable were continuous, this situation would be Analysis of Variance (ANOVA), testing a hypothesis about the difference between three means: Ho: 1 = 2 = 3

Modeling with Dummy Variables In logistic regression, the hypothesis could be written: Ho: ln(Odds1) = ln(Odds2) = ln(Odds3) • In log-binomial regression, the hypothesis could be written: • Ho: ln(prop1) = ln(prop2) = ln(prop3) In any case, we need to “trick” the modeling procedure into handling the nominal variable appropriately.

Modeling with Dummy Variables Race/ethnicity as a single variable coded, for example 1=African American, 2=Hispanic, and 3=White, will be treated as ordinal in a regression model. proc genmod order=formatted; model outcome = ethnicity / link=log dist=bin; run; The incorrect interpretation of the resulting beta coefficient for ‘ethnicity’ would be, “for every unit change in ‘ethnicity’, there is a ____ change in the log(prevalence/risk) of the outcome”.

Modeling with Dummy Variables So, what’s the trick? Dummy variables, or indicator variables, are a set of dichotomous variables which together capture the nominal construct of interest. For a nominal variable with k categories, a set of k-1 dummy variables will capture the entire construct. If variables for all k categories are created, there will be redundancy in the model.

Modeling with Dummy Variables Each dichotomous variable considered separately is indeed assumed to be ‘ordinal’ by the modeling procedure For example, we know that ‘sex’ can be appropriately modeled even though it is a nominal variable. The beta coefficient for sex is interpreted as the difference between means (OLS) or the difference between log proportions (log-binomial) for males and females.

Modeling with Dummy Variables Example: Dummy variables for race/ethnicity: We create 2 dummies for our 3 category race/ethnicity variable: Original Dummy1 Dummy 2 Coding af_am hisp African American 1 1 0 Hispanic 2 0 1 White 3 0 0 Here, whites are being considered as the reference group.

Modeling with Dummy Variables Explicit coding in SAS: If we’re not sure which level we want as the reference group, we can code ‘k’ dummies and then decide which k-1 we will model: if race = 1 then af_am = 1; else if race ^= . then af_am = 0; if race = 2 then hisp = 1; else if race ^= . then hisp = 0; if race = 3 then white = 1; else if race^= . then white = 0;

Modeling with Dummy Variables Now, race/ethnicity can be modeled as follows: proc logistic order=formatted; model outcome = af_am hisp; run; The beta coefficient for af_am is the difference in log odds between African Americans and whites; the beta coefficient for hisp is the difference in log odds between Hispanics and whites. Exponentiating the betas, we get the odds ratios for African Americans v. whites & for Hispanics v. whites.

Modeling with Dummy Variables SAS will automatically code dummy variables within a proc step, but then the analyst needs to control the coding and the reference group. For the race/ethnicity example, rather than using the dummies af_am and hisp which were coded in the data step, the following model could be specified using the CLASS statement in SAS: proc logistic order=formatted; class race(ref=‘3’) / param = ref; model outcome = race; run;

Modeling with Dummy Variables So far, whichever way we created dummy variables, there was no direct comparison of African Americans to Hispanics. One way to do this is to rerun the model: proc logistic 0rder=formatted; model outcome = af_am white; run; or proc logistic order=formatted; class race(ref=‘2’) / param = ref; model outcome = race; run; (hold the thought for other ways to get a desired comparison).

Modeling with Dummy Variables Suppose you have an apparently ordinal variable such as income: • Should you include it in a model in this ordinal form? proc logistic order=formatted; model outcome = income; run; • Should you use dummy variables? • How many dummy variables will you create? • What is the reference group? Why?

Linear Models II Wednesday, May 30, 10:15-12:00

Linear Models II Wednesday, May 30, 10:15-12:00

Presentation Transcript

Linear Programming Models in Services

Introduction to Generalized Linear Models

Linear models in Epidemiology

Recognition by Linear Combinations of Models

GENERAL LINEAR MODELS: Estimation algorithms

Linear Hierarchical Models

Linear Programming Models

Introduction to R

5-7: Predict with Linear Models

Lecture 5 Linear Mixed Effects Models

Week 7: General linear models

Linear Regression Models

Applications of Linear and Integer Programming Models

Generalized Linear Models

Forecasting the EMU Inflation Rate Linear Econometrics Versus Non-Linear Computational Models

Chapter 3: Generalized Linear Models

Stochastic Gradient Descent Training for L1-regularizaed Log-linear Models with Cumulative Penalty

Log-linear and logistic models

2-1: Linear Functions as Mathematical Models

Chapter 4 Linear Models for Classification