Supervisors: Prof. K. Mohammad, Dr. M. H. Forouzanfar Advisor: Prof. M. Mahmoodi

Effect of physical activity on functional performance and knee pain in patients with knee osteoarthritis using Marginal Structural Models Supervisors: Prof. K. Mohammad, Dr. M. H. Forouzanfar Advisor: Prof. M. Mahmoodi Author: Dr. M. A. Mansournia January, 2012

Acknowledgments • I am very thankful to • Dr. Goodarz Danaei of Harvard for many useful discussions and detailed comments on several drafts of my thesis paper • Prof. Miguel Hernán of Harvard for his expert advice on data analysis and for his thoughtful comments on a draft of my thesis paper • Dr. Jay Kaufman, the editor of EPIDEMIOLOGY, and two anonymous reviewers for their insightful comments and constructive suggestions on a draft of my thesis paper

Miguel Hernán Goodarz Danaei Professor of Epidemiology Assistant Professor of Global Health & Population Harvard School of Public Health Harvard School of Public Health

Introduction • Knee osteoarthritis (OA) is a leading cause of pain and disability in the elderly • Several systematic reviews of randomized trials have demonstrated that exercise reduces pain and disability in patients with knee OA • Prospective observational studies are needed to evaluate the long-term effects of lifestyle physical activity in the knee OA population

Fixed and time-dependent exposures • We define an exposure to be fixed if every subject’s baseline exposure level determines the subject’s exposure level at all later times • Exposures can be fixed because • they only occur at the start of follow-up (e.g., a bomb explosion, a one-dose vaccine, a traffic accident) • they do not change over time (e.g., genotype ) • they evolve over time in a deterministic way (e.g., time since baseline exposure) • Any exposure that is not fixed is said to be time-dependent (e.g., medical treatment, diet, cigarette smoking, occupational exposure)

Time-dependent confounders • A time-dependent covariate to be a time-dependent confounder if a post-baseline value of the covariate is both an independent predictor of (i.e., a risk factor for) both subsequent exposure and the outcome within strata jointly determined by baseline covariates and prior exposure • Time-dependent confounding occurs only with time-dependent exposures • Nearly all exposures of epidemiologic interest are time-dependent

Arthritis Rheum. 2011;63(1):127-36 • Dunlop et al evaluated the effect of physical activity on subsequent 1-year functional performance in adults with knee OA using two-year follow-up data from the Osteoarthritis Initiative (OAI) study • Using GEE to estimate a linear model, they adjusted for the potential confounders available only at baseline as well as the concurrent values of time-dependent confounders and reported a dose-response relationship between physical activity and better performance

Standard methods for analysis of longitudinal data may lead to biased estimates, whether or not one adjusts for time-dependent confounders in the analysis, when exposure affects a time-dependent confounder or when exposure is affected by prior outcome and affects future outcome • Both of these conditions are possible in the OAI data, e.g., physical activity may affect body mass index as a potential confounder and may also affect as well as be affected by functional performance

Notations • A(t) and Y(t) denote physical activity and outcome (i.e., functional performance or knee pain) at visit t (i.e., 0, 1, 2 and 3 years) • L(t) corresponds to a vector of measured time-dependent potential confounders at visit t • C(t) denotes censoring during the period (visit t-1, visit t] • We use overbars to indicate the history of time-dependent variables through visit t, e.g.,

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3 Causal diagram for the OAI study under the time ordering {L(t), Y(t)}, A(t), Y(t+1)

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3 Bias of standard methods

A revolutionary thinker

Counterfactual theories • Neyman (1923) • Effects of fixed exposures in randomized experiments • Rubin (1974) • Effects of fixed exposures in randomized and observational studies • Robins (1986) • Effects of time-dependent exposures in randomized and observational studies

g-methods of Robins • The “g” in g-methods stands for “generalized”. Unlike conventional stratification-based methods, g-methods can generally be used to estimate the effects of time-dependent treatments • g-methods include: • g-computation algorithm formula (Robins, 1986) • g-estimation of structural nested models (Robins, 1989) • inverse probability of weighting of marginal structural models (Robins, 1998)

Marginal Structural Models, 1997 Proceedings of the American Statistical Association

Marginal Structural Models (MSMs) • Models for the marginal distribution of counterfactual outcomes, possibly conditional on baseline covariates, e.g., • is the potential or counterfactual random variable representing a subject's outcome at time t+1 had, possibly contrary to the fact, the subject received the exposure history rather than his/her observed exposure history • β5 is a column vector of parameters • a1(t), a2(t), and a3(t) are indicators for moderate, high, and very high levels of physical activity at visit t • V is a subset of the baseline covariates (In our analysis, V includes L(0) and Y(0))

A recent discussion about MSMs with Sander Greenland (SG), Miguel Hernán (MH), Jay Kaufman (JK), and James Robins (JR)

From MM to SG • I have a basic question about the term "marginal structural models (MSMs)". Why are MSMs called marginal? Because they model the marginal distribution of counterfactual variables, not the joint distribution of them (i.e., causal types) or because they do not include any covariates in the model (leading to marginalinterpretations of the model coefficients)?

From SG to MM, MH, and JR • My impression is that it is the first reason you give, but just to verifythe actually intent I forward it to the primary source authors. As you might note intheir articles, an MSM for potential outcomes can include many covariatesand thus not be very marginal in the ordinary regression sense; the covariatelevels might even happen to identify each individual uniquely

From MH to SG, MM, and JR • MSMs model the marginal distribution of counterfactual outcomes. When MSMs are conditional on baseline variables, we still refer to them as marginal. See Section 12.4 and Fine Point 12.3 of our book and let us know if this issue is unclear

From MM to MH, SG, and JR • I read the relevant sections in your book. To avoid confusion, it is a good idea to emphasize that two types of marginality implicit in MSMs are two distinct issues as follows: 1) MSMs model the marginal distribution of counterfactual outcomes. Thus the model stated for E (Ya|V) in Fine Point 12.3 is marginal with respect to other potential outcomes Ya* for a*≠a, irrespective of the covariates included in V. The parameters of the joint density of counterfactual outcomes (i.e., subject-sepecific or causal types) can not generally be identified, even from randomized trials 2) MSMs do not need to include covariates to adjust for confounding, because they are, by definition, models for causal effects, not the observed associations. This in turn, leads to marginal causal effects with respect to the covariates not included in the model

From SG to MM, MH, and JR • It seems to me #2 as stated below needs a subtle caution and more generally some clarification of this issue is needed:1) The weight stabilization outlined in the pair of Epidemiology 2000 articles includes the "modifiers" V in the numerator as well as denominator probability. Just to see my concern clearly, you could suppose that V is independent of all other covariates but not independent of A given those covariates. Then it seems in fitting the marginal-structural outcome model (MSM) with V-stabilized weights, confounding by V is no longer controlled by the weighting, as the weights no longer capture covariation of V and A; instead, adjustment for confounding by V is accomplished by the conditioning on V in the MSM, just as in conventional regression adjustment. That fact might lead one to desire more detail in specifying V than if the only concern were modification. 2) I'll also repeat that the meaning of "marginal" in MSM is brought home by an example in which V is unique for each individual (e.g., age in continuous time) and thus there is no marginalization in the population-aggregation sense. In terms commonly seen elsewhere it would be a fully conditional or subject-specific model - the marginalization is strictly with respect to the nonidentified joint potential-outcomes distribution.Let me know if I have missed something here

From MM to SG, MH, and JR • I have just noticed that our discussion is the topic of a recent commentary in EPIDEMIOLOGY (please see attached). Interestingly the author, Dr. Jay Kaufman, stated that "…..I note that effect measures adjusted simply via inverse probability of treatment weights have a marginal interpretation with respect to the covariates, and in this sense the word “marginal” has nothing to do with the first word in “marginal structural models."

From SG to MM, MH, JR, and JK • Thanks to Mohammad for pointing out Kaufman's 2010 "Marginalia" commentary. I'd read this when it appeared but the present discussion has made me notice a problem with it - I think its description is not quite accurate so I am copying this e-mail to him as well. • It states at the start: "stabilization of the weights in a marginal structural model can change this interpretation from marginal to conditional—a potentially important consequence that appears to have not yet been widely discussed in published work." • I believe it is not the stabilization with V (=Z in Kaufman) in the numerator of the weight that changes the interpretation. It is inclusion of V in the outcome MSM. Without V in the numerator, V is adjusted both by the weighting and the outcome model, so for V the entire procedure is in this sense doubly robust.

Putting V in the weight numerator doesn't change the interpretation of the treatment MSM coefficient, which is already conditional on V if V is in the outcome MSM. But, as Kaufman notes, it removes adjustment for V through the weighting so that adjustment now depends solely on the outcome model. Any precision improvement from including V in the weight numerator model is thus bought at the cost of increased bias risk: Bias due to mismodeling of V in the outcome MSM is no longer removed by the weighting. This seems just another example of the usual bias-variance trade-off. • The general point needing emphasis is that weighting is not conditioning. I think of it as reconfiguration of the sampling distribution as part of a strategy to remove biases while minimizing the variance inflation that may ensue from that bias removal. In the methods we are discussing, we remove unwanted dependencies (via the denominator) and then restore certain marginals (via the numerator) to recapture some of the precision thus lost, leading to more critical dependence on adjustment via the outcome model. • That said, I think the overall caution raised in the Kaufman commentary about MSM interpretation is warranted. • Again, I will appreciate verification or correction of my observations.

From JK to SG, MM, MH, and JR • I am forced to concede this point to Sander. I did write this a bit incompletely, and I wish that I had stated this point more clearly. What happens IN PRACTICE is that when authors stabilize for a baseline variable V, they then include that variable into the outcome model, and therefore the adjustment for this variable changes from marginal to conditional. Because this is the common practice, I took it for granted that putting V in the numerator implied also putting it into the outcome model, and I should have made this point more explicitly. I agree with Sander that simply stabilizing by V by putting it into the numerator of the weights without also adding it to the outcome model will not change the interpretation to a conditional one, but it will also fail to adjust for any potential confounding by V. My colleague Charlie Poole has always told me that real peer-review begins only AFTER the article is published. This e-mail exchange would be an example of his dictum.

I am in Chile at the moment, and in a remarkable coincidence, I was discussing this very point this morning with a very sharp young Chilean statistician named José Zubizarreta. He is currently a doctoral student with Paul Rosenbaum at Penn, but lives in NYC, and has just discovered that he commutes on the same train with Marshall Joffe (who also lives in NYC). So they have taken to having long conversations on the train, and hit upon this very topic as a problem that needs some further development. José said that they had an idea for a new way to estimate the weights that would make the bias-variance tradeoff without having to rely on stabilization or trimming the weights. I don’t know much more detail than that at present, but mention this to indicate that the current practice around stabilization strikes some people as suffering from an arbitrariness that requires further development. • Far from a correction, I thank Sander for his careful attention to this text, and I appreciate very much his clarification, which I endorse. • I would only add that if we stick to difference or ratio contrasts of risks, this marginal vs conditional distinction is largely obviated, since marginal and conditional RD and RR yield identical numerical values.

From SG to JK, MM, MH, and JR • This is a slight misreading of what I meant. I assumed V is already in the outcome model and thus controlled through that, otherwise it would not be V (a modifier in the original 2000 formulation). Thus with V we are already in the conditional world. The only question then is whether V is not in the weight numerator (hence we have some double robustness with removable inefficiency) or in the weight numerator (hence only single robustness with improved efficiency). • Interesting. However, I think there has already been a few lines of work on this problem:1) All the machine-learning stuff about weight estimation, starting with Ridgeway and McCaffrey at RAND 2004 (see their summary in Stat Sci 2007) and going on up through Lee et al. last year (attached), all of which pretty much obviates ad hoc weight trimming.

2) My impression is that van der Laan and his group claim to have solved the entire bias-vs-variance issue in all causal-inference problems via "targeted maximum likelihood" (TML) with a "superlearner" (again a machine-learning based approach). They certainly have written a lot on the topic. I can claim no expertise at all on the issue, but note that at least they seem to frame the problem more clearly than most have so far: Identify your structural model and the parameter of interest within it; state your inference-optimality criteria; then construct an optimal procedure given your criteria, structural model, and any further constraints (model) on the data-generating process. In other words, apply classical frequentist decision theory to these problems. I am sure Jamie could improve dramatically on my impression and cite his own solutions. • If both the conditional and marginal are derived using the same model and that model imposes conditional homogeneity. But off the null at least one of RD and RR will be heterogeneous. Under a model allowing heterogeneity, you could say the properly weighted averages of the conditional effects will equal the marginal effects. But for the RR the weights for that average are not the IPTW or standardization weights.

From MM to SG, JK, MH, and JR • "Without V in the numerator, V is adjusted both by the weighting and the outcome model, so for V the entire procedure is in this sense doubly robust." • The attached paper uses such weights in the context of a point-treatment study, see equations (1) and (2) at page 273 as well as the weight equation at page 274. Note that here (i) X denotes a vector of measured pretreatment covariates, (ii) V is part of X and denotes a vector of useful covariates in predicting the probability of outcome, and (iii) those variables in X but not in V are the nuisance variables.

From SG to MM, JK, MH, and JR • Thanks for reminding us of this Joffe et al. 2004 article. However I'm unclear on how what you point out about it illuminates the quote from me or what we were discussing.They mention stabilization only briefly in the discussion, and while that is fine they explore no details that I can see.Perhaps you can explain more? • I don't think it true that model-based standardization (MBS) is more difficult to implement than IPTW MSM fitting; their claim otherwise (p. 276) is just an artifact of comparing consistent variance formulas for MBS to the conservative approximations used in IPTW. Especially, if one uses a simulation method (e.g., bootstrap) to get standard errors they are both of the same coding complexity; in fact MBS will tend to be more stable computationally (since it uses no estimator inverses). But IPTW has received an order of magnitude more attention so that procs are available for it.

Perhaps more controversially, and analogously, I think the oft-asserted sparse-data superiority of IPTW over MBS is an artifact reflecting that the sparsity problem has gotten more attention in the IPTW world. But ultimately the two worlds aren't in competition since they can be combined along the lines seen in doubly robust estimation (by using sparse-data machine learners for all regression fits). In particular, I wish the Joffe et al. paper had discussed unconditional logistic regression with nonparametric nuisance-parameter estimation as an alternative or supplement to conditional logistic regression and MSMs, and how it could be combined with IPTW to fit MSMs. • Finally, as a very minor point in honor of Phil Dawid, I think strong ignorability (eq. 3 p. 273) is stronger than needed for MSM estimation as we have been discussing; since we are only estimating marginal potential outcome distributions, all that is needed is the slightly weaker marginal independence condition of weak ignorability:Pr(A=a|X=x,Yj=yj)= Pr(A=a|X=x)for all a, j whereYjis the potential outcome at A=j (component j of the vector Y)

From MM to SG, JK, MH, and JR • It seems to me that V is adjusted both by the weighting and the outcome MSM in the paper. Here the inclusion of V in the MSM changes the interpretation of βa from marginal to conditional, even though the weights are not V-stabilized.

From JR to SG, MM, MH, and JK • I will try to write something today • Although your back and forth has gotten some points correct there remain fundamental misconceptions re msms in your dialogue which i will try to clear up. Accurate statements are available in my 1999 paper tools Msms versus snms as tools for causal inferrence but is quite technical. also look at my synthese 1999 paper, accessible but not the complete story.See also my paper also the bang and robins 2005 paper about DR for msms, also the 207 kang and schaeffer article re dr and especially my discussion. • There is no correct answer re extreme weights. Often reflects the parameter is barely identifiable. Best answer may be to change the question. or do some culling of covariates associated with treatment but not the outcome based on some type of pretest although CI post pretest no longer valid. • My discussion in epi paper from 2006 -kurth walker etc may be useful • If you are interested maybe an article to epi re these issues may be useful since obviously much confusion ; sander since we get each other maybe we could draft one after you read my responses if you wish.

From SG to JR, MM, MH, and JK • Looking forward to your explication...just to inspire details: • There have to be better answers than others - and weight trimming I've seen implemented in the MSM lit so far looks desperate, at least when compared to using a modeling procedure that limits global influence (in particular by curtailing downward leveraging of small probability estimates) and thus is more capable of limiting inessential weight instability. Is there some kind of theory and algorithm for trimming that achieves the same goal? • Unless you set CIs via a method (simulation or CV) that accounts for that. The problems here may impair naive bootstrapping, but not all (smoother) methods. If the culling is based on the covariate-treatment distribution only, its unclear to me how that affects the treatment-outcome CI (as opposed to the clear variance underestimation from covariate-outcome-based selection). Anyway, I thought if we were really serious, we'd want to transform the covariate vector into an estimate of an adjustment-sufficient reduction with minimal relation to treatment.

Only saw allusions there to problems beyond modification (and thus impacts of different weightings/target distributions). The issue of modification has to be addressed at some point, but it seems might best be kept separate at first from the present confounding-control issue, e.g., by starting with estimation of the marginal RR in a situation where the conditional RRs are constant and thus equal the target parameter (but we don't know that). • That's an understatement if articles can't get one degree of separation away from the originator before misstatements arise! • Definitely, a paper tying this all up would be good, so I await your clarifications.

Inverse-Probability-of-Exposure–and-Censoring Weighting • Under the 3 identifiability assumptions of consistency, no unmeasured confounders for exposure and censoring histories, and positivity, one can use inverse-probability-of-exposure–and-censoring weighting to consistently estimate the parameters of MSMs • The V-stabilized inverse-probability-of-exposure-and-censoring weights at visit t are • The denominator of SWi(t) is informally the conditional probability that the ith subject remains uncensored thorough visit t+1 and receives his/her observed exposure history, hence the name "inverse-probability-of-exposure-and-censoring weight"

The inclusion of the numerator, which does not depend on time-dependent covariates, results in a more efficient estimation • Then if we fit the following linear regression model to the observed data by weighted least squares with weights SWi(t), the weighted least square estimates of γ1, γ2, and γ3 will be consistent for the causal parameters of interest β1, β2, and β3 of our MSM

Why does the weighting approach work? • Weighting with V-stabilized weights creates a pseudo-population in which (i) and does not predict exposure at t and censoring during (visit t, t+1] given past exposure history and baseline covariates V (ii) is identical to that in the actual study population i.e., if our MSM holds in the original population, the same will be true of the pseudo-population • Hence we would like to do ordinary regression in the pseudo-population. But this is exactly what our IPW estimator does, because the weights SW(t) create, as required, SW(t) copies of each subject.

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3 Causal diagram for the OAI study under the time ordering {L(t), Y(t)}, A(t), Y(t+1)

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3 Causal diagram for the pseudo-population

Introducing MSMs for epidemiologists

First application, EPIDEMIOLOGY, 2000

Second application, JASA, 2001

Third application, Stat. Med., 2002

Applications of MSMs in EPIDEMIOLOGY and AJE, 2000-2009 • Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–70. • Cook NR, Cole SR, Hennekens CH. Use of a marginal structural model to determine the effect of aspirin on cardiovascular mortality in the Physicians’ Health Study. Am J Epidemiol. 2002;155:1045–53. • Cole SR, Hernán MA, Robins JM, Anastos K, Chmiel J, Detels R, et al. Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models. Am J Epidemiol. 2003;158:687–94. • Bodnar LM, Davidian M, Siega-Riz AM, Tsiatis AA. Marginal structural models for analyzing causal effects of time-dependent treatments: an application in perinatal epidemiology. Am J Epidemiol. 2004;159:926–34.

Tager IB, Haight T, Sternfeld B, Yu Z, van Der Laan M. Effects of physical activity and body composition on functional limitation in the elderly: application of the marginal structural model. Epidemiology. 2004;15:479–93. • Cole SR, Hernán MA, Margolick JB, Cohen MH, Robins JM. Marginal structural models for estimating the effect of highly active antiretroviral therapy initiation on CD4 cell count. Am J Epidemiol. 2005;162:471–8. • Mortimer KM, Neugebauer R, van der Laan M, Tager IB. An application of model-fitting procedures for marginal structural models. Am J Epidemiol. 2005;162:382–8. • Haight T, Tager I, Sternfeld B, Satariano W, van Der Laan M.Effects of body composition and leisure-time physical activity on transitions in physical functioning in the elderly. Am J Epidemiol. 2005;162:607-17.

Cole SR, Hernán MA, Anastos K, Jamieson BD, Robins JM. Determining the effect of highly active antiretroviral therapy on changes in human immunodeficiency virus type 1 RNA viral load using a marginal structural left-censored mean model. Am J Epidemiol. 2007;166:219–27. • Petersen ML, Deeks SG, Martin JN, van Der Laan MJ. History-adjusted marginal structural models for estimating time-varying effect modification. Am J Epidemiol. 2007;166:985-93. • Brotman RM, Klebanoff MA, Nansel TR, Andrews WW, Schwebke JR, Zhang J, Yu KF, Zenilman JM, Scharfstein DO. A longitudinal study of vaginal douching and bacterial vaginosis--a marginal structural modeling analysis. Am J Epidemiol. 2008;168:188-96. • Lopez-Gatell, Cole SR, Hessol NA, French AL, Greenblatt RM, Landesman S, et al. Effect of tuberculosis on the survival of women infected with human immunodeficiency virus. Am J Epidemiol. 2007;165:1134-42.

Bembom O, van Der Laan M, Haight T, Tager I. Leisure-time physical activity and all-cause mortality in an elderly cohort. Epidemiology. 2009;20:424-30.

Supervisors: Prof. K. Mohammad, Dr. M. H. Forouzanfar Advisor: Prof. M. Mahmoodi