- 102 Views
- Uploaded on
- Presentation posted in: General

Supervisors: Prof. K. Mohammad, Dr. M. H. Forouzanfar Advisor: Prof. M. Mahmoodi

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Effect of physical activity on functional performance and knee pain in patients with knee osteoarthritis using Marginal Structural Models

Supervisors: Prof. K. Mohammad, Dr. M. H. Forouzanfar

Advisor: Prof. M. Mahmoodi

Author: Dr. M. A. Mansournia

January, 2012

- I am very thankful to
- Dr. Goodarz Danaei of Harvard for many useful discussions and detailed comments on several drafts of my thesis paper
- Prof. Miguel Hernán of Harvard for his expert advice on data analysis and for his thoughtful comments on a draft of my thesis paper
- Dr. Jay Kaufman, the editor of EPIDEMIOLOGY, and two anonymous reviewers for their insightful comments and constructive suggestions on a draft of my thesis paper

Miguel Hernán Goodarz Danaei

Professor of Epidemiology Assistant Professor of Global Health & Population

Harvard School of Public Health Harvard School of Public Health

- Knee osteoarthritis (OA) is a leading cause of pain and disability in the elderly
- Several systematic reviews of randomized trials have demonstrated that exercise reduces pain and disability in patients with knee OA
- Prospective observational studies are needed to evaluate the long-term effects of lifestyle physical activity in the knee OA population

- We define an exposure to be fixed if every subject’s baseline exposure level determines the subject’s exposure level at all later times
- Exposures can be fixed because
- they only occur at the start of follow-up (e.g., a bomb explosion, a one-dose vaccine, a traffic accident)
- they do not change over time (e.g., genotype )
- they evolve over time in a deterministic way (e.g., time since baseline exposure)

- Any exposure that is not fixed is said to be time-dependent (e.g., medical treatment, diet, cigarette smoking, occupational exposure)

- A time-dependent covariate to be a time-dependent confounder if a post-baseline value of the covariate is both an independent predictor of (i.e., a risk factor for) both subsequent exposure and the outcome within strata jointly determined by baseline covariates and prior exposure
- Time-dependent confounding occurs only with time-dependent exposures
- Nearly all exposures of epidemiologic interest are time-dependent

- Dunlop et al evaluated the effect of physical activity on subsequent 1-year functional performance in adults with knee OA using two-year follow-up data from the Osteoarthritis Initiative (OAI) study
- Using GEE to estimate a linear model, they adjusted for the potential confounders available only at baseline as well as the concurrent values of time-dependent confounders and reported a dose-response relationship between physical activity and better performance

- Standard methods for analysis of longitudinal data may lead to biased estimates, whether or not one adjusts for time-dependent confounders in the analysis, when exposure affects a time-dependent confounder or when exposure is affected by prior outcome and affects future outcome
- Both of these conditions are possible in the OAI data, e.g., physical activity may affect body mass index as a potential confounder and may also affect as well as be affected by functional performance

- A(t) and Y(t) denote physical activity and outcome (i.e., functional performance or knee pain) at visit t (i.e., 0, 1, 2 and 3 years)
- L(t) corresponds to a vector of measured time-dependent potential confounders at visit t
- C(t) denotes censoring during the period (visit t-1, visit t]
- We use overbars to indicate the history of time-dependent variables through visit t, e.g.,

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3

Causal diagram for the OAI study under the time ordering {L(t), Y(t)}, A(t), Y(t+1)

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3

Bias of standard methods

- Neyman (1923)
- Effects of fixed exposures in randomized experiments

- Rubin (1974)
- Effects of fixed exposures in randomized and observational studies

- Robins (1986)
- Effects of time-dependent exposures in randomized and observational studies

- The “g” in g-methods stands for “generalized”. Unlike conventional stratification-based methods, g-methods can generally be used to estimate the effects of time-dependent treatments
- g-methods include:
- g-computation algorithm formula (Robins, 1986)
- g-estimation of structural nested models (Robins, 1989)
- inverse probability of weighting of marginal structural models (Robins, 1998)

- Models for the marginal distribution of counterfactual outcomes, possibly conditional on baseline covariates, e.g.,
- is the potential or counterfactual random variable representing a subject's outcome at time t+1 had, possibly contrary to the fact, the subject received the exposure history rather than his/her observed exposure history
- β5 is a column vector of parameters
- a1(t), a2(t), and a3(t) are indicators for moderate, high, and very high levels of physical activity at visit t
- V is a subset of the baseline covariates (In our analysis, V includes L(0) and Y(0))

A recent discussion about MSMs with

Sander Greenland (SG),

Miguel Hernán (MH),

Jay Kaufman (JK), and

James Robins (JR)

- I have a basic question about the term "marginal structural models (MSMs)". Why are MSMs called marginal? Because they model the marginal distribution of counterfactual variables, not the joint distribution of them (i.e., causal types) or because they do not include any covariates in the model (leading to marginalinterpretations of the model coefficients)?

- My impression is that it is the first reason you give, but just to verifythe actually intent I forward it to the primary source authors. As you might note intheir articles, an MSM for potential outcomes can include many covariatesand thus not be very marginal in the ordinary regression sense; the covariatelevels might even happen to identify each individual uniquely

- MSMs model the marginal distribution of counterfactual outcomes. When MSMs are conditional on baseline variables, we still refer to them as marginal. See Section 12.4 and Fine Point 12.3 of our book and let us know if this issue is unclear

- I read the relevant sections in your book. To avoid confusion, it is a good idea to emphasize that two types of marginality implicit in MSMs are two distinct issues as follows:
1) MSMs model the marginal distribution of counterfactual outcomes. Thus the model stated for E (Ya|V) in Fine Point 12.3 is marginal with respect to other potential outcomes Ya* for a*≠a, irrespective of the covariates included in V. The parameters of the joint density of counterfactual outcomes (i.e., subject-sepecific or causal types) can not generally be identified, even from randomized trials

2) MSMs do not need to include covariates to adjust for confounding, because they are, by definition, models for causal effects, not the observed associations. This in turn, leads to marginal causal effects with respect to the covariates not included in the model

- It seems to me #2 as stated below needs a subtle caution and more generally some clarification of this issue is needed:1) The weight stabilization outlined in the pair of Epidemiology 2000 articles includes the "modifiers" V in the numerator as well as denominator probability. Just to see my concern clearly, you could suppose that V is independent of all other covariates but not independent of A given those covariates. Then it seems in fitting the marginal-structural outcome model (MSM) with V-stabilized weights, confounding by V is no longer controlled by the weighting, as the weights no longer capture covariation of V and A; instead, adjustment for confounding by V is accomplished by the conditioning on V in the MSM, just as in conventional regression adjustment. That fact might lead one to desire more detail in specifying V than if the only concern were modification. 2) I'll also repeat that the meaning of "marginal" in MSM is brought home by an example in which V is unique for each individual (e.g., age in continuous time) and thus there is no marginalization in the population-aggregation sense. In terms commonly seen elsewhere it would be a fully conditional or subject-specific model - the marginalization is strictly with respect to the nonidentified joint potential-outcomes distribution.Let me know if I have missed something here

- I have just noticed that our discussion is the topic of a recent commentary in EPIDEMIOLOGY (please see attached). Interestingly the author, Dr. Jay Kaufman, stated that "…..I note that effect measures adjusted simply via inverse probability of treatment weights have a marginal interpretation with respect to the covariates, and in this sense the word “marginal” has nothing to do with the first word in “marginal structural models."

- I have just noticed that our discussion is the topic of a recent commentary in EPIDEMIOLOGY (please see attached). Interestingly the author, Dr. Jay Kaufman, stated that "…..I note that effect measures adjusted simply via inverse probability of treatment weights have a marginal interpretation with respect to the covariates, and in this sense the word “marginal” has nothing to do with the first word in “marginal structural models."

- Thanks to Mohammad for pointing out Kaufman's 2010 "Marginalia" commentary. I'd read this when it appeared but the present discussion has made me notice a problem with it - I think its description is not quite accurate so I am copying this e-mail to him as well.
- It states at the start: "stabilization of the weights in a marginal structural model can change this interpretation from marginal to conditional—a potentially important consequence that appears to have not yet been widely discussed in published work."
- I believe it is not the stabilization with V (=Z in Kaufman) in the numerator of the weight that changes the interpretation. It is inclusion of V in the outcome MSM. Without V in the numerator, V is adjusted both by the weighting and the outcome model, so for V the entire procedure is in this sense doubly robust.

- Putting V in the weight numerator doesn't change the interpretation of the treatment MSM coefficient, which is already conditional on V if V is in the outcome MSM. But, as Kaufman notes, it removes adjustment for V through the weighting so that adjustment now depends solely on the outcome model. Any precision improvement from including V in the weight numerator model is thus bought at the cost of increased bias risk: Bias due to mismodeling of V in the outcome MSM is no longer removed by the weighting. This seems just another example of the usual bias-variance trade-off.
- The general point needing emphasis is that weighting is not conditioning. I think of it as reconfiguration of the sampling distribution as part of a strategy to remove biases while minimizing the variance inflation that may ensue from that bias removal. In the methods we are discussing, we remove unwanted dependencies (via the denominator) and then restore certain marginals (via the numerator) to recapture some of the precision thus lost, leading to more critical dependence on adjustment via the outcome model.
- That said, I think the overall caution raised in the Kaufman commentary about MSM interpretation is warranted.
- Again, I will appreciate verification or correction of my observations.

- I am forced to concede this point to Sander. I did write this a bit incompletely, and I wish that I had stated this point more clearly. What happens IN PRACTICE is that when authors stabilize for a baseline variable V, they then include that variable into the outcome model, and therefore the adjustment for this variable changes from marginal to conditional. Because this is the common practice, I took it for granted that putting V in the numerator implied also putting it into the outcome model, and I should have made this point more explicitly. I agree with Sander that simply stabilizing by V by putting it into the numerator of the weights without also adding it to the outcome model will not change the interpretation to a conditional one, but it will also fail to adjust for any potential confounding by V. My colleague Charlie Poole has always told me that real peer-review begins only AFTER the article is published. This e-mail exchange would be an example of his dictum.

- I am in Chile at the moment, and in a remarkable coincidence, I was discussing this very point this morning with a very sharp young Chilean statistician named José Zubizarreta. He is currently a doctoral student with Paul Rosenbaum at Penn, but lives in NYC, and has just discovered that he commutes on the same train with Marshall Joffe (who also lives in NYC). So they have taken to having long conversations on the train, and hit upon this very topic as a problem that needs some further development. José said that they had an idea for a new way to estimate the weights that would make the bias-variance tradeoff without having to rely on stabilization or trimming the weights. I don’t know much more detail than that at present, but mention this to indicate that the current practice around stabilization strikes some people as suffering from an arbitrariness that requires further development.
- Far from a correction, I thank Sander for his careful attention to this text, and I appreciate very much his clarification, which I endorse.
- I would only add that if we stick to difference or ratio contrasts of risks, this marginal vs conditional distinction is largely obviated, since marginal and conditional RD and RR yield identical numerical values.

- This is a slight misreading of what I meant. I assumed V is already in the outcome model and thus controlled through that, otherwise it would not be V (a modifier in the original 2000 formulation). Thus with V we are already in the conditional world. The only question then is whether V is not in the weight numerator (hence we have some double robustness with removable inefficiency) or in the weight numerator (hence only single robustness with improved efficiency).
- Interesting. However, I think there has already been a few lines of work on this problem:1) All the machine-learning stuff about weight estimation, starting with Ridgeway and McCaffrey at RAND 2004 (see their summary in Stat Sci 2007) and going on up through Lee et al. last year (attached), all of which pretty much obviates ad hoc weight trimming.

2) My impression is that van der Laan and his group claim to have solved the entire bias-vs-variance issue in all causal-inference problems via "targeted maximum likelihood" (TML) with a "superlearner" (again a machine-learning based approach). They certainly have written a lot on the topic. I can claim no expertise at all on the issue, but note that at least they seem to frame the problem more clearly than most have so far: Identify your structural model and the parameter of interest within it; state your inference-optimality criteria; then construct an optimal procedure given your criteria, structural model, and any further constraints (model) on the data-generating process. In other words, apply classical frequentist decision theory to these problems. I am sure Jamie could improve dramatically on my impression and cite his own solutions.

- If both the conditional and marginal are derived using the same model and that model imposes conditional homogeneity. But off the null at least one of RD and RR will be heterogeneous. Under a model allowing heterogeneity, you could say the properly weighted averages of the conditional effects will equal the marginal effects. But for the RR the weights for that average are not the IPTW or standardization weights.

- "Without V in the numerator, V is adjusted both by the weighting and the outcome model, so for V the entire procedure is in this sense doubly robust."
- The attached paper uses such weights in the context of a point-treatment study, see equations (1) and (2) at page 273 as well as the weight equation at page 274. Note that here (i) X denotes a vector of measured pretreatment covariates, (ii) V is part of X and denotes a vector of useful covariates in predicting the probability of outcome, and (iii) those variables in X but not in V are the nuisance variables.

- Thanks for reminding us of this Joffe et al. 2004 article. However I'm unclear on how what you point out about it illuminates the quote from me or what we were discussing.They mention stabilization only briefly in the discussion, and while that is fine they explore no details that I can see.Perhaps you can explain more?
- I don't think it true that model-based standardization (MBS) is more difficult to implement than IPTW MSM fitting; their claim otherwise (p. 276) is just an artifact of comparing consistent variance formulas for MBS to the conservative approximations used in IPTW. Especially, if one uses a simulation method (e.g., bootstrap) to get standard errors they are both of the same coding complexity; in fact MBS will tend to be more stable computationally (since it uses no estimator inverses). But IPTW has received an order of magnitude more attention so that procs are available for it.

- Perhaps more controversially, and analogously, I think the oft-asserted sparse-data superiority of IPTW over MBS is an artifact reflecting that the sparsity problem has gotten more attention in the IPTW world. But ultimately the two worlds aren't in competition since they can be combined along the lines seen in doubly robust estimation (by using sparse-data machine learners for all regression fits). In particular, I wish the Joffe et al. paper had discussed unconditional logistic regression with nonparametric nuisance-parameter estimation as an alternative or supplement to conditional logistic regression and MSMs, and how it could be combined with IPTW to fit MSMs.
- Finally, as a very minor point in honor of Phil Dawid, I think strong ignorability (eq. 3 p. 273) is stronger than needed for MSM estimation as we have been discussing; since we are only estimating marginal potential outcome distributions, all that is needed is the slightly weaker marginal independence condition of weak ignorability:Pr(A=a|X=x,Yj=yj)= Pr(A=a|X=x)for all a, j whereYjis the potential outcome at A=j (component j of the vector Y)

- It seems to me that V is adjusted both by the weighting and the outcome MSM in the paper. Here the inclusion of V in the MSM changes the interpretation of βa from marginal to conditional, even though the weights are not V-stabilized.

- I will try to write something today
- Although your back and forth has gotten some points correct there remain fundamental misconceptions re msms in your dialogue which i will try to clear up. Accurate statements are available in my 1999 paper tools Msms versus snms as tools for causal inferrence but is quite technical. also look at my synthese 1999 paper, accessible but not the complete story.See also my paper also the bang and robins 2005 paper about DR for msms, also the 207 kang and schaeffer article re dr and especially my discussion.
- There is no correct answer re extreme weights. Often reflects the parameter is barely identifiable. Best answer may be to change the question. or do some culling of covariates associated with treatment but not the outcome based on some type of pretest although CI post pretest no longer valid.
- My discussion in epi paper from 2006 -kurth walker etc may be useful
- If you are interested maybe an article to epi re these issues may be useful since obviously much confusion ; sander since we get each other maybe we could draft one after you read my responses if you wish.

- Looking forward to your explication...just to inspire details:
- There have to be better answers than others - and weight trimming I've seen implemented in the MSM lit so far looks desperate, at least when compared to using a modeling procedure that limits global influence (in particular by curtailing downward leveraging of small probability estimates) and thus is more capable of limiting inessential weight instability. Is there some kind of theory and algorithm for trimming that achieves the same goal?
- Unless you set CIs via a method (simulation or CV) that accounts for that. The problems here may impair naive bootstrapping, but not all (smoother) methods. If the culling is based on the covariate-treatment distribution only, its unclear to me how that affects the treatment-outcome CI (as opposed to the clear variance underestimation from covariate-outcome-based selection). Anyway, I thought if we were really serious, we'd want to transform the covariate vector into an estimate of an adjustment-sufficient reduction with minimal relation to treatment.

- Only saw allusions there to problems beyond modification (and thus impacts of different weightings/target distributions). The issue of modification has to be addressed at some point, but it seems might best be kept separate at first from the present confounding-control issue, e.g., by starting with estimation of the marginal RR in a situation where the conditional RRs are constant and thus equal the target parameter (but we don't know that).
- That's an understatement if articles can't get one degree of separation away from the originator before misstatements arise!
- Definitely, a paper tying this all up would be good, so I await your clarifications.

- Under the 3 identifiability assumptions of consistency, no unmeasured confounders for exposure and censoring histories, and positivity, one can use inverse-probability-of-exposure–and-censoring weighting to consistently estimate the parameters of MSMs
- The V-stabilized inverse-probability-of-exposure-and-censoring weights at visit t are
- The denominator of SWi(t) is informally the conditional probability that the ith subject remains uncensored thorough visit t+1 and receives his/her observed exposure history, hence the name "inverse-probability-of-exposure-and-censoring weight"

- The inclusion of the numerator, which does not depend on time-dependent covariates, results in a more efficient estimation
- Then if we fit the following linear regression model
to the observed data by weighted least squares with weights SWi(t), the weighted least square estimates of γ1, γ2, and γ3 will be consistent for the causal parameters of interest β1, β2, and β3 of our MSM

- Weighting with V-stabilized weights creates a pseudo-population in which
(i) and does not predict exposure at t and censoring

during (visit t, t+1] given past exposure history and baseline covariates V

(ii) is identical to that in the actual study population i.e., if our MSM holds in the original population, the same will be true of the pseudo-population

- Hence we would like to do ordinary regression in the pseudo-population. But this is exactly what our IPW estimator does, because the weights SW(t) create, as required, SW(t) copies of each subject.

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3

Causal diagram for the OAI study under the time ordering {L(t), Y(t)}, A(t), Y(t+1)

U0 U1 U2 U3L0 L1 L2 L3C1C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3

Causal diagram for the pseudo-population

- Hernán MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology. 2000;11:561–70.
- Cook NR, Cole SR, Hennekens CH. Use of a marginal structural model to determine the effect of aspirin on cardiovascular mortality in the Physicians’ Health Study. Am J Epidemiol. 2002;155:1045–53.
- Cole SR, Hernán MA, Robins JM, Anastos K, Chmiel J, Detels R, et al. Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models. Am J Epidemiol. 2003;158:687–94.
- Bodnar LM, Davidian M, Siega-Riz AM, Tsiatis AA. Marginal structural models for analyzing causal effects of time-dependent treatments: an application in perinatal epidemiology. Am J Epidemiol. 2004;159:926–34.

- Tager IB, Haight T, Sternfeld B, Yu Z, van Der Laan M. Effects of physical activity and body composition on functional limitation in the elderly: application of the marginal structural model. Epidemiology. 2004;15:479–93.
- Cole SR, Hernán MA, Margolick JB, Cohen MH, Robins JM. Marginal structural models for estimating the effect of highly active antiretroviral therapy initiation on CD4 cell count. Am J Epidemiol. 2005;162:471–8.
- Mortimer KM, Neugebauer R, van der Laan M, Tager IB. An application of model-fitting procedures for marginal structural models. Am J Epidemiol. 2005;162:382–8.
- Haight T, Tager I, Sternfeld B, Satariano W, van Der Laan M.Effects of body composition and leisure-time physical activity on transitions in physical functioning in the elderly. Am J Epidemiol. 2005;162:607-17.

- Cole SR, Hernán MA, Anastos K, Jamieson BD, Robins JM. Determining the effect of highly active antiretroviral therapy on changes in human immunodeficiency virus type 1 RNA viral load using a marginal structural left-censored mean model. Am J Epidemiol. 2007;166:219–27.
- Petersen ML, Deeks SG, Martin JN, van Der Laan MJ. History-adjusted marginal structural models for estimating time-varying effect modification. Am J Epidemiol. 2007;166:985-93.
- Brotman RM, Klebanoff MA, Nansel TR, Andrews WW, Schwebke JR, Zhang J, Yu KF, Zenilman JM, Scharfstein DO. A longitudinal study of vaginal douching and bacterial vaginosis--a marginal structural modeling analysis. Am J Epidemiol. 2008;168:188-96.
- Lopez-Gatell, Cole SR, Hessol NA, French AL, Greenblatt RM, Landesman S, et al. Effect of tuberculosis on the survival of women infected with human immunodeficiency virus. Am J Epidemiol. 2007;165:1134-42.

- Bembom O, van Der Laan M, Haight T, Tager I. Leisure-time physical activity and all-cause mortality in an elderly cohort. Epidemiology. 2009;20:424-30.

- The aim of this study is to estimate the causal effect of physical activity on functional performance and knee pain in the OAI cohort using inverse probability weighted (IPW) estimators of marginal structural models
- In my thesis paper, I discussed the theoretical basis and highlighted the key assumptions of this methodology in the context of the OAI study

- OAI is an ongoing, multi-center, longitudinal study of knee OA in men and women aged 45 to 79 years, either with, or at risk of developing, knee OA
- Subjects were recruited between 2004 and 2006 from four clinical sites: Baltimore, MD; Columbus, OH; Pittsburgh, PA; and Pawtucket, RI
- Data from baseline and 3 annual follow-up visits (enrollees dataset 12 and clinical datasets 0.2.2, 1.2.1, 3.2.1, and 5.2.1) were obtained from the OAI database which is publicly available at http://www.oai.ucsf.edu/

- A total of 2545 subjects with radiographic knee OA (i.e., a Kellgren/Lawrence grade of 2 or 3 in one or both knees) who had data on all baseline covariates were selected for analysis
- Patients with rheumatoid arthritis or inflammatory arthritis, severe joint space narrowing in both knees or unilateral total knee replacement and severe joint space narrowing in the other knee or bilateral total knee replacement or plans to have bilateral knee replacement in the next 3 years, co-morbid conditions that might interfere with the ability to participate in a 4-year study, a positive pregnancy test, inability to provide a blood sample for any reason, plans to relocate in the next 3 years, current participation in a randomized double-blind trial, and men who weigh over 285 pounds and women over 250 pounds were excluded from the study

- Physical activity level was measured at each visit, using the Physical Activity Scale for the Elderly (PASE), a 21-item questionnaire designed to assess leisure-time, household, and occupational activities during the past 7 days
- The PASE score was computed by multiplying the activity frequency by the activity intensity weights and then summing over all activities
- Quartiles of the PASE score at baseline (i.e., Q1=(0 to 94), Q2=(95 to 146), Q3=(147 to 206) and Q4=(207 to 465)) were used to define low, moderate, high, and very high levels of physical activity at all visits

- At each visit, functional performance was assessed using the timed 20-meter walk test
- We used average speed in m/min over two 20-meter tests as the main outcome
- Self-reported knee pain was measured separately for each knee using the 5-item pain subscale of the Western Ontario and McMaster Universities OA Index (WOMAC; Likert version 3.1)
- We defined the score for each subject as the higher of the scores for the two knees

- Subjects were (right) censored at their first missing outcome measurement
- From the 2545 subjects included in our analysis, 2260, 2027, and 1874 subjects completed the timed 20-meter walk test by the first, second, and third follow-up visit, respectively
- The same numbers for the pain subscale of WOMAC were 2386, 2243, and 2156 subjects

- Age (restricted cubic splines with four knots located at the 5th, 35th, 65th, and 95th percentiles)
- Sex
- Race (White, African American, other)
- Education (less than high school, high school graduate, college)
- Marital status (married, other)
- Current smoking
- Current alcohol use (0, <1, ≥1 drinks/day)
- Knee OA severity defined as joint space narrowing (none, narrowed and severely narrowed using data from the worse knee)
- Comorbidity (defined as a Charlson index greater than 0)
- Prior knee injury, hip pain, ankle pain, and foot pain

- Knee symptoms (presence of pain, aching, or stiffness in or around either knee on most days in at least one month during the past year)
- Depressive symptoms during the past 7 days based on the Center for Epidemiologic Studies Depression Scale (CES-D) (≥16, <16)
- Body mass index (calculated as weight[kg]/height[meters2] and categorized into 3 groups: <25, 25-<30, ≥30 kg/m2)
- Functional performance or knee pain (depending on the outcome of interest)
- In the case of missing data on time-dependent covariates (which occurred for < 0.5% for the functional performance and 5% for the knee pain analysis), the last observed values were carried forward

- As we let V to include L(0) and Y(0), SWi(0) would be 1 for all subjects
- The true weights at visits 1 and 2 are unknown, and need to be estimated

- We estimated the exposure probabilities in the denominator of SWi(t) using the following pooled polytomous logistic regression model (for k = 1, 2 and j = 1, 2, 3):
- To estimate the censoring probabilities in the denominator of SWi(t) we used the following pooled binary logistic regression model (for k = 1, 2):

- The exposure and censoring probabilities in the numerator of SWi(t) were estimated by fitting the above polytomous and binary logistic models but without the covariates L(k) and Y(k)
- Misspecification of the denominator models results in biased effect estimates. In contrast, replacing the numerator of SWi(t) with an estimate based on misspecified models does not result in bias

- The temporal ordering assumption {L(t), Y(t)}, A(t), Y(t+1) for t = 0, 1, 2 may be violated in the OAI study, because the covariates {L(t), Y(t)} were measured after the exposure changed for the last time before visit t
- Physical activity at any time may change in response to the time-dependent confounders or the outcomes at that time, but the times of exposure change don't necessarily coincide with the times of visits
- We can use either {L(t), Y(t)} or {L(t-1), Y(t-1)} as the mismeasured versions of the confounders values when the exposure changed for the last time before visit t
- We also conducted a different analysis using IPW models with weights adjusting for {L(t-1), Y(t-1)}. All uncensored subjects at visit 1 were included in this analysis

U0 U1 U2 U3L0 L1 L2 L3C2 C3A0 A1 A2 A3Y0 Y1 Y2 Y3

Causal diagram for the OAI study under the time ordering {L(t-1), Y(t-1)}, A(t), Y(t+1)

- Because the repeated outcome measurements and the use of weights induce within-subject correlation, we used a Huber-White sandwich variance estimator with clustering by subject to obtain valid, but conservative, confidence intervals for our IPW estimator
- For comparison, we also used GEE to estimate two conventional linear regression models that included 1) only the baseline confounders 2) the baseline and time-dependent confounders as covariates
- Also we investigated the cumulative effect of physical activity on the outcomes by specifying alternative MSMs that included a cumulative measure of physical activity in the models
- All of the statistical analyses were performed using STATA version 10

- Compared to patients with low physical activity levels, patients with higher physical activity levels were younger; more likely to be men; had higher education; were more likely to drink alcohol; and included more white and married individuals
- They were also more likely to have better functional performance; more likely to have prior injuries; and less likely to have comorbidities
- Comparison of baseline characteristics of study participants who completed the 3-year follow-up for functional performance and those who did not indicated that the latter were more likely to be women; had less education; were more likely to currently smoke; were less likely to drink alcohol; and included fewer white and married individuals
- They were more likely to have worse functional performance; had higher body mass index; were more likely to have depression, knee pain, comorbidities and prior hip, ankle and foot pain; and had lower levels of physical activity

Mean differences (with 95% confidence intervals) in functional performance between levels of physical activity and low level using IPW and conventional models adjusting for concurrent time-dependent confounders (n=6161 person visits)

Mean differences (with 95% confidence intervals) in knee pain between levels of physical activity and low level using IPW and conventional models adjusting for concurrent time-dependent confounders (n=6785 person visits)

- In a post-hoc analysis, we included linear interaction terms between age and physical activity in the MSM for functional performance. The results indicated that the effect of physical activity on functional performance increased linearly with age (interaction p = 0.017)
- Therefore, we fit an MSM with interaction terms between the indicator of being 65 years of age or older and physical activity

Mean differences (with 95% confidence intervals) in functional performance between levels of physical activity and low level separately in the elderly (≥ 65 years old at baseline) and younger participants using IPW models adjusting for concurrent time-dependent confounders

Mean differences (with 95% confidence intervals) in functional performance between levels of physical activity and low level using IPW and conventional models adjusting for lagged time-dependent confounders (n=3901 person visits)

Mean differences (with 95% confidence intervals) in knee pain between levels of physical activity and low level using IPW and conventional models adjusting for lagged time-dependent confounders (n=4399 person visits)

Mean differences (with 95% confidence intervals) in functional performance between levels of physical activity and low level separately in the elderly (≥ 65 years old at baseline) and younger participants using IPW models adjusting for lagged time-dependent confounders

- A chunk test of the linear interaction terms between age and physical activity in the IPW model for functional performance was not statistically significant (interaction p = 0.19)

- Using marginal structural models, we estimated the causal effects of physical activity on functional performance and knee pain in a prospective cohort study
- We used this methodology because conventional methods may give biased effect estimates when (i) exposure affects a confounder or (ii) exposure both affects and is affected by the study outcome
- In the OAI study, like the other interval cohorts, confounders are not measured at the times of exposure change. However, either concurrent or lagged time-dependent confounders could be considered as the mismeasured versions of the true values of the confounders
- Therefore, for each of the two outcomes, we fitted two IPW models using two sets of weights; one adjusting for concurrent and one adjusting for lagged time-dependent confounders

- Both IPW models showed that physical activity does not affect knee pain in adults with knee OA
- While the IPW model with weights adjusting for concurrent confounders indicated that physical activity has no effect on functional performance, the IPW model with weights adjusting for lagged confounders demonstrated a weak effect without a clear dose-response pattern
- In general, these two models can yield different effect estimates because
(1) concurrent confounders are expected to have less measurement error

than lagged confounders because measurement intervals are relatively long

(i.e., 1-year)

(2) the IPW model with weights adjusting for concurrent confounders may

give biased estimates if either L(t) or Y(t) is affected by true A(t), e.g., a

participant may change his/her physical activity level six months before the

current visit and maintain that level of activity until the visit occurs

- A previous analysis used data from the first two years of follow-up in the OAI and applied a conventional linear regression model to adjust for the potential confounders available only at baseline as well as the concurrent time-dependent confounders
- The results of that analysis showed a significant increase in mean walking speed with higher levels of physical activity. The mean differences between the second, third, and fourth quartiles of the PASE score and the first quartile were 2.01, 2.93, and 4.02 m/min, respectively
- The authors mentioned that a sensitivity analysis that additionally adjusted for baseline functional performance confirmed a significant statistical trend but did not report the results
- It is difficult to directly compare these results and ours, because of different follow-up times and some minor differences in the covariates adjusted for in the regression model

- A possible explanation for the discrepancy between the results of Dunlop et al and ours is that adjustment for time-dependent confounding by prior functional performance is essential in this context because post-baseline values of functional performance are independent predictors of both subsequent physical activity and functional performance within strata jointly defined by baseline covariate and prior physical activity
- To determine whether past history of functional performance is an independent risk factor for current functional performance, we fit a linear regression for functional performance at visit t that included functional performance and physical activity at visit t-1 and all baseline covariates V. The coefficient of functional performance at visit t-1 was 0.54 with p = 1.34×10-149
- To determine whether past history of functional performance independently predicts the current physical activity, we fit a polytomous logistic regression for physical activity at visit t that included functional performance at visit t, physical activity at visit t-1 and all baseline covariates V. Testing the coefficients of functional performance at visit t in the model yielded χ2(3) = 41.07; p = 6.31×10-9

- Several systematic reviews of randomized trials have demonstrated that exercise therapy reduces knee pain and improves self-reported physical function in patients with knee OA
- Furthermore, beneficial effects of exercise on walking performance were shown in some randomized trials
- The discrepancy between the literature and our results reflects the fact that the exercise programs used in these randomized trials are much more structured and of higher intensity than the physical activity usually undertaken by adults

- The results from both IPW models indicated that physical activity improves functional performance slightly in a dose-response manner in the elderly
- There was some evidence, based on the IPW model with weights adjusting for concurrent time-dependent confounders, that the beneficial effect of physical activity on functional performance increases with age
- These findings are consistent with studies which have shown that exercise improves function in patients with knee OA partly via strengthening the quadriceps muscles and that the strength of the quadriceps declines with age

- There were no qualitative differences in the conclusions obtained from IPW and conventional models adjusting for time-dependent confounders
- It should be noted that marginal and conditional effects are not directly comparable due to following reasons:
i) biases caused by conditioning on a variable affected by

exposure

ii) non-collapsibility of certain effect measures

iii) misspecifying the conditional model due to ignoring

the effect modification(s) by the conditioning covariate(s)

- Our exploratory analyses revealed that none of the time-dependent confounders are strongly predicted by physical activity. Moreover, the effect measure of interest in our study is mean difference, which is a collapsible measure

- Measuring physical activity using questionnaires could result in substantial error especially when questionnaires are administered relatively infrequently as was the case in the OAI
- Using quartiles instead of actual scores in the analysis may reduce power, induce a biased impression of dose–response, and change non-differential to differential error
- Residual confounding bias can arise from measurement errors in self-reported confounders, e.g., knee pain, knee symptom, and depression score
- The direction and magnitude of the bias due to measurement errors in exposure and confounders can not be predicted without knowledge of error structures

- There may be a positive association between the errors in self-reported confounders and physical activity and this could increase residual confounding
- Infrequent measurements of the confounders could also lead to residual confounding
- Another limitation of our study is that we could not fit separate censoring models for different censoring mechanisms (e.g., knee surgery versus death), because the reasons for loss to follow-up were not publicly available

- Our results suggested that physical activity has either no effect or a very small effect on functional performance and does not affect knee pain in adults with knee OA
- Post-hoc analyses indicated that physical activity may slightly improve functional performance in the elderly with knee OA
- Overall, our results do not support a long-term beneficial effect of lifestyle physical activity on functional performance and knee pain in adults with knee OA
- Our conclusion does not preclude the effectiveness of short-term structured physical activity programs in patients with knee OA

- The methodological challenges for estimation of the causal effect of a time-dependent treatment on a repeated outcome from interval cohorts can be summarized as follows:
- Past outcome history should be considered as a strong potential time-dependent confounder and adjusted for in the analysis along with the other time-dependent confounders
- One must use causal methods to appropriately adjust for time-dependent confounders affected by exposure
- A limitation in interval cohorts is that the values of confounders at the times of treatment change are unknown, increasing the potential for residual confounding. Moreover, in many interval cohorts even the changes in the value of treatment can not be ascertained. The best strategy for analysis of such cohorts is to perform alternative analyses using both concurrent and lagged values of time-dependent confounders

- Existence of counterfactuals and consistency
- No interference
- No unmeasured confounders for exposure and censoring histories
- Positivity (weaker form)
- Correct model specification

- For each possible level of exposure history , we assume a subject's counterfactual outcome is well defined
- To link the observed data to ideal data, we make the consistency assumption: a subject’s counterfactual outcome at visit t under his/her observed exposure history is equal to his/her actual outcome at that visit, i.e.,
- No causation without manipulation
- Difference between choice and intervention

- The counterfactual outcomes for a participant do not depend on the exposure histories of other participants

- Also known as the assumption of conditional exchangeability or the assumption of sequential randomization
- For all and t ≥ k, exposure at visit k and censoring during the period (visit k, visit k+1] are independent of all future counterfactual outcomes , conditional on all measured prognostic factors history, i.e.,

- Based on subject-matter knowledge and univariate associations we included potential confounders in the exposure models but avoided including the predictors of exposure that were not associated with the outcome
- The same variables were used for censoring models
- The inclusion of the variables that are unrelated to the exposure but related to the outcome in the exposure models accounts for chance confounding in the data and thus decreases the variance of the estimates across the realizations of a study without increasing bias
- In contrast, the inclusion of the variables that are related to the exposure but not to the outcome in the exposure models increases the variance of the estimates without decreasing bias. It can also lead to finite-sample bias

- Also known as the experimental treatment assignment assumption
- As we used V-stabilized weights in our analyses, a V-specific weaker form of the positivity assumption is required
- It states that, for each combination of values of the baseline covariates V, there should be a nonzero probability of every level of exposure A(k) for every combination of values of exposure and covariate histories , , and that occur among individuals with that combination of values of V, i.e.,

- Structural violations of the positivity assumption (i.e., subjects with certain values of confounders theoretically cannot receive a given level of the exposure) is unlikely for our study
- Weight stabilization with all baseline covariates decreases the chance of random violations of the positivity assumption (i.e., no subjects with certain values of confounders happen to receive a given level of the exposure)
- Structural and random violations of the positivity assumption can result in bias and/or increase in variance for the IPW estimators

- A weaker form of the positivity assumption is also required for censoring
- It states that, for each combination of values of the baseline covariates V, there should be a nonzero probability of remaining uncensored during the period (visit k, visit k+1] for every combination of values of exposure and covariate histories , , and that occur among individuals with that combination of values of V, i.e.,

- We assume that our MSM and the models for exposure and censoring that were used in estimating the denominator of SWi(t) were correctly specified
- We informally explored the sensitivity of bias and precision of the effect estimate to different weight models specifications (e.g., linear vs. non-linear terms for continuous confounders, adding lagged confounders and so on) as described by Cole and Hernán
- More formal approaches evaluate models based on the prediction error and use cross-validation to avoid overfitting

Some cause happiness wherever they go; others whenever they go

Oscar Wilde