1 / 15

Maya Petersen, MD PhD works.bepress/maya_petersen

Methods for Analyzing Data from Global Cohort Collaborations: Causal F rameworks and Efficient Estimators . Maya Petersen, MD PhD works.bepress.com/maya_petersen Divisions of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley.

sofia
Download Presentation

Maya Petersen, MD PhD works.bepress/maya_petersen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methods for Analyzing Data from Global Cohort Collaborations: Causal Frameworks and Efficient Estimators Maya Petersen, MD PhD works.bepress.com/maya_petersen Divisions of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley

  2. Global Cohort Collaborations • Unique opportunity to learn how to optimize HIV care delivery in practice • Real-world settings • Big samples • High quality longitudinal data on lots of variables • We can be ambitious! • Analyses can and should directly target complex policy and clinical questions • Novel methods are needed in order to • Translate complex questions into statistical problems • Provide rigorous answers

  3. Causal Models/Counterfactuals • Tool for translating a research question into a statistical estimation problem • Target analysis at the question you care about • Lots of questions can’t be translated into a coefficient in a regression model • Tool for ensuring that assumptions are • Explicit • Interpretable to those able to evaluate plausibility

  4. Some examples of causal research questions that can be defined using counterfactuals ….

  5. Marginal Structural Models • Specify relationship between an exposure and the expectation of a counterfactual outcome • Ex: How would mortality differ under immediate versus delayed switch to second line therapy following immunologic failure? • Useful for questions about • Cumulative effects of longitudinal exposures/sequential decisions • Casual dose response curves

  6. Dynamic Regimes • Rules for assigning treatment in response to a subject’s observed past • Ex: How would availability of routine HIV RNA testing to guide switch (as compared to CD4s only) affect mortality? • Many key questions involve dynamic regimens • Good medicine and good policy require understanding how best to respond to new data • Helps ensure that our questions are realistic and supported by the data

  7. Direct and Indirect Effects • How much of an exposure’s effect is mediated by a specific pathway? • Ex. How does implementing a task sharing program affect patient outcomes, and how much of this effect is mediated through individual enrollment in the program? • Useful for investigating: • Why an intervention did (or didn’t) work • Unintended consequences/spill over effects

  8. Statistical methods to estimate these counterfactual quantities A growing tool box….

  9. The statistical challenge • Novel statistical methods are needed to provide the best possible answers to these questions • Standard parametric regression not sufficient • Data are complex: Many variables measured at potentially informative intervals over long periods • Need estimators that are • Robust: Avoid introducing bias • Efficient: Maximize precision

  10. Inverse probability weighting • Estimate how exposure depends on the observed past • Use this estimate to reweight the data • Limitations • You have to do a good job estimating your weights • Subject to bias and high variance with strong confounding

  11. Parametric longitudinal G-formula • Estimate everything else about the data-generating process • Use these estimates to set up simulations • Limitations • You have to model essentially the whole data generating process correctly • Inference can be tricky

  12. “Efficient double robust” methods • Minimize bias due to model misspecification • Maximize precision of effect estimates • Targeted Maximum Likelihood Estimation • New results for longitudinal effect estimation • Often reduces both bias and variance • Software coming soon….

  13. Data-adaptive estimation • Which variables to adjust for and how? • Misspecifiedparametric model -> Bias • If build model ad hoc, susceptible to “evaluation pressure” • Need formal tools that can handle these settings • A priori specified algorithms for learning from data • Ex. Super Learner • Library of data adaptive algorithms • Internal cross validation to choose how to combine them

  14. Current Methods Research • Hierarchical data • Interventions at the clinic and individual level • Interference/Spill over • An individual’s outcome is affected by the exposures of other individuals • Estimation and Inference with small sample sizes • For many implementation questions, the clinic rather than the individual may be the independent sampling unit

  15. Acknowledgements • Collaborations with • IeDEA-SA and IeDEA-EA • Mark van der Laan, UC Berkeley • Elvin Geng, UCSF • This talk based on huge body of research • Theoretical work by Robins, Pearl, van der Laan, Hernan, others… • Applied work by many

More Related