Targeted Maximum Likelihood Super Learning Application

Targeted Maximum Likelihood Super Learning Application to assess effects in RCT,Observational Studies, and Genomics Mark van der Laan works.bepress.com/mark_van_der_laan Division of Biostatistics, University of California, Berkeley Workshop Brad Efron, December 2009

Outline • Super Learning and Targeted Maximum Likelihood Learning • Causal effect in observational studies • Causal effect in RCTs • Variable importance analysis in Genomics • Multiple testing • Case Control data

Motivation • Avoid reliance on human art and parametric models • Adapt the model fit to the data • Target the fit to the parameter of interest • Statistical Inference TMLE/SL Targeted Maximum Likelihood coupled with Super Learner methodology

TMLE/SL Toolbox Targeted effects • Effect of static or dynamic multiple time point treatments (e.g. on survival time) • Direct and Indirect Effects • Variable importance analysis in genomics Types of data • Point treatment • Longitudinal/Repeated Measures • Censoring/Missingness/Time-dependent confounding. • Case-Control • Randomized clinical trials and observational data

Two-stage Methodology: SL/TMLE 1. Super Learning Works on a library of model fits Builds data-adaptive composite model by assigning weights Weights are optimized based on loss-function specific cross-validation to guarantee best overall fit • 2. Targeted Maximum Likelihood Estimation • Zooms in on one aspect of the • model fit—the target • Removes bias for the target.

Loss-Based Super Learning in Semiparametric Models • Allows one to combine many data adaptive (e.g.) MLEs into one improved MLE. • Grounded by oracle results for loss-function based cross-validation (vdL&D, 2003). Loss function needs to be bounded. • Performs asymptotically as well as best (oracle) weighted combination, or achieves parametric rate of convergence.

Super Learner Flow Chart in Prediction

Super Learner Prediction

Initial P-estimator of the probability distribution of the data: P ˆ ˆ ˆ P P* P TRUE ˆ ˆ Ψ(P) Ψ(P*) Targeted Maximum LikelihoodEstimation Flow Chart Inputs The model is a set of possible probability distributions of the data Model User Dataset Targeted P-estimator of the probability distribution of the data O(1), O(2), … O(n) Observations True probability distribution Target feature map: Ψ( ) Ψ(PTRUE) Initial feature estimator Targeted feature estimator Target feature values True value of the target feature Target Feature better estimates are closer to ψ(PTRUE)

Targeted Maximum Likelihood • MLE/SL aims to do good job of estimating whole density • Targeted MLE aims to do good job at parameter of interest • General decrease in bias for parameter of Interest • Fewer false positives • Honest p-values, inference, multiple testing

(Iterative) Targeted MLE ^ • Identify optimal strategy for “stretching” initial P • Small “stretch” -> maximum change in target • Given strategy, identify optimum amount of stretch by MLE • Apply optimal stretch to P using optimal stretching function -> 1st-step targeted maximum likelihood estimator • Repeat until the incremental “stretch” is zero • Some important cases: 1 step to convergence • Final probability distribution solves efficient influence curve equation  (Iterative) T-MLE is double robust & locally efficient ^

Example: Targeted MLE of Causal effect of point treatment on outcome Impact of Treatment on Disease

Likelihood of Point Treatment with Single Endpoint Outcome • Draw baseline characteristics • Draw treatment • Draw missing indicator • If not missing, draw outcome • Counterfactual outcome distributions defined by intervening on treatment and enforcing no missingness • Causal effects defined as user supplied function of these counterfactual distributions

TMLE for Average Causal Effect • Observe predictors W, treatment A, missingness indicator Delta, and outcome Y: • Target is additive causal effect: EY(1)-Y(0) • Regress Y on treatment A and W and Delta=1 (e.g. Super Learning), and add clever covariate where • Then average regression over W for fixed treatment a: EnYa • Evaluate average effect: EnY1-EnY0

TMLE is Collaborative Double Robust • Suppose the initial fit minus true outcome regression is only a function of W through S • Suppose the treatment mechanism adjusts correctly for a set of variables that includes S • Then, the Targeted MLE is consistent. • Thus the treatment mechanism only needs to adjust for covariates whose effect has not been captured by the initial fit yet. • Formally,

TMLE/SL: more accurate information from less data Simulated Safety Analysis of Epogen (Amgen)

Example: Targeted MLE in RCT Impact of Treatment on Disease

The Gain in Relative Efficiency in RCT is function of Gain in R^2 relative to unadjusted estimator • We observe (W,A,Y) on each unit • A is randomized, P(A=1)=0.5 • Suppose the target parameter is additive causal effect EY(1)-Y(0) • The relative efficiency of the unadjusted estimator and a targeted MLE equals 1 minus the R-square of the regression 0.5 Q(1,W)+0.5 Q(0,W), where Q(A,W) is the regression of Y on A,W obtained with targeted MLE.

TMLE in Actual Phase IV RCT • Study: RCT aims to evaluate safety based on mortality due to drug-to-drug interaction among patients with severe disease • Data obtained with random sampling from original real RCT FDA dataset • Goal: Estimate risk difference (RD) in survival at 28 days (0/1 outcome) between treated and placebo groups

TMLE in Phase IV RCT • TMLE adjusts for small amount of empirical confounding (imbalance in AGE covariate) • TMLE exploits the covariate information to gain in efficiency and thus power over unadjusted • TMLE Results significant at 0.05

TMLE in RCT: Summary • TMLE approach handles censoring and improves efficiency over standard approaches • Measure strong predictors of outcome • Implications • Unbiasedestimates with informative censoring • Improved power for clinical trials • Smaller sample sizes needed • Possible to employ earlier stopping rules • Less need for homogeneity in sample • More representative sampling • Expanded opportunities for subgroup analyses

Targeted MLEAnalysis of Genomic Data Biomarker discovery, Impact of mutations on disease, or response to treatment

The Need for Experimentation • Estimation of Variable Importance/Causal Effect requires assumption not needed for prediction • “Experimental Treatment Assignment” (ETA) • Must be some variation in treatment variable A within every stratum of confounders W • W must not perfectly predict/determine A • g(a|W)>0 for all (a,W)

Biomarker Discovery: HIV Resistance Mutations • Goal: Rank a set of genetic mutations based on their importance for determining an outcome • Mutations (A) in the HIV protease enzyme • Measured by sequencing • Outcome (Y) = change in viral load 12 weeks after starting new regimen containing saquinavir • How important is each mutation for viral resistance to this specific protease inhibitor drug? • Inform genotypic scoring systems

Stanford Drug Resistance Database • All Treatment Change Episodes (TCEs) in the Stanford Drug Resistance Database • Patients drawn from 16 clinics in Northern CA • 333 patients on saquinavir regimen Final Viral Load Baseline Viral Load 12 weeks <24 weeks TCE (Change >= 1 Drug) Change in Regimen Table 2: LPV

Parameter of Interest • Need to control for a range of other covariates W • Include: past treatment history, baseline clinical characteristics, non-protease mutations, other drugs in regimen • Parameter of Interest Variable Importance ψ = E[E(Y|Aj=1,W)-E(Y|Aj=0,W)] • For each protease mutation (indexed by j)

Parameter of Interest • Assuming no unmeasured confounders (W sufficient to control for confounding) • Causal Effect is same as W-adjusted Variable Importance E(Y1)-E(Y0)=E[E(Y|A=1,W)-E(Y|A=0,W)]= ψ • Same advantages to T-MLE

Targeted Maximum Likelihood Estimation of the Adjusted Effect of HIV Mutation on Resistance to Lopinavir Stanford mutation score, http://hivdb.stanford.edu, accessed September, 1997

Multiple Testing: Combining Targeted MLE with Type-I Error Control

Hypothesis Testing Ingredients • Data (X1,…,Xn) • Hypotheses • Test Statistics • Type I Error • Null Distribution • Marginal (p-values) or • Joint distribution of the test statistics • Rejection Region • Adjusted p-values

Type I Error Rates • FWER: Control the probability of at least one Type I error (Vn): P(Vn > 0) · • gFWER: Control the probability of at least k Type I errors (Vn): P(Vn > k) · • TPPFP: Control the proportion of Type I errors (Vn) to total rejections (Rn) at a user defined level q: P(Vn/Rn > q) · • FDR: Control the expectation of the proportion of Type I errors to total rejections: E(Vn/Rn) ·

Multivariate Normal Null Distribution • Suppose null hypotheses involve testing of target parameters H_0: psi(j)<=0 • We estimate target parameters with T-MLE, and use t-statistic for testing • T-MLE as vector is asymptotically linear with known influence curve IC • Valid joint null distribution for multiple testing is N(0,Sigma=E IC^2) • Null distr can be inputted in any MTP (Dudoit, vdL, 2009, Springer)

GENERAL JOINT NULL DISTRIBUTION Let Q0jbe a marginal null distribution so that for j in set S0 of true nulls Q0j-1Qnj(x)> x, for all x where Qnj is the j-th marginal distribution of the true distribution Qn(P) of the test statistic vector Tn.

JOINT NULL DISTRUTION We propose as null distribution the distribution Q0n of Tn*(j)=Q0j-1Qnj(Tn(j)), j=1,…,J This joint null distribution Q0n(P) does indeed satisfy the wished multivariate asymptotic domination condition in (Dudoit, van der Laan, Pollard, 2004).

BOOTSTRAP BASED JOINT NULL DISTRIBUTION We estimate this null distribution Q0n(P) with the bootstrap analogue: Tn#(j)=Q0j-1Qnj#(Tn#(j)) where # denotes the analogue based on bootstrap sample O1#,..,On#of an approximation Pn of the true distribution P.

Case-Control Weighted Targeted MLE • Case-control weighting in targeted MLE successfully maps an estimation method designed for prospective sampling into a method for case-control sampling. • This technique relies on knowledge of the true prevalence probability P(Y=1)=q0 to eliminate the bias of the case-control sampling design. • The procedure is double robust and locally efficient. It produces efficient estimators when its prospective sample counterpart is efficient.

Comparison to Existing Methodology Case-control weighted targeted MLE differs from other approaches as it can estimate any type of parameter, incorporates q0, and is double robust and locally efficient.

Case-Control Weighted Targeted MLE Simulation Results • We showed striking improvements in efficiency and bias in our case-control weighted method versus the IPTW estimator (Mansson 2007, Robins 1999), which does not utilize q0. • Our complete simulation results bolster our theoretical arguments that gains in efficiency and reductions in bias can be obtained by having known q0 and using a targeted estimator. Table results for a sample of 500 cases and 1000 controls taken from a population of 120,000 where q0 = 0.035

Closing Remarks • True knowledge is embodied by semi or non-parametric models • Semi-parametric models require fully automated state of the art machine learning (super learning) • Targeted bias removal is essential and is achieved by targeted MLE • Statistical inference is now sensible • The machine learning algorithms are (super) efficient for the target parameters.

Closing Remarks • (RC) Clinical Trials and Observational Studies can be analyzed with TMLE. • TMLE outperforms current standards in analysis of clinical trials and observational studies, including double robust methods • It is the only targeted method that is collaborative double robust, efficient, and naturally incorporates machine learning

UC Berkeley Oliver Bembom Susan Gruber Kelly Moore Maya Petersen Dan Rubin Cathy Tuglus Sherri Rose Michael Rosenblum Eric Polley P.I. Ira Tager (Epi). Stanford Univ. Robert Shafer Kaiser: Dr. Jeffrey Fessels.… FDA: Thamban Valappil, Greg Soon, Harvard: David Bangsberg, Victor DeGruttolas Acknowledgements

References • Oliver Bembom, Maya L. Petersen , Soo-Yon Rhee , W. Jeffrey Fessel , Sandra E. Sinisi, Robert W. Shafer, and Mark J. van der Laan, "Biomarker Discovery Using Targeted Maximum Likelihood Estimation: Application to the Treatment of Antiretroviral Resistant HIV Infection" (August 2007). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 221. http://www.bepress.com/ucbbiostat/paper221 • Mark J. van der Laan and Susan Gruber, "Collaborative Double Robust Targeted Penalized Maximum Likelihood Estimation" (April 2009). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 246, http://www.bepress.com/ucbbiostat/paper246 • Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard, "Super Learner" (July 2007). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 222. http://www.bepress.com/ucbbiostat/paper222 • Mark J. van der Laan and Daniel Rubin, "Targeted Maximum Likelihood Learning" (October 2006). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. http://www.bepress.com/ucbbiostat/paper213 • Oliver Bembom, Mark van der Laan (2008), A practical illustration of the importance of realistic individualized treatment rules in causal inference, Electronic Journal of Statistics. • Mark J. van der Laan, "Statistical Inference for Variable Importance" (August 2005). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 188. http://www.bepress.com/ucbbiostat/paper188

References • Oliver Bembom, Maya L. Petersen , Soo-Yon Rhee , W. Jeffrey Fessel , Sandra E. Sinisi, Robert W. Shafer, and Mark J. van der Laan, "Biomarker Discovery Using Targeted Maximum Likelihood Estimation: Application to the Treatment of Antiretroviral Resistant HIV Infection" (August 2007). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 221. http://www.bepress.com/ucbbiostat/paper221 • Mark J. van der Laan, Eric C. Polley, and Alan E. Hubbard, "Super Learner" (July 2007). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 222. http://www.bepress.com/ucbbiostat/paper222 • Mark J. van der Laan and Daniel Rubin, "Targeted Maximum Likelihood Learning" (October 2006). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 213. http://www.bepress.com/ucbbiostat/paper213 • Yue Wang, Maya L. Petersen, David Bangsberg, and Mark J. van der Laan, "Diagnosing Bias in the Inverse Probability of Treatment Weighted Estimator Resulting from Violation of Experimental Treatment Assignment" (September 2006). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 211. http://www.bepress.com/ucbbiostat/paper211 • Oliver Bembom, Mark van der Laan (2008), A practical illustration of the importance of realistic individualized treatment rules in causal inference, Electronic Journal of Statistics. • Mark J. van der Laan, "Statistical Inference for Variable Importance" (August 2005). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 188. http://www.bepress.com/ucbbiostat/paper188

Collaborative T-MLE: Building the Propensity Score Based on Outcome Data • Initial outcome regression based on super learning • Construct rich set of one dimensional dimension reductions of W, that will be used as main terms below • Select main terms in propensity score using forward selection based on emp. fit (e.g, loglik) of T-MLE • If no main term increases emp. fit of TMLE, then carry out T-MLE update to update initial outcome regression • Proceed to generate a sequence of T-MLE’s using increasingly nonparametric treatment mechanisms • Select the wished T-MLE with cross-validation

The Likelihood for Right Censored Survival Data • It starts with the marginal probability distribution of the baseline covariates. • Then follows the treatment mechanism. • Then it follows with a product over time points t • At each time point t, one writes down likelihood of censoring at time t, death at time t, and it stops at first event • Counterfactual survival distributions are obtained by intervening on treatment, and censoring. • This then defines the causal effects of interest as parameter of likelihood.

TMLE with Survival Outcome • Suppose one observesbaseline covariates, treatment, and one observes subject up till end of follow up or death: • One wishes to estimate causal effect of treatment A on survival T • Targeted MLE uses covariate information to adjust for confounding, informative drop out and to gain efficiency

TMLE with Survival Outcome • Target ψ1(t0)=Pr(T1>t0) and ψ0(t0)=Pr(T0>t0) – thereby target treatment effect, e.g., 1) Difference: Pr(T1>t0) - Pr(T0>t0), 2) Log RH: • Obtain initial conditional hazard fit (e.g. super learner for discrete survival) and add two time-dependent covariates • Iterate until convergence, then use updated conditional hazard from final step, and average corresponding conditional survival over W for fixed treatments 0 and 1

TMLE analogue to log rank test • The parameter, corresponds with Cox ph parameter, and thus log rank parameter • Targeted MLE targeting this parameter is double robust

TMLE in RCT with Survival OutcomeDifference at Fixed End Point Independent Censoring  TMLE: gain in power over KM Informative Censoring  TMLE: unbiased

Targeted Maximum Likelihood Super Learning Application

Targeted Maximum Likelihood Super Learning Application

Presentation Transcript

Gospels of Mark and Luke

Targeted Maximum Likelihood Learning of Scientific Causal Questions

Targeted MLE for Variable Importance and Causal Effect with Clinical Trial and Observational Data

The Gospel of Mark

Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Tests Despite Incorrect Regression Models

Mark

Biomarker Discovery Analysis

Bayesian Approach For Clinical Trials Mark Chang, Ph.D. Executive Director Biostatistics and Data management AMAG Pharm

Mark Dean

Mark

Christianity Explained

Mark 9:33-37

Mark 12:1-12 The Parable of the Tenants

Mark Ch. 9

Mark Tansey

The Gospel of Passion Mark 1:9-21

Mark 9:38-41: Unity

LABORATORIO DE ANÁLISIS POR ACTIVACIÓN NEUTRÓNICA (LAAN)

Empirical Efficiency Maximization:

MARK