Variable Selection for Individualized Treatment Decision-Making

Variable Selection for Individualized Treatment Decision-Making Presenter: Daniel Almirall University of Michigan, Institute for Social Research Joint Statistical Meetings San Diego, California July 30, 2012

Warm-up

Warm-up: Data=(S,A,Y). Suppose we want the effect of A on Y. Why condition on a pre-treatment variable S? • Confounding (specific to observational studies): S is correlated with both A and Y. • Precision: S may be a pre-treatment measure of Y, or any other variable highly correlated with Y. • Missing Data: Y is missing for some units, S and A predict missing-ness, and S is associated with Y. • Effect Heterogeneity/Moderation/Modification: S may moderate, or specify the effect of A on Y.

Warm-up: Data=(S,A,Y). Suppose we want the effect of A on Y. Why condition on a pre-treatment variable S? • Effect Heterogeneity/Moderation/Modification: S may moderate, or specify the effect of A on Y. • Actually, our focus is on a specific type of Effect Moderation…

Tailoring Variables are specific types of moderator variables. • A tailoring variable is a pre-treatment measure such that individuals who measure at some values of the variable benefit more (equally) from one (multiple) type(s) of treatment, whereas individuals who measure at other values of the variable benefit from a different (specific) type of treatment. • Tailoring variables are prescriptive. • They help individualize treatment decision making.

Example of a Tailoring Variable • Provide outpatient treatment to individuals with higher levels of social support. Provide either one to individuals with low levels of social support.

What is the Relevance? • Theoretical Implication: Understanding the heterogeneity of treatment effects enhances our understanding of scientific theories; and may suggest new scientific hypotheses. • Practical Implication 1: Identifying types of individuals for which treatment not effective may suggest altering the treatment to suit the needs of those individuals. • Practical Implication 2: Individualized decision making: Provide different treatments for different types of individuals.

Prototypical Linear Regression with Covariate-by-Treatment Interactions • In the example linear model: • S is a tailoring variable if 1 + 2 s is negative (positive) or zero for some values of S=s, yet positive (negative) for other values of S=s.

Goals and context

Goal • Primary Goal: To devise a method for Tailoring Variable selection • Do this in such a way that • Results are more likely to replicate • Does not require a priori knowledge of functional forms especially main effects of candidate tailoring variables • Permits subsequent exploratory data analysis of the ways in which the selected tailoring variables may be combined for individualizing treatment • This is called Tailoring Variable Feature Construction

Need for a Principled Method of Variable Selection for Tailoring • Current practice may be of some concern: • Recall: E( Y(a) | S=s ) = 0 + 1 s +1 a + 2 a s • Fit many different interactions; look for p-value < 0.05 • Theoretical explanations sometimes come after • Unfortunate Results: • Proposals for tailoring variables do not replicate • For example: Project MATCH in alcohol research • Issues: Wealth of data, cost, statistical power • The “process of discovery” is often considered fun!

Variable Selection for Prediction is not the same thing as Variable Selection for Tailoring • Prediction: • What variables predict Y? • Tailoring: • What variables predict the individual (differential) effects of A on Y, e.g., D = Y(1) – Y(0)? • An obvious challenge is that we do not observe D for each individual. D can be thought of as latent. • Stated differently, what variables are useful in making decisions about A=1 vs A=0 in terms of optimizing Y?

Variable Selection: Effect Heterogeneity • Imai, K. and Ratkovic, M. (2012) “Estimating treatment effect heterogeneity in Randomized Program Evaluation” ACC, Tom Ten Have Memorial Award! Session this Wed 8/1 200P-350P. • Kang, J., et al. (2012). “Tree-structured analysis of treatment effects with large observational data.” Applied Statistics • Loh, W.Y., et al. (2012) “Should all smokers use combination smoking cessation pharmacotherapy?” Nicotine and Tobacco Res. • He, X. (2012) “Identification of subgroups with large differential treatment effects in GWAS” Thesis. • Siddique, J. et al. (2011) “Comparative effectiveness of medication vs CBT in Depressed Low-income Women” ICHPS 2011, Cleveland • Gunter, L., et al. (2011) “Variable selection for qualitative interactions” Statistical Methodology • Imai, K. and Strauss, A. (2011). “Estimation of heterogeneous treatment effects from randomized experiments” • Cai, T., et al. (2010) “Analysis of randomized comparative clinical trial data for personalized treatment selections” Biostatistics. Also in the session this Wed 8/1 200P-350P. • King, A.C., and Kraemer H.C. et al. (2008) “Exploring refinements in targeted behavioral medicine intervention to advance public health.” Annals of Behavioral Medicine • Kraemer, H.C. (2007) “Toward non-parametric and clinically meaningful moderators and mediators.” Stat. in Medicine. Has many other useful and recent articles in this area!!

THE MOTIVATING DATA SET

Adolescent Substance Use Data • Observational study of N=2870 adolescents with substance use problems • From substance use programs across the US (CSAT) • GAIN: Global Appraisal of Individual Needs • structured clinical interview; baseline, 3, 6, 9, 12 months • demographics & measures along 6 dimensions of need • Data { (S0,X0), A1, (S1,X1), A2, (S2,X2), A3, Y } • St = pre-specified candidate tailoring variables • Xt = many many auxiliary variables; e.g., X2 has 126 • At = did adolescent receive txt in 3-month interval? • Y = substance use frequency at 12 months

A PROPOSAL for tailoring variable selection

Adolescent Substance Use Data • Data: { (S0,X0), A1, (S1,X1), A2, (S2,X2), A3, Y } • St: Candidate Tailoring Variables • E.g., eps7p3, sfs8p6, etc… (E.g., 18 variables in S2) • Xt: Auxiliary variables • At: Treatment • Y: Outcome = sfs8p12 • We describe the method for final time point only. (Extends readily but beyond scope of talk.) • Choose among the 18 variables S2 to make a decision about A3=1=treatment vs A3=0=no treatment during months 6-9.

Use theory, clinical experience, cost to choose a pre-specified list of candidate tailoring variables S. • Randomly split data set: discovery & evaluation. • Using discovery data set: • Using A=1: Build a machine to predict Y. Call this f1(S,X). • Using A=0: Build a machine to predict Y. Call this f0(S,X). • Using all data: Calculate D = f1 (S,X) – f0(S,X) • Variable selection on D: e.g., Use LASSO with STABILITY SELECTION for variable selection in a regression of D ( or D>0 ) on S. Selected variables denoted by S*  S. • Using the evaluation data set: • Test selected variables for differential effects in (IPTW) regression; e.g. of Y on A, S*, S*-by-A interaction terms.

Stability Selection (with LASSO) Meinshausen and Bulhmann (2010) JRSSB • LASSO: Often difficult to select the right amount of regularization (tuning pmtr) to select S* exactly. • Bootstrap the LASSO. For every value of the tuning parameter , calculate the probability (over bootstraps) of selecting the variable K with LASSO. These are “stability paths”: • For each variable K, calculate the max . These are called “selection probabilities”. • Keep variables that have max ≥ . •  chosen by the user (another tuning parameter!?)

Variable Selection Results

Evaluation Results (IPTW Regression)

Evaluation Results (IPTW Regression):Specific Contrasts from the Regression Model

can we do more?

Thank you. Contact information: dalmiral@umich.edu Funding: R03-MH-097954 (PI: Almirall) R01-MH-080015 (PI: Murphy) P50-DA-010075 (PI: Collins) R01-DA-015697 (PI: McCaffrey & Griffin) Ownership of Errors: Any errors, confusion, or misconceptions are my own—my colleagues did not completely vet all of my statements in this talk.

Extra slides

Future Days Abstinent S is a moderator variable because the magnitude of the effect of Tx=NTX+CBI versus Tx=NTX differs by levels of S. However, S is not a tailoring variable: Tx=NTX+CBI is better for all subjects. NTX+CBI (High is better) NO TAILORING Y NTX S=0no heavy drinking S=1returned to heavy drinking S is a weak tailoring variable because the direction of the effect of Tx=NTX+CBI versus Tx= NTX differs by levels of S but magnitude is small. S is somewhat prescriptive: Offer Tx=NTX+CBI to S=1 subjects; the difference in effects is not substantial for S=0 subjects. BETTER High is better NTX+CBI Y NTX S=0 S=1 S is a strong tailoring variable because the direction of the effect of Tx=NTX +CBI versus Tx=NTX differs by levels of S. S is very prescriptive: Offer Tx=NTX to S=0 subjects; offer Tx=NTX+CBI to S=1 subjects. Large magnitudes of clinical significance. BEST High is better NTX NTX+CBI Y S=0 S=1

Variable Selection for Individualized Treatment Decision-Making

Variable Selection for Individualized Treatment Decision-Making

Presentation Transcript

CHAPTER 7 VARIABLE COSTING: A DECISION-MAKING PROCESS

Decision Making for Results

Criteria for Decision Making

Decision Making for Results

Gibbs Variable Selection

Decision Making in the Era of Individualized Treatment in Advanced NSCLC: Highlights From Chicago

Rough Set Model Selection for Practical Decision Making

Standards for Decision Making

Variable Selection for Tailoring Treatment

Variable Selection for Optimal Decision Making

PSR Individualized Treatment Plan

Accounting for Decision Making

Variable Selection for Optimal Decision Making

Variable Selection for Tailoring Treatment

Statistics for Decision Making

Decision Variable Ordering

Information for Decision Making

Data for Decision-Making

Data for Decision-Making

Data for Decision-Making

A framework for medical treatment decision-making

Data for Decision-Making