Inference for Clinical Decision Making Policies

Inference for Clinical Decision Making Policies D. Lizotte, L. Gunter, S. Murphy INFORMS October 2008

Sequenced Treatment Alternatives to Relieve Depression STAR*D • A goal of the clinical trial is construct good treatment sequences for patients suffering from treatment resistant depression. • The goal is to achieve remission. www.star-d.org

QIDS ≤ 5 Level 1 Max 12 Weeks CIT Follow-up QIDS > 5 QIDS ≤ 5 SER, BUP, VEN CIT+BUS, CIT+BUP Level 2 Max 12 Weeks Preference to Switch Preference to Augment Follow-up QIDS > 5 QIDS ≤ 5 Level 3 Max 12 Weeks MIRT, NTP L2+Li, L2+THY Preference to Switch Preference to Augment Follow-up QIDS > 5 QIDS ≤ 5 Level 4 12 Weeks TCP, MIRT+VEN Follow-up

STAR*D • Level 1 Observation: • QIDS:Quick Inventory of Depressive Symptoms • 16 Items. Score range: 0-27. Self-reported. • Preference for type of Level 2 treatment: Switch or Augment • Level 2 Treatment Action: If Level 1 preference is Switch then switch to either Ser, Bup or Ven; if Level 1 preference is Augment then augment with Bup or Bus. • Level 2 Observation: • QIDS • Preference for type of Level 3 treatment: Switch or Augment • Level 3 Treatment Action: If Level 2 preference is Switch then switch to either Mirt or Ntp: if Level 2 preference is Augment then augment with Li or Thy • Level 3 Observation: • QIDS • Patients exit to follow-up if remission is achieved (QIDS ≤ 5).

Construct the policy to maximize average sum of rewards • Reward: Convert Level 2 and Level 3 QIDS scores to standardized percentiles → %QIDS • Reward: Rj=1-(%QIDSj-%5)/100 for j=2,3 • If a patient remits in Level 2, R2=1+1, R3=0. • Construct policy so as to maximize E[R2 +R3]

Batch version of Q-learning for finite horizon problems • Approximate Q3 by regressing R3 on Levels 1 & 2 QIDS within each (present action, preference, past action category. • The best level 3 action is and the value is

Batch version of Q-learning for finite horizon problems: • Approximate Q2 by regressing R2 + V3 on Level 1 QIDS within each present action x preference category. • The best level 2 action is and the value is

Use voting across bootstrap samples to assess confidence • 100 bootstrap samples • Each sample produces a Q2; for each level 1 QIDS score we calculate the level 2 action that maximizes Q2(o,a). This is a vote by this bootstrap sample for the action.

Conclusion • If level 1 QIDS is >12 then Ven is best treatment action at level 2 • If level 1 QIDS is <11 then Ser is best treatment action at level 2 • If level 1 QIDS is around 11 or 12 then Ven and Ser are best treatment actions at level 2.

The Problem • Many patients dropout of the study.

Two Approaches to Study Dropout • Complete Case Analysis (Remove all patients with incomplete data from the analysis)--- gross assumptions on why people do or do not dropout. N=1201→N=679. • Use a Bayesian method: Multiple Imputation. • This method multiply imputes the missing data. Intuitively, an imputation model is used to group similar patients. Data from similar patients who remain in the study is used to construct the imputations for the missing data of dropouts.

Conclusion • If level 1 QIDS is > 20 then Ven and Bup are best treatment actions at level 2 • If level 1 QIDS is <12 then Ven and Ser are best treatment actions at level 2 • If level 1 QIDS is around 12 to 20 then Ven is best treatment action at level 2.

Discussion • If reinforcement learning and modern day control methods are to be used with clinical trial data then these methods must be combined with modern missing data methods and methods for assessing confidence. • The multiple imputation + bootstrap we used is likely conservative in terms of the assessment of confidence. • We are developing more principled methods of assessing confidence.

This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/INFORMS10.08.ppt Email me with questions or if you would like a copy! samurphy@umich.edu

Inference for Clinical Decision Making Policies