machine reinforcement learning in clinical research l.
Skip this Video
Download Presentation
Machine/Reinforcement Learning in Clinical Research

Loading in 2 Seconds...

play fullscreen
1 / 46

Machine/Reinforcement Learning in Clinical Research - PowerPoint PPT Presentation

  • Uploaded on

Machine/Reinforcement Learning in Clinical Research. S.A. Murphy May 19, 2008. Outline Goal: Improving Clinical Decision Making Using Data. Clinical Decision Making Types of Training Data Incomplete Mechanistic Models Clinical Trials Some Open Problems Example. Questions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Machine/Reinforcement Learning in Clinical Research' - violet

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline goal improving clinical decision making using data
OutlineGoal: Improving Clinical Decision Making Using Data
  • Clinical Decision Making
  • Types of Training Data
  • Incomplete Mechanistic Models
  • Clinical Trials
  • Some Open Problems
  • Example

Patient Evaluation Screen with MSE


Policies are individually tailored treatments, with treatment type and dosage changing according to the patient’s outcomes.

k Stages for each patient

Observation available at jth stage

Action at jth stage (usually a treatment)


k Stages

History available at jth stage

Reward following jth stage (rj is a known function)

Primary Outcome:



Use training data to construct decision rules, d1,…, dk that input information in the history at each stage and output a recommended action; these decision rules should lead to a maximal mean Y (cumulative reward).

The policy is the sequence of decision rules, d1,…, dk .

In implementation of the policy the actions are set to:

some characteristics of a clinical decision making policy
Some Characteristics of a Clinical Decision Making Policy
  • The learned policy should be decisive only when warranted.
  • The learned policy should not require excessive data collection in order to implement.
  • The learned policy should be justifiable to clinical scientists.
types of data
Types of Data
  • Clinical Trial Data
    • Actions are manipulated (randomized)
  • Large Databases or Observational Data Sets
    • Actions are not manipulated by scientist
  • Bench research on cells/animals/humans
clinical trial data sets
Clinical Trial Data Sets
  • Experimental trials conducted for research purposes
    • Scientists decide proactively which data to collect and how to collect this data
    • Use scientific knowledge to enhance the quality of the proxies for observation, reward
    • Actions are manipulated (randomized) by scientist
    • Short Horizon (less than 5)
    • Hundreds of subjects.
observational data sets
Observational Data Sets
  • Observational data collected for research purposes
    • Use scientific knowledge to pinpoint high quality proxies for observation, action, reward
    • Scientists decide proactively which proxies to collect and how to collect this data
    • Actions are not manipulated by scientist
    • Moderate Horizon
    • Hundreds to thousands of subjects.
observational data sets14
Observational Data Sets
  • Clinical databases or registries– (an example in the US would be the VA registries)
    • Data was not collected for research purposes
    • Only gross proxies are available to define observation, action, reward
    • Moderate to Long Horizon
    • Thousands to Millions of subjects
mechanistic models
Mechanistic Models
  • In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations.
  • Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another.
low availability of mechanistic models
Low Availability of Mechanistic Models
  • Clinical scientists have recourse to only crude, qualitative models
  • Unknown causes create problems. Scientists who want to use observational data to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.

Unknown, Unobserved Causes

(Incomplete Mechanistic Models)

unknown unobserved causes incomplete mechanistic models
Unknown, Unobserved Causes(Incomplete Mechanistic Models)
  • Problem: Non-causal associations between treatment (here counseling) and rewards are likely.
  • Solutions:
    • Collect clinical trial data in which treatment actions are randomized. This breaks the non-causal associations yet permits causal associations.
    • Participate in the observational data collection; proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to utilize this information.
star d
  • The statistical expertise relevant for policy construction was unavailable at the time the trial was designed.
  • This trial is over and one can apply for access to this data
  • One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

  • Ongoing study at U. Pennsylvania
  • Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.
oslin extend
Oslin ExTENd


8 wks Response


TDM + Naltrexone

Early Trigger for





CBI +Naltrexone



8 wks Response


TDM + Naltrexone

Late Trigger for





CBI +Naltrexone

clinical trials
Clinical Trials
  • Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods.
    • In the clinical trial large amounts of data are collected at each stage of treatment
    • Small number of finite horizon patient trajectories
    • The learned policy can vary greatly from one training set to another.
open problems
Open Problems
  • Equivalent Actions
    • Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence.
  • Evaluation
    • Need to assess the quality of the learned policy (or compare policies) using training data
open problems27
Open Problems
  • Variable Selection
    • To reduce the large number of variables to those most useful for decision making
    • Once a small number of variables is identified, we need to know if there is sufficient evidence that a particular variable (e.g. output of a biological test) should be part of the policy.
measures of confidence
Measures of Confidence
  • A statistician’s approach: use measures of confidence to address these three challenges
    • Pinpointing equivalent actions
    • Pinpointing necessary patient inputs to the policy
    • Evaluating the quality of a learned policy
evaluating the quality of a learned policy using the training data
Evaluating the quality of a learned policy using the training data
  • Traditional methods for constructing measures of conference require differentiability (to assess the variation in the policy from training set to training set).
  • The mean outcome following use of a policy (the value of the policy) is a non-differentiable function of the policy.
example single stage k 1
Example: Single Stage (k=1)
  • Find a prediction interval for the mean outcome if a particular estimated policy (here one decision rule) is employed.
  • Action A is binary in {-1,1}.
  • Suppose the decision rule is of form
  • We do not assume the Bayes decision boundary is linear.
single stage k 1
Single Stage (k=1)

Mean outcome following this policy is

is the randomization probability

prediction interval for
Prediction Interval for

Two problems

  • V(β) is not necessarily smooth in β.
  • We don’t know V so V must be estimated as well. Data set is small so overfitting is a problem.
similar problem in classification
Similar Problem in Classification

Misclassification rate for a given decision rule (classifier)

where V is defined by

(A is the {-1,1} classification; O1 is the observation; βT O1 is a linear classification boundary)


is non-smooth.

Toy Example: The unknown Bayes classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary

f(o)= sign(β0 + β1 o)

jittering of
Jittering of



simulation example
Simulation Example
  • Data Sets from the UCI repository
  • Use squared error loss to form classification rule
  • Sample 30 examples from each data set; for each sample construct prediction interval. Assess coverage using remaining examples.
  • Repeat 1000 times
95 prediction intervals
“95% Prediction Intervals”

Confidence rate should be ≥ .95

prediction interval for38
Prediction Interval for

Our method obtains a prediction interval for a smooth upper bound on

is training error.

prediction interval for39
Prediction Interval for

where is the set of close to in terms of squared error loss. Form a percentile bootstrap interval for this smooth upper bound.

  • This method is generally too conservative
95 prediction intervals40
“95% Prediction Intervals”

Confidence rate should be ≥ .95

a challenge
A Challenge!

Methods for constructing the policy (or classifier) and providing an evaluation of the policy (or classifier) must use same small data set.

How might you better address this problem?

  • Equivalent Actions: Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence.
  • Evaluating the usefulness of a particular variable in the learned policy.
  • Methods for producing composite rewards.
    • High quality elicitation of functionality
  • Feature construction for decision making in addition to prediction
This seminar can be found at:


Email me with questions or if you would like a copy:


Patient Evaluation Screen with MSE