Machine reinforcement learning in clinical research
Download
1 / 46

Machine - PowerPoint PPT Presentation


  • 224 Views
  • Updated On :

Machine/Reinforcement Learning in Clinical Research. S.A. Murphy May 19, 2008. Outline Goal: Improving Clinical Decision Making Using Data. Clinical Decision Making Types of Training Data Incomplete Mechanistic Models Clinical Trials Some Open Problems Example. Questions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Machine' - violet


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline goal improving clinical decision making using data l.jpg
OutlineGoal: Improving Clinical Decision Making Using Data

  • Clinical Decision Making

  • Types of Training Data

  • Incomplete Mechanistic Models

  • Clinical Trials

  • Some Open Problems

  • Example


Questions l.jpg
Questions

Patient Evaluation Screen with MSE


Slide7 l.jpg

Policies are individually tailored treatments, with treatment type and dosage changing according to the patient’s outcomes.

k Stages for each patient

Observation available at jth stage

Action at jth stage (usually a treatment)


Slide8 l.jpg

k Stages

History available at jth stage

Reward following jth stage (rj is a known function)

Primary Outcome:


Slide9 l.jpg

Goal:

Use training data to construct decision rules, d1,…, dk that input information in the history at each stage and output a recommended action; these decision rules should lead to a maximal mean Y (cumulative reward).

The policy is the sequence of decision rules, d1,…, dk .

In implementation of the policy the actions are set to:


Some characteristics of a clinical decision making policy l.jpg
Some Characteristics of a Clinical Decision Making Policy

  • The learned policy should be decisive only when warranted.

  • The learned policy should not require excessive data collection in order to implement.

  • The learned policy should be justifiable to clinical scientists.


Types of data l.jpg
Types of Data

  • Clinical Trial Data

    • Actions are manipulated (randomized)

  • Large Databases or Observational Data Sets

    • Actions are not manipulated by scientist

  • Bench research on cells/animals/humans


Clinical trial data sets l.jpg
Clinical Trial Data Sets

  • Experimental trials conducted for research purposes

    • Scientists decide proactively which data to collect and how to collect this data

    • Use scientific knowledge to enhance the quality of the proxies for observation, reward

    • Actions are manipulated (randomized) by scientist

    • Short Horizon (less than 5)

    • Hundreds of subjects.


Observational data sets l.jpg
Observational Data Sets

  • Observational data collected for research purposes

    • Use scientific knowledge to pinpoint high quality proxies for observation, action, reward

    • Scientists decide proactively which proxies to collect and how to collect this data

    • Actions are not manipulated by scientist

    • Moderate Horizon

    • Hundreds to thousands of subjects.


Observational data sets14 l.jpg
Observational Data Sets

  • Clinical databases or registries– (an example in the US would be the VA registries)

    • Data was not collected for research purposes

    • Only gross proxies are available to define observation, action, reward

    • Moderate to Long Horizon

    • Thousands to Millions of subjects


Mechanistic models l.jpg
Mechanistic Models

  • In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations.

  • Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another.


Low availability of mechanistic models l.jpg
Low Availability of Mechanistic Models

  • Clinical scientists have recourse to only crude, qualitative models

  • Unknown causes create problems. Scientists who want to use observational data to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.



Slide18 l.jpg

Unknown, Unobserved Causes data)

(Incomplete Mechanistic Models)


Unknown unobserved causes incomplete mechanistic models l.jpg
Unknown, Unobserved Causes data)(Incomplete Mechanistic Models)

  • Problem: Non-causal associations between treatment (here counseling) and rewards are likely.

  • Solutions:

    • Collect clinical trial data in which treatment actions are randomized. This breaks the non-causal associations yet permits causal associations.

    • Participate in the observational data collection; proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to utilize this information.


Slide20 l.jpg

Conceptual Structure in the Clinical Sciences data)(experimental trial data)


Star d l.jpg
STAR*D data)

  • The statistical expertise relevant for policy construction was unavailable at the time the trial was designed.

  • This trial is over and one can apply for access to this data

  • One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.

www.star-d.org


Extend l.jpg
ExTENd data)

  • Ongoing study at U. Pennsylvania

  • Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.


Oslin extend l.jpg
Oslin ExTENd data)

Naltrexone

8 wks Response

Randomassignment:

TDM + Naltrexone

Early Trigger for

Nonresponse

CBI

Randomassignment:

Nonresponse

CBI +Naltrexone

Randomassignment:

Naltrexone

8 wks Response

Randomassignment:

TDM + Naltrexone

Late Trigger for

Nonresponse

Randomassignment:

CBI

Nonresponse

CBI +Naltrexone


Clinical trials l.jpg
Clinical Trials data)

  • Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods.

    • In the clinical trial large amounts of data are collected at each stage of treatment

    • Small number of finite horizon patient trajectories

    • The learned policy can vary greatly from one training set to another.


Open problems l.jpg
Open Problems data)

  • Equivalent Actions

    • Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence.

  • Evaluation

    • Need to assess the quality of the learned policy (or compare policies) using training data


Open problems27 l.jpg
Open Problems data)

  • Variable Selection

    • To reduce the large number of variables to those most useful for decision making

    • Once a small number of variables is identified, we need to know if there is sufficient evidence that a particular variable (e.g. output of a biological test) should be part of the policy.


Measures of confidence l.jpg
Measures of Confidence data)

  • A statistician’s approach: use measures of confidence to address these three challenges

    • Pinpointing equivalent actions

    • Pinpointing necessary patient inputs to the policy

    • Evaluating the quality of a learned policy


Evaluating the quality of a learned policy using the training data l.jpg
Evaluating the quality of a learned policy using the training data

  • Traditional methods for constructing measures of conference require differentiability (to assess the variation in the policy from training set to training set).

  • The mean outcome following use of a policy (the value of the policy) is a non-differentiable function of the policy.


Example single stage k 1 l.jpg
Example: Single Stage (k=1) training data

  • Find a prediction interval for the mean outcome if a particular estimated policy (here one decision rule) is employed.

  • Action A is binary in {-1,1}.

  • Suppose the decision rule is of form

  • We do not assume the Bayes decision boundary is linear.


Single stage k 1 l.jpg
Single Stage (k=1) training data

Mean outcome following this policy is

is the randomization probability


Prediction interval for l.jpg
Prediction Interval for training data

Two problems

  • V(β) is not necessarily smooth in β.

  • We don’t know V so V must be estimated as well. Data set is small so overfitting is a problem.


Similar problem in classification l.jpg
Similar Problem in Classification training data

Misclassification rate for a given decision rule (classifier)

where V is defined by

(A is the {-1,1} classification; O1 is the observation; βT O1 is a linear classification boundary)


Jittering l.jpg
Jittering training data

is non-smooth.

Toy Example: The unknown Bayes classifier has quadratic decision boundary. We fit, by least squares, a linear decision boundary

f(o)= sign(β0 + β1 o)


Jittering of l.jpg
Jittering of training data

N=30

N=100


Simulation example l.jpg
Simulation Example training data

  • Data Sets from the UCI repository

  • Use squared error loss to form classification rule

  • Sample 30 examples from each data set; for each sample construct prediction interval. Assess coverage using remaining examples.

  • Repeat 1000 times


95 prediction intervals l.jpg
“95% Prediction Intervals” training data

Confidence rate should be ≥ .95


Prediction interval for38 l.jpg
Prediction Interval for training data

Our method obtains a prediction interval for a smooth upper bound on

is training error.


Prediction interval for39 l.jpg
Prediction Interval for training data

where is the set of close to in terms of squared error loss. Form a percentile bootstrap interval for this smooth upper bound.

  • This method is generally too conservative


95 prediction intervals40 l.jpg
“95% Prediction Intervals” training data

Confidence rate should be ≥ .95


A challenge l.jpg
A Challenge! training data

Methods for constructing the policy (or classifier) and providing an evaluation of the policy (or classifier) must use same small data set.

How might you better address this problem?


Discussion l.jpg
Discussion training data

  • Equivalent Actions: Need to know when a subset of actions are equivalent–that is, when there is no or little evidence to contradict this equivalence.

  • Evaluating the usefulness of a particular variable in the learned policy.

  • Methods for producing composite rewards.

    • High quality elicitation of functionality

  • Feature construction for decision making in addition to prediction


Slide43 l.jpg

This seminar can be found at: training data

http://www.stat.lsa.umich.edu/~samurphy/

seminars/Benelearn08.ppt

Email me with questions or if you would like a copy:

[email protected]


Slide44 l.jpg

Unknown, Unobserved Causes training data(Incomplete Mechanistic Models)


Slide45 l.jpg

Unknown, Unobserved Causes training data(Incomplete Mechanistic Models)


Questions46 l.jpg
Questions training data

Patient Evaluation Screen with MSE


ad