1 / 58

Developing Dynamic Treatment Regimes for Chronic Disorders

Developing Dynamic Treatment Regimes for Chronic Disorders. S.A. Murphy Univ. of Michigan RAND: August, 2005. Goals. Today Review Four categories of methods for constructing dynamic treatment regimes using data. Generalization error. Review.

finna
Download Presentation

Developing Dynamic Treatment Regimes for Chronic Disorders

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing Dynamic Treatment Regimes for Chronic Disorders S.A. Murphy Univ. of Michigan RAND: August, 2005

  2. Goals • Today • Review • Four categories of methods for constructing dynamic treatment regimes using data. • Generalization error

  3. Review • Definition of a dynamic treatment regime and the role of tailoring variables, decision options and decision rules • Using scientific theory, clinical experience and expert opinion to construct dynamic treatment regimes • Designing randomized experiments to inform the construction of dynamic treatment regimes

  4. Conceptual Formulation

  5. Four Categories of Methods for Constructing Dynamic Treatment Regimes Using Data (Secondary Analyses)

  6. k Decisions on one individual Observation made prior to jth decision point Decision or action at jth decision point Present and past observations Present and past decisions

  7. k Decisions Observation made prior to jth decision point Decision or “action” at jth decision point “Reward” following jth decision point Primary Outcome:

  8. Goal: Construct decision rules that input data at each decision point and output a recommended decision; these decision rules should lead to a maximal mean Y. The dynamic treatment regime is the sequence of decision rules:

  9. In the future we offer treatment An example of a simple decision rule is: alter treatment at time j if otherwise maintain on current treatment.

  10. Nature is your best friend and tells you all you need to know! You know for all values of

  11. Use Dynamic Programming: (k=2)

  12. Data (k=2) on n subjects Decision/action at jth decision point; assigned via randomization (SMART). Then

  13. Data (k=2) on n subjects. Suppose n is large, the observation space is small (e.g. binary scalar), the decision space is small, and k is small. Use a nonparametric model for and construct decision rules via dynamic programming.

  14. Our Setting (k=2) The number of subjects n is not large compared to the size of the (observation space, decision space). We are forced to use function approximation (i.e. parametric, semiparametric models) to add information to the data (e.g. to effectively pool over subjects’ data). The game is “What are you going to approximate?”

  15. Four Categories of Methods • Likelihood-based (Thall et al. 2000, 2002; POMDP’s in reinforcement learning) • Q-Learning (Watkins, 1989) (a popular method from reinforcement learning) • ---regression • A-Learning (Murphy, 2003; Robins, 2004) ---regression on a mean zero space • Weighting (Murphy, et al., 2002, related to policy search in reinforcement learning) ---weighted mean

  16. Conceptual Formulation

  17. Q-Learning

  18. Q-Learning Approximate using some approximation class (splines, linear regression, etc). Use regression (least squares, penalized least squares)

  19. A Simple Version of Batch Q-Learning Approximate j=1,2

  20. Decision Rules:

  21. Disadvantages of batch Q-Learning that motivate A-Learning: • We are adding information that we do not need in order to construct the decision rules. Essentially this information assists in pooling subjects’ data. If this information is false then any decrease in variance achieved by using the information may be offset by bias. • The unnecessary information may imply that implicit, unpleasant, assumptions are made on the conditional distribution of each observation given the past.

  22. Disadvantages of batch Q-Learning that motivate A-Learning: • Depending on the approximation class (e.g. the model) for the Q-functions, the model for the system dynamics may not be coherent; that is, it maynot be possible for this model to be true; that is, you might not be able to generate data with these models as the true Q-functions. • Usually when we add information to the data, this information is related to our understanding of the causal structure. It turns out that in this case, the unnecessary information concerns non-causal quantities.

  23. First point: Recall You have modeled both the advantage and the value functions. You don’t need to model the value.

  24. Second and third points: Recall you modeled and you have a model for

  25. Fourth point: In general the weightings of the various observations in the Q function are non-causal!! Why is this? Berkson’s paradox.

  26. Conceptual Formulation

  27. A-Learning

  28. A-Learning Approximate/parameterize the advantages:

  29. A-Learning Equivalently we can approximate/parameterize: Why is this equivalent to approximating the advantages???

  30. A-Learning The estimating equations will be in terms of are randomization probabilities

  31. A Simple Version of Batch A-Learning Approximate j=1,2. Then the estimating function is in terms of

  32. A Simple Version of Batch A-Learning

  33. Constructed Decision Rules:

  34. Disadvantages of batch Q-Learning that motivated A-Learning: • We are adding information that we do not need in order to construct the decision rules. Essentially this information assists in pooling subjects’ data. The unnecessary information may imply that implicit, unpleasant, assumptions are made on the conditional distribution of each observation given the past. • Depending on the approximation class (e.g. the model) for the Q-functions, the model for the system dynamics may not be coherent; that is, it maynot be possible for this model to be true; that is, you might not be able to generate data with these models as the true Q-functions.

  35. Telescoping decomposition of conditional mean

  36. Disadvantages of batch Q-Learning that motivated A-Learning: • Usually when we add information to the data, this information is related to our understanding of the causal structure. It turns out that in this case, the unnecessary information involves non-causal quantities.

  37. Conceptual Formulation

  38. Disadvantages of batch A-Learning that motivate Weighting: • We are adding information that we do not need in order to construct the decision rules. Essentially this information assists in pooling subjects data. If this information is false then any decrease in variance achieved by using the information may be offset by bias. !!!! • Except in very simple cases we are implicitly (as opposed to explicitly) approximating/modeling the best decision rules. Often experts have good ideas about the form of the best decision rules.

  39. Disadvantages of batch A-Learning that motivate Weighting: • Often there are constraints on the decision rules. We will want to find the best decision rules within the constrained space. These constraints may be: • Decision rules must make sense to clinician and patient • In impoverished environments may not have access to all of the observations collected in experimental trial.

  40. Weighting Consider the likelihood of the data: If the regime is implemented then the likelihood is

  41. Weighting Given parameterized decision rules We’d like to choose θto maximize

  42. A Simple Version of Weighting Maximize over θ

  43. Discussion • The four classes of methods make tradeoffs. • The different classes differ by the size of model space (parameter space). The more information you add via modeling assumptions, the smaller the model space and the lower the variance in estimation. And the greater the potential for bias due to the use of incorrect information. • What are we talking about? • Bias of what?! • Variance of what?! • Do we really want/need standard errors for the estimators of β’s, γ’s, θ’s??!

  44. Generalization Error

  45. Generalization Error • The use of function approximation combined with the methods, likelihood-based, Q-learning and A-learning, implicitly constrains the space of decision rules. • Yet do these methods try to find the best decision rules within the constrained space? No. • What is the goal here? • We want to be able to characterize the generalization error (mean, variance, confidence interval, error bound, etc)

  46. One decision only Data:on n subjects is randomized with probability

  47. Goal: Find the decision rules that are best (maximize mean Y) within a restricted class of decision rules: e.g.,

  48. The estimand is or equivalently

  49. The generalization error is The bias is the mean of the generalization error

More Related