1 / 19

Percentile-Finding and the Sorcerer s Stone Combining Up-and-Down and Bayesian Designs

Percentile-Finding and the Sorcerer's Stone? Combining Up-and-Down and Bayesian Designs . Assaf Oron and Peter HoffStatistics Dept., University of Washington, Seattleassaf@u.washington.edu. Philosopher's. Percentile Finding: The Problem. Binary Response Experiments (

libitha
Download Presentation

Percentile-Finding and the Sorcerer s Stone Combining Up-and-Down and Bayesian Designs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Percentile-Finding and the Sorcerer’s Stone? Combining Up-and-Down and Bayesian Designs Assaf Oron and Peter Hoff Statistics Dept., University of Washington, Seattle assaf@u.washington.edu

    2. Percentile-Finding and the Sorcerer’s Stone? Combining Up-and-Down and Bayesian Designs Assaf Oron and Peter Hoff Statistics Dept., University of Washington, Seattle assaf@u.washington.edu

    3. Percentile Finding: The Problem Binary Response Experiments (‘yes’ or ‘no’) Positive response probability increases with increasing treatment (x) Sensory experiments, toxicity studies, material stress failure studies, etc. Thresholds assumed to have a (sub-)CDF F(x)

    4. Percentile Finding: The Problem Goal: find the treatment that would give a fixed probability p of positive response i.e., a percentile of F: Qp = F -1(p) , a.k.a. the target (In this talk we use p=0.3, encountered in Phase I clinical trials) Constraints: A fixed discrete set of treatments Small to moderate sample size (n < 10 to n ˜ 100)

    5. Two Sequential Sampling Approaches

    6. Method Basics Up-and-Down (Dixon and Mood, 1948) Ubiquitous in Applied Research (psychophysics, engineering, life sciences, medicine,…) Generates a Markov Chain, stationary distribution p peaked around target (Tsutakawa, 1967) Bayesian (QUEST, Watson and Pelli, 1983; CRM, O’Quigley et al., 1990) CRM Increasingly popular in Phase I clinical trials Aims to ‘zoom’ perfectly onto closest level to target

    7. Convergence Comparison

    8. Convergence Comparison

    9. U&D Convergence Limitations

    10. Robustness Comparison

    11. Robustness Comparison

    12. Sorcerer’s Stone: Bayesian Quick Gambling Now we reach the magic connection of this talk. Bayesian designs tend to create the impression they have some magical knowledge of where the target is. And very typically – not just in simulation, but also in experiments the design locks onto a single level and gambles on it as the correct one. What we see in the chart are 7 distribution scenarios – never mind their names now – and for each of them, how often did the Bayesian design allocate 12 or more of the first 18 trials to the same single level. In blue is how often they got it right; in red, how often they got it wrong. Now what happens when they get it wrong? We wasted a good bit of the experiment gathering information in the wrong place, and now we have to dig ourselves up from the hole. Why does this happen? Bayesian designs essentially gamble that any ‘unlucky’ sequence, if it happens, will happen late in the experiment. If the gamble fails and an excursion is observed early, the experiment starts with very poor point estimates of F, which feed into the model and throw it off target. Then it takes quite a while for these estimates to correct themselves.,Now we reach the magic connection of this talk. Bayesian designs tend to create the impression they have some magical knowledge of where the target is. And very typically – not just in simulation, but also in experiments the design locks onto a single level and gambles on it as the correct one. What we see in the chart are 7 distribution scenarios – never mind their names now – and for each of them, how often did the Bayesian design allocate 12 or more of the first 18 trials to the same single level. In blue is how often they got it right; in red, how often they got it wrong. Now what happens when they get it wrong? We wasted a good bit of the experiment gathering information in the wrong place, and now we have to dig ourselves up from the hole. Why does this happen? Bayesian designs essentially gamble that any ‘unlucky’ sequence, if it happens, will happen late in the experiment. If the gamble fails and an excursion is observed early, the experiment starts with very poor point estimates of F, which feed into the model and throw it off target. Then it takes quite a while for these estimates to correct themselves.,

    13. We run an U&D chain, but calculate the Bayesian model at each step: If the Bayesian allocation is closer to target with 100(1-ß)% posterior credibility, we allow Bayesian allocation to override U&D Bayesian Up-and-Down (BUD)

    14. BUD Credibility: How it Works (1)

    15. BUD Credibility: How it Works (2)

    16. BUD: More about ß ß=0.5 is ‘pure’ Bayesian (Bayesian override guaranteed, when using median-based posterior allocation) ß=0 is ‘pure’ U&D (Override never happens) ß=0.15 to 0.25 seem to work reasonably well Notes: In toxicity-averse applications, one can use different ß values for ‘up’, ‘down’ moves in order to limit toxic responses (target remains unchanged) ß is roughly analogous to frequentist Type II error risk

    17. BUD Estimation Performance

    18. Conclusions: There’s No Sorcerer’s Stone Either way, this is a small-n, discrete-level, censored sampling of thresholds There’s a limit on how well we can expect to do; and we are quite at risk of ‘meltdown’ ‘Unlucky’ sequences are common and can be devastating Current long-memory designs do not address this risk BUD offers a way to reduce our exposure, while still improving allocation sharpness with time

    19. Acknowledgements Original Motivation for Studying U&D Michael J. Souter, M.D., Harborview, Seattle Ph.D. Committee at UW Peter Hoff, Margaret Pepe, Paul Sampson, Barry Storer, Jon Wellner Discussion and information Malachi Columb, Nancy Flournoy, Mauro Gasparini, Miguel A. Garcěa-Perez, Mizrak Gezmu, Mario Stylianou Help with this Talk Veronica Berrocal, Qunhua Li, Gail Potter

    20. References (chronologically ordered) Dixon and Mood: JASA 43 (1948), 109-126 Wetherill et al.: Biometrika 53 (1966), 439-454 Tsutakawa: JASA 62 (1967), 842-856 Watson and Pelli: Percept. Psychopys. 33 (1983), 113-120 O’Quigley et al.: Biometrics 46 (1990), 33-48 Goodman et al.: Stat. Med. 14 (1995), 1149-1161 Shen and O’Quigley: Biometrika 83 (1996), 395-405 Babb et al.: Stat. Med. 17 (1998), 1103-1120 Stylianou and Flournoy: Biometrics 58 (2002), 171-177 Cheung and Chappel: Biometrics 58 (2002), 671-674 Oron: Ph.D. Dissertation, forthcoming (Fall 2007)

More Related