Who Needs Bayesian Phase I Trials?

Who Needs Bayesian Phase I Trials? Rick Chappell Professor, Departments of Statistics and of Biostatistics and Medical Informatics University of Wisconsin Madison chappell@stat.wisc.edu 2010 Joint Statistical Meetings; Vancouver

My thanks to The International Indian Statistical Association, Biometrics Section and Professor Debajyoti Sinha, Florida State University for inviting me to speak; An unnamed CMO of an unnamed pharmaceutical company; The audience.

Outline • Goal of Phase I Trials in Cancer & Other Serious Diseases • How we need Bayesian Phase I Trials • Options for Dose Escalation • Example, with Design and Simulations • Conclusions

I. Goal of Phase I Trials in Cancer & Other Serious Diseases • Goal first delineated by Schneiderman (1967) as estimation of dose yielding a fixed percent (often 33%) of dose-limiting toxicities (DLTs) but not lower. This is called the Maximum Tolerable Dose (MTD). • Phase I Trials are Usually Outcome-adaptive

Phases of a Clinical Trial • Preclinical: Biochemical and animal studies • Phase I, Cancer - estimate toxicity rates using few typically very sick subjects. • Phase II - determines a therapy’s potential efficacy usually using a few very sick patients. • Phase III – a large randomized controlled, possibly blinded, possibly multicenter, experiment • Phase IV - a controlled trial of an approved treatment with long-term followup of safety and efficacy.

Schematic of Phase I Trial 100 % Toxicity 33 . . . mtd 0 d1 d2 Dose

Allowable DLT rate can be lower than 33% but rarely higher 100 % Toxicity 20 .. mtd 0 d1 d2 Dose

How We Need Bayesian Phase I Trials Rick Chappell Professor, Departments of Statistics and of Biostatistics and Medical Informatics University of Wisconsin Madison 2010 Joint Statistical Meetings; Vancouver

II. How we need Bayesian Phase I Trials A. In the Design - How will we set the doses? Guess at a series of levels starting at a low toxicity rate and moving to higher rates. - Isn’t this a prior? - Formal prior elicitation methods can be very helpful. B. In Dose Escalation C. In the Analysis - The data from a single phase I trial usually provide only a little information. - Information is usually available from other sources. - Model-based smoothing (e.g., for monotonicity) often useful. - Computation using conjugate priors is simple. - Multiple analyses can (should) be done. - Storer (1989) recommended that analysis be separated from design.

III. Options for Dose Escalation A. Algorithmic designs (Shih & Lin, 2006, in Chevret’s text). - These could be defined as designs where (de-)escalation depends only on outcomes in the most recent groups of patients. - However, some “permanent” changes, such as adjustments in cohort sizes, have been included in “algorithms”. - Most intuitive definition is “A (de-)escalation system which a human being can keep in his head.” B. Bayesian and other model-based designs.

A. Simple Algorithmic “Up-and-Down” Designs • Usually small cohorts of patients (1 – 5); • Dose levels increased and/or decreased until a predetermined sample size is reached; • Designed to estimate the MTD as 33%-ile with cohorts of size 3, 20%-ile with cohorts of size 5, etc.; • Not flexible (can also spend a lot of patients at low-toxicity doses), but can easily be modified to speed up via adaptive cohort sizes; • Other sources of information can be used.

Storer’s (1989) phase I up-and-down classification Design A (“Traditional”): Groups of three patients are treated. Escalation occurs if no toxicity is observed in all three: otherwise, an additional three patients are treated at the same dose level. If only one of six has toxicity, escalation again continues; otherwise, the trial stops. Design B: Single patients are treated. The next patient is treated at the next lower dose level if a toxic response is observed, otherwise at the next higher dose level until sample size is reached.

Design D: Groups of three patients are treated. Escalation if no toxicity is seen and de-escalation if more than one patient has toxicity. If one patient has toxicity, next group of three is treated at same dose level; repeat until sample size is reached. Similar to traditional except can go down. ( Design C: Similar to design D, except that the rule applies to the preceding three patients at any point, instead of using discrete batches of three. ) - Larger cohorts can be used to estimate the MTD as the dose with a 1/4 = 25%, 1/5 = 20% or 1/6 = 17% rates. - Designs can be combined, such as “BD”.

B.Bayesian Designs - The Continual Reassessment Method (CRM; O’Quigley, et al., 1990) 1. Prior guess is made as to the dose-response (toxicity) curve; 2. First patient is assigned to the prior MTD; 3. After the patient is fully followed, his or her outcome is used to update the prior curve; 4. Next patient is assigned to the new “posterior” MTD; • Repeated until sample size is reached. • Final posterior MTD is estimate.

C. Operating Characteristics are Paramount 1. “Operating Characteristics”: How a design escalates the dose and changes cohort size, and how it stops. 2. Clinicians want to know how treatment decisions are made – they are fearful of the black box. 3. Operating Characteristics of Bayesian / other model-based designs can be described by example and simulation. 4. One cannot rely on clinicians' priors to produce [Bayesian] designs with acceptable operating characteristics. In practice, the statistician must at least specify prior uncertainty.

“4. One cannot rely on clinicians' priors to produce [Bayesian] designs with acceptable operating characteristics. In practice, the statistician must at least specify prior uncertainty.” Sometimes this requirement leads to absurdities. Ji, Li, and Bekele (2007) proposed a prior for a Bayesian method “that can be easily understood and implemented by nonstatisticians.” It is simple, and leads to a practical and sensible algorithm.

“4. One cannot rely on clinicians' priors to produce [Bayesian] designs with acceptable operating characteristics. In practice, the statistician must at least specify prior uncertainty.” Sometimes this requirement leads to absurdities. Ji, Li, and Bekele (2007) proposed a prior for a Bayesian method “that can be easily understood and implemented by nonstatisticians.” It is simple, and leads to a practical and sensible algorithm. But the recommended prior was Beta(.005, .005):

D. How ethical operating characteristics subvert current Bayesian designs 1. The “Continual Reassessment Method” [CRM; O’Quigley et al., 1990] attempts to assign subjects to the MTD and so efficiently estimate it. 2. The CRM does not directly address the following ethical/logistical issues in Phase I trials: a) The need for a starting dose lower than the prior MTD; b) The need for a maximum increase in dose per cohort; c) A desire to let cohort sizes vary; d) A desire to allow data on sub-dose-limiting toxicities to inform escalation & cohort size choices. 3. The CRM can be modified; Or one can incorporate elements of Escalation with Overdose Control (EWOC), which constrains predicted overdose rates; Or we can combine the two.

3. The CRM can be modified (cont.); The most common way to modify the CRM is to overlay algorithms (for initial dose, maximum increase, etc.) on top of it. But this also has disadvantages. Cheung (2005) defined coherence as “An escalation for a new patient is said to be coherent only when the previous patient does not show sign of toxicity. Likewise, a de-escalation is coherent only when a toxic outcome has just been seen.” He shows the CRM, EWOC, & all unmodified likelihood-based methods to be coherent. But many of the simple common modifications are not.

A Practical, Easy, Understandable, and Scientifically Sound Algorithmic Design for a Phase I Trials Rick Chappell Professor, Departments of Statistics and of Biostatistics and Medical Informatics University of Wisconsin Madison chappell@stat.wisc.edu 2010 Joint Statistical Meetings; Vancouver

IV. Example, with Design and Simulations A. Design Considerations 1. Agent is a radiopharmaceutical available in doses of 6.25; 2. Concern centers on toxicity to bone marrow and also kidney, bladder, and liver; 3. FDA requires that starting dose be 12.5 (units unspecified); 4. Investigators consider this to be very low and want rapid initial escalation; 5. Information is available on “sub-dose-limiting” toxicities – e.g., 999 neutrophils/mm3 is grade 3 neutropenia but 1,001 is grade 2.

4. Investigators consider a 12.5 starting dose to be very low and want rapid initial escalation; • Information is available on “sub-dose-limiting” toxicities – e.g., 999 neutrophils/mm3 is grade 3 neutropenia but 1,001 is grade 2. How to address these requirements? • Escalate if 0 DLTs in a cohort; • De-escalate if ≥ 2 DLTs in a cohort; • Start with small cohorts (size 2), escalate dose in increments of 12.5 , maintain until higher risk suspected; • Permanently expand cohorts to size 5 when excessive risk seen, defined as ≥ 1 DLT and/or 2 sub-DLTs; • Permanently halve dose increment to 6.25 when ≥ 2 DLTs and/or ≥ 3 sub-DLTs seen; • Maintain dose if 1 DLT in a cohort; • Stop when ≥ 20 subjects reached (max 24); • “Highest dose with ≤ 20% DLTs recommended for phase II”.

B. Designs under investigation

B. (cont.) Other designs considered 1. 3+3, up-and-down until ≥ 20 patients reached: Slow, focuses on too-high doses, ignores sub-DLTs. 2. 5+5, up-and-down until ≥ 20 patients reached: Very slow, ignores sub-DLTs. 3. Like Design 3 but cohort size expands 2 → 5 with a single DLT instead of two: Too slow, stays at low doses, doesn't identify phase 2 dose well. 4. Like Design 3 but cohort size expands with a single DLT instead of 2 and continues to accrue subjects until ≥ 2 toxicities are observed in a single cohort: Can have very large sample size and long duration.

C. Simulations 1. Simulations compared: Design 1: Traditional 3+3 escalation only; Design 2: “Traditional 5+5” escalation only; Design 3: Up-and-down, cohort size adaptive, dose increment size adaptive. 2. Two scenarios used, High Toxicity and Low Toxicity. 3. Comparisons made in terms of Accuracy, Toxicity, Sample size, and Duration. 4. 10,000 simulations in each case, all figures accurate to # of digits reported.5+5, up-and-down until ≥ 20 patients reached: Very slow, ignores sub-DLTs.

Simulation: High-Toxicity Scenario

Simulation Results: High-Toxicity Scenario Precision

Simulation Results: High-Toxicity Scenario Toxicities “Sub-therapeutic”: at least 12.5 below true phase II dose

Simulation Results: High-Toxicity Scenario Sample Size / Duration

Simulation: Low-Toxicity Scenario

Simulation Results: Low-Toxicity Scenario Precision

Simulation Results: Low-Toxicity Scenario Toxicities “Sub-therapeutic”: at least 12.5 below true phase II dose

Simulation Results: Low-Toxicity Scenario Sample Size / Duration

V. Conclusions: • A very simple algorithmic phase I design was presented which had good operating characteristics and produced good estimates of the phase II dose. • The protocol of which this design formed a part was recently approved by the FDA with no comments whatsoever. • The Chief Medical Officer with authority over the trial was satisfied: “We'd be delighted for you to present the statistics publically. I hope that the title is something along the line of " An incredibly brilliant and clinically perfect clinical trial design".....!

An Incredibly Brilliant and Clinically Perfect Clinical Trial Design Rick Chappell Professor, Departments of Statistics and of Biostatistics and Medical Informatics University of Wisconsin Madison chappell@stat.wisc.edu 2010 Joint Statistical Meetings; Vancouver

Who Needs Bayesian Phase I Trials?

Who Needs Bayesian Phase I Trials?

Presentation Transcript

Statistical Science Issues in Preventive HIV Vaccine Efficacy Trials: Part II

Bayesian Statistics and Belief Networks

Introduction to Bayesian Networks

Metamorphic Phase Diagrams

Phase unwrapping, Wavelet transform profilometry (WTP), and MEtrology GUided RAdioTHerapy (MEGURATH)

Implementation of Bayesian Logistic Regression for dose escalation at Novartis Oncology

Homework 3: Naive Bayes Classification

Changes of phase usually involve a transfer of energy.

Bayesian Logic Programs for Plan Recognition and Machine Reading

Assessing the Effect of Visualizations on Bayesian Reasoning through Crowdsourcing

SPH6004 Advanced Biostatistics

Bayesian Decision Theory (Sections 2.1-2.2)

HIV Clinical trials at MRC CTU

Chapter 5 Phases and Solutions

Likelihood, Bayesian and Decision Theory

Statistics 542 Introduction to Clinical Trials Issues in Analysis of Randomized Clinical Trials

Pre-Bayesian Games

Critical Methological Issues in Recent Randomised Trials

Phase Diagrams

Phase-Locked Loop

Clinical Trials Overview