**The changing landscape of interim analyses for efficacy /** futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts Biotechnology Council Cambridge, Mass June 2, 2009

**Reasons for Interim Analyses** Early stopping for safety extreme efficacy futility Adaptation of design based on observed data to play the winner / drop the loser maintain power make any adaptation, for whatever reason and whether or not data-derived, whilst controlling for

**Methods for Interim Analyses** Multi-stage designs / seamless transition designs Group-sequential designs Stochastic curtailment Sample size adjustments Adaptive (« flexible ») designs

**Early Stopping** • Helsinki Declaration: “Physician should cease any investigation if the hazards are found to outweigh the potential benefits.”(« Primum non nocere ») • Trials with serious, irreversible endpoints should be stopped if one treatment is “proven” to be superior, and such potential stopping should be formally pre-specified in the trial design.

**The Cost of Delay** « Blockbusters » reach sales > 500 M$ a year (> 1 M$ a day)

**Fixed Sample Size Trials…** 1 – the sample size is calculated to detect a given difference at given significance and power2 – the required number of patients is accrued3 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events

**…vs(Group) Sequential Trials…** 1 – the sample size iscalculated to detect a givendifferenceatgivensignificance and power2 – patients are accrueduntil a pre-plannedinterimanalysisof patient outcomestakes place3a – the trial isterminatedearly, or3b – the trial continues unchanged4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specifiednumber of events

**…vs Adaptive Trials** 1 – the sample size iscalculated to detect a givendifferenceatgivensignificance and power2 – patients are accrueduntil a pre-plannedinterimanalysisof patient outcomestakes place3a – the trial isterminatedearly, or3b – the trial continues unchanged, or3c – the trial continues withadaptations4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified or modifiednumber of events

**PHASE III** PHASE II Randomized phase II trial with continuation as phase III trial Simultaneous screening of several treatment groups with continuation as phase III trial : Arm 1 Arm 2 Arm 3 Early stopping ofone or more arms Comparison of the arms

**PHASE III** PHASE III INTERIM Phase III trial with interim analysis Phase III trial with interim look at data: Arm 1 Arm 2 Arm 3 Interim comparison ofthe arms Comparison of the arms

**Seamless transition designs(e.g. for dose selection)** Designs can be operationally or inferentially seamless:

**GroupSequential Trials** • If several analyses are carried out, the Type I error is inflated if each analysis is carried out at the target level of significance. • So, the interim analyses must use an adjusted level of significance so as to preserve the overall type I error.

**Inflation of with multiple analyses** With 5 analyses performed at level 0.05, the overall level is 0.15

**Adjusting for multiple analyses** The 5 analyses must be performed at level 0.0159 in order to preserve an overall level of 0.05

**Group sequential designs** • Test H0: Δ = 0 vs. HA: Δ ≠ 0 • m pts. accrued to each arm between analyses • Use standardized test statistic Zk, k=1,...,K

**Group-Sequential Designs – Type I Error ** • Probability of wrongly stopping/rejecting H0at analysis k PH0(|Z1|<c1, ..., |Zk-1|<ck-1, | Zk |≥ck) = πk • “Type I error spent at stage k” • P(Type I error) = ∑πk • Choose ck’s so that ∑πkα

**Group-Sequential Designs – Type II Error ** • Probability of Type II error is 1-PHA( U {|Z1|<c1, ..., |Zk-1|<ck-1, | Zk |≥ck} ) • Depends on K, α, β, ck’s. • Given the values, the required sample size can be computed • it can be expressed as R x (fixed sample size)

**Pocock Boundaries** • Reject H0 if | Zk| > cP(K,α) • cP(K,α) chosen so that P(Type I error) = α • All analyses are carried out at the same adjusted significance level • The probability of early rejection is high but the power at the final analysis may be compromised

**Pocock Boundaries** • p-values for Zk (two-sided)per interim analysis (K=5)

**O’Brien-Fleming Boundaries** • Reject H0 if | Zk | > cOBF(K,α)√(K / k) • fork=K we get | ZK | > cOBF(K,α) • cOBF(K,α) chosen so that P(Type I error) = α • Early analyses are carried out at extreme adjusted significance levels • The probability of early rejection is low but the power at the final analysis is almost unaffected

**O’Brien-Fleming Boundaries** • p-values for Zk (two-sided)per interim analysis (K=5)

**Wang & Tsiatis Boundaries** • Wang & Tsiatis (1987): Reject H0if | Zk | > cWT(K,α,θ)(K / k)θ - ½ • θ = 0.5 gives Pocock’s test; θ = 0, O’Brien-Fleming • implemented in some software (e.g. EaSt) • Can accomodate any intermediate choice between Pocock and O’Brien-Fleming

**Wang & Tsiatis Boundaries** • p-values for Zk (two-sided)per interim analysis (K=5) with = .2

**Haybittle & Peto Boundaries** • Haybittle & Peto (1976): Reject H0 if | Zk | > 3 for k = 1,...,K-1 Reject H0 if | Zk | > cHP(K,α) for k = K • | Zk | > 3corresponds to using p < 0.0026 • Early analyses are carried out at extreme, yet reasonable adjusted significance levels • Intuitive and easily implemented if correction to final significance level is ignored (pragmatic approach)

**Haybittle & Peto Boundaries** • p-values for Zk (two-sided)per interim analysis (K=5)

**Boundaries compared** • p-values for Zk (two-sided)per interim analysis (K=5)

**Boundaries compared** • Zk per interim analysis (K=5)

**Potential savings / costs in using group sequential designs** Expected sample sizes for different designs (K=5): - outcomes normally distributed with = 2- = 0.05- = 0.1 for A - B = 1

**Error-Spending Approach** • Removing the requirement of a fixed number of equally- spaced analyses • Lan & DeMets (1983): two-sided tests “spending” Type I error. • Maximum information design: • Error spending function → • Defines boundaries • Accept H0 if Imaxattained without rejecting the null

**Error-Spending Approach** • f(t)=min(2-2Φ(z1-α/2),α) yields ≈ O’B-F boundaries • f(t)=min(α ln (1+(e -1)t,α) yields ≈ Pocock boundaries • f(t)=min(αtθ,α): • θ=1 or 3 corresponds to Pocock and O’B-F, respectively

**How Many Interim Analyses?** • One or two interim analyses give most benefit in terms of a reduction of the expected sample size • Not much gain from going beyond 5 analyses

**When to Conduct Interim Analyses?** • With error-spending, full flexibility as to number and timing of analyses • First analysis should not be “too early” (often at 50% of information time) • Equally-spaced analyses advisable • In principle, strategy/timing should not be chosen based on the observed results

**Who conducts interim analyses?** • Independent Data Monitoring Committee • Experts from different disciplines (clinicians, statisticians, ethicists, patient advocates, …) • Reviews trial conduct, safety and efficacy data • Recommends • Stopping the trial • Continuing the trial unchanged • Amending the trial

**Sample Size Re-Estimation** • Assume normally distributed endpoints • Sample size depends on σ2 • If misspecified, nIcan be too small • Idea: internal pilot study • estimate σ2 based on early observed data • compute new sample size, nA • if necessary, accrue extra patients above nI

**Early Stopping for Futility** • Stopping to reject H0ofno treatment difference • Avoids exposing further patients to the inferiortreatment • Appropriate if no further checks are needed on, e.g.,treatment safety or long-term effects. • Stopping to acceptH0 ofno treatment difference • Stopping “for futility” or “abandoning a lost cause” • Saves time and effort when a study is unlikely to leadto a positive conclusion.

**Two-Sided Test**

**Stochastic Curtailment** Idea: • Terminate the trial for efficacy if there is high probability of rejecting the null, given the current data and assuming the null is true among future patients • Conversely, terminate the trial for futility if there is low probability of rejecting the null, given the current data and assuming the alternative is true among future patients

**Conditional Power** • At the interim analysis k, define pk(Δ) = PHA(Test will reject H0 | current data) • A high value of pk(0) suggests T will reject H0 • terminate the trial & reject H0ifpk(0) > ξ • terminate the trial & accept H0if 1-pk(Δ) > ξ’ (1-sided) • probabilities of error, type I α/ ξ, type II β / ξ’ Note: ξ and ξ’ 0.8

**Conditional Power** • Unconditional power for α=0.05 and β=0.1 at Δ=0.2 • Conditional power for a mid-trial analysis with an estimate of Δof 0.1 • probability of rejecting the null at the end of the trial has been reduced from 0.9 to 0.1

**Conditional Power** B(t) = Z(t)t1/2 = t

**Conditional Power** Slope = assumed treatment effect in future patients

**Conditional Power** Crosshatched area = conditional power

**Predictive Power** • Problem with the conditional power approach: it is computed assuming Δ not supported by the current data. • A solution: average across the values of Δ • “Predictive power” • π(Δ | data)is the posterior density • Termination against H0ifPk > ξetc. • What prior ?

**Futility guidelines**

**Overruling futility boundaries**

**Adaptive Designs** • Based on combining p-values from different analyses • Allow for flexible designs • sample size re-calculation • any changes to the design (including endpoint, test, etc!)

**Adaptive Designs** • Lehmacher and Wassmer (1999): At stage k, combine one-sided p-values p1,... ,pk L = k-1/2∑Φ-1(1-pk) • Use any group sequential design for L • Slight power loss as compared to a group-sequential plan • Flexibility as to design modifications: OK for control of type I error, BUT…

**Potential concerns with adaptive designs** • Major changes between cohorts make clinical interpretation difficult • If eligibility / endpoint changed, what is adequate label? • Temporal trends • Operational bias • Less efficient than group sequential for sample size adjustments • Modest gains (in general), high risks