Some Considerations for Choosing Among Types of Phase II Designs

Some Considerations for Choosing Among Types ofPhase II Designs Paul Catalano June 26, 2009

Purpose of Phase II Studies • Determine if a new agent or a new treatment regimen appears sufficiently efficacious to be worth further investigation • Not attempting to prove or establish that the new agent improves outcome • Verify the safety of the therapy • Provide statistical rigor/formal evaluation context and targeted patient population

General Approach • Often formulate as testing a null hypothesis vs. an alternative • E.g. H0: pr = 0.05 vs. Ha: pr = 0.20, where pr is the true proportion of patients who will respond to the new agent • Consequence of a type I error (): an ineffective agent will be studied further • Use  = 0.10 (one-sided) • Larger than in phase III studies

General Approach • Consequence of a type II error (): an effective agent will not be studied further •  should be < 0.10 • In practice, tend to be multiple phase II studies performed in multiple diseases, so the overall chance of missing an effective treatment is lower • Selection of therapies for phase III testing is based on all available data, not on a single phase II study

Types of Phase II Designs • Single arm with single analysis (can have multiple single arm studies in one protocol) • Single arm with interim stopping rules (usually with suspension of accrual) • Randomized selection designs(pick-the-winner) • Comparative randomized control • Randomized discontinuation designs

Classic Design for Screening New Agents • Patients refractory to standard therapy • If some patients improve, agent must have some activity • Often use H0: pr = 0.05 vs. Ha: pr = 0.20 • Simon’s (1989) optimal two-stage designs minimize expected sample size under H0

Classic Design for Screening New Agents • Simon’s optimal design for pr = 0.05 vs 0.20: • 1st stage: treat 12 patients; stop if no responses • 2nd stage: treat 25 patients; conclude inactive if < 4 / 37 (11%) respond • CTEP / IDB has been pushing this design for new agents in diseases without prior evidence of activity

Classic Design for Screening New Agents • Single arm two-stage designs are inefficient for multicenter studies • Time and effort needed to develop protocol and CRFs and set up database • Cost of activation at institutions • Prefer settings where single stage designs are appropriate or studies with multiple strata and / or multiple arms

Single Stage Accrual Designs • Might be appropriate • If some prior evidence of activity • For combinations of new drugs with standard treatments • Example: H0: pr = 0.20 vs Ha: pr = 0.37 (null rate depends on level of activity for standard rx) • 1-stage: 45 patients, reject H0 if > 12 / 45 (27%) respond • 2-stage: conclude inactive if < 5 / 25 (20% 1st stage) or 13/50 (26% overall) respond

Improvement in Disease Stabilization • Cytostatic agents might improve disease stabilization rates rather than improve response rates • Test for improvement in disease stabilization rates; e.g. H0: ps= 0.30 vs. Ha: ps= 0.50, where ps = proportion stable or responding (free of progression) at x months (e.g. x = 4) • Calculations the same as for response

Other Endpoints • Multinomial: test e.g. H0: pr = 0.05 and ps= 0.30 vs. Ha: pr > 0.05 or ps> 0.30 • Less efficient than binomial • May be more difficult to interpret • TTP or PFS • Kaplan-Meier estimate at single time or other nonparametric test • Parametric (e.g. exponential) models can be slightly more efficient • Survival generally not appropriate

Example Multinomial Design • Test e.g. H0: pr = 0.05 and ps= 0.30 vs. Ha: pr > 0.05 or ps> 0.30 • Need to consider power against multiple alternative values; e.g. Ha1: pr = 0.20, ps= 0.30Ha2: pr = 0.14, ps= 0.40Ha3: pr = 0.05, ps= 0.50 • 1-stage: n=46, reject H0 if > 6 responses or >20 cases responding or stable •  = 0.09; power = 0.92 for Ha1,Ha2, &Ha3

Types of Randomized Phase II Designs • Separate evaluation of each arm • Each arm evaluated in a similar population • Selection designs: select the ‘best’ arm for further study • Comparative randomized control • Randomized discontinuation Randomized designs are larger and more complex – need to explain each arm to patients

Control Arms • Concern about selection bias in studies without a simultaneous control group • Studies can enroll different patient groups even with the same nominal population • Population drift and stage migration • Control groups more appropriate for evaluating contribution to a combination or effect on progression than for determining if any response activity • Comparing studies from different groups

Control Arms • Often not needed because • Phase II studies can only detect fairly large effects, so biases would need to be large • Consequence of a false positive is further testing of an inactive drug • Cooperative group or other studies conducted in the same network with central data review produce fairly consistent results • Increase the time and expense for phase II evaluation

Randomized Selection (Pick-the-Winner) Design • (Simon, Wittes and Ellenberg, 1985) randomize between 2 or more experimental arms (no control arm) • In a sense, least efficacious arm is a control for the others • Select the best arm for further evaluation • Usually define ‘best’ to be the arm with the best outcome, no matter how small the difference

Randomized Selection (Pick-the-Winner) Design • With two arms,   0.50 • Rationale: doesn’t matter which arm is selected if they are nearly equivalent • Often separate efficacy test for each arm, too • 1-stage or 2-stage • Usually prefer randomizing over a series of separate studies • Facilitates (informal) comparisons • Guards against sampling bias

Randomized Selection (Pick-the-Winner) Design Estimated Resp Rate R1/n1 R2/n2 Rk/nk R A N D O M I Z E RX1 RX2 . . . . . . RXk RXj is ‘best’ if Rj/nj > Ri/ni for i  j Can use other endpoints

Randomized Selection (Pick-the-Winner) Design • Example: Simon’s optimal 2-stage design for H0: pr = 0.20 vs Ha: pr = 0.40 enrolls 17 patients in the 1st stage and 20 in the 2nd( =  = .10) • Apply this design to each arm in a 2-arm randomized selection design

Randomized Selection (Pick-the-Winner) Design • Probability of selecting the best arm declines as the number of arms increases P{X1>max(X2,…,Xk)} = x P(X1=x)P(X2<x, X3<x…,Xk<x|X1=x) = x P(X1=x)P(X2<x) P(X3<x)… P(Xk<x) = x P(X1=x)P(X2<x)k-1 if X2, …, Xkhave the same distribution

Randomized Selection(Pick-the-Winner) Design • X1~Bin(50,.32); X2,…,Xk~Bin(50,.20) gives P{X1>max(X2,…,Xk)} = .90 for k = 2 andP{X1>max(X2,…,Xk)} = .72 for k = 6 • Advanced renal trial of several targeted agents: 6 arms, n=55 / arm • TTP compared via Cox model • If one arm has median TTP of 7.2 months and the other 5 have median TTP of 4.8 months (50% improvement), then the probability of selecting the best arm is 0.87

Comparative Randomized Control Design • Discussed for evaluating cytostatic agents in Korn et al. (2001) • Randomize experimental vs. standard and formally compare the arms • Appropriate if don’t have a reasonable prior estimate of expected control arm outcomes • Endpoint could be any of the standard phase II endpoints (e.g. TTP, response) • Might target larger differences than a phase III

Comparative Randomized Control Design • Test could be a definitive (phase III) evaluation with  < 0.025 (one-sided) • If little prior phase II efficacy data, need early stopping rules for lack of benefit • Might not be appropriate if a second phase III study evaluating survival would be needed

Comparative Randomized Control Design • Test could be a suggestive (phase II) evaluation with a larger  (e.g. 0.10 to 0.20) • Appropriate for screening new agents • If positive, still needs to be followed by a definitive phase III study • Korn et al. suggest using  = 0.20, because the sample size with  = 0.10 is large enough that it might be better to go directly to the definitive study

Study of Bevacizumab vs. Placebo in RCC • 3-arm comparison of TTP (two dose levels of bevacizumab), targeting a large difference (100% improvement in median TTP), but designed to be definitive (Yang, 2003) • Overall  = .05 (two-sided),  = 0.20 • Each comparison at one-sided 0.0125 • Needed about 50 patients / arm (stopped early because of highly significant results) • Crossover from placebo to low dose drug

Yang Study of Bevacizumab vs. Placebo • Was overall  = .05 appropriate? • A second, larger study is still needed for survival • Could have identified drug as promising with even fewer patients (larger )

Yang Study of Bevacizumab vs. Placebo • Was a placebo needed? • Evaluation bias should be much smaller than a doubling of TTP • May not be to identify promising drugs • FDA tends to require a placebo for TTP • Was the control arm needed? • Would results from a single arm, single institution study have been convincing?

E5397 – Advanced Head and Neck Cancer • Cisplatin + C225 vs. Cisplatin + Placebo • Designed to have 90% power to detect an improvement in median PFS from 2 months to 4 months (100% improvement) with  = 0.025 (one-sided) • With allowance for non-compliance, required 54 eligible patients / arm • Final accrual was 117 eligible patients

E5397 Summary of Results Hazard Ratios (Placebo/C225) and 95% CIs PFS: 1.31 (0.91, 1.89) Survival: 1.16 (0.80, 1.69)

E5397 PFS by Treatment

E5397 Survival by Treatment

E5397 • Study is not definitive – underpowered for both PFS and survival • Is it promising – should a follow-up study of C225 be done? • Would a better strategy have been a single arm phase II with a response endpoint, followed by a definitive phase III based on the ‘promising’ response rate of 26%?

E5397 • PFS reaches the one-sided  = 0.10 cutoff for a ‘promising’ phase II result • Survival would not have been an appropriate endpoint • Estimated improvement is 16% • Confidence interval consistent with 20% decrease to a 69% increase • Phase II sample sizes are not adequate to detect realistic survival effects

Randomized Discontinuation Design • An enrichment strategy based on randomizing patients who appear to be doing well on the treatment (Rosner, Stadler, Ratain, 2002) • Initially all patients are treated, patients free of progression for some period of time are randomized between continuing treatment and placebo, with crossover from placebo to treatment at progression or specified PFI • Complex design with a blinded randomization and 3 registration points

Randomized Discontinuation Design Off study PD R A N D O M I Z E R E G I S T E R R E A S S E S RX Initial RX SD (run-in) RX Placebo Crossover at PD or after specified PFI Response Continue RX

Randomized Discontinuation Design • Usefulness depends on how successful the run-in is in selecting patients benefiting from treatment • TTP is highly variable in most diseases, so randomized population will be a mixture • Korn et al. (2001), Capra (2004) suggest often less efficient than standard RCT • Carry-over effect could dilute difference between randomized arms • Requires much larger sample size

Randomized Discontinuation Design • CALGB 69901 (CAI in RCC) • Randomize patients if stable after 16 weeks • Enrolled 374 patients; randomized 65 eligible patients (17%) • Enrichment strategy was not successful, but does CAI have any activity? • Did they learn any more from 374 patients than ECOG did from 57 patients in a more traditional two-stage phase II design (E4896)?

Main Points • In many settings, conventional phase II designs may still be appropriate • Start-up costs for single-arm two-stage designs are a concern • Randomized phase II studies allow evaluation of multiple agents or schedules and protect against sampling bias • Selection designs are useful for informal comparison and identifying promising agents

Main Points • Control arms should not ordinarily be needed, but can be effective in some settings • Survival is seldom (never?) the best phase II endpoint • Randomized discontinuation designs may not be appropriate and need to be strongly justified

References Capra WB (2004). Comparing the power of the discontinuation design to that of the classic randomized design on time-to-event endpoints. Controlled Clinical Trials 25:168-177. Freidlin B, Dancey J, Korn EL, Zee B, Eisenhauer E (2002) Multinomial phase II trial designs (letter to the editor). Journal of Clinical Oncology 20:599. Korn EL, Arbuck SG, Pluda JM, Simon R, Kaplan RS, Christian MC (2001). Clinical trial designs for cytostatic agents: are new approaches needed? Journal of Clinical Oncology 19:265-272. Rosner GL, Stadler W, Ratain MJ (2002). Randomized discontinuation design: application to cytostaticantineoplastic agents. Journal of Clinical Oncology 20:4478-4484. Simon R (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10:1-10. Simon R, Wittes RE, Ellenberg SS (1985). Randomized phase II clinical trials. Cancer Treatment Reports 12:1375-1381. Yang JC et al. (2003). A randomized trial of bevacizumab, an anti-vascular endothelial growth factor antibody, for metastatic renal cancer. New England Journal of Medicine 349:427-434.

Some Considerations for Choosing Among Types of Phase II Designs

Some Considerations for Choosing Among Types of Phase II Designs

Presentation Transcript

Adaptive Designs Terminology and Classification Adaptive Seamless Phase II/III Designs

What are some considerations in choosing interventions?

Designs for Phase II Clinical Trials

Types of Research and Designs

Considerations for Topical Microbicide Phase 3 Trial Designs, an Investigator’s Perspective

Choosing Research Designs II

Types of Research and Designs

Types of Designs

Types of Group Designs

Considerations For Curriculum and Instructional Designs

Some considerations

Types of study designs

Types of Experimental Designs

Some Common Types of Staircase Designs

Dentists - Some Key Considerations When Choosing a Dentist

Considerations for Choosing MIPS Quality Measures

Designs for Phase II Clinical Trials

Types of Research Designs

Phase II Trial Designs: Old and New

Some Coding Considerations

SOME OTHER CONSIDERATIONS

Types of Necklace Designs