Comments on “Adaptation and Heterogeneity” by Armin Koch

Comments on“Adaptation and Heterogeneity”by Armin Koch Paul Gallo, Willi Maurer PhRMA Adaptive Design KOL Lecture Series November 14, 2008

Bottom line • We agree with many of Armin’s bottom-line statements. • Differences in opinion or perspective largely boil down to questions of degree. • e.g., how heavy is the burden of proof ?

Motivation • Not simply a matter of obtaining lower standards for sponsor advantage. • Statistical issues: subsets / subgroups always present challenges, and there can be a tendency in many contexts to over-react to signals. • No party wants to falsely invalidate or substantially undervalue meaningful results. • How do we go about this most rationally?

Scope • Our scope: changes which do not strongly structurally change the nature of the data. • e.g., SS re-estimation, dropping an arm (as in seamless II / III), changing randomization allocation, etc. • More substantial structural changes, e.g., involving the nature of endpoints, would likely raise much more challenging issues with regard to any possibility of combining information to test a common hypothesis.

“One trial or two?” A big question: • Where is within-trial heterogeneity important to investigate / understand? • All trials? (but we don’t, and there don’t seem to be standards) • Group sequential trials? (information leakage more easily envisioned, and evidence more accessible, but seems not routinely explored) • Adaptive trials?

Meta-analyses (slides 9-10) • Is there an implication that meta-analyses are in some sense less important than confirmatory trials? • Meta-analyses can affect medical practice. • Not all are the same, but we would start from a view that both can be important within their contexts, and the question is one of most accurately interpreting the results.

Meta-analyses (continued) • We may know that trials within a meta-analysis differ in many identifiable aspects: • investigators following different protocols and procedures, monitoring standards, calendar times, information from other trials out in the open, “therapeutic drift”, etc., etc.

Meta-analyses (continued) • Adaptive trials without major structural changes may have none of these problems. • When interpreting whether or not non-definitive but potentially important signals of interaction might be real, shouldn’t there be an important role for plausibility / rationale ?

“Natural reasons” (slide 11) • Because other factors can lead to some heterogeneity, “can we ignore the problem?” • No, but this highlights a difficulty and points toward caution in interpretation. • The concern: results might potentially be invalidated by a signal (real or not) unrelated to the adaptation, which would not have raised a concern had it arisen in a conventional trial.

“Natural reasons” • Underlying minor heterogeneity can increase the chance of a formal test flagging a signal. • e.g., a 0.15 level interaction test may have a higher effective false positive (interaction) rate, because its homogeneity null hypothesis is already not true.

Relationship to effect size (slide 12) • This seems not unlike how some important subgroup issues might be viewed. • A situation with a signal of heterogeneity, but with each subgroup showing benefit, would often be judged differently from one in which there was a subgroup that did not appear to benefit. • Some relationship to quantitative / qualitative issues.

“Information leakage” (slide 14) • We “can’t prove that (good procedures) have been followed” • True, but we wouldn’t say that this alone would be sufficient; it’s just one part of the case. • Evidence of confidentiality / compliance, numerical strength of the signal, scientific plausibility or explanation, etc., are all part of the story.

“Price to be paid” (slide 16) • Generally there are statistical prices to paid, but these are usually easily handled. • “Further investigation” of signals because of the added complexity in an adaptive design, as a price, seems fine also. • But the goal should be to most accurately draw conclusions from the data at hand; to falsely discount meaningful results at some non-trivial frequency may not be a good price to pay.

Practical implementation issues • Don’t lose sight of the challenges inherent in implementing an investigation. • Defining stages is not always as clear as it might sound initially. • e.g., the DMC dataset is not the same as the point of implementation.

Practical implementation (continued) • What’s the relevant unit for definition of stages? • patients (when they are randomized) or events (when they occur)? • Too much unstructured exploration with regard to these types of questions may increase the chance of false signals. • Time-to-event analyses may present special challenges because of confounding with non-proportional hazards.

Pre-specification • Not only with regard to methodology . . . • What am I concerned about in advance, i.e., what’s the mechanism by which a change might be induced, and is it consistent with what I later see in the data? • Shouldn’t a signal be taken more seriously if consistent with sound pre-stated concerns? • Of course we do have to leave room for exploration of unanticipated signals.

Comments on “Adaptation and Heterogeneity” by Armin Koch