Interim Analyses of Clinical Trials. A Requirement. Outline. Background and how DSMBs arose and function Group sequential methods Examples. References . Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring Committees in Clinical Trials, Wiley, 2002.
Interim Analyses ofClinical Trials A Requirement
Outline • Background and how DSMBs arose and function • Group sequential methods • Examples
References • Ellenberg SS, Fleming TR, DeMets DL, Data Monitoring Committees in Clinical Trials, Wiley, 2002. • DeMets DL, Furberg CD, Friedman LM. Data Monitoring in Clinical Trials. A Case Studies Approach, Springer, 2006. • Jennison C and Turnbull BW, Group Sequential Methods with Applications to Clinical Trials, Chapman and Hall, 2000. • Proschan MA, Lan KKG, Wittes J, Statistical Monitoring of Trials. A Unified Approach, 2006, Springer. • http://www.biostat.wisc.edu/landemets
Structure for Cooperative Studies (Greenberg Report) Policy BoardorAdvisory Committee National Advisory Heart Council Initialreviewgroup Institutestaff Executive CommitteeorSteering Committee CoordinatingCenter Participating Units Cont Clinical Trials 9:137-48, 1988.
Monitoring Committee Acronyms • PAB = Policy advisory board • DSMB = Data and Safety Monitoring Board • DMC = Data Monitoring Committee • ESMB = Efficacy and safety monitoring board • OSMB = Observational study monitoring board
Steering/Executive Committee/Protocol Team Study design Patient recruitment and follow-up Data collection Quality assurance Review of external data Study reports DMC or DSMB Safety of patients Protection of integrity of study Review of blinded data on safety and efficacy of treatments Review of trial conduct, amendments and external data Responsibilities DMCs are responsible to patients, investigators IRBs, regulatory agencies and sponsor.
Data Monitoring Rationale • Accumulating data needs to be monitored for risk/benefit (Safety is best assured by comparing the rate of adverse events with a control group) • Reasons: • Ethical : do not expose participants to an inferior intervention longer than needed to test hypothesis • Scientific: assessment of relevance of question (e.g., external data), design assumptions, logistical problems. • Economic: do not waste financial or human resources for a futile trial.
Reasons for Early Terminationof Clinical Trials • Based on accumulated data from the trial: • Unequivocal evidence of treatment benefit or harm • Unexpected, unacceptable side effects • No emerging trends and no reasonable chance of demonstrating benefit • Based on overall progress of the trial: • Failure to include enough patients at a sufficient rate • Lack of compliance in a large number of patients • Poor follow-up • Poor data quality
Today • All NIH sponsored clinical trials are required to have a data monitoring plan • NIH-sponsored trials with clinical endpoints have a DSMB • Many industry sponsored studies have a DSMB • The FDA has prepared a guidance document (Establishment and Operation of Clinical Trial Data Monitoring Committees) http://www.fda.gov/RegulatoryInformation/Guidances/ucm127069.htm • There is variation in operating procedures for DSMBs
When is an Independent DSMB Needed • Early phase studies • Monitoring usually at local level; independent DMC not usually needed. • Phase III & IV studies with morbidity/mortality outcomes; pivotal phase III trials • Frail populations, e.g., children, elderly • Trial with substantial uncertainty about safety, e.g., gene therapy See FDA Guidance and ICH/E9, section 4.5.
DSMBComposition: Multidisciplinary • Clinical experts in the subject matter area • Biostatisticians with expertise in clinical trials and preferably in the subject matter area • Others depending on the nature of the study, e.g., ethicist, pharmacologist, patient advocate Senior investigators without significant conflicts of interest
Independence of DSMB: • Voting members should not be part of the investigative team or work for the sponsor • There should be a clear “need to know” policy for non-DSMB members, e.g., the statistician preparing interim summaries needs to know and may be an employee of the sponsor or member of the investigative team • Members should state potential conflicts This view is not shared by all. See Meinert CL and discussion, Cont Clin Trials, 1998
Typical DSMB Meeting Format • Open Session • Progress report using open data (no outcome data by treatment group) • Sponsor, e.g., NIH, Executive Committee, Protocol Chairs, DSMB and unblinded statisticians • Closed Session • Outcome data by treatment group (usually coded) • DSMB and unblinded statisticians only • Executive Session (DSMB only) • Debriefing Session • DSMB, Sponsor, Executive Committee, Protocol Chairs, and unblinded statisticians
DSMB Confidentiality • Interim data reviewed by the DSMB must remain confidential • Members must not share interim data with anyone outside DSMB • Leaks can affect • Patient recruitment • Protocol compliance • Outcome assessment • Trial integrity and support
DMC Recommendations • Continue the study unmodified • Modify the study protocol • Terminate the study • Serious toxicity • Clear benefit • Futility • Design/logistical problems
Outline • Background and how DSMBs function • Group sequential methods • Examples
DSMB Decision Making Can Be Complex • Internal consistency • Benefit/Risk • External consistency • Current versus future patients • Clinical and public health impact • Statistical issues – monitoring guidelines
Overall Probability of Achieving a Result with Given Nominal Significance of 0.05 After N Repeated Tests Under Ho 1 .05 2 .083 3 .107 4 .126 5 .142 10 .193 25 .266 No. of Tests (N) Probability Ref: McPherson, NEJM, 1974.
Value of Nominal Significance Level Necessary to Achieve a True Level of 0.05 After N Repeated Tests 1 .05 2 .0296 3 .0221 4 .0183 5 .0159 10 .0107 Significance Level Which Should be Used No. of Tests (N) Ref: McPherson, NEJM, 1974.
Early Work • Acceptance sampling • Wald (1947) sequential probability ratio test Manufacturing problems, continuous monitoring of the data, no upper bound on sample size
Group Sequential Methods • Calculate a summary statistics (e.g., Z for logrank test) on each additional new group of participants (events) • Compare the test statistic to a critical value that preserves overall type 1 error (e.g., 0.05).
Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks InterimO-Brien/ Haybittle/ AnalysisPocockFleming Peto 1 2.49 5.46 3.0 2 2.49 3.85 3.0 3 2.49 3.15 3.0 4 2.49 2.73 3.0 5 2.49 2.44 3.0 6 2.49 2.23 3.0 7 2.49 2.06 1.96 (2.00)
General Approach • Compute sample size as if a single look (fixed sample approach) • Specify number of interim analyses and stopping boundary (usually OBF). • Inflate sample size to preserve assumed power using constants in table (not always done as adjustment is minor). • Compute the standardized statistic Zk at each analysis and compare with critical values corresponding to monitoring boundary chosen. • At the end or upon early termination determine P-values and confidence intervals in the usual manner.
Problems with Initial Approach • Difficult to specify number of analyses in advance • Logistically difficult to organize reviews after equal increments of information. Solutions: Slud and Wei and Lan-DeMets
Flexible Approaches • Slud and Wei (JASA, 1982) – specify exit probabilities for each look (stage) such that they sum to , e.g., the prob of exiting the kth stage is the joint prob of not exiting the 1st k-1 stages and exiting the kth one. • Lan-DeMets (Biometrika, 1983) – specify a use function or type I error spending function, e.g., at time zero, used = 0 and with full information used = 0.05 (or nominal level)
(t2 ) (t1 ) Spending Function (t) Alpha .05 (number of events observed at monitoring) (total number of anticipated events) (t ) } .0 t1 t2 1 Information Fraction spending function plotted over fraction of total information to be obtained in the study, evaluated at two arbitrary points, t1 and t2 in the study t = Cont Clin Trial 2000;21:190-207
Plots of Pocock-type and O’Brien Fleming-type spending functions for a one-sided 0.025 significance level, for four analyses at 25%, 50%, 75% and 100% of the expected information. Spending Functions Pocock OBF
Approximate O’Brien Fleming Boundaries Using Lan-DeMets Spending Function Approach: Overall Significance =0.05 and 4 Looks Interim O-Brien OBF AnalysisFlemingLan-DeMets 1 4.05 4.33 2 2.86 2.96 3 2.34 2.36 4 2.02 2.01
Usual Choices for Information • Planned number of events in event-driven trial with common closing date chosen to achieve event target. • Follow-up time, e.g., percent of participants attending final follow-up visit in trial with fixed follow-up for each participant. • Calendar time, e.g., trial with common calendar closing date (e.g., to ensure some minimum follow-up for each participant) but not event-driven.
Beta-Blocker Heart Attack Trial (BHAT) • Placebo-controlled trial of propranolol in patients with a recent MI • Recruitment began in June 1978; planned termination June 1982; average of 3 years of follow-up and maximum of 4 • Primary endpoint – all-cause mortality • Event target - 629 deaths • Stopped early in October 1981 JAMA 1982; 247:1707-1714.
Interim Monitoring of BHAT Study 1 May 1979 11 (.23) 56 (.09) 1.68 2 Oct 1979 16 (.33) 77 (.12) 2.24 3 Mar 1980 21 (.44) 126 (.20) 2.37 4 Oct 1980 28 (.58) 177 (.28) 2.30 5 Apr 1981 34 (.71) 247 (.39) 2.34 6 Oct 1981 40 (.83) 318 (.51) 2.82 LookNumber MonitoringDate Months Since Start CumulativeDeaths Logrank Statistic
Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF) AnalysisOBFEventsCalendar 1 5.46 8.00 4.53 2 3.85 8.00 3.73 3 3.15 4.86 3.20 4 2.73 4.08 2.75 5 2.44 3.41 2.47 6 2.23 2.95 2.28 7 2.06 1.97 2.05 Logrank Z=2.82
Flexible Number of Looks • Another advantage of the Lan-DeMets spending function approach is the flexibility with the number of looks. • Suppose BHAT was not stopped and there were 3 more looks before the end (10 total). • Looks 7-10 correspond to information fractions considering the number of events of 0.65, 0.75, 0.85 and 1.0. • Stopping boundaries can be calculated conditioned upon the previous tests
Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF) Analysis7 Looks10 Looks 18.00 8.00 28.00 8.00 34.86 4.86 44.08 4.08 53.41 3.41 62.95 2.95 71.97 2.58 8 2.41 9 2.26 10 2.06
Suppose We Get To the 6th Analysis by A Different Route • Information fractions are .05, .20, .30, .40, .45 • Instead of .09, .12, .20, .28, and .39
Critical Values (z) for 2-sided Group Sequential Design with .05 Overall Significance and 7 Looks(BHAT) Interim Lan-DeMets (OBF) Analysis7 Looks7 Looks 18.00 8.00 28.00 4.89 34.86 3.93 44.08 3.33 53.41 3.19 6 2.95 2.98
Variations of the Theme • Asymmetric boundaries (e.g., non-significant harmful effect of new treatment) • Use upper boundary for superiority and less conservative boundary for harm (Z= -1.5 or –2.0, or OBF for efficacy and Pocock for harm) • Appropriate for an investigational product but probably not for a product already approved and used as part of standard of care • Multiple outcomes, e.g., efficacy and safety, and composites • Multiple trials (CHARM heart failure, Cox-2 chemo-prevention) • Futility and curtailed sampling procedures (conditional and unconditional power) • Repeated confidence intervals (e.g., use OBF critical values to compute interim CIs)
Asymmetric Monitoring Boundary for Harm Harm Pocock 2.4 1.5 Z Benefit
SMART Study Design CD4+ cell count >350 cells/mm3 n = 2720 n = 2752 Drug Conservation (DC) Strategy [Stop or defer ART until CD4+ < 250; then episodic ART based on CD4+ cell count to increase counts to > 350] Virologic Suppression (VS) Strategy [Use of ART to maintain viral load as low as possible throughout follow-up] Plan: 910 primary endpoints; 8 years average follow-up. Intervention interrupted on 11 January 2005. N Engl J Med 2006.
SMART Guideline “…it is recommended that the DSMB consider early termination or protocol modification only when the O’Brien-Fleming boundary is crossed for the primary endpoint and the findings for the primary and the composite cardiovascular, metabolic endpoint are consistent...”
Interim Monitoring: O’Brien Fleming Boundaries for the Primary Endpoint, by DSMB Date
Interim Monitoring: O’Brien Fleming Boundaries for the Primary Endpoint, by Cut Date
SMART Primary and Supportive Endpoint Results HR (DC/VS) DC Group VS Group N Rate N Rate [95% CI] P-value • OD or death • (primary endpoint)122 3.450 1.4 2.5 [1.8, 3.5] <0.001 • CVD, Renal, Liver65 1.839 1.1 1.7 [1.1, 2.5] 0.009 • - CVD 48 1.331 0.8 1.6 [1.0, 2.5] 0.05 • - Renal 9 0.22 0.1 4.5 [1.0, 20.9] 0.05 • - Liver 10 0.37 0.2 1.4 [0.6, 3.8] 0.46
Futility • Usual definition - convincing evidence exists that the new treatment is not beneficial. • If this is the case, minimizing exposure to an ineffective treatment with potential toxicities and saving resources should lead to a consideration to stop the trial. • What is convincing? • Futility, more generally, can also be impacted by low event rate or slow enrollment (e.g., CVD mortality outcome in the Physician’s Health Study).
Conditional Power (or Stochastic Curtailment) to Assess Futility • What is the probability of rejecting the null hypothesis (i.e., getting a significant result) given the data to date and my best guess about the future, e.g., • will look like the past • no difference • like assumed in the design Lan KKG, Wittes J, Biometrics, 1988.