Robustness of an Odds-ratio Test in a Stratified Group Sequential Trial with a Binary Outcome Measure Jianmin Pan Department of Biostatistics, St. Jude Children’s Research Hospital Memphis, TN, USA Nankai University, July 2006
Background for the Problem Motivation: Design of a clinical trial at St. Jude Children’s Research Hospital, Memphis, TN Subject: The patients undergoing treatment for acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML) and receiving intrathecal chemotherapy in the hospital Procedure: Access the cerebrospinal fluid (CSF) through lumbar puncture (LP) Problem: Leakage of CSF during the procedure causes postdural puncture headache (PDPH) Treatment: Two types of needles have been used for LP (1) the Sprotte needle (Arm A) (2) the Quinke needle (Arm B) Arm A is believed to carry a lower risk of CSF leakage than Arm B Research Purpose: Test whether the frequency of PDPH is reduced in the group receiving LP with the Sprotte needle vs. those receiving LP with the Quinke needle
Introduction • Clinical trials compare two or more treatment groups by using a binary outcome measure • Group sequential designs are employed by setting up the stopping boundaries for the interim analyses by using standard procedures • A stratified analysis based on the “odds ratio” is used in the research • There are often several factors that affect the primary outcome of interest but whose true distribution and effect size are unknown • Zelen’s block randomization scheme is employed to ensure no significant imbalance with respect to the stratification factors in two treatment groups
Statistical Problem • To make sure that the patients are not deprived of the better treatment procedure, we used a group sequential design with five equally spaced interim analyses • Roughly 20% of the children undergoing LP using the Quincke needle are expected to experience PDPH • The LP procedure using Sprotte needle would be considered better if we can demonstrate a 10% reduction in the incidence of PDPH (from 20% to 10%) with at least 80% power and type I error control at =0.05 • Hypothesis testing • By stopping the trial early to reject the null or alternative hypotheses we would need to enroll 359 patients (East, 2000) • The stopping rules corresponding to the five interim analyses are given in Table A (Jennison and Turnbull, 2000)
Stratified Analyses • The incidence of PDPH was expected to differ by disease type (ALL vs. AML) and age ( the younger, 1-9 years, vs. the older patients, 10-18 years) • A priori distribution of patients falling into the two categories was unknown • We decided to stratify the randomization by the two factors by using Zelen’s (1978) block randomization scheme to reduce imbalance between the two groups • All patients registered and randomized, consistent with the “intent to treat” principle • The odds ratio between two response probabilities and :
Test Statistic • Test statistic: (1) where is the estimate of variance of and , and : the numbers of responses and : the numbers of observations in the j-th stratum at the k-th analysis in Arms A and B
Simulation Study • To assess under what conditions this approach is reasonable, we undertook a simulation study. • We examined the behavior of this approach under various sampling schemes and a variety of response probabilities for different strata. • We considered only two stratification factors: Age and Disease Type. • The sample size was fixed to enroll a maximum of 359 patients. • We evaluated the performance of the above approach in maintaining the type I error control and assessed its effect on the power properties. • We repeated 10,000 times for each combination: response probability and sampling scheme
Simulation Setup (4 Scenarios): Null Distribution • Equal or unequal sampling scheme; i.e., the probabilities of a patient falling into one of the four strata are the same or different, and assuming the response probabilities to be the same across four strata, that is, or ( ). • Equal or unequal sampling scheme with different odds ratios across four strata, assuming thatand that is varied so that • Equal or unequal sampling scheme with the same odds ratio but different response probabilities across four strata; that is, or . • Unequal sampling scheme with different odds ratios across four strata, assuming that and that is varied so that
Simulation Setup (4 Scenarios): Power Properties • Equal or unequal sampling scheme and assuming the response probabilities to be the same across four strata, that is, and • Equal or unequal sampling scheme with different odds ratios across four strata, assuming that and that is varied so that • Equal or unequal sampling scheme with the same odds ratio but different response probabilities across four strata; that is, • Unequal sampling scheme with different odds ratios across four strata, assuming that and that is varied so that
Chart 3 (Scenario III) • Type I Control Case 1: .10,.10,.40,.40.30,.30,.10,.10 Case 2: .25,.25,.25,.25.25,.15,.20,.20 Case 3: .10,.20,.30,.40.30,.40,.50,.60 Case 4: .40,.40,.10,.10.70,.60,.50,.40 Case 5: .10,.10,.40,.40.60,.60,.10,.10 Case 6: .40,.40,.10,.10.80,.80,.10,.10 • Power Case 1: .25,.25,.25,.25 .267,.047,.047,.073// .45,.10,.10,.15 Case 2: .25,.25,.25,.25 .073,.129,.160,.047// .15,.25,.30,.10 Case 3: .40,.40,.10,.10 .800,.800,.100,.100// .90,.90,.20,.20 Case 4: .40,.10,.25,.25 .193,.023,.100,.100 // .35,.05,.20,.20 Case 5: .45,.30,.05,.20 .308,.073,.023,.047// .50,.15,.05,.10 Case 6: .40,.40,.10,.10 .600,.600,.100,.100// .77,.77,.20,.20
Concluding Remarks • Scenario I (Table 1): 1. Type I error control is well maintained for both (stratified & unstratified) methods and in the range of 5%. 2. Power estimates are in the desired range (80%). • Scenario II (Table 2): 1. As long as the weighted average, , of 0.20 is achieved for the treatment group (Arm A) the type I error control is well maintained for both methods and the unequal sampling distribution has little impact on the type I error control. 2. As long as the weighted average of response probabilities is 0.10 one can be reasonably sure that the power of 80% would be achieved. The type I error is well controlled and the power estimates are in the desired range for scenarios 1 & 2.
Scenario III (Chart 3): 1. Type I error control estimates based on stratified analysis are well maintained and are in the range of 5%. 2. When there is large variation within , the unstratified analysis appear somewhat conservative, irrespective of the sampling scheme used. 3. Even though the odd ratio for each stratum is fixed at 0.44 but the power estimates are highly variable depending on the difference in the weighted average of response probabilities in the two groups. 4. It is seen that the power estimate is more stable and approximately in the 80% range when the difference between the weighted averages ( - ) is approximately 0.10. 5. Furthermore, the approach based on stratified analysis shows consistently better power properties than the one based on unstratified analysis. If this difference is smaller than 0.10 then the power could be significantly lower and the unstratified approach is likely to yield even lower power If this difference is larger than 0.10 then the method based on stratified approach is more likely to provide higher power than anticipated.
Scenario IV (Chart 4) When and , then depending on the sampling scheme 1. Type I error control with either approach is not maintained and can either be very conservative or invalid. 2. There is no point in comparing the power estimates since test is not valid 3. The power estimates are consistent with the finding of the null distribution control that the power is either very low or artificially inflated.
Conclusions Based on the simulation study presented above it seems that The most important factor in assuring that the proposed sample size will guarantee the desired power is the weighted average of the response probabilities. As long as the difference in the weighted averages between the two groups is maintained at about 0.10 then we can be reasonable sure that the desired power of at least 80% will be achieved. Using the raw averages could provide misleading conclusions. For most of the situations the stratified approach is superior to the unstratifed approach and should be used in designing and analyzing clinical trials. One must pay close attention to see if the observed weighted response probabilities for the two groups are in the range specified while designing the study.