Power and sample size calculations Michael Væth, University of Aarhus

Power and sample size calculations Michael Væth, University of Aarhus • Introductory remarks • Two-sample problem with normal data • Comparison of two proportions • Sample size and power calculations based on Wald’s test • Two-sample problem with censored survival data • Non-inferiority trials and equivalence trials • Sample size and confidence intervals DSTS meeting, Copenhagen

Power and sample size calculations “Investigators often ask statisticians how many observations they should make (fortunately, usually before the study begins). To be answerable, this question needs fuller formulation. There is resemblance to the question, How much money should I take when I go on vacation? Fuller information is needed there too. How long a vacation? Where? With whom?” Moses(NEJM,1985) DSTS meeting, Copenhagen

Power and sample size calculations • A study should: • Allow conclusive answers to the questions being addressed • Provide estimates of relevant quantities with sufficient precision Standard approach Identify a maximal risk of wrong conclusions or Quantify the size of a sufficient precision Determine the minimum sample size for which the study achieves the design goals DSTS meeting, Copenhagen

Power and sample size calculations Implementation of standard approach Use commercial special-purpose software Simulations Analytic methods DSTS meeting, Copenhagen

Two-sample problem with continuous outcome RCT, equal allocation probabilities Outcome follow a normal distribution. Means Common standard deviation, assumed known: Expected treatment difference Minimal relevant difference Estimated treatment difference: Hypothesis: Test statistic: DSTS meeting, Copenhagen

Two-sample problem with continuous outcome (2) If the test statistic has a standard normal distribution. In general, the test statistic is normal: mean and standard deviation 1 DSTS meeting, Copenhagen

Two-sample problem with continuous outcome (3) Level of significance Power A: Distribution of the test statistic when B: Distribution of the test statistic for an alternative value of DSTS meeting, Copenhagen

Two-sample problem with continuous outcome (4) Only contribution from one term unless power close to level of significance Assume so only the upper term matters Basic relation DSTS meeting, Copenhagen

Two-sample problem with continuous outcome (5) Sample size for given power Power for given sample size DSTS meeting, Copenhagen

Two-sample problem with continuous outcome (6) Depends on the error probabilities Depend on the problem Table of for selected values of DSTS meeting, Copenhagen

Comparison of two proportions Score test: Basic relation becomes with DSTS meeting, Copenhagen

Comparison of two proportions (2) Wald’s test: Basic relation becomes The simple structure N = (model term)(error term) is recovered DSTS meeting, Copenhagen

Comparison of two proportions (3) Example 1 N(Score) = 2894 N(Wald) = 2888 Other sample fractions N(Score) = 4610 N(Wald) = 4278 N(Score) = 4422 N(Wald) = 4749 DSTS meeting, Copenhagen

Sample size and power calculations based on Wald’s test Data and Statistical model Question: Hypothesis about 1-dim. parameter Wald’s test with Sample size for given power Power for given sample size DSTS meeting, Copenhagen

Sample size and power calculations based on Wald’s test (2) Example 1 (ctd.) Same problem, but now use Wald’s test based on ln(odds) Score Wald 2894 2888 4610 4278 4422 4749 N = 2906 N = 4778 N = 4304 DSTS meeting, Copenhagen

Sample size and power calculations based on Wald’s test (3) Use of simulations Computer generates a large number of independent sample of size from a scenario representing a relevant difference Power estimated as proportion of samples for which Wald’s test is statistically significant at level Sample size for power level DSTS meeting, Copenhagen

Sample size and power calculations based on Wald’s test (4) Use of simulations Sample size multiplier DSTS meeting, Copenhagen

Two-sample problem with censored survival data Time-to-event data Two sample, proportional hazards model Hazard rates Parameter of interest Wald’s test with the number of events in group i DSTS meeting, Copenhagen

Two-samples with censored data (2) Wald’s test is approximately normal with sd = 1 and mean average probability of an event in group i Sample size Sample size depends primarily on number of events DSTS meeting, Copenhagen

Two-samples with censored data (3) Example 2 Design of a RCT with survival endpoint Comparison of new and standard treatment Endpoint: All-cause mortality Requirements: max. 6 years; power = 80% for HR = 0.8 Study start Accrual ends Study ends 0 A T = A + F Accrual period Follow-up period No additional follow-up In general DSTS meeting, Copenhagen

Two-samples with censored data (3) Example 2 (ctd.) KM-estimate: standard treatment 1 - KM Std. Treatment: Average event probability = AUC/baseline Average event probability with new treatment DSTS meeting, Copenhagen

Two-samples with censored data (4) Example 2 (ctd.) 635 events are needed to meet the design requirements This can be achieved in different ways 6 designs with the the same expected number of events (635) Competing risk: Replace 1-KM with Cumulative Incidence DSTS meeting, Copenhagen

Non-inferiority & equivalence trials Minimal relevant difference  Maximal irrelevant difference Null hypothesis DSTS meeting, Copenhagen

Non-inferiority & equivalence trials Two-sample problem with normal data (Wald’s test approach for 1-parameter problem) Non-inferiority: a one-sided hypothesis Basic relation Sample size Note: If the power is assessed at a zero difference, then the sample size needed to achieve this power will be underestimated if the effect of the new product is less than that of the active control DSTS meeting, Copenhagen

Non-inferiority & equivalence trials Equivalence: union-intersection test Two one-sided tests Basic relations Sample size: is specified for Note: If the power is assessed at a zero difference, then the sample size needed to achieve this power is underestimated if the true difference is not zero. DSTS meeting, Copenhagen

Sample size and confidence intervals Design phase: Sample size considerations are traditionally phrased in the terminology of hypothesis testing Formulas are derived by controlling error probabilities Reporting and interpreting results Focus on estimates and confidence intervals Hypothesis tests are downplayed Why not use the same approach on both occasions? DSTS meeting, Copenhagen

Sample size and confidence intervals Power calculations when reporting the results? Probability statements should utilize the collected data and not be based on anticipated values of the parameters. Some statistical packages provide calculation of ”post-hoc power” or ”observed power”, i.e. Power computed at the estimated parameter value. This does not make sense. The power becomes a (known) function of the significance level. Interpretation: Probability of replication DSTS meeting, Copenhagen

Sample size and confidence intervals Sample size calculations based on confidence intervals? Two-sample problem with normal data (Wald’s test approach for 1-parameter problem) 95% confidence interval Choose smallest N such that a confidence interval centered at excludes 0 Corresponds to power = 0.50 DSTS meeting, Copenhagen

Sample size and confidence intervals Use the fundamental relation between hypothesis test and confidence intervals to formulate the sample size requirements in confidence interval terminology Greenland(AJE,1988), Daly(BMJ,1991) • To compute a sample size specify • The confidence level • The minimum size parameter-value that we wish to estimate unambigously, i.e. with a confidence interval that excluded the null value • The probability of achieving this if the true value is the this minimum value DSTS meeting, Copenhagen

Power and sample size calculations A ”commentary” on the world-wide-web: "How not to collaborate with a biostatistician” http://www.xtranormal.com/watch/6878253/ DSTS meeting, Copenhagen

Power and sample size calculations Michael Væth, University of Aarhus