2/27/03 Outline

1 / 26

# 2/27/03 Outline - PowerPoint PPT Presentation

2/27/03 Outline. Part I: Misc. Statistical Issues Multiple comparisons in clinical trials Multiple endpoints Subgroups Adverse experience categorization Multivariate adjustment Part II: Multi-center trials and working with industry (Cummings left over). Multiple comparisons.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' 2/27/03 Outline' - callie

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
2/27/03 Outline
• Part I: Misc. Statistical Issues
• Multiple comparisons in clinical trials
• Multiple endpoints
• Subgroups
• Part II: Multi-center trials and working with industry (Cummings left over)
Multiple comparisons
• The general problem
• Each statistical test has a 5% chance of Type I error
• We are wrong 1 time out of 20
• Easy to come up with spurious results
• Take a worthless drug (placebo 2) compare to placebo 1
• 1 study: P(type I error)= 5%
• 2 studies: P(1 or 2 type I errors)= almost 10%
• 20 studies: P(at least one significant)=64%
• Publication bias
Multiple comparisons: solutions?
• Bonferroni
• Divide overall p-value by number of tests
• Unacceptable losses of power
• Use common sense/Bayesian
• Does result make sense?
• Biologic plausibility
• Is result supported by previous data?
• Was analysis defined apriori?
• Examples of problem in clinical trials
Multiple comparisons in RCT’s are pervasive
• Monitoring of trials: look at results as they accumulate
• Lots of statistical machinery
• Multiple endpoints in a trial
• Primary endpoint: “all fractures” but also found significant reductions in hip fractures
• Primary endpoint: fractures, significant reductions in breast cancer
• Safety
• Subgroup analyses
• Multivariate analysis (adjustment) for BL covariates
• Rothman, 1990
• A policy of not making adjustments is preferable
• “ Scientists should not be so reluctant to explore leads that may turn out to be wrong that they penalize themselves for missing possibly important findings”
Multiple Endpoints: Making a Mountain Out of a Molehill
• Multiple Outcomes of Raloxifene Evaluation (MORE) trial
• Main outcome: vertebral fractures
• Secondary outcome: non-vertebral fractures
• Main osteoporotic subtypes: hip, wrist
• Overall, no effect of raloxifene on NV fractures
• Looked at 14 subtypes of fractures
• One significant: ankle. Wanted to title paper: “Raloxifene reduces ankle fractures”
Multiple Endpoints in PEPI: Strict Bonferonni Rule
• Post-menopausal Estrogen/Progesterone Intervention PEPI (website)
• 4 treatment groups, several primary outcomes: all continuous
• Adjust all p-values to account for multiple comparisons
• Multiple primary endpoints (4)
• Within each endpoint, adjust for 4 treatments
Multiple endpoints
• Often many ways to slice the outcome pie
• Different subgroups of endpoints
• Fractures: all, leg, arm, rib, etc. (MORE)
• Multiple comparisons problems
• Some solutions
• Very explicit predefinition of endpoints
• Limit number of endpoints
• FDA: single endpoint only
Subgroups
• After primary analysis, want to look at subgroups
• Does effectiveness vary by subgroup
• If drug effective, is it more effective in some populations?
• If results overall show no effect, does drug work in subgroup of participants?
• Are adverse effects concentrated in some subgroups?
Example: Efficacy of alendronate
• FIT II: Women with BMD T-score < -1.6 (osteopenic--only 1/3 osteoporotic)
• Women without existing vertebral fractures (2)
• Overall results: 14% reduction, p=.07
• Wimpy

1.5

P=0.07

0.86

(0.73 - 1.01)

1

B

Relative Risk

B

B

0

Overall

1.14

(0.82 - 1.60)

1.03

B

1.5

(0.77 - 1.39)

B

0.86

(0.73 - 1.01)

B

B

1

B

Relative Risk

B

B

B

B

B

B

B

0.64

(0.50 - 0.82)

0

Overall

T < -2.5

T > -2.0

-2.5 < T < -2.0

Baseline Femoral Neck BMD, by T-score

What to Do With an Unexpected Subgroup Finding
• Is this a real finding? (not really specified apriori)
• Has this been previously observed?
• Increase prior probability
• Ways to verify
• Examine for other similar subgrouping variables (BMD at hip, spine, radius)
• Examine for other similar endpoints (hip fractures, etc.)
• Most important: look at other trials, if possible and available
• Examine biologic plausibility
Effect of alendronate on hip fx depends on baseline hip BMD

Baseline BMD T-score

-1.6 – -2.5

1.84 (0.7, 5.4)

0.44 (0.18, 0.97)

< - 2.5

Overall

0.79 (0.43, 1.44)

0.1

1

10

Relative Hazard (± 95% CI)

Fosamax International Trial (FOSIT)
• 1908 women, 34 countries
• Lumbar spine BMD T-score < -2
• Alendronate (10 mg) vs. placebo
• One year follow-up
• BMD main endpoint
• 47% reduction in all clinical fractures (p<.05)

Baseline hip BMD T NRR* 95% CI

Overall 1908 0.53 (0.3,0.9)

> -2 955 1.2 (0.5, 2.9)

-2 to –2..5 279 0.32 (0.07,1.5)

< -2.5 674 0.26 (0.1,0.7)

Subgroup analysis in HERS
• Overall no effect of HRT or perhaps harm in year 1
• Is there a subgroup who benefit?
• Is there subgroup with significant harm?
• Look at relative hazard (RH) within subgroups defined by baseline variables
• Medication use at baseline
• Prior disease
• Health habits
• Compare RH in those with and without risk factor
• RH in those using beta blockers compared to those not using
• RH > 1 ==> harm
• Get p-value for significance of difference of RH in those w and without
HERS: 4 years of HRT increased then decreased CHD Events

Year E + P Placebo RH p-value

1 57 38 1.5 .04

2 47 48 1.0 1.0

3 35 41 0.9 .6

4 + 5 33 49 0.7 .07

> 5 ???

P for trend = 0.009

Subgroups: the final frontier in HERS

Relative hazard (E vs. placebo)

Subgroup Within Among

Subgroup N (%) Subgroup Others p*

history of smoking 1712 (62) 1.01 3.39 .01

current smoker 360 (13) 0.55 1.92 .03

digitalis use 275 (10) 4.98 1.26 .04

>= 3 live births 1616 (58) 1.09 2.72 .04

lives alone 775 (28) 2.97 1.14 .05

prior mi by chart review 1409 (51) 2.14 0.93 .05

beta-blocker use 899 (33) 2.89 1.15 .06

age >= 70 at randomization 1019 (37) 2.65 1.14 .06

* Statistical significance of interaction

Lots of subgroups were analyzed in HERS
• history of smoking (at rv) 1712 (62) 1.01 3.39 0.30 .01
• current smoker (at rv) 360 (13) 0.55 1.92 0.29 .03
• digitalis use (at rv) 275 (10) 4.98 1.26 3.96 .04
• >= 3 live births 1616 (58) 1.09 2.72 0.40 .04
• lives alone (at rv) 775 (28) 2.97 1.14 2.60 .05
• prior mi by chart review (cr) 1409 (51) 2.14 0.93 2.30 .05
• beta-blocker use (at rv) 899 (33) 2.89 1.15 2.51 .06
• age >= 70 at randomization 1019 (37) 2.65 1.14 2.32 .06
• prior mi in most distant tertile 447 (16) 2.64 0.93 2.82 .07
• walk 10m or in exercise program (at rv) 1770 (64) 2.35 1.11 2.12 .08
• prior ptca by chart review (cr) 1189 (43) 0.92 1.98 0.46 .08
• prior mi within 2 years 420 (15) 3.20 1.28 2.50 .11
• tg > median (at rv) 1377 (50) 2.02 1.05 1.93 .12
• rales in the lungs (at rv) 80 ( 3) 0.43 1.65 0.26 .13
• digitalis or ace-inhibitor use (at rv) 653 (24) 2.33 1.24 1.88 .16
• previous ert for >= 12 months 302 (11) 4.19 1.41 2.98 .18
• serious medical conditions 1028 (37) 1.05 1.81 0.58 .21
• age >= 53 at lmp 578 (21) 3.19 1.38 2.31 .23
• hdl > median (at rv) 1315 (48) 1.18 1.95 0.61 .24
• lp(a) > median (at rv) 1378 (50) 1.26 2.08 0.60 .25
• use of non-statin llm (at rv) 420 (15) 0.89 1.69 0.52 .25
• married (at rv) 1588 (57) 1.26 1.98 0.64 .29
• lvef <= 40% 178 ( 6) 2.16 1.01 2.13 .31
• prior mi within 4 years 765 (28) 2.07 1.32 1.57 .32
• previous ert use for >= 1 year 327 (12) 2.86 1.41 2.03 .32
• prior mi within 1 year 194 ( 7) 2.88 1.43 2.02 .33
• chest pain (at rv) 982 (36) 1.25 1.88 0.67 .33
• dbp >= 90 mmhg (at rv) 149 ( 5) 0.91 1.62 0.56 .35
• prior ptca within 1 year 206 ( 7) 3.94 1.46 2.71 .38
• prior mi within 3 years 612 (22) 2.05 1.37 1.50 .40
• prior ptca within 4 years 838 (30) 1.15 1.70 0.68 .40
• use of any llm (at rv) 1296 (47) 1.23 1.76 0.70 .40
• diuretic use (at rv) 775 (28) 1.89 1.33 1.42 .41
• signs and symptoms of chf (at rv) 118 ( 4) 0.94 1.60 0.58 .42
• ace inhibitor use (at rv) 483 (17) 2.05 1.40 1.46 .44
• total cholesterol > median (at rv) 1377 (50) 1.32 1.80 0.74 .47
• l-thyroxine use (at rv) 414 (15) 2.29 1.43 1.60 .47
• poor/fair self-rated health (at rv) 665 (24) 1.30 1.72 0.76 .51
• heart murmur (at rv) 540 (20) 1.89 1.42 1.34 .53
• sbp >= 140 mmhg (at rv) 1051 (38) 1.37 1.72 0.80 .59
• prior ptca within 3 years 695 (25) 1.27 1.61 0.78 .62
• s3 heart sounds (at rv) 19 ( 1) 2.74 1.50 1.82 .63
• htn by physical exam (at rv) 557 (20) 1.32 1.62 0.81 .64
• >= 2 severely obstructed main vessels 1312 (47) 1.53 1.26 1.22 .69
• statin use (at rv) 1004 (36) 1.34 1.59 0.84 .71
• have you ever been pregnant 2564 (93) 1.55 1.15 1.35 .72
• calcium-channel blocker (at rv) 1511 (55) 1.61 1.38 1.17 .73
• previous hrt for >= least 12 months 132 ( 5) 1.24 1.60 0.78 .77
• ldl > median (at rv) 1373 (50) 1.44 1.63 0.89 .77
• prior ptca within 2 years 475 (17) 1.35 1.56 0.87 .81
• baseline left bundle branch block 212 ( 8) 1.31 1.55 0.85 .82
• white 2451 (89) 1.48 1.62 0.92 .88
• ever told you had diabetes 634 (23) 1.48 1.53 0.97 .94
• aspirin use (at rv) 2183 (79) 1.51 1.56 0.97 .95
• any alcohol consumption (at rv) 1081 (39) 1.54 1.57 0.98 .97
• gallstones or gallbladder dis. 633 (23) 1.55 1.52 1.02 .97
• baseline atrial fibrillation/flutter 33 ( 1) - 1.50 - -

Total subgroups examined: 102

Total subgroups with p< .05: 6

Subgroups: conclusions
• Subgroups are full of statistical problems
• Multiple comparisons may lead to erroneous conclusions
• Limited power in for subgroup analyses
• Subgroups based on baseline variables are less bad
• Subgroups based on post-randomization variables are more problematic
Safety assessment
• Often many categories (FIT: 200 or more)
• Some are rare
• Ex: Risedronate and lung cancer
• How to control for spurious findings?
• P-values almost meaningless
• AE’s collected as “open text”
• Need to categorize and compare by treatment
• Options:
• Many categories: few events per treatment, low power
• Few categories: heterogenuous, may miss important effects
• No correct solution
• MeDRA coding
• ~15,000 standard clinical terms (“specific terms”)
• Various levels of grouping
• May be non-sensical in some situations
• Sometimes adjust for baseline variables
• Especially those that are maldistributed
• If algorithm for adjustment not pre-defined, adds subjective element to “objective” RCT
• Given ineffective treatment, with enough fiddling with adjustments, can come up with significant effect (Paul Meier)
• Conclusions: Many argue that should NEVER do adjustments in RCT’s
• If do adjustment, severely limit plans
Statistical issues: Summary
• ITT (from 1/30 lecture):
• All participants remain on medication
• All participants are followed until end of study
• Pre-planned analysis
• Multiple comparisons are ubiquitous
• Monitoring
• Subgroup analyses
• Safety analyses
• Where possible, minimize subjectivity and adhoc-ness
• Use judgement