Design and Analysis of Clinical Trials

# Design and Analysis of Clinical Trials

## Design and Analysis of Clinical Trials

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Design and Analysis of Clinical Trials Instructor: Jen-pei Liu, Ph.D. Department of Statistics National Cheng-Kung University Division of Biostatistics, National Health Research Institutes Lecture III: Statistical Principles for Analysis of Clinical Data

2. Statistical Methods for Biotechnology Products II Statistical Principles for Analysis of Clinical Data Instructor: Jen-pei Liu, Ph.D. Division of Biometry Department of Agronomy National Taiwan University, and Division of Biostatistics and Bioinformatics National Health Research Institutes

3. Types of Data • Continuous Endpoints • Numerical discrete data • Heart beats per minutes • Total NIHSS • Total Hamilton Rating Scale for Depression • Total Alzheimer’s Disease Assessment Scale

4. Types of Data • Continuous Endpoints • Numerical continuous data • Age • Weight • ALT • Peak flow rate (liters per minute) • FEV1 (% of predicted value)

5. Types of Data • Categorical Endpoints • Nominal scale data Classification of patients according to their attributes • Gender • Race • Occurrence of a particular adverse reaction • Occurrence of ALT>3 times upper normal limit

6. Types of Data • Ordered (ordinal scale) categorical data • A certain order among different categories • Symptom score 0 = no symptom, 1 = mild, 2 = moderate, 3 = severe • Severity of adverse reactions • Severity of disease

7. Types of Data Censored Endpoints • Time to the occurrence of a pre-defined event • Time (continuous) and occurrence (categorical) • The occurrence of the event may not observed for some patients. Then the time to the occurrence of the event for these subjects is censored

8. Types of Data • Chapman, et al (NEJM 1991; 324: 788-94) The use of prednisone in reduction of relapse within 21 days of the treatment of acute asthma in the emergency room. • Primary endpoint Time to unscheduled visit to clinics because of worsening asthma.

9. Types of Data Cross-sectional vs. longitudinal data • Cross-sectional data (snap shot at one time point) Clinical data are collected and evaluated at a particular time point during the trial • Longitudinal data (snap shots at several time points) Clinical data collected and evaluated over a series of time points during the trial

10. Example Knapp et al (JAMA 1994; 271: 985-991) • A multi-center trial with 33 centers • Double-blind, randomized, 4 parallel groups • Forced escalation 30 weeks of randomized treatment • 6 visits The start of randomized treatment (baseline) 6,12,18,24, and 30 weeks • Cross-sectional data CIBI and ADAS-cog evaluated at the start of randomized treatment • Longitudinal A series of CIBI and ADAS-cog evaluated at the start of the study, the start of randomized treatment,6,12,18,24, and 30 weeks

11. Types of Comparison • Within-group (patient) comparison Comparison of the changes within the same patients at different time points during the trial. • Between-group (patient) comparison Comparison between groups of patients under different treatments.

12. Example: Major depression disorder Stark and Hardison (VCP, 1985;46,53-58) Cohn and Wilcox (JCP,1985:46,21-31) • Double-blind, randomized, three parallel groups • One-week placebo washout period • Fluoxetine vs. imipramine vs. placebo • 6 weeks of randomized treatments • Primary efficacy endpoint HAM-D score at the last follow-up visit • Within each group Change from baseline in HAM-D score • Between groups Comparison of the change from baseline in HAM-D score between groups

13. Endpoints • Raw measurements at a time point. • Change at a time point from baseline. • Percent change at a time point from baseline. • Clinically meaningful targeted value attained at a time point, i.e. sitting DBP <= 85 mm Hg • Selection of time points should be able to measure the effect of the intervention.

14. Selection of Endpoints • Endpoints should reflect the change of clinical status caused by the intervention. • Endpoints should be sensitive to the change of clinical status caused by the intervention. • Endpoints should be validated. • Raw measurements at a time point can only measure the static clinical status. • Change at a time point from baseline can measure the magnitude of the change of clinical status caused by the intervention. • Change from baseline has the same unit as the raw measurement

15. Selection of Endpoints • Percent change at a time point from baseline measures the relative magnitude of the change of clinical status caused by the intervention. • Percent change from baseline is unitless. • The same percent change may reflect different magnitudes of change • 20/100 = 2/10 = 200/1000 = 20%

16. Selection of Endpoints • One of the key inclusion criteria for clinical trial in treatment of mild to moderate essential hypertension is sitting DBP being between 95-115 mm Hg. • Three changes from baseline: 115  105, 105  95, 95  85. • 95 Changes from baseline: 8.7%, 9.5%, 10.5% • Only 95  85 reaches the clinically meaningful targeted value.

17. Selection of Endpoints • Endpoints should reflect clinically meaningful interpretation and applicability. • Clinically meaningful targeted value > change from baseline > percent change from baseline. • Clinical investigators should have responsibility for determination of the efficacy endpoints used in the clinical trials.

18. Selection of Endpoints LDL HDL TG Targeted Value < 100mg/dL 40-60 mg/dL < 150 mg/dL Bile acid Binding Resin 15-30% 3-5% no change Nicotinic acid  5-25% 15-35% 15-25% Fibric acid  5-20% 10-20%  20-50% HMG-CoA 18-55%  3-5%  7-30% Inhibitor

19. Descriptive Statistics All statistics are estimates with sampling errors • Continuous Data • Central tendency Mean: arithmetic average of all observations y Median: the middle observation • Dispersion Standard deviation s Minimum: the smallest observation Maximum: the largest observation Range: maximum minus minimum • Log-transformation: Mean on the log-scale exp (mean on the-scale) = geometric mean on the original scale

20. Descriptive Statistics • Presentation of results • Individual groups • Comparative difference • Example Adkinson, et al (NEJM 1997;336:324-31) Immunotherapy for asthma in allergic children

21. Categorical Data • Proportion of the patients with a certain attribute: the number of the patients with the attribute divided the total number of the patients in the group • Presenting both of counts and proportions m, p • Chapman, et al (NEJM 1991; 324: 788-94) The use of prednisone in reduction of replapse within 21 days of the treatment of acute asthma in the emergency room

22. Measures for comparison between groups Difference in the proportions • Relative risk The ratio of the proportions of the test group to the control. • Odds ratio The ratio of the odds of the test group to the control. • Odds The number of patients with the attribute to that without the attribute.

23. Categorical Endpoints • Difference in proportions provides the absolute magnitude of difference. • Both relative risk and odds ratio gives the relative magnitude of difference. • 50%  25% and 0.05%  0.025% both yield a relative risk of 50% but differences in proportion are 25% and 0.025% respectively. • Relative risk and odd ratio are appropriate when the proportion of the event for control group is small (<5%). • When the proportion of the event is small (<5%), the relative risk  Odds ratio.

24. Censored Data • Kaplan-Meier curve (Actuarial probabilities) The proportions of the patients with occurrence of a pre-defined event over a period of time. • Median survival The time to the pre-defined event (e.g. death) occurring in 50% of the patients. • Hazard ratio The hazard of the occurrence of a pre-defined event of the test group to the control group

25. Example: Crawford, et al (NEJM 1989; 321: 419-24) • A controlled trial of leuprolide with and without flutamide in prostatic carcinoma • Randomized, double-blind, 2 parallel groups • Primary endpoint: overall survival

26. Kaplan-Meier Estimates of the Risk of Serious CV Events in the APC Trial by Treatment Arm*

27. Kaplan-Meier Estimates of the Risk of Serious CV Events in the APC Trial by Treatment Arm* 671 *In this analysis, “serious CV events” include death from CV causes, MI, stroke, or heart failure Solomon SD, et al: N Engl J Med 352, 2005

28. Inferential Statistics • Inference from the sample to the target population • A decision process for clinical hypotheses based on the trial objective through statistical testing procedures

29. Example: Farlow et al (JAMA 1992; 268: 2523-2529) • Randomized, double-blind, parallel groups • Objective To compare the tacrine (20, 40, 80 mg per day) versus placebo for probable Alzheimer’s disease • Null hypothesis No difference in ADAS-cog scale between 80 mg of tacrine and placebo. • Alternative hypothesis There exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo.

30. Example: The NINDS rt-PA Stroke Study Group (NEJM 1996; 335: 841-7) • Objective for partⅠ A greater proportion of patients with acute ischemic stroke treated with t-PA, as compared with those given placebo, have early improvement (>= 4 from baseline on NIHSS). • Primary efficacy endpoint Proportion of patients with improvement • Null hypothesis No difference in the proportions of patients with improvement between t-PA and placebo. • Alternative hypothesis The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

31. Decision Based on Results

32. Decision Based on Results • Significance level: The consumer’s risk The chance that the decision based on the results there is a minimal difference of 24% improvement between t-PA and placebo when in fact there is no difference. • Power = 1 – producer’s risk The chance that decision based on the results concludes a minimal difference of 24% improvement between t-PA and placebo in fact there is.

33. Statistical Testing Procedures • Step1 State the null and alternative hypotheses • Null hypothesis: the one to be questioned No difference in the proportions of patients with improvement between t-PA and placebo. • Alternative hypothesis: the one of particular interest to investigators The minimal difference in the proportions of patients with improvement between t-PA and placebo is at least 24%.

34. Statistical Testing Procedures • Step 2 Choose an appropriate test statistics such as two-sample t-statistics. • Step 3 • Select the nominal significance level the risk of typeⅠerror you are willing to commit Usually 5%

35. Statistical Testing Procedures • Step 4 • Determine the critical value, rejection region and decision rule For large samples, two-sided alternative andα= 0.05, the critical value is z(0.025) = 1.96 and rejection region will be the one such that the absolute value of the test statistic is greater than 1.96. • Decision rule reject the null hypothesis if the resulting test statistic is in the rejection region.

36. Statistical Testing Procedures Step 1 to step 4 should be determined and pre-specified in the Statistical Method section of the protocol before initiation of the study.

37. Statistical Testing Procedures • Step 5 When the study is completed or the data are available for interim analysis, complete the value of the test statistic specific in Step2 (protocol). • Step 6 Make decision based on the resulting value of the test statistic and decision rule specified in Step 4 (protocol).

38. Statistical Testing Procedures • Conclusion • Reject the null hypothesis The sampling error is an unlikely explanation of discrepancy between the null hypothesis and observed values and the alternative hypothesis is proved at a risk of 5%. • Fail to reject null hypothesis The sampling error is a likely explanation and the data fail to provide sufficient evidence to doubt the validity of the null hypothesis. • Do NOT claim that the null hypothesis is accepted.

39. P - value • If there is no difference in ADAS-cog between the two groups (i.e., the null hypothesis is true), the chance of obtaining a mean difference at least as large as the observed mean difference. • If p-value is small, it implies that the observed difference is unlikely to occur if there is no difference in ADAS-cog scale between 80mg of tacrine and placebo.

40. P - value • How small the p-value is sufficient enough to conclude that there exists a true difference in ADAS-cog scale between 80 mg of tacrine and placebo? • It depends upon the risk that the investigator is willing to take for committing type I error. • Nominal significance level = risk of type I error (The chance of concluding existence of a true difference in ADAS-cog when in fact there is no difference)

41. P - value • If the observed p-value < the nominal significance level (i.e., the observed p-value < risk of type Ⅰerror), then conclude there exists a true difference in ADAS-cog. • The nominal significance level = 5% or 1% • The p-value for the observed difference in mean ADAS-cog is 0.015. • If the nominal significance level is 5%, then it is concluded that there is a difference in ADAS-cog between 80mg of tarcine and placebo in target population of patients with probable Alzheimer’s disease.

42. P - value • We can not make the same decision if the nominal significance level is chosen to be 1%. • Should always reported the observed p-value and let readers and reviewers judge the strength of evidence by themselves and do not use p-value < 0.05.

43. Confidence Interval • Example Adkinson, et al (NEJM 1997; 336: 324-31) Immunotherapy for asthma in allergic children

44. Confidence Interval • Estimates about the true population difference. • Random intervals which can be different if the same trial is repeated. • A 95% confidence interval implies 95% chance that the interval (-7.8, 0.1) will cover the true difference in average PEFR between the two groups. • A 95% confidence interval for the difference will not include 0 if and only if p-value < 0.05.