Implementation and Analysis of Patient Reported Outcomes in Clinical Trials

Implementation and Analysis of PROs in Clinical Trials Jeff A. Sloan, Ph.D. DIA Meeting, D.C., June 26, 2005

Why is it difficult to deal with PROs? • Relatively recent acceptance • 25 years ago physicians were the sole raters of patient pain • JCAHO 2000 guideline: every patient’s pain to be assessed upon intake on a 0-10 scale • Time and experience alleviates novelty and skepticism

Checklist for designing, conducting and reporting HRQL - PRO in clinical trialsPatient Reported Outcomes (PRO) and Regulatory Issues : A European Guidance Document for the improved integration of health-related quality of life assessment in the drug regulatory process. Chassany O et ERIQA Working Group. Drug Information Journal 2002. • HRQL / PRO objectives • Added value of HRQL / PRO • Choice of the questionnaires • Hypotheses of HRQL / PRO changes • Study design • Basic principles of RCT fulfilled ? • Timing and frequency of assessment • Mode and site of administration... • HRQL / PRO measure • Description of the measure (items, domains…) • Evidence of validity • Evidence of cultural adaptation • Statistical analysis plan • Primary or secondary endpoint • Superiority or equivalence trial • Sample size • ITT, type I error, missing data • Reporting of results • Participation rate, data completeness • Distribution of HRQL / PRO scores • Interpreting the results • Effect size, • Minimal Clinically Important Difference • Comparison with other criteria / scores • Number needed to treat…

Take home messages:there is good news • There are problems with using PROs as indicators of efficacy in clinical trials. • There are scientifically sound solutions to these problems. The problems have been disseminated widely and consistently. The solutions have not.

It takes a certain amount of bravery to work with PRO’s

Primary goal: advance the state of the science to help patients soar

How do you analyze PRO data?

Science is a candle in the dark - Carl Sagan We will use the candle of science to improve the QOL of cancer patients

… by answering scientific questions • What is the value added of PROs to treatment trials? • How do you deal with multiple endpoints? • How do you handle missing data? • What is the clinical significance of PRO assessments?

What is the value added of additional questions?

Single-Item or Multiple-Item PRO?

Guidelines for endpoint determination • Several good references (Beitz, 1996; Chassany, 2002; Fayers, 1999; Sloan, 2002) • Reliability and validity data available • Pilot/focus groups to establish R/V • What aspects of PROs are likely to change? • Can one expect an overall change in well-being, health status or QOL?

Recipe for endpoint determination • List PRO aspects likely to change. • Operationalize each item from a tool. • Survey clinicians/patients if unsure. • Keep the total number of items under 25. • Mock up tables with “perfect world” data, labels with “perfect” results. • Link sample size to a priori clinical significance.

How do you deal with multiple endpoints?

An example of combined symptoms:Gemzar (gemcitabine) • Indication: Advanced pancreatic cancer • Instrument or Method: • Negotiated PRO outcome, “clinical benefit response” • PRO Domains Assessed: • Pain, analgesic consumption, performance status, weight • Results: • Clinical benefit response was experienced by 24% of patients receiving Gemzar versus 5% of patients receiving 5FU, p=0.002

Gemzar-specific clinical benefit response

A patient was considered a clinical benefit responder to Gemzar if …. • The patient showed >=50% reduction in pain intensity or analgesic consumption, or a 20+ point improvement in performance status (for at least 4 weeks with no worsening of other parameters) • Memorial Pain Assessment Card and Karnofsky Performance Scale • The patient was stable on all of the parameters mentioned and showed a marked, sustained weight gain not due to fluid accumulation ( >7% increase maintained for 4 weeks)

O’Brien Global Test for Multiple Outcomes • Example: Venlafaxine for Hot Flashes • Hot flash frequency per day • Hot flash average severity per day • none, mild, moderate, severe, very severe • scored 0, 1, 2, 3, 4 • Hot flash score (severity times frequency) • Uniscale QOL • Hot flash affect on QOL • Toxicity incidence on 11 variables

O’Brien p-values Endpoints Includedp-value Hot Flash Frequency Hot Flash Average Severity 0.0071 Hot Flash Score 0.0050 Uniscale QOL 0.7528 Hot Flash Affects QOL Toxicity

How do you handle the problem of missing data?

Impact of hydrazine sulfate on colorectal cancer patient QOL Impact of different imputation methods for missing data

Effect of imputation method on treatment comparison

The data are usually trying to tell you something…. …you just have to pay attention

What is the clinical significance of PRO assessments?

Two general methods for clinical significance • Anchor-based methods requirements • independent interpretable measure (the anchor) which has appreciable correlation between anchor and target • Distribution-based methods • rely on expression of magnitude of effect in terms of measure of variability of results (effect size)

The MID method in one slide

The ERES Approach • QOL tool range = 6 standard Deviations • SD Estimate =100 percent / 6 = 16.7% of theoretical range • Two-sample t-test effect sizes (Cohen): small, moderate, large effect(0.2, 0.5, 0.8 SD shift) • S,M,L effects = 3%, 8%, 12% of range

Assessing Clinical Significance • 1) Methods used to date • 2) Group versus individual differences • 3) Single item versus multi-item • 4) Patient, clinician, population perspectives • 5) Changes over time • 6) Practical considerations for specific audiences • MCP, April, May, June 2002

The solutions found for tumor response cutoffs may provide guidance • We call a reduction of 50% a response. • Have reductions of 49% all the time, but do not worry about misclassification. • Moertel (1976) basis for 50% cutoff • Find a cutoff and stick to it?

The Good News • Statistical, Philosophical, Empirical, Clinical, Historical, Practical approaches to defining a clinically significant effect for symptom assessments are all in the same ballpark • A 10 point difference on a 100-point scale (1/2 SD) is almost always going to be clinically significant • Smaller differences may also be meaningful (data) • Applies to groups or individuals (just different SD) Norman GR, Sloan JA, Wyrwich KW. Expert Review of Pharmacoeconomics and Outcomes Research Sept 2004; 4(5): 515 – 519 Sloan JA, Cella D, Hays R. J Clin Epidemiol (in press).

What’s next?

A Mayo/NCCTG meeting onFDA guidances on patient-reported outcomes (PRO)Discussion, Education, and Operationalization • FDA to release guidances for assessing PRO’s in all clinical trials (3rd quarter 2005?) • Meeting co-sponsored with FDA to: • provide a focused process to facilitate discussion among all stakeholders • educate stakeholders on background, content, and concerns • provide an opportunity for input • delineate ways to best operationalize the guidance into clinical trials • February 23-25, 2006, DC (Westfields Marriott, Chantilly, VA, 7 miles from Dulles) • Seeking stakeholders involvement

New ideas have enabled us to make advances in PRO science With your help, there will be more to come

Thank you References: jsloan@mayo.edu

Implementation and Analysis of Patient Reported Outcomes in Clinical Trials