Statistical issues in the evaluation of predictive biomarkers
Download
1 / 33

Statistical Issues in the Evaluation of Predictive Biomarkers - PowerPoint PPT Presentation


  • 288 Views
  • Uploaded on

Statistical Issues in the Evaluation of Predictive Biomarkers. Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov. Kinds of Biomarkers. Surrogate endpoint Pre & post rx, early measure of clinical outcome Pharmacodynamic

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistical Issues in the Evaluation of Predictive Biomarkers' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistical issues in the evaluation of predictive biomarkers l.jpg

Statistical Issues in the Evaluation of Predictive Biomarkers

Richard Simon, D.Sc.

Chief, Biometric Research Branch

National Cancer Institute

http://brb.nci.nih.gov


Kinds of biomarkers l.jpg
Kinds of Biomarkers Biomarkers

  • Surrogate endpoint

    • Pre & post rx, early measure of clinical outcome

  • Pharmacodynamic

    • Pre & post rx, measures an effect of rx on disease

  • Prognostic

    • Which patients need rx

  • Predictive

    • Which patients are likely to benefit from a specific rx

  • Product characterization


Prognostic biomarkers can be therapeutically relevant l.jpg
Prognostic Biomarkers Can be Therapeutically Relevant Biomarkers

  • 3-5% of node negative ER+ breast cancer patients require or benefit from systemic rx other than endocrine rx

  • Prognostic biomarker development should focus on specific therapeutic decision context


Predictive biomarkers l.jpg
Predictive Biomarkers Biomarkers

  • In the past often studied as un-focused post-hoc subset analyses of RCTs.

    • Numerous subsets examined

    • Same data used to define subsets for analysis and for comparing treatments within subsets

    • No control of type I error

  • Led to conventional wisdom

    • Only hypothesis generation

    • Only valid if overall treatment difference is significant


Slide5 l.jpg

  • Cancers of a primary site are often a heterogeneous gouping of diverse molecular diseases

  • The molecular diseases vary enormously in their responsiveness to a given treatment

  • It is feasible (but difficult) to develop prognostic markers that identify which patients need systemic treatment and which have tumors likely to respond to a given treatment

    • e.g. breast cancer and ER/PR, Her2


Slide6 l.jpg

  • “Hypertension is not one single entity, neither is schizophrenia. It is likely that we will find 10 if we are lucky, or 50, if we are not very lucky, different disorders masquerading under the umbrella of hypertension. I don’t see how once we have that knowledge, we are not going to use it to genotype individuals and try to tailor therapies, because if they are that different, then they’re likely fundamentally … different problems…”

    • George Poste


The standard approach to designing phase iii clinical trials is based on three assumptions l.jpg
The standard approach to designing phase III clinical trials is based on three assumptions

  • Qualitative treatment by subset interactions are unlikely

  • “Costs” of over-treatment are less than “costs” of under-treatment

  • It is not feasible to reliably evaluate treatments for subsets


Slide8 l.jpg

  • Qualitative treatment by subset interactions are unlikely is based on three assumptions

    • Biology has shown that this is often false

  • “Costs” of over-treatment are less than “costs” of under-treatment

    • With today’s drugs this is economically unsustainable

  • It is not feasible to reliably evaluate treatments for subsets

    • With molecularly targeted treatment, and prospectively defined candidate subsets, this is feasible


Standard clinical trial approaches l.jpg
Standard Clinical Trial Approaches is based on three assumptions

  • Have led to widespread over-treatment of patients with drugs to which few benefit

  • Possible failure to appreciate the effectiveness of some drugs in biologically restricted target populations


Slide10 l.jpg


The roadmap l.jpg
The Roadmap post-hoc data dredging approach to subset analysis

  • Develop a completely specified genomic classifier of the patients likely to benefit from a new drug

  • Establish analytical and pre-analytical validity of the classifier

  • Use the completely specified classifier to design and analyze a new clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan that preserves the overall type-I error of the study.


Guiding principle l.jpg
Guiding Principle post-hoc data dredging approach to subset analysis

  • The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier

    • Developmental studies are exploratory

    • Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier


New drug developmental strategy i l.jpg
New Drug Developmental Strategy I post-hoc data dredging approach to subset analysis

  • Restrict entry to the phase III trial based on the binary predictive classifier, i.e. targeted design


Slide15 l.jpg

Develop Predictor of Response to New Drug post-hoc data dredging approach to subset analysis

Using phase II data, develop predictor of response to new drug

Patient Predicted Responsive

Patient Predicted Non-Responsive

Off Study

New Drug

Control


Applicability of design i l.jpg
Applicability of Design I post-hoc data dredging approach to subset analysis

  • Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug

    • eg trastuzumab

  • With a strong biological basis for the classifier, it may be unacceptable to expose classifier negative patients to the new drug

  • Analytical validation, biological rationale and phase II data provide basis for regulatory approval of the test

  • Phase III study focused on test + patients to provide data for approving the drug


Slide17 l.jpg

  • If a drug is found safe and effective in a defined patient population, approval should not depend on finding the drug ineffective in some other population

  • Consequently, if the drug is found safe and effective in biomarker classifier positive patients, approval of the drug should not be contingent on testing the drug in classifier negative patients


Evaluating the efficiency of strategy i l.jpg
Evaluating the Efficiency of Strategy (I) population, approval should not depend on finding the drug ineffective in some other population

  • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006

  • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

  • reprints and interactive sample size calculations at http://linus.nci.nih.gov


Slide19 l.jpg

  • Relative efficiency of targeted design depends on population, approval should not depend on finding the drug ineffective in some other population

    • proportion of patients test positive

    • effectiveness of new drug (compared to control) for test negative patients

  • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

  • The targeted design may require fewer or more screened patients than the standard design


Trastuzumab herceptin l.jpg
Trastuzumab population, approval should not depend on finding the drug ineffective in some other populationHerceptin

  • Metastatic breast cancer

  • 234 randomized patients per arm

  • 90% power for 13.5% improvement in 1-year survival over 67% baseline at 2-sided .05 level

  • If benefit were limited to the 25% assay + patients, overall improvement in survival would have been 3.375%

    • 4025 patients/arm would have been required


Web based software for comparing sample size requirements l.jpg
Web Based Software for Comparing Sample Size Requirements population, approval should not depend on finding the drug ineffective in some other population

  • http://linus.nci.nih.gov/brb/


Developmental strategy ii l.jpg

Develop Predictor of population, approval should not depend on finding the drug ineffective in some other population

Response to New Rx

Predicted Responsive

To New Rx

Predicted Non-responsive to New Rx

New RX

Control

New RX

Control

Developmental Strategy (II)


Developmental strategy ii23 l.jpg
Developmental Strategy (II) population, approval should not depend on finding the drug ineffective in some other population

  • Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan

  • Having a prospective analysis plan is essential

  • “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan

  • The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier

  • The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier


Slide24 l.jpg
Validation of EGFR biomarkers for selection of EGFR-TK inhibitor therapy for previously treated NSCLC patients

  • PFS endpoint

    • 90% power to detect 50% PFS improvement in FISH+

    • 90% power to detect 30% PFS improvement in FISH−

  • Evaluate EGFR IHC and mutations as predictive markers

  • Evaluate the role of RAS mutation as a negative predictive marker

Outcome

FISH +

(~ 30%)

Erlotinib

2nd line NSCLC with specimen

1° PFS

2° OS, ORR

FISH

Testing

Pemetrexed

1-2 years minimum additional follow-up

FISH −

(~ 70%)

Erlotinib

Pemetrexed

4 years accrual, 1196 patients

957 patients


Analysis plan b limited confidence in test l.jpg
Analysis Plan B inhibitor therapy for previously treated NSCLC patients(Limited confidence in test)

  • Compare the new drug to the control overall for all patients ignoring the classifier.

    • If poverall 0.03 claim effectiveness for the eligible population as a whole

  • Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients

    • If psubset 0.02 claim effectiveness for the classifier + patients.


Slide26 l.jpg


Analysis plan c adaptive l.jpg
Analysis Plan C for having developed a classifier (adaptive)

  • Test for difference (interaction) between treatment effect in test positive patients and treatment effect in test negative patients

  • If interaction is significant at level int then compare treatments separately for test positive patients and test negative patients

  • Otherwise, compare treatments overall


Sample size planning for analysis plan c l.jpg
Sample Size Planning for Analysis Plan C for having developed a classifier

  • 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power

  • If 25% of patients are positive, when there are 88 events in positive patients there will be about 264 events in negative patients

    • 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level


Simulation results for analysis plan c l.jpg
Simulation Results for Analysis Plan C for having developed a classifier

  • Using int=0.10, the interaction test has power 93.7% when there is a 50% reduction in hazard in test positive patients and no treatment effect in test negative patients

  • A significant interaction and significant treatment effect in test positive patients is obtained in 88% of cases under the above conditions

  • If the treatment reduces hazard by 33% uniformly, the interaction test is negative and the overall test is significant in 87% of cases


Development of genomic classifiers l.jpg
Development of Genomic Classifiers for having developed a classifier

  • Single gene or protein based on knowledge of therapeutic target

  • Empirically determined based on evaluation of a set of candidate genes

  • Empirically determined based on genome-wide correlating gene expression, copy number variation or genotype to patient outcome after treatment


Development of genomic classifiers31 l.jpg
Development of Genomic Classifiers for having developed a classifier

  • During phase II development or

  • After failed phase III trial using archived specimens.

  • Adaptively during early portion of phase III trial.


Conclusions l.jpg
Conclusions for having developed a classifier

  • Neither academic research, industry, NCI or FDA have adequately adapted to the fundamental discoveries of the heterogeneity of human cancers

  • There is great potential for developing treatments that are highly effective for the right patients using prognostic and predictive biomarkers

  • There is great potential for reducing the waste of economic resources from vast over-treatment of cancer patients

  • Critical path objectives are more likely to be achieved thru development of predictive biomarkers than thru development of surrogate endpoint biomarkers


ad