Imperfect gold standards for biomarker evaluation
1 / 27

Imperfect Gold Standards for Biomarker Evaluation - PowerPoint PPT Presentation

  • Uploaded on

Imperfect Gold Standards for Biomarker Evaluation. Rebecca A. Betensky Conference on Statistical Issues in Clinical Trials University of Pennsylvania April 18, 2012. Outline. Motivation: need for kidney injury biomarkers for diagnosis of acute kidney injury (AKI)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Imperfect Gold Standards for Biomarker Evaluation' - shana-berry

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Imperfect gold standards for biomarker evaluation

Imperfect Gold Standards for Biomarker Evaluation

Rebecca A. Betensky

Conference on Statistical Issues in Clinical Trials

University of Pennsylvania

April 18, 2012


  • Motivation: need for kidney injury biomarkers for diagnosis of acute kidney injury (AKI)

  • Impact of imperfect gold standard on apparent sensitivity and specificity of perfect biomarker

  • Examine conditional independence assumption: implicit restrictions

  • Bounds on true sensitivity and specificity

Serum creatinine for aki
Serum creatinine for AKI

  • Clinicians have used SCr to diagnose AKI for decades.

  • Acknowledged as inadequate gold standard:

    • Poor specificity in some settings that are not associated with kidney injury

    • Poor sensitivity in setting of adequate renal reserve

    • Relatively slow kinetics after injury

  • Considerable interest in identifying better biomarkers of tubular injury: potentially more accurate and earlier diagnosis.

How to evaluate new biomarkers
How to evaluate new biomarkers?

  • Studies have used changes in SCr as the gold standard against which to test novel tubular injury biomarkers.

  • Aside from problems of specificity and sensitivity,

    • SCr does not directly reflect tubular function or injury

    • Based on a cutoff, which will impact its true spec and sens, and thus that of novel marker.

Conceptual framework
Conceptual framework

  • Actual disease that is the target of the diagnostic test (AKI) is not synonymous with clinical conditions identified by imperfect gold standard (SCr).

  • AKI is difficult to establish without invasive and risky histopathological assessment.

  • Using imperfect gold standard (i.e., imperfect reference test) may distort apparent diagnostic performance of novel biomarker.

Idealized example of perfect novel biomarker

disease prevalence=20%

imperfect gold standard sensitivity=80%, specificity=80%

Relative to imperfect gold standard, a perfect novel biomarker will have apparent sensitivity of 50% and apparent specificity of 64/68=94%.

At lower prevalence, dominant effect of imperfect gold standard is on perfect biomarker’s apparent sensitivity:

apparent sens= apparent spec=

This is similar to imperfect gold standard=“need for dialysis”.

At prevalence of 20%, apparent sensitivity of perfect biomarker is 100% and apparent specificity is 84%. The bounds of the apparent AUC are 0.84-1.00.

Even rare false positives (imperfect gold standard spec=99%) lead to apparent sensitivity of 86% and bounds of apparent AUC of 0.72-0.98.

Cut offs for scr
Cut-offs for SCr dialysis”.

  • Recent clinical studies of novel AKI biomarkers have used a variety of SCr criteria to define AKI.

  • These examples illustrate that different choices of cut-off’s can lead to hugely different apparent properties of a novel biomarker.

What if new biomarker is not perfect
What if new biomarker is not perfect? dialysis”.

  • Need assumptions on relationship between new biomarker and imperfect gold standard and disease to evaluate new biomarker.

  • Conditional independence is convenient; allows for latent class models.

  • However, it introduces implicit restrictions.

What can we learn for dialysis”.imperfect novel biomarker?

  • Previous illustration assumes perfect novel biomarker.

  • Common assumption is conditional independence: P(B=b|G=g,D=d)=P(B=b|D=d)

  • Apparent sensitivity of B relative to G:

  • Apparent specificity of B relative to G:

  • Use these to solve for “true sensitivity” and specificity of B relative to D

  • Bounds on apparent AUC:

    • Apparent AUC< apparent sens × apparent spec

    • Apparent AUC>apparent sens+(1-apparent sens) × apparent spec

Problems with conditional independence
Problems with conditional independence dialysis”.

  • May not be plausible from mechanistic or physiological perspective; the two tests measure related phenomena.

  • May be association between disease severity and test results; two tests may be conditionally independent given disease severity, but not conditionally independent given presence or absence of disease.

  • Assumption of conditional independence constrains the disease prevalence; may not be plausible.

Conditional independence disease severity
Conditional Independence: dialysis”.disease severity

  • Independence given disease severity:

    P(G=1, B=1|D=1,X)=P(G=1|D=1,X)×P(B=1|D=1,X)

    does not imply independence given disease:


Conditional independence disease prevalence
Conditional Independence: dialysis”.disease prevalence

Conditional independence may not be possible at a given disease prevalence.

Bounds on prevalence under conditional independence
Bounds on prevalence under conditional independence dialysis”.

Under conditional independence, split into two tables, with some constraints:



p=P(D=1)= a+ b+c+ d

Example dialysis”.

Ignoring sampling variability, for p(0.285,0.715), conditional independence is not possible.

Other dependence assumptions
Other dependence assumptions dialysis”.

  • With more tests, some methods model relationships between some tests. This is arbitrary, and cannot be tested without a rich enough study.

  • Discrepant resolution method; disfavored due to bias.

  • Composite reference method; success depends on reliability of reference tests.

Bounds on true sensitivity and specificity of a new biomarker
Bounds on true sensitivity and specificity of a new biomarker

  • Explore information available from the comparison of B and G, when no assumptions are made regarding their dependence.

  • Assume operating characteristics of G are known.

  • Derive bounds for operating characteristics of B.

Idea biomarker

  • Simply by bounding cells in cross tabulation of G and (B,D) to be between 0 and 1 we derive bounds for

    • P(D=1, B=1|G=1)

    • P(D=0, B=0|G=0)

  • True sensitivity and specificity of G maximized at maxima of these and minimized at minima of these.

Example biomarker

  • Apparent sens=25/35=71%

  • Apparent spec=60/65=92%

  • Suppose sens of G is 90% and spec of G is 95%

  • True sens of B is (61%,81%)

  • True spec of B is (87%,98%)

  • These bounds are reasonably narrow.

Example biomarker

  • Apparent sens=50%

  • Apparent spec=75%

  • Suppose sens of G is 90% and spec of G is 95%

  • The true sens of B is (33%,67%)

  • True spec of B is (71%,78%)

  • Bound for sens is quite wide, ranging from poor test to possibly adequate; bound for spec is narrow.

Conclusions biomarker

  • Low sensitivity of a promising kidney injury biomarker when expected prevalence of disease is low (e.g., contrast nephropathy – NGAL sensitivity=78%), raises question of imperfect specificity of “gold standard”.

  • Likewise, low specificity when expected prevalence is high (e.g., ICU with hypotension and sepsis – NGAL spec=76% when applied to critically ill patients) raises question of imperfect sensitivity of gold standard.

Conclusions biomarker

  • Need “hard” clinical endpoints for use as gold standard, but even these have potential problems (e.g., long latency, confounding by other risk factors).

  • Could use exposure status, such as to nephrotoxic drug, to avoid SCr.

  • Amount of information in comparing new biomarker to imperfect gold standard may not be very high, even if imperfect gold standard is a good test itself.

  • Conditional independence is problematic – physiologically and technically.

  • Nonparametric bounds may or may not be useful; but certainly reflect true information content.

  • Ultimate validation of a biomarker’s utility is demonstration in a randomized clinical trial that it alters clinical management and improves clinical outcomes.

Acknowledgments biomarker

  • Sarah Emerson, PhD

  • Sushrut Waikar, MD

  • Joseph Bonventre, MD

    Waikar SS, Betensky RA, Emerson SC, Bonventre JV (2012). Imperfect gold standards for kidney injury biomarker evaluation. J Am Soc Nephrol 23: 13-21.

    Emerson SC, Waikar SS, Bonventre JV, Betensky RA (2012). Biomarker validation with an imperfect reference: issues and bounds. Unpublished manuscript.