1 / 44

ROC Analysis

ROC Analysis. Emily Kistner-Griffin, PhD Amy Wahlquist, MS Cancer Prevention and Control Statistics Tutorial August 13, 2009. Outline. Motivating Example: Chest CT Classification Sensitivity and Specificity ROC curve and AUC estimation Nonparametric Curve Parametric Curve

ranae
Download Presentation

ROC Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ROC Analysis Emily Kistner-Griffin, PhD Amy Wahlquist, MS Cancer Prevention and Control Statistics Tutorial August 13, 2009

  2. Outline • Motivating Example: Chest CT • Classification • Sensitivity and Specificity • ROC curve and AUC estimation • Nonparametric Curve • Parametric Curve • ROC and Logistic Regression • Comparing ROC curves

  3. I. Motivating Example: Chest CT Evaluating the probability of malignancy in pulmonary nodules seen on chest CT in 213 MUSC patients from two cohorts Sample of 194 subjects seen in pulmonary clinic and 19 subjects with CT previous to an unrelated surgical intervention Develop a prediction model from clinical data and radiological characteristics of lung nodules

  4. Chest CT A model of P (malignancy) of pulmonary nodules has been described in the literature (Swensen SJ et al., 1997) Model included three demographic characteristics: patient age, smoking status (ever vs. never), any history of cancer Model included three radiological characteristics: diameter, upper lobe location, and spiculation

  5. Chest CT Swensen et al. reported an area under the reciever operating curve of 0.8014 ± 0.0360 in a validation sample, using a logistic regression approach. Interested in how well Swensen’s model performs in the MUSC cohort. Interested in evaluating whether we can improve the prediction model by including other patient characteristics

  6. II. Classification • Consider medical tests that are measured on a continuous or ordinal scale • Goal: to describe the performance of the medical test in classifying subjects into individuals with and without disease • Examples: PSA and CA-125 as biomarkers of prostate and ovarian cancer; BI-RADS for breast imaging (radiologist determined probability of malignancy)

  7. Classification from CT • Consider the diameter of the nodule as measured on the CT scan (range: 3.3mm-15mm) • Larger nodules are more likely to be malignant (OR: 1.34, 95% CI: 1.20-1.49) • How well can we predict malignancy from nodule diameter?

  8. Contingency Table

  9. Classification Tables • Choose a cut-point on continuous or ordinal scale in order to assign disease status

  10. III. Sensitivity & Specificity • For selected cut-point determine sensitivity and specificity of medical test (or prediction model) • Sensitivity = Pr ( TP | + ) = TP / (TP+FN) = TPF • Specificity = Pr ( TN | — ) = TN / (TN+FP) = TNF • In order to summarize test characteristics – must compute sensitivity and specificity at multiple cut-points

  11. Sensitivity & Specificity Example

  12. From Metz CE (1978) Basic Principles of ROC Analysis. Seminars in Nuclear Medicine; 8 (4): 283 – 297.

  13. Decision Threshold • Lowering the threshold increases TPF (sensitivity) and the FPF (1-specificity) • Raising the threshold decreases the TPF and the FPF • Points representing all possible TPF and FPF lie on a curve – passing through the lower (0,0) corner when all tests are called negative and the upper (1,1) corner when all the tests are called positive • If the test is informative then all other points on the curve must be above the diagonal (TP more likely than FP) • The curve describing the compromises between TPF and FPF is called the ROC curve

  14. roctab malignant diameter, detail graph Detailed report of Sensitivity and Specificity ------------------------------------------------------------------------------ Correctly Cutpoint Sensitivity Specificity Classified LR+ LR- ------------------------------------------------------------------------------ ( >= 3.3 ) 100.00% 0.00% 33.80% 1.0000 ( >= 4 ) 100.00% 1.42% 34.74% 1.0144 0.0000 ( >= 5 ) 97.22% 13.48% 41.78% 1.1236 0.2061 ( >= 6 ) 97.22% 29.08% 52.11% 1.3708 0.0955 ( >= 7 ) 93.06% 39.72% 57.75% 1.5436 0.1749 ( >= 8 ) 83.33% 50.35% 61.50% 1.6786 0.3310 ( >= 9.1 ) 70.83% 61.70% 64.79% 1.8495 0.4727 ( >= 10 ) 65.28% 70.92% 69.01% 2.2449 0.4896 ( >= 11 ) 56.94% 78.72% 71.36% 2.6764 0.5469 ( >= 12 ) 45.83% 82.98% 70.42% 2.6927 0.6528 ( >= 13 ) 25.00% 90.78% 68.54% 2.7115 0.8262 ( >= 14 ) 13.89% 95.04% 67.61% 2.7976 0.9061 ( >= 15 ) 0.00% 98.58% 65.26% 0.0000 1.0144 ( > 15 ) 0.00% 100.00% 66.20% 1.0000 -------------------------------------------------------------------- ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67317 0.80900

  15. Likelihood Ratios LR+ = sensitivity / (1-specificity) = TPF FPF LR- = (1-sensitivity) / specificity = 1-TPF 1-FPF LR+ is the slope between the origin and the point on the ROC curve and LR- is the slope between the point on the curve and the (1,1) point (Choi 1998)

  16. IV. ROC curve and AUC estimation • ROC: Receiver Operating Characteristic • Developed in signal detection theory to illustrate how the receiver deciphers between signal and noise (1960s) • Illustration of two test characteristics: sensitivity and specificity at selected cut-points (decision thresholds) • Popularized in medical testing in the field of Radiology (1980s)

  17. ROC curve and Thresholds • ROC curve describes disease detection independent of disease prevalence (sensitivity and specificity are also) • Prevalence may help determine the operating threshold: • Low prevalence suggests reducing FPF (higher specificity, higher threshold, lower part of the curve) • High prevalence suggests increasing TPF (higher sensitivity, lower threshold, higher part of the curve) • In practice, must consider costs and consequences of FP and FN before selecting the desirable cut-off: • Consequence of FN: death? • Consequence of FP: stressful, costly work-up or treatment

  18. Area Under the ROC Curve • Summarizes the performance of the test • Probability that the result of the test for a randomly selected abnormal subject will be greater than the result of the test for a randomly selected normal subject • Average TPF: averaged across whole range of FPF in (0,1) • Perfect test gives AUC = 1.0 and an uninformative test gives AUC=0.50 • Parametric and non-parametric approaches to constructing the ROC curve and calculating the area under the curve (AUC)

  19. a. Nonparametric ROC Curve • Constructed by plotting sensitivity and (1 – specificity) at each possible cut-point • Area under the curve (AUC) constructed using the trapezoidal rule • Variance estimators have been derived Delong et al. (1988), Hanley and McNeil (1982); Bamber (1975)

  20. Variance of AUC • Specifically for Delong et al. (1988) variance estimate:

  21. Confidence Intervals for AUC • Must consider distribution of AUC estimate: asymptotically normal or binomial assumption • Must select standard error estimate (Delong et al. approach is the default): ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67317 0.80900 . roctab malignant diameter, binomial ROC -- Binomial Exact -- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 213 0.7411 0.0347 0.67754 0.79916

  22. b. Parametric ROC Curve • Assumes a binormal model • A monotone transformation of the test results exists to give results that are normally distributed in the diseased and non-diseased populations • Method involves fitting a straight line to the empirical ROC points by plotting using normal probability scales on each axis (plot inverse of the standard normal cumulative distribution function for sensitivity and specificity) • Intercept of the line is the standardized difference in the continuous variable between the two populations; slope is a ratio of the standard deviations

  23. Parametric AUC Estimation AUC is a function of the slope and intercept of the estimated line – using the standard normal cumulative distribution function

  24. Nonparametric vs. Parametric • Parametric approaches assume a binormal distribution to makes inferences (obtain MLE): only when the assumption is true are the estimators unbiased • With continuous data a nonparametric approach is recommended • With discrete ratings a parametric approach is recommended as nonparametric approaches tend to underestimate the true AUC • Note standard error of the AUC is smaller using a continuous scale

  25. . rocfit malignant diameter, cont(10) Fitting binormal model: Binormal model of malignant on diameter Number of obs = 213 Goodness-of-fit chi2(7) = 8.52 Prob > chi2 = 0.2894 Log likelihood = -456.16837 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- intercept | 0.997803 0.181857 5.49 0.000 0.641370 1.354236 slope (*) | 1.170680 0.139487 1.22 0.221 0.897290 1.444070 -------------+---------------------------------------------------------------- /cut1 | -1.296367 0.141750 -9.15 0.000 -1.574192 -1.018542 /cut2 | -0.668255 0.110960 -6.02 0.000 -0.885733 -0.450777 /cut3 | -0.222392 0.102919 -2.16 0.031 -0.424110 -0.020674 /cut4 | 0.202507 0.101135 2.00 0.045 0.004286 0.400729 /cut5 | 0.499186 0.103559 4.82 0.000 0.296214 0.702159 /cut6 | 0.756664 0.109249 6.93 0.000 0.542539 0.970788 /cut7 | 1.040925 0.119741 8.69 0.000 0.806237 1.275614 /cut8 | 1.541544 0.150124 10.27 0.000 1.247307 1.835781 /cut9 | 2.369036 0.244933 9.67 0.000 1.888975 2.849096 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ | Indices from binormal fit Index | Estimate Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- ROC area | 0.741532 0.034471 0.673970 0.809094 delta(m) | 0.852328 0.144007 0.570080 1.134576 d(e) | 0.919346 0.151542 0.622329 1.216364 d(a) | 0.916517 0.150751 0.621050 1.211985 ------------------------------------------------------------------------------ (*) z test for slope==1

  26. . rocplot, confband

  27. V. ROC and Logistic Regression • Prediction Model from Chest CT • Use logistic regression to create probabilities of malignancy (represent diagnostic results from multiple predictors) • Compare two logistic models of malignancy – one from previous literature and model with selected variables from the MUSC data • Variables suggested in Swensen SJ et al. + surgical cohort (variable describing collection of samples) • Variables selected using backwards regression in MUSC data

  28. . logistic malignant surgical_cohort patient_age any_non_lung_cancer_history lung_cancer_history smoker_ever diameter upper_lobe spiculated Logistic regression Number of obs = 207 LR chi2(8) = 73.20 Prob > chi2 = 0.0000 Log likelihood = -94.454613 Pseudo R2 = 0.2793 ------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- surgical_c~t | 7.045799 4.91585 2.80 0.005 1.794929 27.65751 patient_age | .9933921 .0184868 -0.36 0.722 .9578115 1.030294 any_non_lu~y | 4.017493 1.537066 3.63 0.000 1.897978 8.50392 lung_cance~y | 10.43958 8.011157 3.06 0.002 2.319987 46.9765 smoker_ever | 1.026627 .5138138 0.05 0.958 .3849437 2.737967 diameter | 1.233463 .0787204 3.29 0.001 1.088433 1.397817 upper_lobe | 1.483983 .5613942 1.04 0.297 .7069965 3.114874 spiculated | 2.094564 .8488535 1.82 0.068 .9465232 4.635065 ------------------------------------------------------------------------------ . predict swensen

  29. . lsens, gensens(sensitivity) genspec(specificity) genpr(cutoffs)

  30. . lroc

  31. Postestimation: 95% CI Use saved predicted probabilities from logistic model: . roctab malignant swensen ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] -------------------------------------------------------- 207 0.8344 0.0294 0.77682 0.89203 . roctab malignant swensen, graph

  32. VI. Comparing ROC curves Again use saved predicted probabilities from logistic model: . roccomp malignant diameter swensen ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------- diameter 207 0.7351 0.0357 0.66518 0.80499 swensen 207 0.8344 0.0294 0.77682 0.89203 ------------------------------------------------------------------------- Ho: area(diameter) = area(swensen) chi2(1) = 9.52 Prob>chi2 = 0.0020

  33. Testing AUC Equality Using quantities defined by Delong et al. for variance estimation to define chi-squared test statistic:

  34. Models with Multiple Predictors . logistic malignant diameter any_non_lung_cancer_history surgical_cohort lung_cancer_history pet_positive pack Logistic regression Number of obs = 206 LR chi2(6) = 112.09 Prob > chi2 = 0.0000 Log likelihood = -75.983489 Pseudo R2 = 0.4245 ------------------------------------------------------------------------------ malignant | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- diameter | 1.218243 .08847 2.72 0.007 1.05662 1.404588 any_non_lu~y | 3.830492 1.693158 3.04 0.002 1.610666 9.109691 surgical_c~t | 6.996053 5.682876 2.39 0.017 1.423719 34.37811 lung_cance~y | 10.16367 8.299078 2.84 0.005 2.051197 50.36092 pet_positive | 11.38458 5.025505 5.51 0.000 4.79259 27.04355 pack | 1.007755 .0046908 1.66 0.097 .9986032 1.016991 ------------------------------------------------------------------------------ . predict musc

  35. . roccomp malignant diameter swensen musc, graph summary

  36. . roccomp malignant diameter swensen musc ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------------------- diameter 202 0.7410 0.0359 0.67062 0.81131 swensen 202 0.8344 0.0298 0.77605 0.89272 musc 202 0.8987 0.0230 0.85374 0.94372 ------------------------------------------------------------------------- Ho: area(diameter) = area(swensen) = area(musc) chi2(2) = 22.81 Prob>chi2 = 0.0000 . rocgold malignant swensen diameter musc ------------------------------------------------------------------------------- ROC Bonferroni Area Std. Err. chi2 df Pr>chi2 Pr>chi2 ------------------------------------------------------------------------------- swensen (standard) 0.8344 0.0298 diameter 0.7410 0.0359 8.2690 1 0.0040 0.0081 musc 0.8987 0.0230 8.6304 1 0.0033 0.0066 -------------------------------------------------------------------------------

  37. Questions? Next: ROC in SPSS

  38. b. Lorenz Curves • ROC curve represents a monotone increasing function of the FPF (1-specificity) • If the risk of disease does not vary monotonically with the diagnostic test then the ROC may not be convex • Lee (1999) suggested a Lorenz curve (used commonly in economics) for such data • The methodology involves reordering the test results to ensure that the ratio of disease subjects / no disease subjects in each category is increasing • Must consider whether reordering makes practical sense (usually sensible on an ordinal scale but not necessarily on a continuous scale)

  39. Defining Lorenz Curves • Plot cumulative percent of individuals with disease against the cumulative percent of individuals without the disease at each cut-point • Examples when a Lorenz might be appropriate: • Test has similar means but different variances across populations with and without disease • Bimodal distribution of test in either population • Skewed distribution in population with disease and symmetric distribution in population without the disease • A flatter Lorenz curve suggests a worse diagnostic test • Two summary indices describe the curvature • Gini index: twice the area between the Lorenz curve and the diagonal line • Pietra index: twice the area of the largest triangle inscribed between the diagonal line and the curve

  40. Lorenz Curves and ROC . roctab malignant diameter, lorenz graph . roctab malignant diameter, lorenz Lorenz curve --------------------------- Pietra index = 0.2322 Gini index = 0.3301 • If the at-risk probabilities increase (or decrease) with increasing values of the test results then Gini = 2(AUC)-1 • Larger Pietra and Gini indices describe better diagnostic tests • Gini index is related to average difference in post-test probabilities for two randomly selected subjects and Pietra index is related to average absolute change between pre and post test probabilities of disease

More Related