1 / 25

Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology. Michael E. Matheny, M.D. Goal. Comparison of support vector machines and logistic regression risk modeling performance over time for the outcome of death in pre-intervention cardiac catheterization patients.

avari
Download Presentation

Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology Michael E. Matheny, M.D.

  2. Goal • Comparison of support vector machines and logistic regression risk modeling performance over time for the outcome of death in pre-intervention cardiac catheterization patients.

  3. Pre-intervention Risk Assessment • Percutaneous Coronary Intervention (PCI) is a high volume procedure with significant morbidity & mortality • Risk of death in PCI varies widely based on co-morbidities • Providing accurate case level estimations can greatly aid patient and physician decision-making

  4. Domain Data Quality • The American College of Cardiologists has published a standardized data dictionary (ACC-NCDR) and mandates that accredited centers maintain detailed data on all PCI patients • Some states, including Massachusetts, now have mandatory reporting of case data based on the ACC-NCDR

  5. Current Risk Model StandardLogistical Regression (LR) • Gold standard for risk modeling in interventional cardiology • Type of generalized non-linear model • Used in analysis of a binary outcome • Bounded by 0 and 1 • Feature (variable) selection • From All Available Data • Known Risk Factors from Prior Studies • Selected Subset of data based on Study Design

  6. Alternative Risk ModelSupport Vector Machine (SVM) • Key Features • Kernel Functions - introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm • Linear • Polynomial • Radial Based • Global Minimum

  7. Risk Model EvaluationDiscrimination • Provides an estimate of population level accuracy • Area under the Receiver Operating Characteristic (ROC) Curve • Graphed by the sensitivity vs. 1-specificity at different thresholds

  8. Risk Model EvaluationCalibration • Provides an estimation of case level accuracy • Hosmer-Lemeshow’s Goodness-of-Fit Test • Primarily used in logistic regression • Calculates how well the observed and expected frequencies match • Handles data sparsity better than more common methods (Variance, Pearson’s) • P > 0.05 is a good fit

  9. Source Data • Brigham & Women’s Hospital • Interventional Cardiology Database • January 1, 2002 – October 30, 2004 • 5383 Cases • Data split two ways each into 2/3 Training (3588) and 1/3 Test (1795) • Sequential Split • sorted chronologically • October 27, 2003 split • Random Split

  10. Sample DemographicsOverview

  11. Model Features

  12. Logistic RegressionModel Development • STATA 8.2 (College Station, TX) • Backwards Stepwise Technique • Exclusion Threshold (P 0.05 – 0.15) • Feature Selection

  13. Logistic RegressionFeature Selection • Model development • Sequential Training Set • Stepwise Backwards (P = 0.10) used for feature selection • Stepwise feature removal based on ROC and HL Goodness-of-fit (HL) optimization

  14. Logistic RegressionFeature Selection

  15. Logistic RegressionEvaluation

  16. Support Vector MachineModel Development • GIST 2.1.1 (Columbia University, NY, NY) • STATA 8.2 (College Station, TX) • All variables used • Kernel Choice • Polynomial (1-6) • Radial width factor (related to sigma) (0.1-20) • Probabilistic Output Methodology • Discriminant: distance from hyperplane • LR Model using Discriminant as the only feature • Established method to convert SVM classification to regression • Allows use of HL Goodness of fit

  17. Support Vector MachinePolynomial Evaluation

  18. Support Vector MachinePolynomial Evaluation

  19. Support Vector MachineRadial Evaluation

  20. Support Vector MachineRadial Evaluation

  21. DiscussionAll Discrimination • All Models showed excellent performance • None of the models was significantly different in performance • This measure was relatively insensitive to changes in data across widely variable levels of calibration

  22. DiscussionLR Calibration • For this data, LR was unable to maintain calibration. This is likely due to temporal data drift • The LR models required manual feature selection and expert knowledge to calibrate the training data sets

  23. DiscussionSVM Calibration • Some versions of both kernel types were able to maintain calibration on both data sets • Calibration was maintained across larger parameter ranges of both kernels for the random data set than the sequential data set • Current assessments of discrimination and calibration on the training set are insufficient to choose the optimal kernel parameter

  24. Conclusions • SVMs could be superior to LR in terms of maintaining calibration over time in this domain • Further exploration is needed to develop additional markers of model robustness • Further work in evaluating optimal time intervals to create new models or recalibrate old models

  25. The end

More Related