1 / 25

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees. Rich Caruana Cornell CS Stefan Niculescu CMU CS Bharat Rao Siemens Medical Cynthia Simms Magee Hospital. C-Section in the U.S.

ceana
Download Presentation

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees Rich Caruana Cornell CS Stefan Niculescu CMU CS Bharat Rao Siemens Medical Cynthia Simms Magee Hospital

  2. C-Section in the U.S. • C-section rate in U.S. too high • Western Europe has lower rates, but comparable outcomes • C-section is major surgery => tough on mother • C-section is expensive • Why is U.S. rate so high? • Convenience (rate highest Fridays, before sporting events, …) • Litigation • Social and Demographic issues • Current controls • Financial: pay-per-patient instead of pay-per-procedure • Physician reviews: monthly/quarterly evaluation of rates

  3. Risk Adjustment • Some practices specialize in high risk patients • Some practices have low-risk demographics • Must correct for patient population seen by each practice

  4. Risk Adjustment • Some practices specialize in high risk patients • Some practices have low-risk demographics • Must correct for patient population seen by each practice Need an accurate, objective, evidence-based method for assessing c-section rate of different physician practices

  5. Modeling "Standard Practice" with Machine Learning • Not trying to improve outcomes • Maintain quality of outcomes while reducing c-sec rate • Compare physicians/practices to other physicians/practices • Warn physicians if rate higher than other practitioners

  6. Data • 3 years of data: 1995-1997 from South-Western PA • 22,175 expectant mothers • 16.8% c-section • 144 attributes per patient (82 used for learning) • 17 physician practices • C-section rate varies from 13% to 23% in practices

  7. Learning Methods • Logistic Regression • Artificial Neural Nets • MML Decision Trees • Buntine's IND decision tree package • Maximum size trees (1000's of nodes) • Probabilities generated by smoothing counts along path to leaf • Bagged MML Decision Trees • Better accuracy • Better ROC Area • Excellent calibration

  8. Bagged Decision Trees • Draw 100 bootstrap samples of data • Train trees on each sample -> 100 trees • Average prediction of trees on out-of-bag samples … Average prediction (0.23 + 0.19 + 0.34 + 0.22 + 0.26 + … + 0.31) / # Trees = 0.24

  9. Bagging Improves Accuracy and AUC

  10. Calibration • Good calibration: • If 1000 patients have pred(x) = 0.1, ~100 should get csec

  11. Calibration • Model can be accurate but poorly calibrated • good threshold with uncalibrated probabilities • Model can have good ROC but be poorly calibrated • ROC insensitive to scaling/stretching • only ordering has to be correct, not probabilities themselves • Model can have very high variance, but be well calibrated • if expected value <pred(patient)> is correct -> good calibration • not worried about prediction for any one patient: • aggregate prediction over groups of patients reduces variance • high variance of per patient prediction is OK • bias/variance tradeoff: favor reducing bias at loss of variance

  12. Measuring Calibration • Bucket method • In each bucket: • measure observed c-sec rate • predicted c-sec rate (average of probabilities) • if observed csec rate similar to predicted csec rate => good calibration in that bucket 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95 # # # # # # # # # # # # # # # # # # # 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

  13. Calibration Plot

  14. Bagging Improves Calibration an Order of Magnitude!

  15. Overall Bagged MML DT Performance • Out-of-bag test points (cross validation for bagging) • Accuracy: 90.1% (baseline 83.2%) • ROC Area: 0.9233 • Mean absolute calibration error = 0.013 • Calibration as good in tails as center of distribution

  16. Aggregate Risk • Aggregate risk of c-section for a group is just the average probability of c-section for each patient in that group • If a group's aggregate risk matches the observed c-sec rate, then the group is performing c-secs in accordance with standard practice (as modeled by learned model)

  17. Sanity Check • Could models have good calibration in tails, but still be inaccurate on some groups of patients? • Suppose two kinds of patients, A and B, both with elevated risk: p(A) = p(B) = 0.8 • Patients A and B differ, model might be more accurate or better calibrated on A than B, so a practice seeing patients A might over/under estimate relative to practice seeing B • Are models good on each physician group?

  18. Assumptions • Intrinsic factors: variables given to learning as inputs • Extrinsic factors: variables not given to learning • Provider (HMO vs. pay-per-procedure vs. no insurance vs. …) • patient/physician preference (e.g. socio-economic class) • Learning compensates for intrinsics, not extrinsics • assumes extrinsic not correlated to intrinsic • correlations may weaken sensitivity • Difference between observed and predicted rates due to extrinsic factors, model bias, noise • If models are good, differences due mainly to extrinsic factors such as provider and physician/patient preferences

  19. Summary & Conclusions • Bagged probabilistic decision trees are excellent • Accuracy: 86.9% -> 90.1% • ROC Area: 0.891 -> 0.923 • Calibration: 0.169 -> 0.013! • Learned models of standard practice are good • 3 of 17 groups have high c-section rate • one has high-risk population • one has normal risk population • one has low-risk population!!! • One group has csec rate 4% below predicted rate • how do they do it? • are their outcomes good?

  20. Future Work • 3 more years of unused data: 1992-1994 • Do discrepancies to standard practice correlate with: • practice characteristics? • patient characteristics? • payor characteristics? • outcomes? • Compare with data/models for patients in Europe • Confidence intervals • Research on machine learning to improve calibration • Apply this methodology to other problems in medicine

  21. Sub Population Assessment • Accurately predict c-section rate for groups of patients • aggregate prediction • not worried about prediction for individual patients • Retrospective analysis • not prediction for future patients (i.e. pregnant women) • prediction for pregnancies that already have gone full term • risk assumed by physician, not machine learning • Tell physician/practice if their rate does not match rate predicted by the standard practice model

  22. Calibration • Good calibration: • Given 1000 patients with identical features, observed rate should equal average predicted rate on those patients • Often have: but that's not good enough

More Related