slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T PowerPoint Presentation
Download Presentation
Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T

Loading in 2 Seconds...

play fullscreen
1 / 25

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees. Rich Caruana Cornell CS Stefan Niculescu CMU CS Bharat Rao Siemens Medical Cynthia Simms Magee Hospital. C-Section in the U.S.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision T' - ceana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Using Machine Learning to Model Standard Practice: Retrospective Analysis of Group C-Section Rate via Bagged Decision Trees

Rich Caruana Cornell CS

Stefan Niculescu CMU CS

Bharat Rao Siemens Medical

Cynthia Simms Magee Hospital

c section in the u s
C-Section in the U.S.
  • C-section rate in U.S. too high
    • Western Europe has lower rates, but comparable outcomes
    • C-section is major surgery => tough on mother
    • C-section is expensive
  • Why is U.S. rate so high?
    • Convenience (rate highest Fridays, before sporting events, …)
    • Litigation
    • Social and Demographic issues
  • Current controls
    • Financial: pay-per-patient instead of pay-per-procedure
    • Physician reviews: monthly/quarterly evaluation of rates
risk adjustment
Risk Adjustment
  • Some practices specialize in high risk patients
  • Some practices have low-risk demographics
  • Must correct for patient population seen by each practice
risk adjustment1
Risk Adjustment
  • Some practices specialize in high risk patients
  • Some practices have low-risk demographics
  • Must correct for patient population seen by each practice

Need an accurate, objective, evidence-based method for assessing c-section rate of different physician practices

modeling standard practice with machine learning
Modeling "Standard Practice" with Machine Learning
  • Not trying to improve outcomes
  • Maintain quality of outcomes while reducing c-sec rate
  • Compare physicians/practices to other physicians/practices
  • Warn physicians if rate higher than other practitioners
slide6
Data
  • 3 years of data: 1995-1997 from South-Western PA
  • 22,175 expectant mothers
  • 16.8% c-section
  • 144 attributes per patient (82 used for learning)
  • 17 physician practices
  • C-section rate varies from 13% to 23% in practices
learning methods
Learning Methods
  • Logistic Regression
  • Artificial Neural Nets
  • MML Decision Trees
    • Buntine's IND decision tree package
    • Maximum size trees (1000's of nodes)
    • Probabilities generated by smoothing counts along path to leaf
  • Bagged MML Decision Trees
    • Better accuracy
    • Better ROC Area
    • Excellent calibration
bagged decision trees
Bagged Decision Trees
  • Draw 100 bootstrap samples of data
  • Train trees on each sample -> 100 trees
  • Average prediction of trees on out-of-bag samples

Average prediction

(0.23 + 0.19 + 0.34 + 0.22 + 0.26 + … + 0.31) / # Trees = 0.24

calibration
Calibration
  • Good calibration:
  • If 1000 patients have pred(x) = 0.1, ~100 should get csec
calibration1
Calibration
  • Model can be accurate but poorly calibrated
    • good threshold with uncalibrated probabilities
  • Model can have good ROC but be poorly calibrated
    • ROC insensitive to scaling/stretching
    • only ordering has to be correct, not probabilities themselves
  • Model can have very high variance, but be well calibrated
    • if expected value <pred(patient)> is correct -> good calibration
    • not worried about prediction for any one patient:
      • aggregate prediction over groups of patients reduces variance
      • high variance of per patient prediction is OK
      • bias/variance tradeoff: favor reducing bias at loss of variance
measuring calibration
Measuring Calibration
  • Bucket method
  • In each bucket:
    • measure observed c-sec rate
    • predicted c-sec rate (average of probabilities)
    • if observed csec rate similar to predicted csec rate => good calibration in that bucket

0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

overall bagged mml dt performance
Overall Bagged MML DT Performance
  • Out-of-bag test points (cross validation for bagging)
  • Accuracy: 90.1% (baseline 83.2%)
  • ROC Area: 0.9233
  • Mean absolute calibration error = 0.013
  • Calibration as good in tails as center of distribution
aggregate risk
Aggregate Risk
  • Aggregate risk of c-section for a group is just the average probability of c-section for each patient in that group
  • If a group's aggregate risk matches the observed c-sec rate, then the group is performing c-secs in accordance with standard practice (as modeled by learned model)
sanity check
Sanity Check
  • Could models have good calibration in tails, but still be inaccurate on some groups of patients?
  • Suppose two kinds of patients, A and B, both with elevated risk: p(A) = p(B) = 0.8
  • Patients A and B differ, model might be more accurate or better calibrated on A than B, so a practice seeing patients A might over/under estimate relative to practice seeing B
  • Are models good on each physician group?
assumptions
Assumptions
  • Intrinsic factors: variables given to learning as inputs
  • Extrinsic factors: variables not given to learning
    • Provider (HMO vs. pay-per-procedure vs. no insurance vs. …)
    • patient/physician preference (e.g. socio-economic class)
  • Learning compensates for intrinsics, not extrinsics
    • assumes extrinsic not correlated to intrinsic
    • correlations may weaken sensitivity
  • Difference between observed and predicted rates due to extrinsic factors, model bias, noise
  • If models are good, differences due mainly to extrinsic factors such as provider and physician/patient preferences
summary conclusions
Summary & Conclusions
  • Bagged probabilistic decision trees are excellent
    • Accuracy: 86.9% -> 90.1%
    • ROC Area: 0.891 -> 0.923
    • Calibration: 0.169 -> 0.013!
  • Learned models of standard practice are good
  • 3 of 17 groups have high c-section rate
    • one has high-risk population
    • one has normal risk population
    • one has low-risk population!!!
  • One group has csec rate 4% below predicted rate
    • how do they do it?
    • are their outcomes good?
future work
Future Work
  • 3 more years of unused data: 1992-1994
  • Do discrepancies to standard practice correlate with:
    • practice characteristics?
    • patient characteristics?
    • payor characteristics?
    • outcomes?
  • Compare with data/models for patients in Europe
  • Confidence intervals
  • Research on machine learning to improve calibration
  • Apply this methodology to other problems in medicine
sub population assessment
Sub Population Assessment
  • Accurately predict c-section rate for groups of patients
    • aggregate prediction
    • not worried about prediction for individual patients
  • Retrospective analysis
    • not prediction for future patients (i.e. pregnant women)
    • prediction for pregnancies that already have gone full term
    • risk assumed by physician, not machine learning
  • Tell physician/practice if their rate does not match rate predicted by the standard practice model
calibration2
Calibration
  • Good calibration:
  • Given 1000 patients with identical features, observed rate should equal average predicted rate on those patients
  • Often have:

but that's not good enough