1 / 28

Re currence using C laims and P ROs for S EER E nhancement: The ReCAPSE Project

The ReCAPSE project aims to enhance cancer registries by adding information on recurrence events through medical claims and patient self-report. Previous work on claims for recurrence has shown promising results. This project focuses on developing a prediction algorithm using rich features from SEER registry and claims data for accurate identification and timing of recurrence events.

mbrand
Download Presentation

Re currence using C laims and P ROs for S EER E nhancement: The ReCAPSE Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recurrence using Claims and PROs for SEER Enhancement: The ReCAPSE Project Ruth Etzioni Fred Hutchinson Cancer Research Center With Teresa A’mar, Jessica Chubak, Bin Huang, David Beatty, Tomas Corey, Daniel Markowitz, Catherine Fedorenko, Chris Li, Kathi Malone and Steve Schwartz

  2. Augmenting cancer registries to capture recurrence • Problem: Information on recurrence events not prospectively collected • Ideal: Add information about recurrence as part of the abstract of data collected for each registry patient Two data approaches • Either way – will generally be predicting via some sort of algorithm Basic or minimalist: target a few broadly available data streams to inform about the likely occurrence and timing of the recurrence event Exhaustive: target every stream of relevant data to be sure to cover the occurrence and timing of the recurrence event

  3. What is the objective? To add recurrence status and timing to the data abstract for each individual in the registry? To summarize the population burden of recurrence within subgroups defined by patient and disease characteristics? To provide guidance to registries doing automated and manual abstraction of recurrence data to reduce time and effort?

  4. What is the objective? To add recurrence status and timing to the data abstract for each individual in the registry? To summarize the population burden of recurrence within subgroups defined by patient and disease characteristics? To provide guidance to registries doing automated and manual abstraction of recurrence data to reduce time and effort? Need high sensitivity and high specificity Need high positive predictive value Need high negative predictive value

  5. ReCAPSE ReCAPSEaims to identify recurrence events using • Medical claims • Patient self report Why medical claims? • STRUCTURED, SCALABLE Why patient self-report? • IMPROVE ACCURACY, MORE CONVENIENT THAN EVER

  6. Previouswork using claims for recurrence Chubak et al 2012: “Second BC event” • 3152 women from a single network diagnosed with stage I/II BC • Classification tree algorithm using procedure, diagnosis and drug codes • High-sensitivity algorithm • Sensitivity=96% • Specificity=95% • High-specificity algorithm • Sensitivity=89% • Specificity=99% A (SINGLE) DECISION TREE

  7. Recent work using claims for recurrence Ritzwoller et al 2018 • 6740 stage I-III BC cases • Two-phase algorithm • Phase I –recurrence classification via ICD9 scores and regression • Phase II – recurrence timing using changepointanalysis to locate the point of biggest change in code group count • Reconcile changepoints across code groups • AUC for phase I: 0.9-0.95 • Timing error 11-12%

  8. Outline • Problem statement • Previous work on claims for recurrence • The critical ingredients • Data with rich “features” for prediction • Gold standard for training and evaluation • A prediction algorithm • Metrics for performance on test data • Ongoing and future work

  9. Outline • Problem statement • Previous work on claims for recurrence • The critical ingredients • Data with rich features for prediction • Gold standard for training and evaluation • A prediction algorithm • Metrics for performance on test data • Ongoing and future work

  10. DATA: SEER registry and claims from KPWA • Registry data from CSS: Puget Sound Cancer Surveillance System • Claims from KPWA internal data warehouse • Gold standard: manual medical record review COUNTS OF MONTHLY CLAIMS BY MONTH SINCE DIAGNOSIS

  11. Patient characteristics and incidence of recurrence 2698 npn-recurrent and 404 recurrent cases Timing of gold standard recurrence* events Based on medical record review

  12. Outline • Problem statement • Previous work on claims for recurrence • The critical ingredients • Data with rich features for prediction • Gold standard for training and evaluation • A prediction algorithm • Metrics for performance on test data • Ongoing and future work

  13. Prediction algorithm concept • Predict the event of recurrence at the month level • Classify each month as pre recurrence (0) or post recurrence (1) • The prediction is a probability that the month is post recurrence Gold standard data (blue dots) each month: 0 if pre- and 1 if post recurrent

  14. Features consist of registry variables and monthly counts of code groups: 77 diagnostic 156 procedure

  15. Feature engineering Feature set designed to: • Identify changes in the code groups that associate with a second event • Be able to handle gaps in the data • Use all available information to predict outcome at any given month Each month predict pre or post recurrence based on: • Count of each code group in that month • Months since last instance of that code group • Months until next instance of that code group • Fraction of prior months in which the code group appears Note: this feature set looks backward and forward to predict the present

  16. Gradient boosted trees • An ensemble algorithm: prediction is based on combining many trees • Each tree targets weaknesses (residuals) in the prior trees • There is no formula for the prediction; just a list of variables in order of their importance in the ensemble

  17. Top features • All features are month-specific • cumul_ratio • Concentration of this code in previous months • time_since • Months since this code group was last seen • time_until • Months until next instance of this code group

  18. Month-level predictions – test set (R) BLUE: OBSERVED RED: PREDICTED

  19. Month-level predictions – test set (R) x BLUE: OBSERVED RED: PREDICTED Predict that a person is recurrent given threshold T if any of their monthly predictions exceeds T x

  20. Month-level predictions – test set (NR) BLUE: OBSERVED RED: PREDICTED Predict that a person is recurrent given threshold T if any of their monthly predictions exceeds T

  21. Outline • Vision • Previous work on claims for recurrence • The critical ingredients • Data with rich features for prediction • Gold standard for training and evaluation • A prediction algorithm • Metrics for performance on test data • Ongoing and future work

  22. Performance on test data • Month level for classification of pre- versus post-recurrent • Month-level ROC curve and AUC • Person level for classification of recurrent or not • For a threshold T classify as recurrent if any predicted probabilities exceed T • Calculate sensitivity and specificity for each of a set of thresholds T • Accuracy of timing of event predictions given threshold T • Choose predicted month as first month for which prediction exceed T • Summarize predicted minus observed month for truly recurrent patients • Draw Kaplan-Meier curve of observed versus predicted time to recurrence

  23. Month-level ROC curve and person-level performance Person-level performance based on classifying as recurrent if any predicted P’s exceed T = 0.5 AUC=0.986

  24. Person-level performance by T: Test set T     #nr     #r   |Sens Spec PPV    NPV0.10    538    79|    0.962    0.942    0.710    0.9940.20    538    79|    0.924    0.955    0.753    0.9880.30    538    79    |    0.886    0.959    0.761    0.9830.40    538    79|    0.886    0.972    0.824    0.9830.50    538    79|    0.886    0.978    0.854    0.9830.60    538    79|    0.886    0.980    0.864    0.9830.70    538    79|    0.873    0.983    0.885    0.981 Sensitivity: P(prediction positive given recurrent Specificity: P(prediction negative given non recurrent) PPV: P(recurrent given prediction positive) NPV: P(non-recurrent given prediction negative)

  25. Accuracy of predicted timing of recurrence: T=0.5 Curve is constructed among the set of all truly recurrent cases in the test set N=79

  26. Next steps Compare performance with other algorithms • Chubak et al and Ritzwoller et al Repeat and evaluate performance in a multi-insurer setting • HICOR-CSS linkage links registry, private insurers and Medicare/Medicaid in Puget Sound • Data includes drug codes and codes for non-cancer claims in addition to diagnosis and procedure codes • BRAVO research study at Fred Hutch has collected gold standard recurrence data by interview and chart review from a subset of these cases Add patient-reported disease status for a subset of gold-standard cases • Casesfrom Swedish Hospital Breast Cancer Registry whom we are surveying Deploy in other registries (Kentucky and beyond)

  27. What is the objective? To add recurrence status and timing to the data abstract for each individual in the registry? To summarize the population burden of recurrence within subgroups defined by patient and disease characteristics? To provide guidance to registries doing automated and manual abstraction of recurrence data to reduce time and effort? Need high sensitivity and high specificity Need high positive predictive value* Need high negative predictive value

  28. STEVE SCHWARTZ CHRIS LI KATHI MALONE TOMAS COREY TERESA AMAR BIN HUANG (Kentucky Regsitry) CATHERINE FEDORENKO JESSICA CHUBAK (KPWA) DAVID BEATTY MD DANIEL MARKOWITZ MD NCI UG3-UH3 SUPPORT, NATIONAL CANCER INSTITUTE

More Related