1 / 25

Imputation-enhanced Prediction of Septic Shock In ICU Patients

Imputation-enhanced Prediction of Septic Shock In ICU Patients. Joyce C. Ho, Cheng H. Lee and Joydeeph Ghosh University of Texas at Austin HI-KDD 2012: ACM SIGKDD Workshop on Health Informatics. Presenter : Kiyana Zolfaghar. Outline. Motivation Challenges of Clinical Data

noah
Download Presentation

Imputation-enhanced Prediction of Septic Shock In ICU Patients

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imputation-enhanced Prediction of Septic Shock In ICUPatients Joyce C. Ho, Cheng H. Lee and JoydeephGhosh University of Texas at Austin HI-KDD 2012: ACM SIGKDD Workshop on Health Informatics Presenter : Kiyana Zolfaghar

  2. Outline • Motivation • Challenges of Clinical Data • Predictive model for • Sepsis Risk • Septic Shock • Impact of imputation methods on prediction • Results

  3. Sepsis and Septic shock a Severe, systemic inflammatory response with a presumed or identified source of infection. Sepsis with one or more organ dysfunction, hypoperfusion or hypotension a complication characterized by low blood pressure despite treatment by >600 mL of fluid inputs in the last hour

  4. Motivation • Septic Shock as a Severe illness • the most common cause of death in western societies • 25% of ICU bed utilization in western countries • mortality rates range 12.8% for sepsis to 45.7%for septic shock • Motivation for Prediction of Septic Shock in ICU Patients • Early intervention and therapy can improve the outcome of patients • treatment transition treated by critical care physicians in later phases Proactive treatment in early phases

  5. Prediction of Sepsis and Septic shock • Data mining approach for identifying patients at risk for developing sepsis • Predictive models • Issues Regarding Classification and Prediction Data Preparation • Feature selection • Data cleaning • remove or reduce noise • treatment of missing values • Regression method • Support vector Machines • Decision trees • Bayesian Classification …..

  6. Challenges of Clinical Data • Typically noisy and inconsistently gathered • Manually recordings of patient's data at irregular intervals • Accurate measures for physiological variables require use of invasive techniques • Naïve Solution • Simply ignoring subjects or features with missing data large amounts of missing data in clinical studies Dramatic decrease in sample sizes or feature spaces Bias in the results

  7. The Paper Contribution • Investigates the role and impact of imputation methods while building predictive models for • Sepsis risk • Septic shock • Methodology of Research • Data Selection • Building predictive models for sepsis and Septic shock • Leveraging different imputation methods on data • Results

  8. Dataset Description • MIMIC-II Database (MultiparameterIntelligent Monitoring in Intensive Care) • Publicly and freely available • Includes very large population of ICU patients • contains high temporal resolution data including • lab results • electronic documentation • monitor trends and waveforms. Funded by : National Institute Of Biomedical Imaging and Bioengineering

  9. Clinical Records in MIMIC-II • Overview of the data categories • General • Patient demographics • Hospital admissions & discharge Info. • Room tracking, death dates • ICD-9 codes • Physiological measures • Hourly vital sign metrics • Medication records • Lab test results • Fluid Balance • Input and output records • Notes and Reports • Discharge summary, nursing progress notes • Radiology and echo reports.

  10. Data Selection and Target Classes • Dataset Size : 12,179 patients • Avoid adults < 18 at time of admission • Patients with least ten observations of BP, TEMP, HR… • Target class • Sepsis Risk Prediction • Patients identified by ICD-9 codlings (\995.91" or \995.92“) • ~ 10:8% of dataset size (1,310 patients) • Septic shock Prediction • Patient with hypotension and total fluid intake >600 mL • ~ 44:7%of sepsis patients (586 patients)

  11. Predictive Model for Sepsis Risk • Features • Patient's Clinical History • Demographic data (gender and ages) • Medical history • Basic health data (weight ..) • Measurements of Physiological Variables • logistic Regression as prediction model • use only the clinical history features • use clinical history features after step-wise regression • all available features • use all available features after step-wise regression

  12. Stepwise logistic Regression model • Logistic Regression • Type of regression analysis used for predicting the outcome of a categorical target variable • Stepwise Regression • the choice of predictive variables is carried out by an automatic procedure • starting with no variables in the model • testing the addition of each variable using a chosen model comparison criterion • adding the variable (if any) that improves the model the most • repeating this process until none improves the model.

  13. Septic Shock Prediction Model • Features • physiologic and laboratory values • Importance of time in septic shock • Feature matrices creation at reference times of 30, 60, 90, and 120 minutes prior to the onset of septic shock. • Prediction Models • Logistic Regression • Support Vector Machine • Classification tree all available features, features set after forward stepwise regression features set after backward stepwise regression

  14. Decision Tree Learning • Goal • create a model to predicts value of a target variable based on several input variables • Learning a decision tree • Recursive partitioning Based on selected attribute • stopping partitioning All samples for a given node belong to the same class • Decision tree • Classification Trees • Regression Trees Sex Male Female Age Survived <= 9.5 >9.5 36% sibsp died 61% > 2.5 <= 2.5 Survived died 2% 2%

  15. Missing Value Imputation • Missing data in MIMIC II excluding records with missing value 47.2%. Reduction in dataset size

  16. Imputation Methods 1) Mean Feature Values (Mean for Subgroup) • Derived from the patients' gender and age group • accounted for fundamental physiological differences between genders and among age groups • Challenges • Mean substitution is especially problematic when there are many missing values • distorts the distribution and variance

  17. Imputation Methods 2) Matrix Factorization-based Approaches (Very popular in Bioinformatics fields) • SVDImpute • Used a linear combination of k-eigenvalues to predict the missing value • Probabilistic Principal Component Analysis (PPCA) • Combined an Expectation-Maximization (EM) approach to Principal Component Analysis (PCA) with a probabilistic model • Use a likelihood function to penalizes data far from the training set • Bayesian PCA • EM approach + Bayesian model to calculate the likelihood for constructed data

  18. Sepsis Risk Prediction Results • No Base Model to compare the result with • Evaluation metric • AUC (Area Under the curve)

  19. Septic Shock Prediction Results • The septic shock EWS as baseline • Prediction model : logistic regression • predict the onset of septic shock one hour in advance • Use invasively-gathered data from MIMIC waveform data

  20. Imputation-enhanced Prediction Of Septic Shock • Impact of various imputation methods on different reference time • In comparison with baseline with logistic regression model

  21. AUC Curves for predicting septic shock 60 minutes before onset

  22. Septic shock prediction 60 minutes before onset for three types of models:

  23. Effect of imputation on logistic regression coefficients for predicting septic Shock Consistency across different imputation methods Inconsistency of values obtained with and withoutImputation non-imputed model suffer from over-fitting

  24. Conclusion • Imputing missing data can improve model Performance especially when dealing with larger, noisier, and more incomplete datasets • Matrix factorization imputation methods like BPCA lead to models with better predictive accuracy than simplerapproaches like group means.

More Related