Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm Revlin Abbi, Elia El-Darzi, andChristos Vasilakis University of Westminster, Harrow School of Computer Science

Presentation overview • Main Focus • Patient spell classification methodology • Impact of LOS classification scheme on prediction accuracy • Content of the presentation • The use of patient length of stay for decision making • Aim, objectives, and methods • Results and conclusion

Introduction • Health care issues • Health care systems are complex • Ageing of population in developed world • Increased demand and escalating costs • Patient length of stay (LOS) and decision making • Duration of time a patient spends in hospital • Readily available and easy to calculate • Proxy measure for resource consumption

Grouping and predicting LOS • Why automate the grouping according to LOS? • Provides simplified representation of population • Clinical judgement and visual inspection is subjective • Why predict patient LOS? • Improved discharge planning • Better allocation and scheduling of resources

Classification algorithms • Unsupervised algorithms • Partitions records into groups • Used when groups are unknown • E.g. K-means, Gaussian mixture modelling (GMM) • Supervised algorithms • Prediction of patient LOS • Maps patient characteristics to LOS classes • Previously combined with clinical judgement • E.g. C4.5, neural networks

Aim and objectives • Aim • Investigate the impact of different classification schemes on prediction accuracy • Identify if: • Prediction accuracy of a decision tree is affected by the choice of LOS classification scheme • Tree structure is affected by the LOS classification scheme • Number of patients within a class affects class accuracy

Dataset • Admissions over a 16-month period • Consists of 7723 records of patients undergoing surgery • Variables • Gender • Age • Date of admission and discharge • Public or private patient • Case type (emergency/ planned) • Major diagnostic category (MDC)

Variability of patient LOS LOS Percentile 99th 100th 25th 50th 75th 90th 95th All patients records 1 3 7 13 20 45 228 Gender Women 1 3 8 14 21 47 228 Men 1 3 6 12 19 43 106 Case type Emergency 2 3 8 15 23 48 228 Non-emergency 1 2 5 11 16 30 98 Age 0-19 1 2 4 8 13 32 49 20-39 1 2 4 8 12 38 228 40-59 1 3 6 12 18 43 174 60-79 2 4 9 16 24 49 179 80-100 2 6 11 19 25 41 98 Patient type Public 1 3 7 13 20 45 228 Private 2 4 8 13 21 41 106

: Input/Output : Processing step Proposed methodology Training Data Fitting GMMs to LOS using EM Selecting a GMM using MDL Set of GMMs A single GMM Deriving LOS classification scheme and merging of boundaries Input Data Splitting 127 LOS classification schemes Building the decision tree (C4.5) Testing decision tree Testing Data Decision tree Performance measures Calculate performance measures Confusion Matrix

Gaussian mixture model (GMM) • Fitted to the raw LOS data • Mixture of Gaussian functions • Expectation Maximisation (EM) algorithm • Iterative optimisation algorithm: fast and efficient • Prior, posterior and unconditional probability • Criterion for selecting the appropriate model • Need to decide on the number of Gaussian • Minimum description length (MDL) • Quantifies each GMM: One to six components

Building the decision tree • C4.5 - divide and conquer algorithm • Creates the decision tree from the training data • Example segment of a decision tree Age <= 61 : | MDC = 19: 0-2 (6.0/2.0) | MDC = 0: | | Pub_Priv = 2: 6-13 (2.0/1.0) | | Pub_Priv = 1: | | | Age <= 49 : | | | | Adm_Cat = 1: 14-36 (2.0/1.0) | | | | Adm_Cat = 2: | | | | | Age <= 32 : 14-36 (4.0/2.0) | | | | | Age > 32 : 37-228 (9.0/1.0) | | | Age > 49 : If Age <= 61 and MDC is 19 then LOS is between 0 and 2 days

No of correct predictions Total no of records * 100 * 100 Overall Accuracy – % of Max Class % of Max Class * 100 Performance measures Overall accuracy Class accuracy No of correct predictions in class Total no belonging to class Prediction profit

Computational experiment • Enumeration: Merging class boundaries • 8 classes • 127 classification schemes • Merging of 8 classes (2-7 classes) 1) 0-1, 2, 3-5, 6-9, 10-11, 12-17, 18-33, and 34-228 2) 0-1, 2, 3-5, 6-9, 10-11, 12-17, and 18-228 (7 classes) 3) 0-1, 2, 3-5, 6-9, 10-11, 12-33, and 34-228 (7 classes) … … 127) 0-33, 34-228 (2 classes)

Results • Overall accuracy is affected by the CS • Overall accuracy ranges 30.6%-98.3% • 47% above 50%

Results • Prediction profit (PP) is affected by the selection of CS • 127 different CS resulted in a range from -7 to 25.3 • 49% zero or below • 46% above one

Results • The structure of the decision tree changes • The size of the tree is affected (number of nodes) • Importance of variables changes • Gender selected before age and admission category • Age selected before admission category and gender • Wide class intervals • Tree predicts the same classification for all patients • 25% of CS resulted in the tree predicting the same class • e.g. 0-33 days consists of 98.3% of patient records

Conclusion • Selection of LOS classification scheme has an impact on prediction accuracy of decision tree • Structure of the decision tree is also affected • Classes with large proportions of records cause the tree to predict the same LOS class • Standard performance measures are not adequate • A new measure for accuracy robustness is needed

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm

Presentation Transcript

The History and Future of Weather Radar and Storm Prediction

Smart Home Technologies

Data Mining: Classification

Prediction of Natural Disasters

Protein Structure Prediction

Data Mining: Classification and Prediction

Two-level Adaptive Branch Prediction

Pump CFD - performance prediction: a tutorial

Protein Secondary Structure Prediction

Chapter 6. Classification and Prediction

CS490D: Introduction to Data Mining Prof. Chris Clifton

Advanced Refractive Effects Prediction System

Prediction (Classification, Regression)

Gene Prediction

Classification and Prediction

Data Mining, Decision Trees and Earthquake Prediction

Secondary Structure Prediction

3D STRUCTURE PREDICTION

Protein structure prediction

Classification Prof. Navneet Goyal BITS, Pilani CS C415/IS C415 – Data Mining