1 / 17

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm . Revlin Abbi, Elia El-Darzi, and Christos Vasilakis. University of Westminster, Harrow School of Computer Science. Presentation overview. Main Focus

emelda
Download Presentation

Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Patient LOS prediction: Evaluating the impact of different CS on the prediction accuracy of the C4.5 algorithm Revlin Abbi, Elia El-Darzi, andChristos Vasilakis University of Westminster, Harrow School of Computer Science

  2. Presentation overview • Main Focus • Patient spell classification methodology • Impact of LOS classification scheme on prediction accuracy • Content of the presentation • The use of patient length of stay for decision making • Aim, objectives, and methods • Results and conclusion

  3. Introduction • Health care issues • Health care systems are complex • Ageing of population in developed world • Increased demand and escalating costs • Patient length of stay (LOS) and decision making • Duration of time a patient spends in hospital • Readily available and easy to calculate • Proxy measure for resource consumption

  4. Grouping and predicting LOS • Why automate the grouping according to LOS? • Provides simplified representation of population • Clinical judgement and visual inspection is subjective • Why predict patient LOS? • Improved discharge planning • Better allocation and scheduling of resources

  5. Classification algorithms • Unsupervised algorithms • Partitions records into groups • Used when groups are unknown • E.g. K-means, Gaussian mixture modelling (GMM) • Supervised algorithms • Prediction of patient LOS • Maps patient characteristics to LOS classes • Previously combined with clinical judgement • E.g. C4.5, neural networks

  6. Aim and objectives • Aim • Investigate the impact of different classification schemes on prediction accuracy • Identify if: • Prediction accuracy of a decision tree is affected by the choice of LOS classification scheme • Tree structure is affected by the LOS classification scheme • Number of patients within a class affects class accuracy

  7. Dataset • Admissions over a 16-month period • Consists of 7723 records of patients undergoing surgery • Variables • Gender • Age • Date of admission and discharge • Public or private patient • Case type (emergency/ planned) • Major diagnostic category (MDC)

  8. Variability of patient LOS LOS Percentile 99th 100th 25th 50th 75th 90th 95th All patients records 1 3 7 13 20 45 228 Gender Women 1 3 8 14 21 47 228 Men 1 3 6 12 19 43 106 Case type Emergency 2 3 8 15 23 48 228 Non-emergency 1 2 5 11 16 30 98 Age 0-19 1 2 4 8 13 32 49 20-39 1 2 4 8 12 38 228 40-59 1 3 6 12 18 43 174 60-79 2 4 9 16 24 49 179 80-100 2 6 11 19 25 41 98 Patient type Public 1 3 7 13 20 45 228 Private 2 4 8 13 21 41 106

  9. : Input/Output : Processing step Proposed methodology Training Data Fitting GMMs to LOS using EM Selecting a GMM using MDL Set of GMMs A single GMM Deriving LOS classification scheme and merging of boundaries Input Data Splitting 127 LOS classification schemes Building the decision tree (C4.5) Testing decision tree Testing Data Decision tree Performance measures Calculate performance measures Confusion Matrix

  10. Gaussian mixture model (GMM) • Fitted to the raw LOS data • Mixture of Gaussian functions • Expectation Maximisation (EM) algorithm • Iterative optimisation algorithm: fast and efficient • Prior, posterior and unconditional probability • Criterion for selecting the appropriate model • Need to decide on the number of Gaussian • Minimum description length (MDL) • Quantifies each GMM: One to six components

  11. Building the decision tree • C4.5 - divide and conquer algorithm • Creates the decision tree from the training data • Example segment of a decision tree Age <= 61 : | MDC = 19: 0-2 (6.0/2.0) | MDC = 0: | | Pub_Priv = 2: 6-13 (2.0/1.0) | | Pub_Priv = 1: | | | Age <= 49 : | | | | Adm_Cat = 1: 14-36 (2.0/1.0) | | | | Adm_Cat = 2: | | | | | Age <= 32 : 14-36 (4.0/2.0) | | | | | Age > 32 : 37-228 (9.0/1.0) | | | Age > 49 : If Age <= 61 and MDC is 19 then LOS is between 0 and 2 days

  12. No of correct predictions Total no of records * 100 * 100 Overall Accuracy – % of Max Class % of Max Class * 100 Performance measures Overall accuracy Class accuracy No of correct predictions in class Total no belonging to class Prediction profit

  13. Computational experiment • Enumeration: Merging class boundaries • 8 classes • 127 classification schemes • Merging of 8 classes (2-7 classes) 1) 0-1, 2, 3-5, 6-9, 10-11, 12-17, 18-33, and 34-228 2) 0-1, 2, 3-5, 6-9, 10-11, 12-17, and 18-228 (7 classes) 3) 0-1, 2, 3-5, 6-9, 10-11, 12-33, and 34-228 (7 classes) … … 127) 0-33, 34-228 (2 classes)

  14. Results • Overall accuracy is affected by the CS • Overall accuracy ranges 30.6%-98.3% • 47% above 50%

  15. Results • Prediction profit (PP) is affected by the selection of CS • 127 different CS resulted in a range from -7 to 25.3 • 49% zero or below • 46% above one

  16. Results • The structure of the decision tree changes • The size of the tree is affected (number of nodes) • Importance of variables changes • Gender selected before age and admission category • Age selected before admission category and gender • Wide class intervals • Tree predicts the same classification for all patients • 25% of CS resulted in the tree predicting the same class • e.g. 0-33 days consists of 98.3% of patient records

  17. Conclusion • Selection of LOS classification scheme has an impact on prediction accuracy of decision tree • Structure of the decision tree is also affected • Classes with large proportions of records cause the tree to predict the same LOS class • Standard performance measures are not adequate • A new measure for accuracy robustness is needed

More Related