1 / 17

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning. Jian Zhang Supervised by: Karen Petrie. Background. Cancer research has become an extremely data rich environment. Plenty of analysis packages can be used for analyzing the data. Data preprocessing.

gratia
Download Presentation

Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Transformation of Raw Clinical Data into Clean Data Using Decision Tree Learning Jian Zhang Supervised by: Karen Petrie

  2. Background • Cancer research has become an extremely data rich environment. • Plenty of analysis packages can be used for analyzing the data. • Data preprocessing.

  3. Rich data environment • There are some factors about breast cancer

  4. Raw clinical data sample • Yes-No data: yes: yes, Yes, Ye, yed, yef … no: No, n, not … null: don’t know, no data, waiting for lab • Positive-Negative data: Positive: +, ++, p, p++… Negative: -, n, neg, n---… Null: no data, ruined sample, waiting for lab

  5. Basic version

  6. Question? Could we make the process automated?

  7. Introduction • Decision Tree learning • Weka

  8. Decision Tree Learning • Decision tree learning is a method for approximating discrete-valued functions, which is one of the most popular inductive algorithms.

  9. Decision tree sample

  10. Weka • Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine learning software written in Java, which contains a collection of algorithms for data analysis and predictive modeling.

  11. Experiment • Data: Training dataset with 100 instances Test dataset with 100 instances, which has 17 different values from the training dataset • Tool: weka

  12. Experiment • Experiment 1 : training dataset • Experiment 2 : training dataset, test dataset

  13. Experiment 1

  14. Experiment 2

  15. Result • Through the results, the decision tree has a good classification and prediction for the existing entries, but for the unknown entries, the prediction is not as good as expected.

  16. Future work • Find and correct the incorrect prediction in the process • Automated transformation for unknown entries

  17. Thank you !

More Related