1 / 16

Prognostic Prediction of Breast Cancer Using C5

Prognostic Prediction of Breast Cancer Using C5. Sakina Begum May 1, 2001. Breast Cancer Diagnosis Second leading cause of cancer death in women. Fine Needle Aspirate (FNA) extract cells and fluid from mass using thin needle examine cells under microscope

errol
Download Presentation

Prognostic Prediction of Breast Cancer Using C5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prognostic Prediction of Breast Cancer Using C5 Sakina Begum May 1, 2001

  2. Breast Cancer Diagnosis • Second leading cause of cancer death in women. • Fine Needle Aspirate (FNA) • extract cells and fluid from mass using thin needle • examine cells under microscope • Early detection of breast cancer depends on accurate diagnosis.

  3. Ability to correctly diagnose cancer using FNA and visual interpretation varies from 65% to 98%.

  4. University of Wisconsin hospitals use Xcyt. Use information about cell characteristic from FNA and multisurface method to determine if tumor is benign or malignant. I wanted to do the same thing using C5.

  5. Data Preparation • File has 569 patients, 32 attributes for each patient • ID • diagnosis • 10 average cell characteristics • 10 standard deviations for each cell characteristic • 10 “worst” cell characteristics • Two files: • All 32 attributes • 12 attributes (including 10 average cell characteristics)

  6. sed and awk are programmable UNIX utilities that perform actions on lines that match a particular condition. awk -f awkfile -F, data1 > data2 {print($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)} 842302 M 17.99 10.38 122.8 1001 0.1184 0.2776 0.3001 0.1471 0.2419 0.07871 sed ‘s/ /,/g’ data2 > cancer.data 842302,M,17.99,10.38,122.8,1001,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871

  7. Data Mining C5 extracts informative patterns from data. -f identifies the application name (called a filestem). -r causes rules to be derived from trees. -S x constructs a classifier containing x% of data from data file. Classifier is evaluated on a non-overlapping set of test cases.

  8. case 1 case 2 By default, the random sample changes every time a classifier is constructed. case 1 case 1 case 2 case 2 Successive runs of C5 with sampling will usually produce different results. I used sampling size 10%, 30%, 50%, 70%, and 90%. I ran C5 three times on each different sampling size.

  9. concave points area perimeter texture texture M M M M M M B B B B  0.049 0.049  693.7   693.7 102.8   102.8 perimeter 19.73   19.73  102.1 102.1  15.45   15.45 symmetry 0.211   0.211 concave points 0.085   0.085 compactness 0.123   0.123

  10. Each rule consists of: • arbitrary rule number • statistics • one or more conditions that must be satisfied • class predicted by rule • confidence with which prediction is made • Statistics: • number of training cases covered by rule/number of cases that do not belong to the rule • lift is result of dividing the rules estimated accuracy by relative frequency of predicted class.

  11. Conclusion Decision tree gives average of 6% to 7% errors. Classifier may be overtrained. Better results by selecting few cell features. Developers of Xcyt obtained best results using three features: worst area, worst smoothness, and average texture.

  12. Lessons Learned • Familiar with C5. • Importance of knowledge of domain. • Further work: • Build classifier using different subset of features. • Use adaptive boosting option.

  13. References W. N. Street, O. L. Mangasarian, W. H. Wolberg. An Inductive Learning Approach to Prognostic Prediction O. L. Mangasarian, W. N. Street, W. H. Wolberg. Breast Cancer Diagnosis and Prognosis via Linear Programming Machine Learning for Cancer Diagnosis and Prognosis: http://www.cs.wisc.edu/~olvi/uwmp/cancer.html

More Related