1 / 24

Support Vector Machine

Classification of multiple cancer types by multicategory support vector machines using gene expression data. Support Vector Machine. A classification method which successfully diagnosis cancer problems Two types

nanji
Download Presentation

Support Vector Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification of multiple cancer types by multicategory support vector machines using gene expression data

  2. Support Vector Machine • A classification method which successfully diagnosis cancer problems • Two types • Binary SVM:optimal extension to more than two classes not seen therefore limitation on its application to multiple tumor types • Multicategory SVM:(recently proposed) Demonstrated on leukemia data and small round blue cells of childhood tumor.

  3. DNA microarray techonology • This method measures the relative amount of mRNA in isolated cells or biosped tissues • Uses SVM, solves a series of binary problems- DAG SVM algorithm • MSVM is applied to two gene expression data sets

  4. Features • Effectiveness • Prediction strength • Effect of data preprocessing • Gene selection • Dimension reduction

  5. Binary SVM

  6. MSVM

  7. Procedure- 3 class problem • Gene expression was monitored for classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) • ALL • B-cell • T-cell

  8. Procedure conc. • Number of genes 7129 • 38 samples- training set • 34 samples- test set • Preprocessing steps performed • Thresholding(floor-100, ceiling 16000) • Filtering of genes (max/min <= 5 and max-min< =500) • Base 10 logarithmic transformation

  9. Procedure conc. • Standardization of each variable • Variable selection • Prescreening measure – ratio of between classes sum of squares to within class sum of squares for each gene( largest ratios taken)

  10. Heat Map of 40 most important genes in training set

  11. Small round blue cell tumors data (SRBCTs) • 4 types • Neuroblastoma (NB) • Rhabdomyosarcoma (RMS) • Non Hodgkin lymphoma (NHL) • Ewing family of tumors ( EWS)

  12. Used Artificial Neural Networks (ANN) • Training set – 63 samples • Test set – 20 samples • Nearest Neighbor, weighted voting , linear SVM was applied to data • MSVM was applied for comparison • Logarithm base 10 of expression levels

  13. Predicted decision vectors

  14. SANN • For multiclass classification • Classification results superior to ANN • ANN uses back propagation algorithm • Why ? • Non linear connections • Inclusion of interactions within independent variables input) • Independence from conventional processes

  15. Limitations • Learned knowledge is contained 100’s-1000’s weights (synapses) • Cannot be analyzed in a single regression formula

  16. Combining several ANNs • Through ensembles of networks An ensemble: collection of finite number of different classifiers • Cascading ANNs

  17. Two level ANN • Task : Chest Radiograms • Lung Nodules( Class A) • Without Lung Nodules( Class B)

  18. Two level architecture carrying lower level and higher level concepts • Task: differentiate (higher level) • Normal cells (class A) • From malignant cells (class B) (lower level) • Class B_1 • Class B_2 • Class B_3 • Class B_4

  19. One vs. all • Used with SVM • K binary classes- distinguish one class from all lumped together • Sample assigned to classifier achieving greatest output activity

  20. ALL Pairs approach • Builds K(K-1)/2 Binary classifiers • K-1 binary classifiers distinguish from other classifiers • Output activities summed up –class with greatest activity is the winning class

  21. SANN • Oriented to human decision making • Exclusion performed- preferences narrowed down • Classification made by first ANN is a preselection for second successive ANN

  22. References • http://info.cchmc.org/presentations/ylee_13Dec02.pdf

More Related