1 / 21

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION. A Flowchart of GASVM. A Flowchart of GASVM. The overall hybrid method consists of two main components: GA and SVM classifier.

vivian
Download Presentation

A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A HYBRID OF GENETIC ALGORITHMS AND SUPPORT VECTOR MACHINES (GASVM) FOR GENE SELECTION

  2. A Flowchart of GASVM

  3. A Flowchart of GASVM • The overall hybrid method consists of two main components: • GA and • SVM classifier. • The GA selects the subsets of features and then the SVM classifier evaluates the subsets during a classification process. • The result of the classification is used for the fitness value of GA. • where accuracy(x) is the leave one out cross validation (LOOCV) accuracy of the classifier with the features subset selection which represented by x.

  4. GASVM for Genes Selection and Classification

  5. Chromosome Representation in GASVM • Let n be the total number of genes available for representing the data to be classified. • Hence, the chromosome is represented by binary vector of dimension n.

  6. Chromosome Representation in GASVM • A chromosome = a solution or a gene subset. • If bit is 1,gene is selected. If bit is 0,gene is unselected. An example of chromosome representation in GASVM for gene selection.

  7. Investigation of GASVM Limitation • It demonstrated an exponential nature of subsets that exist as the number of features (genes) increases -> NP-complete

  8. Drawback of GASVM • GASVM - search space is too large due to high dimensional data • complexity of search space • low accuracy • high number of selected genes

  9. Proposed Solution N/2 N • Correlations between number of subset y and number of selected features x from total of features n.

  10. Chromosome representation in GASVM-II An example of chromosome representation in GASVM-II for genes selection.

  11. A Flowchart of GASVM-II

  12. GASVM-II for Genes Selection and Classification

  13. Drawback of GASVM-II • GASVM-II • selected gene manually. • overfitting - High LOOCV accuracy, but low test accuracy – inconsistent result

  14. Case Study: GASVM Versus GASVM for Gene Selection • Leukemia Dataset • The first benchmark gene expression microarray dataset is Leukemia Cancer. The data contains examples of human acute leukemia, originally analyzed by Golub et al. • The dataset containing expression levels of 7129 genes can be obtained at http://www.genome.wi.mit.edu/mpr. • The bone marrow or blood samples were taken from 72 patients, 25 with acute myeloid leukemia (AML) and 47 with acute lymphoblastic leukemia (ALL). • The training data consists of 38 samples and the remaining 34 samples were used as testing data.

  15. Colon Dataset • The second benchmark dataset is Colon Cancer. The data contains expression levels of 2000 genes from 40 tumor and 22 normal colon tissues. • The dataset only has 62 samples for training data, originally analyzed by Alon et al.12 and downloaded from http://microarray.princeton.edu/oncology/affydata/index.html.

  16. Experimental environment • Parameters of the GASVM and GASVM-II for the Leukemia and Colon Cancer datasets

  17. Results analysis and discussions • Classification accuracies for different gene subsets using GASVM-II method

  18. Results analysis and discussions • Benchmark of GASVM, GASVM-II and SVM performances and current best of previous methods on Leukemia Cancer dataset

  19. Results analysis and discussions • Benchmark of GASVM, GASVM-II and SVM performances and current best of previous methods on the Colon Cancer dataset

  20. Biological plausibility for informative genes in datasets • List of the same informative genes in the Leukemia Cancer dataset produced by GASVM-II and previous works

  21. Biological plausibility for informative genes in datasets • List of the same informative genes in the Leukemia Colon dataset produced by GASVM-II and previous works

More Related