1 / 23

Integration II

Integration II. Prediction. Kernel-based data integration. SVMs and the kernel “trick” Multiple-kernel learning Applications Protein function prediction Clinical prognosis. SVMs. These are expression measurements f rom two genes for two populations (cancer types)

raoul
Download Presentation

Integration II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration II Prediction

  2. Kernel-based data integration • SVMs and the kernel “trick” • Multiple-kernel learning • Applications • Protein function prediction • Clinical prognosis

  3. SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... [Noble, Nat. Biotechnology, 2006]

  4. SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types [Noble, Nat. Biotechnology, 2006]

  5. SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a one-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]

  6. SVMs These are expression measurements from two genes for two populations (cancer types) The goal is to define a cancer type classifier... One type of classifier is a “hyper-plane” that separates measurements from two cancer types E.g.: a two-dimensional hyper-plane [Noble, Nat. Biotechnology, 2006]

  7. SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? [Noble, Nat. Biotechnology, 2006]

  8. SVMs Suppose that measurements are separable: there exists a hyperplane that separates two types Then there are an infinite number of separating hyperplanes Which to use? The maximum-margin hyperplane Equivalently: minimizer of [Noble, Nat. Biotechnology, 2006]

  9. SVMs Which hyper-plane to use? In reality: minimizer of trade-off between 1. classification error, and 2. margin size penalty loss

  10. SVMs This is the primal problem This is the dual problem

  11. SVMs What is K? The kernel matrix: each entry is sample inner product one interpretation: sample similarity measurements completely described by K

  12. SVMs Implication: Non-linearity is obtained by appropriately defining kernel matrix K E.g. quadratic kernel:

  13. SVMs Another implication: No need for measurement vectors all that is required is similarity between samples E.g. string kernels

  14. Protein Structure Prediction Protein structure Sequence similarity Protein sequence

  15. Protein Structure Prediction

  16. Kernel-based data fusion Core idea: use different kernels for different genomic data sources a linear combination of kernel matrices is a kernel (under certain conditions)

  17. Kernel-based data fusion Kernel to use in prediction:

  18. Kernel-based data fusion In general, the task is to estimate SVM function along with coefficients of the kernel matrix combination This is a type of well-studied optimization problem (semi-definite program)

  19. Kernel-based data fusion

  20. Kernel-based data fusion

  21. Kernel-based data fusion Same idea applied to cancer classification from expression and proteomic data

  22. Kernel-based data fusion • Prostate cancer dataset • 55 samples • Expression from microarray • Copy number variants • Outcomes predicted: • Grade, stage, metastasis, recurrence

  23. Kernel-based data fusion

More Related