1 / 47

Predicting protein function from heterogeneous data

Predicting protein function from heterogeneous data. Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology. Outline. Bayesian networks Support vector machines Diffusion / message passing. Annotation transfer.

gerik
Download Presentation

Predicting protein function from heterogeneous data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting protein function from heterogeneous data Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology

  2. Outline • Bayesian networks • Support vector machines • Diffusion / message passing

  3. Annotation transfer • Rule: If two proteins are linked with high confidence, and one protein’s function is unknown, then transfer the annotation. Protein of known function Protein of unknown function

  4. Guilt by association • Rule: Assign function to unannotated proteins using a majority vote rule among its immediate neighbors. ?

  5. P(B) = 0.001 P(E) = 0.002 Burglary Earthquake P(A|B,E) = 0.95 P(A|B, ¬E) = 0.94 P(A|¬B,E) = 0.29 P(A|¬B, ¬E) = 0.001 Alarm John calls Mary calls P(M|A) = 0.70 P(M|¬A) = 0.01 P(J|A) = 0.90 P(J|¬A) = 0.05

  6. One network per gene pair A B Probability that genes A and B are functionally linked

  7. Bayesian Network

  8. Conditional probability tables • A pair of yeast proteins that have a physical association will have a positive affinity precipitation result 75% of the time and a negative result in the remaining 25%. • Two proteins that do not physically interact in vivo will have a positive affinity precipitation result in 5% of the experiments, and a negative one in 95%.

  9. Inputs • Protein-protein interaction data from GRID. • Transcription factor binding sites data from SGD. • Stress-response microarray data set.

  10. ROC analysis Using Gene Ontology biological process annotation as the gold standard.

  11. Pros and cons • Bayesian network framework is rigorous. • Exploits expert knowledge. • Does not (yet) learn from data. • Treats each gene pair independently.

  12. Supervised learning

  13. Support vector machine

  14. Support vector machine + + + + + - - Locate a plane that separates positive from negative examples. + + - + - + + - - - - - + - - - + + - - + - - Focus on the examples closest to the boundary.

  15. Four key concepts • Separating hyperplane • Maximum margin hyperplane • Soft margin • Kernel function (input space  feature space)

  16. Input space gene2 1 3 gene1 gene2 patient1 -1.7 2.1 patient2 0.3 0.5 patient3 -0.4 1.9 patient4 -1.3 0.2 patient5 0.9 -1.2 2 4 gene1 5

  17. Each subject may be thought of as a point in an m-dimensional space.

  18. Separating hyperplane • Construct a hyperplane separating ALL from AML subjects.

  19. Choosing a hyperplane • For a given set of data, many possible separating hyperplanes exist.

  20. Maximum margin hyperplane • Choose the separating hyperplane that is farthest from any training example.

  21. Support vectors • The location of the hyperplane is specified via a weight associated with each training example. • Examples near the hyperplane receive non-zero weights and are called support vectors.

  22. Soft margin • When no separating hyperplane exists, the SVM uses a soft margin hyperplane with minimal cost. • A parameter C specifies the relative cost of a misclassifcation versus the size of the margin.

  23. Incorrectly measured or labeled data The separating hyperplane does not generalize well No separating hyperplane exists

  24. Soft margin

  25. The kernel function • “The introduction of SVMs was very good for the most part, but I got confused when you began to talk about kernels.” • “I found the discussion of kernel functions to be slightly tough to follow.” • “I understood most of the lecture. The part that was more challenging was the kernel functions.” • “Still a little unclear on how the kernel is used in the SVM.”

  26. Why kernels?

  27. Separating previously unseparable data

  28. Input space to feature space • SVMs first map the data from the input space to a higher-dimensional feature space.

  29. Kernel function as dot product • Consider two training examples A = (a1, a2) and B = (b1, b2). • Define a mapping from input space to feature space: (X) = (x1x1, x1x2, x2x1, x2x2) • Let K(X,Y) = (X • Y)2 • Write (A) • (B) in terms of K.

  30. Kernel function as dot product • Consider two training examples A = (a1, a2) and B = (b1, b2). • Define a mapping from input space to feature space: (X) = (x1x1, x1x2, x2x1, x2x2) • Let K(X,Y) = (X • Y)2 • Write (A) • (B) in terms of K. • (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2)

  31. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2)

  32. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2

  33. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2

  34. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2)

  35. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2) = [(a1,a2)• (b1,b2)]2

  36. Kernel function as dot product (A) • (B) = (a1 a1, a1a2, a2a1, a2a2) • (b1 b1, b1b2, b2b1, b2b2) = a1a1b1b1 + a1a2b1b2 + a2a1b2b1 + a2a2b2b2 = a1b1a1b1 + a1b1a2b2 + a2b2a1b1 + a2b2a2b2 = (a1b1 + a2b2) (a1b1 + a2b2) = [(a1,a2)• (b1,b2)]2 = (A • B)2 = K(A, B)

  37. Separating in 2D with a 4D kernel

  38. “Kernelizing” Euclidean distance

  39. Kernel function • The kernel function plays the role of the dot product operation in the feature space. • The mapping from input to feature space is implicit. • Using a kernel function avoids representing the feature space vectors explicitly. • Any continuous, positive semi-definite function can act as a kernel function. Need for “positive semidefinite” for kernel function unclear. Proof of Mercer’s Theorem: Intro to SVMs by Cristianini and Shawe-Taylor, 2000, pp. 33-35.

  40. Overfitting with a Gaussian kernel

  41. The SVM learning problem • Input: training vectors xi … xn and labels yi … yn. • Output: bias b plus one weight wi per training example • The weights specify the location of the separating hyperplane. • The optimization problem is a convex, quadratic optimization. • It can be solved using standard packages such as MATLAB.

  42. SVM prediction architecture Query = x x1 x2 x3 ... xn k k k k w2 w3 wn w1

  43. A simple SVM training algorithm • Jaakkola, Diekhans, Haussler. “A discriminative framework for detecting remote protein homologies.” ISMB 99. do randomly select a training example find its optimal weight w.r.t. all other (fixed) weights until the weights stop changing

More Related