Create Presentation
Download Presentation

Download Presentation

Predicting protein function from heterogeneous data

Download Presentation
## Predicting protein function from heterogeneous data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Predicting protein function from heterogeneous data**Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology**One-minute response**• Wasn’t sure what the relevance was of the dot product in the feature space. • I think more examples would be helpful. Anything visual is helpful. • Confused about how exactly to use kernel to determine separating plane. • More time on last three slides. • Confused on how the weights in the kernel function will be used in the final prediction. • Please include a toy example with numbers for the Bayesian network. • Also, a biologically relevant motivating example for the SVM. • This was the first time I understood the kernel thing. • I am not sure when to use the SVM versus other clustering approaches. • For the kernel trick, the main thing that is missing is the motivation for not enumerating the entire feature space. • The kernel discussion was hard to follow. More math background would have helped. • I was good with everything up to the “Kernel function as dot product” slide. I’m not sure what the purpose of phi is. • I liked the concrete examples. • I got a good feel for the big picture but failed to fully grasp everything about the kernel section. • I was distracted by jargon in parts of the lecture. Better to introduce the term “kernel” when it is first used. • Still a bit shaky on the weights and the SVM optimization. • What are some examples of other common methods that use kernels? • Draw bigger on the board. • Hope you go into why we choose certain k over others.**Outline**• Support vector machines • Diffusion / message passing**Kernel function**• The kernel function plays the role of the dot product operation in the feature space. • The mapping from input to feature space is implicit. • Using a kernel function avoids representing the feature space vectors explicitly. • Any continuous, positive semi-definite function can act as a kernel function. Proof of Mercer’s Theorem: Intro to SVMs by Cristianini and Shawe-Taylor, 2000, pp. 33-35.**Learning gene classes**Training set Eisen et al. 2465 Genes Learner Model 79 experiments MYGD Eisen et al. 3500 Genes Predictor Class 79 experiments Test set**Predictions of gene function**Fleischer et al. “Systematic identification and functional screens of uncharacterized proteins associated with eukaryotic ribosomal complexes” Genes Dev, 2006.**Overview**• 218 human tumor samples spanning 14 common tumor types • 90 normal samples • 16,063 “genes” measured per sample • Overall SVM classification accuracy: 78%. • Random classification accuracy: 1/14 = 9%.**Summary: Support vector machine learning**• The SVM learning algorithm finds a linear decision boundary. • The hyperplane maximizes the margin; i.e., the distance from any training example. • The optimization is convex; the solution is sparse. • A soft margin allows for noise in the training set. • A complex decision surface can be learned by using a non-linear kernel function.**Cost/Benefits of SVMs**• SVMs perform well in high-dimensional data sets with few examples. • Convex optimization implies that you get the same answer every time. • Kernels functions allow encoding of prior knowledge. • Kernel functions handle arbitrary data types. • The hyperplane does not provide a good explanation, especially with a non-linear kernel function.**Vector representation**• Each matrix entry is an mRNA expression measurement. • Each column is an experiment. • Each row corresponds to a gene.**Similarity measurement**• Normalized scalar product • Similar vectors receive high values, and vice versa. Similar Dissimilar**Sequence kernels**• We cannot compute a scalar product on a pair of variable-length, discrete strings. >ICYA_MANSE GDIFYPGYCPDVKPVNDFDLSAFAGAWHEIAKLPLENENQGKCTIAEYKY DGKKASVYNSFVSNGVKEYMEGDLEIAPDAKYTKQGKYVMTFKFGQRVVN LVPWVLATDYKNYAINYNCDYHPDKKAHSIHAWILSKSKVLEGNTKEVVD NVLKTFSHLIDASKFISNDFSEAACQYSTTYSLTGPDRH >LACB_BOVIN MKCLLLALALTCGAQALIVTQTMKGLDIQKVAGTWYSLAMAASDISLLDA QSAPLRVYVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTKIPAVFKI DALNENKVLVLDTDYKKYLLFCMENSAEPEQSLACQCLVRTPEVDDEALE KFDKALKALPMHIRLSFNPTQLEEQCHI**protein**1 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 protein Protein-protein interactions • Pairwise interactions can be represented as a graph or a matrix.**Linear interaction kernel**• The simplest kernel counts the number of interactions between each pair. 1 0 0 1 0 1 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 3**Diffusion kernel**• A general method for establishing similarities between nodes of a graph. • Based upon a random walk. • Efficiently accounts for all paths connecting two nodes, weighted by path lengths.**Hydrophobicity profile**• Transmembrane regions are typically hydrophobic, and vice versa. • The hydrophobicity profile of a membrane protein is evolutionarily conserved. Membrane protein Non-membrane protein**Hydrophobicity kernel**• Generate hydropathy profile from amino acid sequence using Kyte-Doolittle index. • Prefilter the profiles. • Compare two profiles by • Computing fast Fourier transform (FFT), and • Applying Gaussian kernel function. • This kernel detects periodicities in the hydrophobicity profile.**Combining kernels**B A B A A:B K(A) K(B) Identical K(A:B) K(A)+K(B)**Semidefinite programming**• Define a convex cost function to assess the quality of a kernel matrix. • Semidefinite programming (SDP) optimizes convex cost functions over the convex cone of positive semidefinite matrices.**Semidefinite programming**According to a convex quality measure: Learn K from the convex cone of positive-semidefinite matrices or a convex subset of it : Integrate constructed kernels Learn a linear mix Large margin classifier (SVM) Maximize the margin SDP**Integrate constructed kernels**Learn a linear mix Large margin classifier (SVM) Maximize the margin**Markov Random Field**• General Bayesian method, applied by Deng et al. to yeast functional classification. • Used five different types of data. • For their model, the input data must be binary. • Reported improved accuracy compared to using any single data type.**Six types of data**• Presence of Pfam domains. • Genetic interactions from CYGD. • Physical interactions from CYGD. • Protein-protein interaction by TAP. • mRNA expression profiles. • (Smith-Waterman scores).**Results**MRF SDP/SVM (binary) SDP/SVM (enriched)**Pros and cons**• Learns relevance of data sets with respect to the problem at hand. • Accounts for redundancy among data sets, as well as noise and relevance. • Discriminative approach yields good performance. • Kernel-by-kernel weighting is simplistic. • In most cases, unweighted kernel combination works fine. • Does not provide a good explanation.**GeneMANIA**• Normalize each network (divide each element by the square root of the product of the sums of the rows and columns). • Learn a weight for each network via ridge regression. Essentially, learn how informative the network is with respect to the task at hand. • Sum the weighted networks. • Assign labels to the nodes. Use (n+ + n-)/n for unlabeled genes. • Perform label propagation in the combined network. Mostafavi et al. Genome Biology. 9:S4, 2008.**Toy example**Round 0 0 0.8 1 0.9 0.5 0.2 0 0 0 0.7 α = 0.95**Toy example**Round 0 Round 1 0 0.8 0.8 0.8 1 1 0.9 0.9 0.5 0.5 0.2 0.2 0 0 0.2 0 0 0.5 0.7 0.7 α = 0.95**Toy example**Round 1 Round 2 0.8 ? 0.8 0.8 1 1 0.9 0.9 0.5 0.5 0.2 0.2 0.2 0 0.5 ? 0.7 0.7 α = 0.95**Toy example**Round 1 Round 2 0.8 0.8 0.8 0.8 1 1 0.9 0.9 0.5 0.5 0.2 0.2 0.2 0 ? 0.5 0.5 0.7 0.7 α = 0.95**Toy example**Round 1 Round 2 0.8 0.8 0.8 0.8 1 1 0.9 0.9 0.5 0.5 0.2 0.2 0.2 0 0.884 ? 0.5 0.5 0.7 0.7 0.2 + (0.95 * 0.8 * 0.9) α = 0.95**Toy example**Round 1 Round 2 0.8 0.8 0.8 0.8 1 1 0.9 0.9 0.5 0.5 0.2 0.2 0.2 0 0.884 0.133 0.5 0.5 0.7 0.7 0 + (0.95 * 0.2 * 0.7) α = 0.95