1 / 26

Protein Function Prediction

Protein Function Prediction. Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role. Sub cellular locations. 1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus

Download Presentation

Protein Function Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Function Prediction Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role.

  2. Sub cellular locations • 1- Cytoplasm • 2- Nuclear • 3- Mitochondria • 4- Extra cellular • 5- Golgi apparatus • 6- Chloroplast • 7- Endoplasmic reticulum • 8- Cytoskeleton • 9- Vacuole • 10- Peroxisome • 11- Lysosome • 12- Plasma membrane

  3. Protein function prediction methods 1- Analyzing Gene expression 2- phylogenetic profiles 3- protein fusion 4- Protein sequences - N- protein protein interaction

  4. protein protein interaction What do we mean by protein interaction ? Do you mean physical contact ? no, but higher levels of relations 1- Inclusion in multi protein complexes 2- Common cellular compartments 3- Same signalling path way 4- Same metabolic path way 5- Co-expression 6- Genetic co-regulation 7- Molecular co-evolution

  5. protein interaction Complete protein interaction  inter actom Very complicated Not only for large number of proteins but for the range of distinct types of protein interaction

  6. Types of protein interaction Three levels of associations 1- Physical interaction 2- Correlated proteins 3- Co-located proteins

  7. Permanent interaction Transient interaction

  8. Metabolic correlation Genetic correlation

  9. Soluble location Membrane location

  10. Databases of Protein interaction • Bimolecular Interaction Network Database (BIND); • Database of Interacting Proteins (DIP); • the General Repository for Interaction Datasets (GRID); • Molecular Interactions Database (MINT); • Database of predicted functional associations among genes/proteins (HNB) that has 3 tools [ SMART - mini PEDANT – STRING ].

  11. How can we predict the function of un annotated proteins through PPI If two proteins interact, they are neighborhood of each other. the functions of un annotated protein’s neighbors contain information about the un annotated proteins f8 f1 f9 p2 p1 f7 f1 p3 m2 f5 m1 p5 Annotated protein f1 f2 Un Annotated protein f5 function f3 f4 f6

  12. Our objective • Is to assign functions to all the un annotated proteins based on the functions of the annotated proteins. Conditions • Protein may have several different functions up to 8 functions • We do not know which combinations of the functions contribute the interaction

  13. Given • Suppose that genome has N proteins P1 PN P1 Pn un annotated proteins Pn+1 Pn+m annotated proteins N=n+m Xi = 1 protein has function 0 protein has not function X = ( x1,x2,…….,xn+m)

  14. Assumptions • Let Oi j observed interaction between Pi and Pj. • Oi j = 1 proteins have interaction • Oi j = 0 other wise • S = { Pi <-> PJ : Oi j , i, j =1…N} • Nei (i) set of proteins interact with Pi • ∏ jfraction of all proteins having function Fj

  15. Previous methods • Neighborhood counting method • Frequencies of its neighbors method • Chi square method • Markov random field method

  16. Neighborhood counting method • Method to predict the function based on the functions of its neighbors ( all annotated functions are ordered in list ) • Dis advantages 1- no significance value 2- ignore the size of functional classes 3- equal weights for distance neighbors F1 most frequent F2 F3 F4 | | FN least frequent

  17. Frequencies of its neighbors method • As same as the previous method Neighborhood counting method but assign k functions for un annotated protein with k largest frequencies in its neighbors. Dis advantages 1- it does not take consideration that some proteins have same function. 2- if we have famous function in the neighbors, the probability that the un annotated protein has the same function is larger than any other function

  18. Chi square method Test • Let ni (j) number of proteins interact with protein i & having function j. observed freq • #Nei (i) number of proteins in Nei (i) • ei (j) = # Nei (j) . ∏j Expected freq Si (j) = [ni (j) – ei (j) ]²/ ei (j) We will take the highest score Significance value but …………………?

  19. Problem of Chi square method • Equally weighted it is obvious that proteins far away from Pi contribute less information than those close neighbors. It is not clear how can we choose the correct weights p2 f1 m2 f7 p5 f5 f6

  20. Problem • The problems are how to assign different weights to the parameters. • How to estimate the probabilities based on the network.

  21. Markov random field method Over come all the above problems • X=(x1, x2, ….., xN) functional annotation • Xi = 1 i protein has the function = 0 other wise Get the prior probability distribution of x based on the interaction network (Gibbs distribution) Calc the posterior probability of the function of the un annotated protein ob Pr (g) ps

  22. Check the accuracy • By using leave one out method for each annotated protein with at least one interaction. We assume that it as un annotated protein and predict the function. Compare the prediction and the annotation Repeat the LOO for all proteins Pi…..Pk Check the sensitivity = ∑ Ki / ∑ ni where Ki over lap between the set of observed & predicted functions Ni number of the functions of protein Pi

  23. Limitations and assumptions • Limitations - Interaction network and functional annotation of proteins are incomplete. - The actual number of interacting protein much than what we have. • Assumptions - annotated proteins have complete functional annotation.

  24. Note • We can take into consideration the correlation between the functions. Ex: protein has function A may increase the chance to have function B because A,B are highly correlated.

  25. Chi square Test • Is a test of significance of overall deviation square in the observed and expected frequencies divided by the expected ones. • X²= ∑ { (o - e)² / e} where o observed freq • e expected freq • Compare x² and the tabulated value • If x² > the tabulated value  value is significance • x² depends on the - number of the classes • - the degree of the freedom • 0--- ∞ Back

  26. Thanks

More Related