LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

LECTURE 3 Complex Network Models Properties of Protein-Protein Interaction Networks

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

1. LECTURE 3 • Complex Network Models • Properties of Protein-Protein Interaction Networks • Usage of KNApSack Database

2. Complex Network Models: • Average Path length L, Clustering coefficient C, Degree Distribution P(k) help understand the global structure of the network. • Some well-known types of Network Models are as follows: • Regular Coupled Networks • Random Graphs • Small world Networks • Scale-free Networks • Hierarchical Networks

3. Regular networks

4. Regular networks Diamond Crystal Both diamond and graphite are carbon Graphite Crystal

5. Regular network (A ring lattice) Average path length L is high Clustering coefficient C is high Degree distribution is delta type.

6. Random Graph Erdos and Renyi introduced the concept of random graph around 60 years ago.

7. Random Graph N=10 Emax = N(N-1)/2 =45 p=0.1 p=0 p=0.25 p=0.15

8. Random Graph Average path length L is Low Clustering coefficient C is low Degree distribution is exponential type. p=0.25

9. Random Graph Usually to compare a real network with a random network we first generate a random network of the same size i.e. with the same number of nodes and edges. Other than Erdos Reyini random graphs there are other type of random graphs A Random graph can be constructed such that it matches the degree distribution or some other topological properties of a given graph Geometric random graphs: each vertex is assigned random coordinates in a geometric space of arbitrary dimensionality and random edges are allowed between adjacent points or points constrained by a threshold distance.

10. Geometric random graph: Example

11. Small world model (Watts and Strogatz) Oftentimes,soon after meeting a stranger, one is surprised to find that they have a common friend in between; so they both cheer: “What a small world!” What a small world!!

12. Small world model (Watts and Strogatz) Randomly rewire each edge of the network with some probability p Begin with a nearest-neighbor coupled network

13. Small world model (Watts and Strogatz) Average path length L is Low Clustering coefficient C is high Degree distribution is exponential type.

14. Scale-free model (Barabási and Albert) Start with a small number of nodes; at every time step, a new node is introduced and is connected to already-existing nodes following Preferential Attachment (probability is high that a new node be connected to high degree nodes)

15. Average path length L is Low Clustering coefficient C is not clearly known. Degree distribution is power-law type. P(k) ~ k-γ

16. Scale-free networks exhibit robustness Robustness – The ability of complex systems to maintain their function even when the structure of the system changes significantly Tolerant to random removal of nodes (mutations) Vulnerable to targeted attack of hubs (mutations) – Drug targets

17. Scale-free model (Barabási and Albert) The term “scale-free” refers to any functional form f(x) that remains unchanged to within a multiplicative factor under a rescaling of the independent variable x i.e. f(ax) = bf(x). This means power-law forms (P(k) ~ k-γ), since these are the only solutions to f(ax) = bf(x), and hence “power-law” is referred to as “scale-free”.

18. Hierarchical Graphs NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 The starting point of this construction is a small cluster of four densely linked nodes (see the four central nodes in figure).Next, three replicas of this module are generated and the three external nodes of the replicated clusters connected to the central node of the old cluster, which produces a large 16-node module. Three replicas of this 16-node module are then generated and the 12 peripheral nodes connected to the central node of the old module, which produces a new module of 64 nodes.

19. Hierarchical Graphs The hierarchical network model seamlessly integrates a scale-free topology with an inherent modular structure by generating a network that has a power-law degree distribution with degree exponent γ = 1 +ln4/ln3 = 2.26 and a large, system-size independent average clustering coefficient <C> ~ 0.6. The most important signature of hierarchical modularity is the scaling of the clustering coefficient, which follows C(k) ~ k –1 a straight line of slope –1 on a log–log plot NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101

20. NETWORK BIOLOGY: UNDERSTANDING THE CELL’S FUNCTIONAL ORGANIZATION Albert-László Barabási & Zoltán N. Oltvai NATURE REVIEWS | GENETICS VOLUME 5 | FEBRUARY 2004 | 101 Comparison of random, scale-free and hierarchical networks

21. protein-protein interaction Typical protein-protein interaction A protein binds with another or several other proteins in order to perform different biological functions---they are called protein complexes.

22. protein-protein interaction This complex transport oxygen from lungs to cells all over the body through blood circulation PROTEIN-PROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online

23. protein-protein interaction PROTEIN-PROTEIN INTERACTIONS by Catherine Royer Biophysics Textbook Online

24. Twenty amino acids

25. Four nucleotides

26. Four nucleotides

27. detected complex data Bait protein Interacted protein A B A D C A E B C D E F Spoke approach B F F Matrix approach C E D Network of interactions and complexes • Usually protein-protein interaction data are produced by Laboratory experiments (Yeast two-hybrid, pull-down assay etc.) • The results of the experiments are converted to binary interactions. • The binary interactions can be represented as a network/graph where a node represents a protein and an edge represents an interaction.

28. Network of interactions 0 0 1 0 1 0 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 1 1 0 AtpB AtpA AtpG AtpE AtpA AtpH AtpB AtpH AtpG AtpH AtpE AtpH Corresponding network Adjacency matrix List of interactions

29. The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biology and Evolution. 2001 giant component consists of 466 proteins 985 proteins and 899 interactions S. Cerevisiae

30. The yeast protein interaction network evolves rapidly and contain few redundant duplicate genes by A. Wagner. Mol. Biol. Evol. 2001 Average degree ~ 2 Clustering co-efficient = 0.022 Degree distribution is scale free

31. An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). Components of this graph has been determined by applying Depth First Search Algorithm There are total 62 components Giant component 93 proteins 300 proteins and 287 interactions E. coli

32. An E. coli interaction network from DIP (http://dip.mbi.ucla.edu/). Average degree ~ 1.913 Clustering co-efficient = 0.29 Degree distribution ~ scale free

33. Lethality and Centrality in protein networks by H. Jeong, S. P. Mason, A.-L. Barabasi, Z. N. Oltvai Nature, May 2001 Almost all proteins are connected 1870 proteins and 2240 interactions S. Cerevisiae Degree distribution is scale free

34. PPI network based on MIPS database consisting of 4546 proteins 12319 interactions Average degree 5.42 Clustering co-efficient = 0.18 Giant component consists of 4385 proteins

35. PPI network based on MIPS database consisting of 4546 proteins 12319 interactions Degree distribution ~ scale free

36. A complete PPI network tends to be a connected graph And tends to have Power law distribution

37. Introduction to KNApSaCK database http://kanaya.aist-nara.ac.jp/KNApSAcK/

38. FT-MS high accurate MW for metabolites [molecular weight (ppm)] Since 2004 Candidates of Metabolites accurate mass: 226.0477 Molecular formula 597 # of candidates for molecular formula Error level for FT-MS 251 32 1 ±1 ±0.1 ±0.01 ±0.001 (Mw  margin ) C10H10O6 KNApSAcK: Species-metabolite relation DB Chorismic acid Isochorismic acid

39. Now! Species Metabolite Last update information 50,048 unique metabolites 101,500 species-metabolite relations Since 2004

40. Current Status of KNApSAcK project Plant kingdom (Predicted) -- 200,000 D. Strack and R. Dixon (2003) Known NPs (Predicted) -- 50,000 /Plants, Luca and Pierre, (2000) KNApSAcK(last update) -- 50,048unique metabolites 101,500 species-metabolite relations Model species Arabidopsis thaliana -- 5,000 ca. 1/3 of 1200 protein types Human -- 2,500 Ryals (2004) Bacteria (E. coli, B. subtilis) -- 800 – 1700 Systematization of Species-metabolite relation DB(KNApSAcK) Basic study: --Metabolomics (Systems Biol) -- Evolution of NPs -- Gene to metabolite relations Applied works: -- Food Sciences -- Health creation -- Herbal medicine -- Drug development by Herb.

41. Main window http://kanaya.naist.jp/KNApSAcK/ We can retrieve metabolite information by: (a) Name (Organism, Metabolite) (g) A list of retrieved metabolites (b) Mw  margin (c) Molecular formula (h) Mode selection (d) Taxonomic hierarchy Substrucutre (e) Ion mass of FT-MS with ionization mode

42. Metabolites can be linked to KNApSAcK easily by Keywords (Organism, Metabolite, Molecular Formula)

43. (+)-Sesamin is reported in 122 species

44. Input: Allium cepa 38 Metabolites

45. KNApSAcK（http:/kanaya.naist.jp/KNApSAcK ）(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) 6 papers-2009 (Red, Foreign country) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) 17 papers-2008 (Red, Foreign country) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)4 papers-2006 Oikawa, A.,et al.,Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al.,Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al.,J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis）　http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ （LECO manual） Form No. 203-821-333 (PubMed) referred by C-ID http://metabolomics.jp/wiki/ 0 5 10 15 20 Target of Research Metabolomics Non-targeted Analysis Review Article Bioinformatics Methodology Development 0 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea Curcuma longa E. coli Rattus norvegicus

46. KNApSAcK（http:/kanaya.naist.jp/KNApSAcK ）(Since 2004) Papers utilized KNApSAcK DB to examine metabolomics ( Thanks!) Davey, M.P., et al., Metabolomics, (2009) Hounsome, N. et al., Postharvest Biol. Technol., (2009) 6 papers-2009 (Red, Foreign country) Xie,Z., et al., J.Exp.Botany,60, 87-97, (2009) Giavalisco, Anal.Chem.(2009) Draper et al., BMC Bioinformatics, (2009) Shroff et al., PNAS (2009) Malitsky, S.,., et al., Plant Physiol., (2008) 17 papers-2008 (Red, Foreign country) Warner, E., et al., J.Chromatography B,(2008) Fait, A., et al., Plant Physiol., 148, 730-750 (2008) Mintz-Oron, S., et al., Plant Physiol., 147, 823-851, (2008) Hanhineva, K., et al., Phytochemistry, 69, 2463-2481 (2008) Bottcher,C., et al., Plant Physiol.,147,2107-2120, (2008) Farder, A. et al., J. Nutrition, 138, 1282-1287, (2008) Mintz-Oron, S., et al., Plant Physiol.,147,823-825, (2008) Overy, D.P., et al., Nature Protocols, 3, 471-485, (2008) Dunn, W.B., Physical Biol.,5, 1-24, (2008) Akiyama, K., In Silico Biol., 8, 27, (2008) Sawada, Plant Cell Physiol., (2008) Arita,M. and Suwa, K., BioData Mining, 1,7.1-8 (2008) Saito, K. et al., Trends in Plant Sci.,13, 36-43, (2008) Akiyama, K., et al., In Silico Biol., 8, 339-345, (2008) Takahashi, H., Anal. Bioanal Chem. (in press) (2008) Iijima, Y., et al., Plant J., 54, 949-962, (2008) Want, E.J. et al., J. Proteome Res., 6, 459-468, (2007) 10 papers-2007 Sofia, M., et al., Trends in Anal. Chem., 26, 855-866, (2007) Hummel, J., et al., Topics in Curr. Genet., 18, 75-95, (2007) Gaida, A., and Neumann, S., J. Int. Bioinf., (2007) Griffiths,W.J.,Metabolomics,Metabolonomics and Metabolite Profiling,(Royal Soc.Chem.),2007 Ohta, D., et al., Anal.Biol. Chem.(2007) Nakamura, Y., et al., Planta, (2007) Suzuki, H., et al., Phytochemistry, (2007) Sakakibara, K., et al., , J .Biol. Chem.,282, 14932-14941, (2007) Saito, K. et al., Trends in Plant Sci., 13, 36-42, (2007) Kikuchi, K and Kakeya, H., Natuure Chem. Biol., 2, 392-394, (2006)4 papers-2006 Oikawa, A.,et al.,Plant Physiol., 142, 398-413, (2006) Shinbo, Y., et al.,Biotchnol. Agric. Forestry, 57, 166-181, (2006) Shinbo, Y., et al.,J. Comput. Aided Chem., 7, 94-101, (2006) since 2004 Web-sites linked to KNApSAcK (WikiBook) http://en.wikibooks.org/wiki/Metabolomics/Databases (UC Davis）　http://fiehnlab.ucdavis.edu/staff/kind/Metabolomics/Structure_Elucidation/ (KEGG) http://fire3.scl.genome.ad.jp/dbget-bin/www_bfind?knapsack (TAIR-Metabolomics Resource – Databases) http://www.arabidopsis.org/portals/metabolome/metabolome_database.jsp/ （LECO manual） Form No. 203-821-333 (PubMed) referred by C-ID http://metabolomics.jp/wiki/ Target of Research 0 5 10 15 20 Metabolomics Non-targeted Analysis Review Article Bioinformatics Methodology Development 0 5 10 Arabidopsis thaliana Fragaria x ananassa Salanum lycopersicum Brassica oleracea Curcuma longa E. coli Rattus norvegicus

47. Recent information about the research works that used/introduced the KNApSAcK database

48. Recent information about the research works that used/introduced the KNApSAcK database