1 / 44

Biochemical networks Concepts and definitions

9th BioSapiens European School of Bioinformatics. Biochemical networks Concepts and definitions. Graph. graph. A graph (G) contains a set of vertices (V) and a set of edges (E) A simple graph contains no self-loop and no multi-edge. vertex ( node ). edge. self-loop. proper edge.

orlando
Download Presentation

Biochemical networks Concepts and definitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 9th BioSapiens European School of Bioinformatics Biochemical networks Concepts and definitions Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  2. Graph graph • A graph (G) contains a set of vertices (V) and a set of edges (E) • A simple graph contains no self-loop and no multi-edge vertex (node) edge self-loop proper edge multi-edge simple graph

  3. Directed Graph (= Digraph) • A directed edge (or arc) is characterized by a head and a tail • A digraph is a graph whose edges are directed • A partially directed graph is a graph combining directed and non-directed edges digraph vertex (= node) arc (= directed edge) self-loop proper arc multi-arc not a multi-arc ! simple digraph

  4. Functional genomics Protein interaction networks represented as undirected graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  5. Two-hybrid method Activation RNA pol DNA-binding ORF A Activation DNA-binding ORF B A A A RNA pol RNA pol B B B Transcription factor Hybrid constructions Bait Prey Interaction  reporter gene is expressed No interaction  reporter gene is not expressed Prey Prey Bait Bait

  6. Two-hybrid Uetz et al. (2000). Nature 403: 623-631 Ito et al. (2001) PNAS 98: 4569-4574

  7. Comparison of the results Ito et al. (2001) PNAS 98: 4569-4574 • When the second “comprehensive” analysis was published, the overlap between thee results obtained in the two independent studies was surprisingly low. • How to interpret this ? • Problem of coverage ? Each study would only represent a fraction of what remains to be discovered. • Problem of noise ? Either or both studies might contain a large number of false positives. • Differences in experimental conditions ?

  8. Connectivity in protein interaction networks • Jeong et al (2001) calculate connectivity in the protein interaction network revealed by the two-hybrid analysis of Uetz and co-workers. • The connectivity follows a power law: • most proteins have a few connections; • a few proteins are highly connected • Highly connected proteins (the so-called “hubs”) correspond to essential proteins. Jeong, H., S.P. Mason, A.L. Barabasi, and Z.N. Oltvai. 2001. Lethality and centrality in protein networks. Nature411: 41-42.

  9. Isolation of protein complexes Y ORF tag Y C B Y A D E 1. Construction of a bank of TAG-fused ORFs 2. Expression of the tagged baits in yeast tagged bait + All cellular proteins,… 3. Cell lysis 4. Affinity purification anti-tag epitope Other proteins,… Slide from Nicolas Simonis

  10. Mass spectrometry - Protein identification B A E D C Y E C B C B Y A D E 1 dimension SDS-PAGE isolation Mass spectrometry B = YLR258w = YER133w = YER054c = YPR184w = YKL085w = YPR160w A C D E Y Slide from Nicolas Simonis

  11. Protein complexes Gavin et al. (1999). Nature 415: 141-147 Tandem Affinity Purification (TAP) CELLZOME: 232 complexes Ho et al. (1999). Nature 415: 180-183 High-throughput mass-spectrometric protein complex identification (HMS-PCI) MDS proteomics 493 complexes

  12. Network of complexes Gavin et al. Functional organization of the yeast proteome by systematic analysis of protein complexes.Nature (2002) vol. 415 (6868) pp. 141-7

  13. Functional genomics Assessment of interactome data Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  14. Assessment of interactome data von Mering et al (2002). Nature.

  15. Comparison of large-scale interaction data • von Mering et al (2002) compared the results from • Two-hybrid assays • Mass spectrometry (TAP and HMS-PCI) • Co-expression in microarray experiments • Synthetic lethality • Comparative genomics (conservation of operons, phylogenetic profiles, and gene fusion) • Among 80,000 interactions, no more than 2,400 are supported by two different methods. • Each method is more specifically related to some • functional classes • cellular location von Mering et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature (2002) vol. 417 (6887) pp. 399-403

  16. Comparison of pairs of interacting proteins with functional classes von Mering et al (2002). Nature 750.

  17. Validation with annotated complexes • von Mering et al (2002) collected information on experimentally proven physical protein-protein interactions, and measured the coverage and positive predictive value of each predictive method • Coverage • fraction of reference set covered by the data. • Positive predictive value • Fraction of data confirmed by reference set. • (Note: they call this “accuracy”, but this term is usually not used in this way) • Beware: the scale is logarithmic ! • This enforces the subjective perception of differences in the lower part of the percentages (0-10), but “compresses” the values between 10 and 50, which gives a false impression of good accuracy. von Mering et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature (2002) vol. 417 (6887) pp. 399-403

  18. Transcription factor -gene networks as directed graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  19. Network motifs in regulons Janga et al. Structure and evolution of gene regulatory networks in microbial genomes. Res Microbiol (2007) vol. 158 (10) pp. 787-94 RegulonDB: a database of transcriptional regulation in Escherichia coli K12 • Experimentally proven interactions: transcription factors (TF) + target genes, binding sites, promoters, operons • Can be used to construct a Tf-gene network, in which one can identify « network motifs », i.e. topological motifs formed by edge combinations. Shen-Orr , Milo, Mangan & Alon (2002). Nat Genet vol. 31 (1) pp. 64-8 Network motifs in the transcriptional regulation network of Escherichia coli.

  20. Using weighted graphs to represent the reliability of interactions Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  21. The ChIP-chip method • Chromatin Immuno-precipitation (ChIP) • Tagging of a transcription factor of interest with a protein fragment recognized by some antibody. • Immobilization of protein-DNA interactions with a fixative agent. • DNA fragmentation by ultrasonication. • Precipitation of the DNA-protein complexes. • Un-binding of the DNA-protein bounds. • Measurement of DNA enrichment. • Two extracts are co-hybridized on a microarray (chip),where each spot contains one DNA fragment where a factor is likely to bind (e.g. an intergenic region, or a smaller fragment).. • For the yeast S.cerevisiae, chips have been designed with all the intergenic regions (6000 regions, avg. 500bp/region) • Recent technology allows to spot 3e+5 300bp DNA fragments on a single slide. • The first extract (labelled in red) is enriched in DNA fragments bound to the tagged transcription factor. • The second extract (labelled in green) has not been enriched. • The log-ratio between red and green channels indicate the enrichment for each intergenic region.

  22. Lee et al (2002) • In 2002, Lee et al publish a systematic characterization of the binding regions of 106 yeast transcription factors. Lee et al. 2002. Science298: 799-804.

  23. Weighted graphs: scoring each edge or node of a graph • Harbison et al. (2004) extended the analysis of Lee et al. (2002) by analyzing the protein-DNA interactions under different culture conditions. • In both articles, the relevance of each interaction is estimated by a P-value. • We can convert this P-value into a significance score • sig = -log10(Pval) • The significance score gives an intuitive perception of edge relevance.

  24. Graph-based analysis of biochemical networks Metabolic networksrepresented as bipartite graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  25. Boerhinger-Mannheim Metabolic Wall Chart http://www.expasy.ch/cgi-bin/show_thumbnails.pl

  26. EcoCyc metabolic chart http://biocyc.org/ECOLI/new-image?type=OVERVIEW

  27. Reactions and compounds: directed bipartite graph a bipartite graph is a graph whose vertex-set V can be partitioned into two subsets U and W, such that each edge of G has one endpoint in U and one endpoint in W. arcs never go from compound to compound arcs never go from reaction to reaction 5,871 compounds 5,223reactions 21,194arcs

  28. Basic concepts of graph theory Phylogenetic trees are acyclic graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  29. The tree of life • This is the only figure of the book “The Origin of Species” (C. Darwin, 1859).

  30. Basic concepts of graph theory From trees back to cyclic graphs Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  31. The ramifications of the universal tree Rivera, M. C. and Lake, J. A. (2004). Nature 431, 152-5. The ring of life provides evidence for a genome fusion origin of eukaryotes. Doolittle, W. F. (1999). Science 284, 2124-9. Phylogenetic classification and the universal tree. Lima-Mendez, G., Van Helden, J., Toussaint, A. and Leplae, R. (2008). Mol Biol Evol 25, 762-77. Reticulate representation of evolutionary and functional relationships between phage genomes.

  32. Basic concepts of graph theory The Gene Ontology as a directed acyclic graph (DAG) Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  33. Gene ontology: processes

  34. Gene ontology: molecular functions

  35. Gene ontology: cellular components

  36. Gene Ontology Databasehttp://www.geneontology.org/

  37. Gene Ontology Database (http://www.geneontology.org/)Example: methionine biosynthetic process

  38. STRING Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  39. STRING: an integrative database of interactions • http://string.embl.de/ • The STRING database integrates interaction networks (genetic and protein interactions) inferred from various type of evidences • Literature curation • High-throughput interactome • Co-expression • Automatic text mining • Synteny (conservation of gene order across genomes) • Gene proximity (operon inference for bacteria) • Phylogenetic profiles (co-occurrence of gene pairs across genomes)

  40. Basic concepts of graph theory Supplementary material Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  41. Basic concepts of graph theory Graph descriptions Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

  42. Graph descriptions : adjacency matrix e1 e3 a b c e2 e5 e7 e6 e4 d e f g h e8 • one row per vertex • one colum per vertex • value = 1 if vertices are adjacent • diagonal = self-loops • problems • no possibility to represent multi-arcs • inefficient storage (many empty cells)

  43. Graph descriptions : adjacency list A list of out-going vertices is associated to each vertex Compact representation Optionally, a list of in-going vertices can be added to allow reverse-traversal of the graph e1 e3 a b c e2 e5 e7 e6 e4 d e f g h e8

  44. Graph descriptions : list of arcs e1 e3 a b c e2 e5 e7 e6 e4 d e f g h e8 • tab-delimited text file • one row per arc • one column for heads • one column for tails • optional columns for arc attributes (label, weight, color, …)

More Related