1 / 78

Evolutionary Biology in Genome Annotation

Explore the use of evolutionary biology concepts in genome annotation for gene structural and functional annotation. Discover the evolution of genomes through nucleotide substitution, gene loss, duplication, and rearrangement. Understand orthologs, paralogs, and the implications of speciation and functional shifts.

keran
Download Presentation

Evolutionary Biology in Genome Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The use of the concepts of evolutionary biology in genome (biological) annotation. Pierre Pontarotti EA 3781 Evolution Biologique pontarot@up.univ-mrs.fr http://www.up.univ-mrs.fr/evol/

  2. Somes Concepts in evolutionary biology • Use of the concepts for • Gene Structural and functional annotation. Informatisation Others concepts

  3. Arthropods Gastrotrichs Nematodes ECDYSOZOANS Onychophorans Tardigrades Kinorhynchs PROTOSTOMES Priapulids Molluscs Rotifers Urbilateria Annelids Gnathostomulids Sipunculans BILATERIA Nemerteans Pogonophorans LOPHOTROCHOZOANS Platyhelminthes Entoprocts Bryozoans Brachiopods Phoronids ?? Vertebrates Cephalochordates Urochordates DEUTEROSTOMES Hemichordates Echinoderms Ctenophorans Cnidarians Poriferans Metazoan Phylogeny ( Adoutte et al. 2000)

  4. URBILATERIA : The hypothetical Metazoan Ancestor Geoffroy de St Hilaire during XIX th Century URBILATERIA Genome evolved by the fixation of : • Nucleotide substitution • Gene loss • Genic duplication • Gene duplication • Genome region duplication • Whole genome duplication • Chromosomal rearrangement

  5. Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Vertebrates Lisamphibia 528 T2 Actinopterygii (Zebrafish) Chondrichthyes (shark) Deutérostomata 564 T1 Pikaia Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) 20 000 genes >751 Céphalochordata (amphioxus) <833-993 Urochordata (Ciona) Echinodermata Insects (Drosophila) 833-993 Protostomata Nématod (c. elegans)

  6. Allele A fixation and accumulation of new mutations A B C D POP 1 split in 2 autonomous populations A1 A2 POP 1A A B C D Allele B fixation and accumulation of new mutations Population : POP 1 A B C D B1 B2 POP 1B From alleles to orthologs Points mutations I

  7. POP 1A split in 2 autonomous populations Allele A1 fixation and accumulation of new mutations A11 A12 POP 1A1 A1 A2 Allele A2 fixation and accumulation of new mutations POP 1A A21 A22 POP 1A2 A1 A2 POP 1Bsplit in 2 autonomous populations Allele B1 fixation and accumulation of new mutations B11 B12 POP 1B1 Allele B2 fixation and accumulation of new mutations B1 B2 B1 B2 POP 1B B21 B22 POP 1B2 From allelesto orthologspoints mutations

  8. A.1.1 A.1.2 A.2.1 A.2.2 B.1.1 B.1.2 B.2.1 B.2.2 Alleles Orthologs Alleles Alleles Alleles From allelesto orthologs

  9. A1 A2 A3 A1 A2 A3’ A3” A1 A2 A3 URBILATERIA Speciation A1/2 A3 Duplication A Orthologs and paralogs HUMAN multigenic family DROSOPHILA multigenic family A1, A2, B Paralogs

  10. Speciation Duplication Orthology/ Paralogy A1 HUMAN Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. A1 DROSO A1/2 A2 HUMAN Paralogs : 2 genes resulting from a duplication event in a genome. A2 DROSO A A3’ HUMAN A3” HUMAN Co-Orthologues A3 A3 DROSO

  11. From Gene History To Gene Function

  12. A Orhologs under purifying selection HUMAN Ancestral Function DROSOPHILA Ancestral Function A A Purifying Selection Purifying Selection Speciation URBILATERIA

  13. A Ortholog functional switch HUMAN New Function ? DROSOPHILA Ancestral Function A A2 Positive selection Or relaxed Purifying Selection Speciation URBILATERIA

  14. A Co-ortholog Sub Functionalization DROSOPHILA Ancestral Function HUMAN Sub-Function HUMAN Sub-Function A A’ A” Duplication Purifying Selection Speciation URBILATERIA

  15. A Co-ortholog Neo Functionalization HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A A2 Positive or relaxed selection Duplication Purifying Selection Purifying Selection Speciation URBILATERIA

  16. Orthology /paralogy information • is important for functional inference • (forget for species with high level of horizontal transfer)

  17. Speciation Duplication Orthology/ Paralogy Orthologs : 2 genes on different species Which come from a common ancestor and separated by a speciation event. A1 HUMAN A1 DROSO A1/2 A2 HUMAN Paralogs : 2 genes resulting from a duplication event in a genome. A2 DROSO A A3’ HUMAN A3” HUMAN Co-Orthologues A3 A3 DROSO

  18. A Warning that will be discussed by other intervenants Many scientists are using the best BLAST hit to look for orthologous relationship … BUT! Many co orthologs can be present Problem with genomes that are not fully sequenced Or when gene loss occurred AND Even with Phylogenetic analysis: • Bias must be corrected. • A phylogenetic tree is hypothetical

  19. Evolutionary shift (due to positive or relaxed selection) could be linked to functional shift .See N Galtier and A Levasseur talks.

  20. Detection of Positive selection and functional shift

  21. Detection of Evolutionary constraint relaxation and functional shift

  22. A Co-ortholog Neo Functionalization HUMAN Ancestral Function HUMAN New Function DROSOPHILA Ancestral Function A A A2 Duplication Purifying Selection Purifying Selection Speciation URBILATERIA

  23. Paralogue replacement PSMB5 PSMB8 (LMP 7) PSMB6 PSMB9 (LMP 2) PSMB7 PSMB10 (LMP Z) Constitutive Proteasome Immuno-Proteasome • New function (specialization) (Specific size protein or peptide degradation – used by MHC system) • Only found in vertebrates • Ancestral function : Protein degradation • Present in all Metazoans, therefore present in Urbilateria (Metazoan ancestor). Constitutive proteasome β-subunits replacement after Interferon-γ stimulation Paralogue = duplicated gene

  24. Large scale gene duplication in vertebrate lineage Amniota (Human) 360 450 Vertebrates Lisamphibia 528 ImmunoProteasome Actinopterygii (Zebrafish) Chondrichthyes (shark) Deutérostomata 564 Cephalaspidomorphi (lamprey) 751 Myxini (Hagfish) Proteasome >751 Céphalochordata (amphioxus) <833-993 Urochordata (Ciona) Echinodermata PROTEASOME Insects (Drosophila) 833-993 Protostomata Nématod (c. elegans)

  25. 58 59 * 52 PSMB7 Mus 69 91 80 99 PSMB7 Ratt 91 100 95 PSMB7 Bos 98 * PSMB7 Homo 62 Duplication 88 PSMB7 Gall 75 PSMB7 Xeno * 93 * * PSMB7 Zebra * 95 59 58 PSMB7 Fugu PSMB10 Zebra 95 78 99 PSMB10 Fugu 74 100 * PSMB10 Bos 93 100 * PSMB10 Mus 100 * PSMB10 Homo 62 80 PSMB7/10 Bran * PSMB7/10Ci-zeta Cionai 78 76 PSMB7/10 Bombyx * PSMB7/10Prosbeta2 * 95 * PSMB7/10CG18341 Drosophila 44 0.1

  26. The study genes and genomes HISTORY. Help to find evidences for gene FUNCTION.

  27. Concepts in evolutionary biology • Use of the concepts for • Structural and functional annotation. • Structural annotation (deciphering of gene structure). • Functional annotation (especially the use of phylogeny to decipher proteins function). .

  28. Functional annotation Biochemical and Biological process : • Experimental approach : • RNA Interference • Tandem affinity purification and mass spectrometry • In Silico

  29. Functional annotation • Functional Annotation Based on phylogeny. from experimentally annotated genes…

  30. INTERLUDE • FUNCTION???? • A complex concept;

  31. Function Prediction Using orthology information (done) Using the evolutionary shift information (in progress) Function prediction by Integrative phylogenomics (Engelhardt et alPLOS Computional biology 2005) (in progress)

  32. Functional annotation Homologs with experimentally known function: how information can be found. Gene Ontology SwissProt GenBank MedLine Textual Information Analysis G.O. Standard

  33. Functional annotation Gene Ontology Classification • Biological process – biological process to which the gene or gene product contributes. • Cell growth and maintenance; pyrimidine metabolism; … • Molecular function – biochemical activity, including specific binding to ligands or structures, of a gene product. • Enzyme, transporter; Toll receptor ligand, … • Cellular component – place in the cell where a gene product is active. • Cytoplasm, ribosome, … . Plus others classifications to develop: In particular evolutionary based ontology

  34. Small fraction correspond to known, well-characterized proteins.If the function is unknown : Phylogenetic analysis : Functional prediction: • Using orthology information • Using the evolutionary shift information • by integrative Phylogenomics

  35. GgaTNFSF10 99 96 DreTNFSF10 HsaTNFSF10 73 PolTNFSF11 79 DF1 HsaTNFSF11 95 78 XlaTNFSF11 GgaTNFSF5 99 MmuTNFSF5 99 HsaTNFSF5 98 79 BboTNFSF5 HsaTNFSF2 99 MmuTNFSF2 96 HsaTNFSF1 99 MmuTNFSF1 88 MmuTNFSF15 99 DF2 74 HsaTNFSF15 HsaTNFSF14 99 MmuTNFSF14 HsaTNFSF6 99 RnoTNFSF6 MmuTNFSF6 69 HsaTNFSF13 99 GgaTNFSF13 68 PolTNFSF13 MmuTNFSF7 99 HsaTNFSF7 55 MmuTNFSF8 DF3 99 HsaTNFSF8 58 MmuTNFSF9 97 HsaTNFSF9 EIGER (DmeTNF) 0,2 Tumor necrosis factor family Phylogenetic tree : Orthologs identification Atherosclerotic plaque formation ALPS - LPR/GLD Lymphoproliferative syndrome Trends in Immunology (July 2003)

  36. TNFRSF10B TNFRSF10A TNFRSF10C TNFRSF10D Human TNF family Phylogenetic tree : Search for the closest Paralog Functional annotation Molecular Function Biological Process TNFSF3 TNFRSF3 LN, PP, GC, Tumorocidal activity PP, GC, T cell Homeostasis (death) TNFSF1 TNFRSF1A T cell Homeostasis (death) TNFSF2 TNFRSF1B T cell costimulation, negative selection? TNFRSF12 TNFSF15 T cell Homeostasis (survival?), CTL activation, peripheral tolerance? TNFRSF14 TNFSF14 TNFRSF6B T cell Homeostasis (death), CTL function, peripheral tolerance, T cell costimulation, chemotaxis TNFSF6 TNFRSF6 T cell transmigration and homeostasis (survival)? TNFSF18 TNFRSF18 T cell homeostasis (survival), peripheral tolerance TNFSF4 TNFRSF4 GC, B cell function, peripheral tolerance, T cell priming TNFSF5 TNFRSF5 Tumorocidal activity, T cell function? Tumorocidal activity, T cell function? TNFSF10 TNFRSF11B TNFSF11 TNFRSF11A LN, bone Homeostasis, mammary gland development B cell Homeostasis B cell Homeostasis ? B cell Homeostasis BR3 TNFSF13B TNFRSF17 TNFSF13 TACI TNFSF12? T cell activation? TNFSF7 TNFRSF7 TNFSF9 TNFRSF9 T cell activation and survival, CTL activity, Tumorocidal actvity? TNFSF8 TNFRSF8 Negative selection, autoimmunity TNFRSF19 ? Tooth, hair, sweat gland formation EDA-A1 EDAR EDA-A2 XEDAR Tooth, hair, skin formation? TNFRSF21 ? Trends in Immunology (July 2003) RELT ?

  37. Small fraction correspond to known, well-characterized proteins.If the function is unknown : Phylogenetic analysis : Gene function prediction: • Using orthology information • Using the evolutionary shift information ( see Levasseur talk) • by integrative Phylogenomics

  38. evolutionary biology concepts for genome annotation Further reading Concepts Levasseur A, Danchin E, Orlando L, Bailly X, Pontarotti P. Conceptual bases for quantifying the role of the environment on genomes evolution: the participation of positive selection and neutral evolution Biological review in press Danchin E.G.J, et al. The Major Histocompatibiliy Complex Origin Immunological reviews. 2004;198(1):216-232. Concepts for applied evolution Danchin E.G.J, Levasseur A, Lopez-Rascol V, Gouret P, Pontarotti P. The use of evolutionary biology concepts for genome annotation. J. Exp. Zoology Part B: Mol. and Dev. Evol. 2006

  39. Informatisation des concepts et connaissances • Phylogénie • Détection des gènes orthologues et paralogues • Détection de changements évolutifs (en cours) • Prévision de fonctions

  40. FIGENIX est une plate-forme logicielle multi-utilisateur dédiée aux taches d'annotation structurales et fonctionnelles: - Prédictions de gènes pour de grandes séquences d'ADN - Construction d'arbres phylogénétiques robustes - Détection automatique d'orthologues et de paralogues - Recherche automatique de données fonctionnelles sur les gènes disponibles à partir de bases de données « Web » - Filtrage et construction de bases de données protéiques (contigage d'EST) - Processus chainés (ex: Prédiction de gènes suivie d'études phylogénétiques pour chacun)

  41. ETAPES DU PIPELINE de Phylogénie (1) Séquence protéique codée par un gène putatif Ensembl NR… BLAST + filtrage CLUSTAL W + purification + correction de biais Alignement multiple PFAM Recherche de domaines par HmmPFAM Conservation « repeats » monophylétiques Enumération domaines Construction Arbre de la Vie Alignement « repeats » fusionnés Existence « repeats »? O N Arbre de référence Test de composition par TREEPuzzle pour élim séq trop divergentes Création domaine « FIGENIX » (correctDomains) Conservation alignement complet

  42. ETAPES DU PIPELINE de phylogénie (2) Détection « groupes de paralogie » + élim sites qui évol trop vites (« test de Gu ») Élim séq >30% « gaps » Construction Arbre de la Vie Élim domaines les + non congruents détectés par HomPart de PAUP Arbre de référence Test de saturation NJ Parcimonie Maximum de vraisemblance arbre arbre arbre Comparaison topologies par tests Templeton-Hasegawa Topologies congruentes? Arbre consensus Arbre NJ N O Détection orthologues I recherche de fonctions

More Related