Phylogenetics workshop: Protein sequence phylogeny week 2 - PowerPoint PPT Presentation

cormac
phylogenetics workshop protein sequence phylogeny week 2 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Phylogenetics workshop: Protein sequence phylogeny week 2 PowerPoint Presentation
Download Presentation
Phylogenetics workshop: Protein sequence phylogeny week 2

play fullscreen
1 / 30
Download Presentation
Phylogenetics workshop: Protein sequence phylogeny week 2
107 Views
Download Presentation

Phylogenetics workshop: Protein sequence phylogeny week 2

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Phylogenetics workshop:Protein sequence phylogenyweek 2 Darren Soanes

  2. Species trees • Interpretation of trees • Taxon sampling • Tools • Lateral (horizontal) gene transfer • Fast evolving genes

  3. Using DNA sequence to construct trees TGCTATT TGCTTTT TGCTTTT TGCTTTT – sequence change due to mutation TGCTATT – ancestral DNA sequence

  4. Reversals can confuse phylogenies TGCTATT TGCTTTT TGCTTTT TGCTTTT TGCTATT TGCTATT reversal TGCTTTT – sequence change TGCTATT – ancestral DNA sequence

  5. To minimise the effect of reversals • Use DNA sequences that are evolving slowly – mutations happen rarely. • Use long stretches of DNA. • Align sequences, use the parts of the alignment that show a high degree of conservation. • rDNA sequences (genes that encode ribosomal RNA) are often used.

  6. Species tree constructed using ribosomal DNA (rDNA) sequence

  7. Using protein sequences to create species trees • Advantages • protein sequences evolve more slowly than DNA sequences (many DNA mutations are neutral – they do not change amino acid sequences) • reversals are less common than in DNA • Single copy protein encoding genes identified • Protein sequences joined together to create a multiple protein sequence for each species • Sequences aligned • Disadvantage – need sequenced genomes

  8. Fungal species trees – more proteins = better resolution oomycete (not fungi) 30 proteins microsporidia plant zygomycete basidiomycetes yeasts ascomycetes 60 proteins filamentous ascomycetes

  9. Fungal Species Tree (based on 153 concatenated protein sequences)

  10. Clades A clade consists of an ancestor organism and all its descendants.

  11. Gene trees • The evolutionary history of genes can be represented as phylogenetic trees based on alignment of protein sequences. • Gene duplication and loss can be inferred from phylogenetic trees. • Protein sequences evolve more slowly that DNA sequences (due to redundancy in genetic code)

  12. Gene duplication Gene duplication due to unequal crossing over during meiosis can create gene families. Sequence and function of different members of a gene family can diverge.

  13. Gene duplication

  14. Sequence homology (1) Genes are said to be homologous if they share a common evolutionary ancestor. Orthologues are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologues retain the same function in the course of evolution. (e.g. myoglobin in mammals).

  15. Sequence homology (2) Paralogous genes are related by duplication within a genome. Paralogues often evolve new functions, even if these are related to the original one. In-paralogues, paralogues that were duplicated aftera speciation and are therefore in the same species Out-paralogues, paralogues that were duplicated before a speciation. Not necessarily in the same species.

  16. Orthology and paralogy

  17. Paralogues A, B and C are different species α and β are different paralogues of the same gene Out-paralogues In-paralogues

  18. Evolution of globin superfamily in human lineage

  19. TOR gene duplication events in fungi TOR: protein kinase, subunit of a complex that regulate cell growth in response to nutrient availability and cellular stresses

  20. Taxon sampling methods • BLAST easiest – though subjective • Occurence of Pfam (protein family) motif • Clustering e.g. • INPARANOID http://inparanoid.sbc.su.se/cgi-bin/index.cgi • orthoMCLhttp://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi

  21. Minimum bootstrap • 70% bootstrap is thought to be broadly similar to P-value 0.05 • Minimum bootstrap used depends on study • To improve bootstrap support • remove poorly aligned sequences if possible, can be due to mis-annotation of genomes. • Change taxon sampling

  22. Collapse branches with bootstrap less than defined value

  23. Lateral gene transfer (purine-cytosine permease) oomycete fungi

  24. Eukaryotic Tree of Life Phytophthora sojae Aspergillus oryzae

  25. Genes that evolve quickly (1) • Synonymous substitution – change in DNA sequence that does not affect the amino acid sequence, often in the third position of a codon, e.g. CCG (Pro)→CCA (Pro). • Non-synonymous substitution - change in DNA sequence that does affect the amino acid sequence, often in the first or second position of a codon, e.g. CCG (Pro)→CAG (Gln).

  26. Genes that evolve quickly (2) • For a given protein encoding gene (comparison between orthologues in more than one species) • dN=number of non-synonomous mutations • dS=number of synonomous mutations • We can calculate the ratio dN/dS. • For most genes this is < 1 • Genes under evolutionary pressure to change protein sequence (diversify), dN/dS > 1

  27. Genes that evolve quickly (3) • CodeML (part of the PAML package) will calculate dN/dS for a set of orthologues from different (closely related) species. • Human vs Chimpanzee – rapidly evolving genes involved in immunity, reproduction and olfaction (smell). • Genes with very low dN/dS (under purifying selection) involved in metabolism, intracellular signalling, nerve / brain function.