1 / 49

Detecting horizontal gene transfers using discrepancies in species and gene classifications

Detecting horizontal gene transfers using discrepancies in species and gene classifications. Alix Boc Vladimir Makarenkov Université du Québec à Montréal. Presentation summary. Some words about phylogeny Network models in phylogenetic analysis What is a horizontal gene transfer (HGT)?

jthelen
Download Presentation

Detecting horizontal gene transfers using discrepancies in species and gene classifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting horizontal gene transfers using discrepancies in species and gene classifications Alix Boc Vladimir Makarenkov Université du Québec à Montréal

  2. Presentation summary • Some words about phylogeny • Network models in phylogenetic analysis • What is a horizontal gene transfer (HGT)? • Description of the new method • Examples of application • Future works • T-Rex software

  3. Recontruction of a phylogenetic tree DNA Sequences Distance Matrix Phylogenetic Tree A: CGTAAT B: CGTACG C: CGTCGA D: ACT……… E: ……………… F: ………………

  4. Inferring phylogenetic trees • Four main approaches: • Distance-based methods • UPGMA by Michener and Sokal (1957) • ADDTREE by Sattath et Tversky (1977) • Neighbor-joining (NJ)by Saitou and Nei (1988) • UNJ and BioNJ methods by Gascuel (1997) • Fitch by Felsenstein (1997) • Weighted least-squares MW by Makarenkov and Leclerc (1999) • Maximum Parsimony (Camin and Sokal 1965; Farris 1970; Fitch 1971) • Maximum Likelihood (Felsenstein 1981) • Bayesian approach (Rannala and Yang 1996; Huelsenbeck and Ronquist 2001)

  5. Phylogenetic mechanisms requiring a network representation • Horizontal gene transfer (i.e. lateral gene transfer) • Hybridization • Homoplasy and gene convergence • Gene duplication and gene loss

  6. Software for building phylogenetic networks • SplitsTree, Huson (1998) • T-Rex, Makarenkov (2001) • NeighborNet, Bryant and Moulton (2002)

  7. Methods for detecting horizontal gene transfers • Hein (1990) and Hein et al. (1995, 1996) • Haseler and Churchill (1993) • Page (1994); Page and Charleston (1998) • Charleston (1998) • Hallet and Lagergren (2001) • Mirkin, Fenner, Galperin and Koonin (2003) • V’yugin, Gelfand and Lyubetsky (2003) • Boc and Makarenkov (2003); Makarenkov, Boc and Diallo (2004)

  8. Three types of horizontal gene transfer

  9. The new model Basic ideas: Reconcile the species and gene phylogenetic trees using either a topological (Robinson and Foulds topological distance) or a metric (least-squares) criterion 2) Incorporatenecessarybiological rules into the mathematical model 3) Maintain the algorithmic time complexity polynomial

  10. Partial gene transfer versus complete transfer (a) (b)

  11. Biological rules

  12. Partial gene transfer. Incorporating biological rules. Situations when a new HGT branch (a,b) can affect the evolutionary distance between species i and j, and cannot affect the distance between i1 and j.

  13. Partial gene transfer. Incorporating biological rules (2). Three cases when the evolutionary distance between the species i and j is not affected by addition of a new HGT branch (a,b)

  14. Partial gene transfer. Incorporating biological rules (3). No HGTs can be considered when affected branches are located on the same lineage

  15. Partial gene transfer. Incorporating biological rules (4). No HGT can be considered when two HGTs affecting a pair of lineages intersect as shown

  16. Partial gene transfer. Incorporating biological rules (5). • Cases (a) and (b): path between the leaves i and j is allowed to go through both HGT branches (a,b) and (a1,b1). • Cases (c) and (d) : path between the leaves i and j is not allowed to go through both HGT branches (a,b) and (a1,b1).

  17. Sub-Tree constraint • To arrange the topological conflicts between T and T1 that are due to the • transfers between single species or their close ancestors. • To identify the transfers that have occurred deeper in the phylogeny. Timing constraint: the transfer between the branches (z,w) and (x,y) of the species tree T can be allowed if and only if the cluster regrouping both affected sub-trees is present in the gene tree T1. Here and further in the article a single branch is depicted by a plane line and a path is depicted by a wavy line.

  18. Optimization

  19. Optimization problem : Least-squares The least-squares loss function to be minimized with an unknown length l of the HGT branch (a,b): Q(ab,l) = + min d(i,j) - the minimum path-length distance between the leaves (i.e. taxa) i and j in the tree T (i,j) - the given dissimilarity value between i and j dist(i,j) = d(i,j) – Min { d(i,a) + d(j,b); d(j,a) + d(i,b) }

  20. Optimization problem : Robinson and Foulds topological distance The topological distance of Robinson and Foulds (1981) between two phylogenetic trees is equal to the minimum number of elementary operations consisting of merging or splitting vertices necessary to transform one tree into another.

  21. Robinson and Foulds topological distance Robinson and Foulds distance between T and T1 is 2. The HGT minimizing the Robinson and Foulds topological distance between the species and gene phylogenetic trees can be considered as the best candidate to reconcile the species and gene phylogenies.

  22. Algorithm

  23. Input file for our program Set X of Taxa = {A,B,C,D,E,F} 6 A 0 2 3 5 5 4 B 2 0 3 5 5 4 C 3 3 0 4 4 3 D 5 5 4 0 2 3 E 5 5 4 2 0 3 F 4 4 3 3 3 0 A 0 4 4 2 4 4 B 4 0 4 4 2 4 C 4 4 0 4 4 2 D 2 4 4 0 4 4 E 4 2 4 4 0 4 F 4 4 2 4 4 0 6 Distance Matrix for the species tree Distance Matrix for the gene tree

  24. Program options • Optimization criterion : Least-Squares or Robinson and Foulds distance. • Type of scenario : Unique or Multiple. • Maximum number of HGTs. • Position of the root.

  25. Algorithm : unique scenario Begin Reconstruction of the species tree T Reestimate the length of each branch in T While Optimization criterion > 0 loop Test all possible HGTs Add the best HGT Reestimate the length of each branch in T Compute the value of the optimization criterion EndLoop End

  26. Algorithm : multiple scenario Begin Reconstruction of the species tree T Reestimate the length of each branch in T Test all connections between pairs of branches Establish a list of HGTs ordered according to the optimization criterion. End

  27. Algorithm : Step 1 • Reconstruction of the species tree T with Neighbor Joinning • Set X of n taxa • Binary tree: internal nodes are all of degree 3, 2n-3 branches • T is explicitly rooted

  28. Algorithm : Step 2 • Comparing the gene tree T1 and the species tree T Criterion 2 : Reestimate the length of each branch of the species tree T according to the distances in T1. LS - Least-Squares coefficient between distances in T and T1 If LS == 0 then There is no HGTs Else Step 3 (next slide) End if Criterion 1 : RF - Robinson and Foulds distance between T and T1 If RF == 0 then There is no HGTs Else Step 3 (next slide) End if

  29. Algorithm : Step 3 • Multiple Scenario • Test all connections between pairs of branches. • Reestimate the length of each branch in T according to the gene distance matrix. • Establish a list of HGTs ordered according • to the least-squares coefficient or the • Robinson-Foulds distance.

  30. Algorithm : Step 3 Species Tree Upcoming HGT1 Species Tree + HGT1 Upcoming HGT2 Species Tree + HGT2 Upcoming HGT3 Species Tree + HGT3 (Gene Tree) • Unique Scenario • The best HGT found is added to the species tree. • The length of each branch is reestimated according to the gene tree. • RF distance or LS coefficient are computed. 1 2 3

  31. output Type de scenario : Unique Liste des aretes et leur longueur de l'arbre d'especes construit avec NJ 1 7---B 1.800000 2 8---C 1.800000 3 9---D 1.800000 4 10---9 0.000020 5 9---E 1.800000 6 10---F 1.800000 7 7---A 1.800000 8 7---8 0.000020 9 10---8 0.000020 Le critere des moindres carres LS pour l'arbre d'especes dont les branches sont evaluees en fonction de l'arbre de gene est: 9.600160 La racine se trouve sur la branche 8--10 ===================== TLG #1 ====================== Menant de la branche 7--B a la branche 10--9 LS = 5.333387 RF = 4 ===================== TLG #2 ====================== Menant de la branche A--7 a la branche 9--D LS = 0.000000 RF = 0

  32. Examples

  33. Application example 1 Horizontal transfer of the Rubisco Large subunit gene Delwiche, C.F., and J. D. Palmer. 1996. Rampant Horizontal Transfer and Duplication of Rubisco Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882.

  34. rbcL Gene Phylogeny

  35. Delwiche and Palmer (1996) - hypotheses of HGTs 1- Cyanobacteria → γ-Proteobacteria 2- α-Proteobacteria → Red and brown algae 3- γ-Proteobacteria →α-Proteobacteria 4- γ-Proteobacteria →β-Proteobacteria

  36. HGTs of the rbcL gene 1 4 3 6 7 2 5 8

  37. HGTs of the rbcL gene - comparison Hypotheses by Delwiche and Palmer (1996) 1- Cyanobacteria → γ-Proteobacteria 2- α-Proteobacteria → Red and brown algae 3- γ-Proteobacteria →α-Proteobacteria 4- γ-Proteobacteria →β-Proteobacteria Solution 1. a-Proteobacteria →β-Proteobacteria 2. α-Proteobacteria → Red and brown algae 3. b-Proteobacteria →γ-Proteobacteria 4. b-Proteobacteria →a-Proteobacteria 5. γ-Proteobacteria →Cyanobacteria 6. β-Proteobacteria →γ-Proteobacteria 7. γ-Proteobacteria →β-Proteobacteria 8. Cyanobacteria →γ-Proteobacteria

  38. Application example 2 Horizontal transfers of the protein rpl12e Data taken from: Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. Archaeal phylogeny based on ribosomal proteins. (2002). Mol. Biol. Evol. 19, 631-639.

  39. Rpl12e HGTs Assumed HGTs of the rpl12e gene involved the clusters of Crenarchaeota and Thermoplasmatales (Matte-Tailliez, 2004) Species tree Rpl12e gene tree

  40. Reconciliation scenario 74% 3 60% 2 69% 4 60% 5 1 55%

  41. Application example 3 • Horizontal transfers of the PheRS synthetase • Data taken from: • Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.

  42. PheRS synthetase

  43. Reconciliation scenario 60% 4 85% 2 5 65% 88% 1 62% 3

  44. Software

  45. T-REX — Tree and Reticulogram Reconstruction1 Downloadable from   http://www.info.uqam.ca/~makarenv/trex.html Authors: Vladimir Makarenkov Versions: Windows 9x/NT/2000/XP and Macintosh With contributions from A. Boc, P. Casgrain, A. B. Diallo, O. Gascuel, A. Guénoche, P.-A. Landry, F.-J. Lapointe, B. Leclerc, and P. Legendre. ________ 1Makarenkov, V. 2001. T-REX: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics 17: 664-668.

  46. T-Rex : Multiple scenario screenshot Bioinformatics software

  47. T-Rex Web infrastructure

  48. Future developments • Maximum Likelihood model • Maximum Parsimony model • Decreasing the running time

  49. Bibliography • Boc, A. and Makarenkov, V. (2003), New Efficient Algorithm for Detection of Horizontal Gene Transfer Events, Algorithms in Bioinformatics, G. Benson and R. Page (Eds.), 3rd Workshop on Algorithms in Bioinformatics, Springer-Verlag, pp. 190-201. • Delwiche, C.F., and J. D. Palmer (1996). Rampant Horizontal Transfer and Duplication of Rubisco Genes in Eubacteria and Plastids. Mol. Biol. Evol. 13:873-882. • Makarenkov,V. (2001), T-Rex: reconstructing and visualizing phylogenetic trees and reticulation networks. Bioinformatics, 17, 664-668. • Makarenkov, V., Boc, A., Delwiche, C.F. and Philippe, H. (2005), A novel approach for detecting horizontal gene transfers: Modeling partial and complete gene transfer scenarios, submittedMol. Biol. Evol. • Makarenkov, V., Boc, A. and Diallo A.B. (2004), Representing Lateral gene transfer in species classification. Unique scenario, IFCS’2004 proceedings, Chicago. • Matte-Tailliez O., Brochier C., Forterre P. & Philippe H. (2002). Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19, 631-639. • Robinson, D.R. and Foulds L.R. (1981), Comparison of phylogenetic trees, Mathematical Biosciences 53, 131-147. • Woese, C. R., G. Olsen, M. Ibba, and D. Söll. 2000. Aminoacyl-tRNA synthetases, the genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64:202-236.

More Related