1 / 40

Distance-Based Genome Rearrangement Phylogeny

Distance-Based Genome Rearrangement Phylogeny. Li-San Wang University of Texas at Austin and University of Pennsylvania. Genomes As Signed Permutations. 1. 6. 1 –5 3 4 -2 -6 or 5 –1 6 2 -4 -3 etc. 2. 5. 4. 3. Genomes Evolve by Rearrangements.

jase
Download Presentation

Distance-Based Genome Rearrangement Phylogeny

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distance-Based Genome Rearrangement Phylogeny Li-San Wang University of Texas at Austin and University of Pennsylvania

  2. Genomes As Signed Permutations 1 6 1 –5 3 4 -2 -6 or 5 –1 6 2 -4 -3 etc. 2 5 4 3

  3. Genomes Evolve by Rearrangements 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

  4. S3 S1 S2 FN S5 S4 S1 S2 S3 S4 S5 S4 S1 S3 FP S2 S5 Phylogeny Reconstruction FN: false negative (missing edge) FP: false positive (incorrect edge) 50% error rate

  5. Results 5% Error

  6. Outline • Genome Rearrangement Evolution • True evolutionary distance estimators • Distance-based phylogeny reconstruction • Simulation study: accuracy of tree reconstruction

  7. Our Model: the Generalized Nadeau-Taylor Model [STOC’01] • Three types of events: • Inversions (INV) • Transpositions (TRP) • Inverted Transpositions (ITP) • Events of the same type are equiprobable • Probabilities of the three types have fixed ratio • We focus on signed circular genomes in this talk.

  8. Additive Distance Matrix and True Evolutionary Distance (T.E.D.) S1 S2 S3 S4 S5 S3 S1 0 9 15 14 17 S1 S2 0 14 13 16 S4 7 5 5 S3 0 13 16 3 1 S4 0 13 4 S5 0 8 S2 S5 Theorem [Waterman et al. 1977] Given an m×m additive distance matrix, we can reconstruct a tree realizing the distance in O(m2) time.

  9. Error Tolerance of Neighbor Joining Theorem[Atteson 1999]Let {Dij} be the true evolutionary distances, and {dij} be the estimated distances for T. Let be the length of the shortest edge in T. If for all taxa i,j, we havethen neighbor joining returns T.

  10. Edit Distances Between Genomes • (INV) Inversion distance [Hannenhalli & Pevzner 1995] • Computable in linear time [Moret et al 2001] • (BP) Breakpoint distance [Watterson et al. 1982] • Computable in linear time • NJ(BP): [Blanchette, Kunisawa, Sankoff, 1999] A = 1 2 3 4 5 6 7 8 9 10 B = 1 2 3 -8 -7 -6 4 5 9 10 BP(A,B)=3

  11. NJ(BP) and NJ(INV) Inversion only Transpositions/inverted transpositions only 120 genes, 160 leaves Uniformly Random Trees

  12. BP and INV BP/2 vs K (120 genes) INV vs K (K: Actual number of inversions) (Inversion-only evolution)

  13. Objectives • Better techniques for estimating true evolutionary distances • Mathematical guarantees whenever possible • Good accuracy in simulation • Robust to model violations • Main goal: improve topological accuracy of tree reconstructions using neighbor joining and weighbor

  14. New Estimators • Exact-IEBP • Approx-IEBP • EDEIEBP: Inverting Expected BreakPoint distanceEDE: Empirically Derived Estimator (inverts the expected inversion distance)

  15. Distance-Based Methods BP NJ INV Weighbor IEBP(Exact-,Approx-) EDE

  16. K BP/2 (2) (1) BP/2 K Estimate True Evolutionary DistancesUsing BP • To use the scatter plot to • estimate the actual number • of events (K): • Compute BP/2 • From the curve, look up the corresponding valueof K BP/2 vs K (120 genes) (K: Actual number of inversions) (Inversion-only evolution)

  17. Using Breakpoints to Estimate True Evolutionary Distances • Compute fn(k)= E[BP(G0,Gk)](i.e. the expected number of breakpoints after k random events; n is the number of genes) • Given two genomes G and G’: • Compute breakpoint distance d=BP(G,G’) • Find k so that fn (k) is closest to d • Challenge: finding fn (k)

  18. True Evolutionary Distance (t.e.d.) Estimators for Gene Order Data IEBP: Inverting the Expected BreakPoint distance EDE: Empirically Derived Estimator

  19. Approx-IEBP [Wang & Warnow, STOC’01]

  20. True Evolutionary Distance Estimators BP vs K (120 genes) IEBP vs K (K: Actual number of inversions) (Inversion-only evolution)

  21. BP INV IEBP True Evolutionary Distance Estimators EDE

  22. 2. Regression Formula for E(INV) • Let n be the number of genes, and k be the number of inversions • We use nonlinear regression to obtain easily computable formulas for E(INV) and Var(INV): -> b=0.5956, c=0.4577 1. 3. 4. exists for all

  23. Absolute Difference Plot • Motivation: Recall the criterion of Atteson’s Theorem. For all i,j:

  24. Error Tolerance of Neighbor Joining Theorem[Atteson 1999]Let {Dij} be the true evolutionary distances, and {dij} be the estimated distances for T. Let be the length of the shortest edge in T. If for all taxa i,j, we havethen neighbor joining returns T.

  25. G G’ Absolute Difference Plot • Motivation: Recall the criterion of Atteson’s Theorem. For all i,j: • Absolute difference plot: • Compute d=d(G,G’) • Plot |d-k| (y-axis) vs k (x-axis) NT Model, k rearrangement events

  26. 120 Genes, Inversion-only Model

  27. 120 Genes, Inv:Transp=1:1

  28. 120 Genes, Transp Only

  29. Using True Evolutionary Distance Helps 120 genes160 taxa Uniformly random treesTranspositions/invertedtranspositions only(180 runs per figure) 5%

  30. There are new distance-based phylogeny reconstruction methods (though designed for DNA sequences) Weighbor[Bruno et al. 2000]uses the variance of good t.e.d.s, and yield more accurate trees than NJ. Variance estimates for the t.e.d.s[Wang WABI’02] Weighbor(IEBP), Weighbor(EDE) Variance of True Evolutionary Distance Estimators K vs Exact-IEBP (120 genes)

  31. Using True Evolutionary Distance Helps 120 genes160 taxa Uniformly random treesTranspositions/invertedtranspositions only(180 runs per figure) 5%

  32. Robustness • EDE • Assumes inversion-only • When used with NJ or Weighbor, gives very accurate results even under transposition-only evolution • IEBP?

  33. IEBP is Robust to Model Violations NJ(Exact-IEBP) 120 genes, 160 taxa Uniformly Random Trees(alpha,beta)=(0,0) (inversion only)

  34. Comb UniformlyRandom Aldous’suggestion Yule CompletelyBalanced Beta Splitting Model [Aldous 1995] ¥ b -2 -1.5 -1 0 20

  35. Effect of Beta on Accuracy(120 Genes, 160 Taxa)(Beta= -1.5: Uniform, -1: Aldous, 0: Yule)

  36. Number of Genes

  37. Number of Taxa

  38. Observations • Number of taxa has little effect on the accuracy of trees (at least under our way of model tree generation) • Datasets with more genes give more accurate trees • Topology has effect on the accuracy of trees for NJ(BP) and NJ(INV); using corrected distances reduces this effect.

  39. Summary • True evolutionary distance estimators for gene order data • When used with Neighbor Joining and Weighbor, produce highly accurate trees • How model trees are generated has effect on the accuracy of tree reconstruction methods

  40. Acknowledgements • University of Texas Tandy Warnow Robert K. Jansen Randy Linder Stacia Wyman • University of New Mexico Bernard M.E. Moret David Bader Jijun Tang Mi Yan • Central Washington University Linda Raubeson • University of Canterbury Mike Steel • NSF • Packard Foundation

More Related