1 / 67

Intro. To Phylogenetic Analysis

Intro. To Phylogenetic Analysis. Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components:

ted
Download Presentation

Intro. To Phylogenetic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intro. To Phylogenetic Analysis Slides modified by David Ardell From Caro-Beth Stewart, Paul Higgs, Joe Felsenstein and Mikael Thollesson

  2. What is phylogenetic analysis and why should we perform it? Phylogenetic analysis has two major components: 1.Phylogeny inference or “tree building” — evolutionary relationships between genes or species 2. Character and rate analysis — mapping information onto trees C-B Stewart, NHGRI lecture, 12/5/00

  3. Common Phylogenetic Tree Terminology Terminal Nodes Branches or Lineages A Represent the TAXA (genes, populations, species, etc.) used to infer the phylogeny B C D Ancestral Node or ROOT of the Tree CLADE E Internal Nodes (represent hypothetical ancestors of the taxa) C-B Stewart, NHGRI lecture, 12/5/00

  4. X and Y are defined to be more closely related to each other than to Z if, and only if, they share a more recent common ancestor than they do with Z A C B D B A C D D C A B

  5. A A C D D B C B B A B C C D D A C B D D A C B A All of these rearrangements show the same evolutionary relationships between the taxa Rooted tree 1a D C A B C-B Stewart, NHGRI lecture, 12/5/00

  6. C-B Stewart, NHGRI lecture, 12/5/00

  7. Three types of trees Cladogram groupings Taxon B Taxon C Taxon A Taxon D no meaning All show the same branching orders between taxa. C-B Stewart, NHGRI lecture, 12/5/00

  8. Three types of trees Cladogram Phylogram groupings groupings + distance 6 Taxon B Taxon B 1 1 Taxon C Taxon C 3 1 Taxon A Taxon A Taxon D 5 Taxon D no meaning evolutionary distance All show the same branching orders between taxa. C-B Stewart, NHGRI lecture, 12/5/00

  9. time Three types of trees Cladogram Phylogram Ultrametric tree groupings groupings + distance groupings + time 6 Taxon B Taxon B Taxon B 1 1 Taxon C Taxon C Taxon C 3 1 Taxon A Taxon A Taxon A Taxon D Taxon D 5 Taxon D no meaning Evolutionary distance All show the same branching orders between taxa. C-B Stewart, NHGRI lecture, 12/5/00

  10. 6 Taxon B 1 1 Taxon C (think lamprey) 3 1 Taxon A 5 Taxon D C is closer to A but more closely related to B Similarity vs. Evolutionary Relationship: Since taxa evolve at different rates, your closest relative could be very different This is why the closest BLAST hit is not necessarily the closest relative, and why you need to make trees. C-B Stewart, NHGRI lecture, 12/5/00

  11. C C G G C G G C Types of Similarity Observed similarity between two entities can be due to: Evolutionary relationship: Shared ancestral characters (‘plesiomorphies’) Shared derived characters (‘’synapomorphy’) Homoplasy (independent evolution of the same character): Convergent events,Parallel events, Reversals G C C G T G C G

  12. A few examples of what can be inferred from phylogenetic trees built from DNAor protein sequence data: • Which species are the closest living relatives of modern humans? • Did the infamous Florida Dentist infect his patients with HIV? • What were the origins of specific transposable elements? C-B Stewart, NHGRI lecture, 12/5/00

  13. Which species are the closest living relatives of modern humans? Gorillas Chimpanzees Bonobos Orangutans Humans 0 15-30 MYA Classical view

  14. Which species are the closest living relatives of modern humans? Molecular view Humans Gorillas Chimpanzees Chimpanzees Bonobos Bonobos Orangutans Gorillas Orangutans Humans 14 0 0 15-30 MYA MYA Classical view

  15. No No Did the Florida Dentist infect his patients with HIV? DENTIST Phylogenetic tree of HIV sequences from the DENTIST, his Patients, & Local HIV-infected People: Patient C Patient A Patient G Yes: The HIV sequences from these patients fall within the clade of HIV sequences found in the dentist. Patient B Patient E Patient A DENTIST Local control 2 Local control 3 Patient F Local control 9 Local control 35 Local control 3 Patient D From Ou et al. (1992) and Page & Holmes (1998)

  16. Uses of character mapping: • Dating adaptive evolutionary events • Ancestral reconstruction • Testing biological hypotheses of correlated function or change C-B Stewart, NHGRI lecture, 12/5/00

  17. Ex: Where geographically was the common ancestor of African apes and humans? Scenario A: Africa as species fountain Scenario B: Eurasia as ancestral homeland Scenario B requires four fewer dispersal events Eurasia = Black Africa = Red = Dispersal Modified from: Stewart, C.-B. & Disotell, T.R. (1998) Current Biology 8: R582-588.

  18. COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Building Trees C-B Stewart, NHGRI lecture, 12/5/00

  19. COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Building Trees C-B Stewart, NHGRI lecture, 12/5/00

  20. COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Building Trees C-B Stewart, NHGRI lecture, 12/5/00

  21. Types of data: Character-data: TaxaCharacters Species A ATGGCTATTCTTATAGTACG Species B ATCGCTAGTCTTATATTACA Species C TTCACTAGACCTGTGGTCCA Species D TTGACCAGACCTGTGGTCCG Species E TTGACCAGTTCTCTAGTTCG Distance-based data:pairwise distances (dissimilarities) A B C D E Species A ---- 0.20 0.50 0.45 0.40 Species B 0.23 ---- 0.40 0.55 0.50 Species C 0.87 0.59 ---- 0.15 0.40 Species D 0.73 1.12 0.17 ---- 0.25 Species E 0.59 0.89 0.61 0.31 ---- Uncorrected “p” distance Example 2: Kimura 2-parameter distance

  22. C-B Stewart, NHGRI lecture, 12/5/00

  23. C-B Stewart, NHGRI lecture, 12/5/00

  24. COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Building Trees C-B Stewart, NHGRI lecture, 12/5/00

  25. Parsimony Given two trees, the one requiring the lowest number of character changes to explain the observations is the better Parsimony score for a tree is the minimum number of required changes This score is frequently referred to as number ofsteps or tree length

  26. Parsimony – an example aacgtatgga bacgggtgca gaacggtgga daactgtgca a: c g: a a: c g: a a: c g: a b: c d: a d: a b: c b: c d: a Total tree length: 7 Total tree length: 8 Total tree length: 8

  27. COMPUTATIONAL METHOD Optimality criterion Clustering algorithm PARSIMONY MAXIMUM LIKELIHOOD Characters DATA TYPE MINIMUM EVOLUTION LEAST SQUARES UPGMA NEIGHBOR-JOINING Distances Building Trees C-B Stewart, NHGRI lecture, 12/5/00

  28. Using models A C G T A C G T , if i≠j A C G A G C , if i=j C T Example: Jukes-Cantor Observed differences Actual changes

  29. C-B Stewart, NHGRI lecture, 12/5/00

  30. C-B Stewart, NHGRI lecture, 12/5/00

  31. C-B Stewart, NHGRI lecture, 12/5/00

  32. C-B Stewart, NHGRI lecture, 12/5/00

  33. Likelihood of a one-branch tree… 30 nucleotides from yh-globin genes of two primates on a one-edge tree * * Gorilla GAAGTCCTTGAGAAATAAACTGCACACTGG Orangutan GGACTCCTTGAGAAATAAACTGCACACTGG There are two differences and 28 similarities at at= 0.02327 lnL= -51.133956 lnL

  34. A recipe for phylogenetic inference Collect your data Select an optimality criterion (“which tree is better?”, tree score) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way

  35. Number of (rooted) trees 3 taxa -> 3 trees 4 taxa -> 15 trees 10 taxa -> 34 459 425 trees 25 taxa -> 1,19·1030 trees 52 taxa -> 2,75·1080 trees Finding the optimal tree is an NP-complete problem –Search strategies Exact Exhaustive Branch and bound Algorithmic Greedy algorithms, a.k.a. hill-climbing (including Neighbor-joining) Heuristic Systematic; branch-swapping (NNI, SPR, TBR) Stochastic Markov Chain Monte Carlo (MCMC) Genetic algorithms Finding the best tree

  36. A A A B C E C E C D B B E D D Polytomy or multifurcation A bifurcation “Star-Decomposition” Completely unresolved or "star" phylogeny Partially resolved phylogeny Fully resolved, bifurcating phylogeny C-B Stewart, NHGRI lecture, 12/5/00

  37. There are three possible unrooted trees on four taxa (A, B, C, D) Tree 1 Tree 2 Tree 3 A C A B A B D D C D B C C-B Stewart, NHGRI lecture, 12/5/00

  38. A B A C C D B C D A E B C A D E B F The number of unrooted trees increases in a greater than exponential manner with number of taxa (2N - 5)!! = # unrooted trees for N taxa C-B Stewart, NHGRI lecture, 12/5/00

  39. C-B Stewart, NHGRI lecture, 12/5/00

  40. Efficiency Power Consistency Robustness Falsifiability – Time to find a/the solution – Rate of convergence/how much data are needed – Convergence to “correct” solution as data are added – Performance when assumptions are violated – Rejection of the model when inadequate What is a “good” method?

  41. C-B Stewart, NHGRI lecture, 12/5/00

  42. C-B Stewart, NHGRI lecture, 12/5/00

  43. Performance on simulated data Frequency of correct inference Sequence length 0.30 and 0.05 respectively All 0.50

  44. + and – of the methods Pair-wise, NJ, distance approach + Fast (efficiency) + Models can be used to make distances (can be consistent) – pairwise distances throw out information (loss of power) – One will get a tree, but no score to compare with other trees or hypotheses Parsimony and tree-search + Philosophically appealing – Occam’s razor – Can be inconsistent – Can be computationally slow due to a huge number of possible trees Maximum likelihood and tree-search + Model-based, can be consistent, powerful, gain biological info – Model-based, bad when you have the wrong model – Computationally veeeeery slow due to heavy calculations in determining the tree score and a huge number of possible trees

  45. The quick and dirty, pretty good tree Calculate model-based pairwise distances. Make a Neighbor-Joining Tree Do a bootstrap

  46. A recipe for phylogenetic inference Collect your data Select an optimality criterion (“which tree is better”?) Optional: do data transformation (“corrections”) Select a search strategy to find the best tree Find the best hypothesis according to that criterion Assess the variation in your data in some way

  47. Assessing the variation Jackknife – resampling without replacement Bootstrap – resampling with replacement

  48. Assessing the variation Jackknife – resampling without replacement Bootstrap – resampling with replacement Resample columns from an alignment with replacement to make a simulated sample of the same size

More Related