The Genome Access Course Phylogenetic Analysis - PowerPoint PPT Presentation

the genome access course phylogenetic analysis n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Genome Access Course Phylogenetic Analysis PowerPoint Presentation
Download Presentation
The Genome Access Course Phylogenetic Analysis

play fullscreen
1 / 26
The Genome Access Course Phylogenetic Analysis
177 Views
Download Presentation
nerita
Download Presentation

The Genome Access Course Phylogenetic Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. TheGenomeAccessCoursePhylogenetic Analysis

  2. Phylogenetics • Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)

  3. What is the ancestral sequence? • pfeffer • pepper • (pf/p)e(ff/pp)er

  4. Evolutionary Trees • A tree is a connected, acyclic 2D graph • Leaf: Taxon • Node: Vertex • Branch: Edge • Tree length = sum of all branch lengths • Phylogenetic trees are binary trees

  5. A Generic Tree

  6. Evolutionary Trees • Rooted • common ancestor • unique path to any leaf • directed • Unrooted • root could be placed anywhere • fewer possible than rooted

  7. Rooted Tree generated by DRAWGRAM (PHYLIP)

  8. Unrooted Tree generated by DRAWTREE (PHYLIP)

  9. Possible Evolutionary Trees

  10. Genes vs. Species • Sequences show gene relationships, but phylogenetic histories may be different for gene and species • Genes evolve at different speeds • Horizontal gene transfer

  11. Methods for Phylogenetic Analysis • Character-State • Maximum Parsimony • Maximum Likelihood • Genetic Distance • Fitch & Margoliash • Neighbor-Joining • Unweighted Pair Group

  12. Phylogenetic Software • PHYLIP • PAUP (Available in GCG) • TREE-PUZZLE • PhyloBLAST • Felsenstein maintains an extensive list of programs on the PHYLIP site

  13. PHYLIP Programs • dnapars/protpars • dnadist/protdist • dnaml (use fastDNAml instead) • neighbor • fitch/kitsch • drawtree/drawgram

  14. Maximum Parsimony • Most common method • Allows use of all evolutionary information • Build and score all possible trees • Each node is a transformation in a character state • Minimize treelength • Best tree requires the fewest changes to derive all sequences

  15. 3 Nodes 3 Nodes Which is the more parsimonious tree? 9 Node Crossings 8 Node Crossings

  16. Maximum Likelihood • Reconstruction using an explicit evolutionary model • Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data. • Demanding computationally • Slowest method • Use to test (or improve) an existing tree

  17. Clustering Algorithms • Use distances to calculate phylogenetic trees • Trees are based on the relative numbers of similarities and differences between sequences • A distance matrix is constructed by computing pairwise distances for all sequences • Clustering links successively more distant taxa

  18. DNA Distances • Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences • Can only work for pairs of sequences that are similar enough to be aligned • All base changes are considered equal • Insertion/deletions are generally given a larger weight than replacements (gap penalties). • Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.

  19. Amino Acid Distances • More difficult to compute • Substitutions have differing effects on structure • Some substitutions require more than one DNA mutation • Use replacement frequencies (PAM, BLOSUM)

  20. Fitch & Margoliash • 3 sequences are combined at a time to define branches and calculate their length • Additive branch lengths • Accurate for short branches

  21. Neighbor Joining • Most common method of tree construction • Distance matrix adjusted for each taxon depending on its rate of evolution • Good for simulation studies • Most efficient computationally

  22. UPGMA – Unweighted Pair Group Methods Using Arithmetic Averages • Simplest method • Calculates branch lengths between most closely related sequences • Averages distance to next sequence or cluster • Predicts a position for the root

  23. Phylogenetic Complications • Errors • Loss of function • Convergent evolution • Lateral gene transfer

  24. Validation • Use several different algorithms and data sets • NJ methods generate one tree, possibly supporting a tree built by parsimony or maximum likelihood • Bootstrapping • Perturb data and note effect on tree • Repeat many times • Unchanged ~90%, tree’s correctness is supported

  25. Are there bugs in our genome? N-acetylneuraminate lyase

  26. The End