Create Presentation
Download Presentation

Download Presentation

The Genome Access Course Phylogenetic Analysis

Download Presentation
## The Genome Access Course Phylogenetic Analysis

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Phylogenetics**• Developed by Willi Henning (Grundzüge einer Theorie der Phylogenetischen Systematik, 1950; Phylogenetic Systematics, 1966)**What is the ancestral sequence?**• pfeffer • pepper • (pf/p)e(ff/pp)er**Evolutionary Trees**• A tree is a connected, acyclic 2D graph • Leaf: Taxon • Node: Vertex • Branch: Edge • Tree length = sum of all branch lengths • Phylogenetic trees are binary trees**Evolutionary Trees**• Rooted • common ancestor • unique path to any leaf • directed • Unrooted • root could be placed anywhere • fewer possible than rooted**Rooted Tree**generated by DRAWGRAM (PHYLIP)**Unrooted Tree**generated by DRAWTREE (PHYLIP)**Genes vs. Species**• Sequences show gene relationships, but phylogenetic histories may be different for gene and species • Genes evolve at different speeds • Horizontal gene transfer**Methods for Phylogenetic Analysis**• Character-State • Maximum Parsimony • Maximum Likelihood • Genetic Distance • Fitch & Margoliash • Neighbor-Joining • Unweighted Pair Group**Phylogenetic Software**• PHYLIP • PAUP (Available in GCG) • TREE-PUZZLE • PhyloBLAST • Felsenstein maintains an extensive list of programs on the PHYLIP site**PHYLIP Programs**• dnapars/protpars • dnadist/protdist • dnaml (use fastDNAml instead) • neighbor • fitch/kitsch • drawtree/drawgram**Maximum Parsimony**• Most common method • Allows use of all evolutionary information • Build and score all possible trees • Each node is a transformation in a character state • Minimize treelength • Best tree requires the fewest changes to derive all sequences**3 Nodes**3 Nodes Which is the more parsimonious tree? 9 Node Crossings 8 Node Crossings**Maximum Likelihood**• Reconstruction using an explicit evolutionary model • Tree is calculated separately for each nucleotide site. The product of the likelihoods for each site provides the overall likelihood of the observed data. • Demanding computationally • Slowest method • Use to test (or improve) an existing tree**Clustering Algorithms**• Use distances to calculate phylogenetic trees • Trees are based on the relative numbers of similarities and differences between sequences • A distance matrix is constructed by computing pairwise distances for all sequences • Clustering links successively more distant taxa**DNA Distances**• Distances between pairs of DNA sequences are relatively simple to compute as the sum of all base pair differences between the two sequences • Can only work for pairs of sequences that are similar enough to be aligned • All base changes are considered equal • Insertion/deletions are generally given a larger weight than replacements (gap penalties). • Possible to correct for multiple substitutions at a single site, which is common in distant relationships and for rapidly evolving sites.**Amino Acid Distances**• More difficult to compute • Substitutions have differing effects on structure • Some substitutions require more than one DNA mutation • Use replacement frequencies (PAM, BLOSUM)**Fitch & Margoliash**• 3 sequences are combined at a time to define branches and calculate their length • Additive branch lengths • Accurate for short branches**Neighbor Joining**• Most common method of tree construction • Distance matrix adjusted for each taxon depending on its rate of evolution • Good for simulation studies • Most efficient computationally**UPGMA – Unweighted Pair Group Methods Using Arithmetic**Averages • Simplest method • Calculates branch lengths between most closely related sequences • Averages distance to next sequence or cluster • Predicts a position for the root**Phylogenetic Complications**• Errors • Loss of function • Convergent evolution • Lateral gene transfer**Validation**• Use several different algorithms and data sets • NJ methods generate one tree, possibly supporting a tree built by parsimony or maximum likelihood • Bootstrapping • Perturb data and note effect on tree • Repeat many times • Unchanged ~90%, tree’s correctness is supported**Are there bugs in our genome?**N-acetylneuraminate lyase