Bcb 444 544
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

BCB 444/544 PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on
  • Presentation posted in: General

BCB 444/544. Lecture 30 Phylogenetics – Distance-Based Methods #30_Nov02. Required Reading ( before lecture). Wed Oct 30 - Lecture 29 Phylogenetics Basics Chp 10 - pp 127 - 141 Thurs Oct 31 - Lab 9 Gene & Regulatory Element Prediction Fri Oct 30 - Lecture 30

Download Presentation

BCB 444/544

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bcb 444 544

BCB 444/544

Lecture 30

Phylogenetics – Distance-Based Methods

#30_Nov02

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


Required reading before lecture

Required Reading (before lecture)

Wed Oct 30 - Lecture 29

Phylogenetics Basics

  • Chp 10 - pp 127 - 141

    Thurs Oct 31 - Lab 9

    Gene & Regulatory Element Prediction

    Fri Oct 30 - Lecture 30

    Phylogenetic – Distance-Based Methods

  • Chp 11 - pp 142 – 169

    Mon Nov 5 - Lecture 31

    Phylogenetics – Parsimony and ML

  • Chp 11 - pp 142 - 169

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


Assignments announcements

Assignments & Announcements

Mon Oct 29 - HW#5

HW#5 = Hands-on exercises with phylogenetics

and tree-building software

Due: Mon Nov 5 (not Fri Nov 1 as previously posted)

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


Bcb 544 team projects

BCB 544 "Team" Projects

Last week of classes will be devoted to Projects

  • Written reports due:

    • Mon Dec 3(no class that day)

  • Oral presentations (20-30') will be:

    • Wed-Fri Dec 5,6,7

    • 1 or 2 teams will present during each class period

  • See Guidelines for Projects posted online

  • BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Bcb 544 only new homework assignment

    BCB 544 Only: New Homework Assignment

    544 Extra#2

    Due: √PART 1 - ASAP

    PART 2 - meeting prior to 5 PM Fri Nov 2

    Part 1 - Brief outline of Project, email to Drena & Michael

    after response/approval, then:

    Part 2 - More detailed outline of project

    Read a few papers and summarize status of problem

    Schedule meeting with Drena & Michael to discuss ideas

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Seminars this week

    Seminars this Week

    BCB List of URLs for Seminars related to Bioinformatics:

    http://www.bcb.iastate.edu/seminars/index.html

    • Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI

      • Bob Jernigan BBMB, ISU

        • Control of Protein Motions by Structure

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Chp 10 phylogenetics

    Chp 10 - Phylogenetics

    SECTION IV MOLECULAR PHYLOGENETICS

    Xiong: Chp 10 Phylogenetics Basics

    • Evolution and Phylogenetics

    • Terminology

    • Gene Phylogeny vs. Species Phylogeny

    • Forms of Tree Representation

    • Why Finding a True Tree is Dificult

    • Procedure of Building a Phylogenetic Tree

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Tree building procedure

    Tree Building Procedure

    • Choose molecular markers

    • Perform MSA

    • Choose a model of evolution

    • Determine tree building method

    • Assess tree reliability

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Choice of molecular markers

    Choice of Molecular Markers

    • Very closely related organisms - nucleic acid sequence will show more differences

    • For individuals within a species - faster mutation rate is in noncoding regions of mtDNA

    • More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences

    • Very distantly related species - use highly conserved protein sequences

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Multiple sequence alignment

    Multiple Sequence Alignment

    • Most critical step in tree building - cannot build correct tree without correct alignment

    • Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one

    • Most alignments need manual editing

      • Make sure important functional residues align

      • Align secondary structure elements

      • Use full alignment or just parts

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Automatic editing of alignments

    Automatic Editing of Alignments

    • Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences

    • Gblocks – detect and eliminate poorly aligned positions and divergent regions

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    How do we measure divergence between sequences

    How do we measure divergence between sequences?

    • Simple measure – just count the number of substitutions observed between the sequences in the MSA

    • Problem – number of substitutions may not represent the number of evolutionary events that actually occurred

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Multiple substitutions

    Multiple Substitutions

    C

    A

    A

    T

    G

    Just because we only see one difference, does not mean that there was only one evolutionary event

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Multiple substitutions1

    Multiple Substitutions

    A

    A

    A

    T

    G

    Just because we only see no difference, does not mean that there were no evolutionary events

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Choosing substitution models

    Choosing Substitution Models

    • Statistical models of evolution are used to correct for the multiple substitution problem

    • Focus on DNA models

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Jukes cantor model

    Jukes-Cantor Model

    • Jukes-Cantor model assumes all nucleotides are substituted with equal probability

    • Can be used to correct for multiple substitutions

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Many other models

    Many Other Models

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Evolutionary models for protein sequences

    Evolutionary Models for Protein Sequences

    • PAM and JTT substitution matrices already take into account multiple substitutions

    • There are also models similar to Jukes-Cantor for protein sequences

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    What about differences in mutation rates between positions within a sequence

    What about differences in mutation rates between positions within a sequence?

    • One of our assumptions was that all positions in a sequence are evolving at the same rate

    • Bad assumption

      • Third position in a codon changes with higher frequency

      • In proteins, some amino acids can change and others cannot

    • This variation is called among-site rate heterogeneity

    • Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Chp 11 phylogenetic tree construction methods and programs

    Chp 11 – Phylogenetic Tree Construction Methods and Programs

    SECTION IV MOLECULAR PHYLOGENETICS

    Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs

    • Distance-Based Methods

    • Character-Based Methods

    • Phylogenetic Tree Evaluation

    • Phylogenetic Programs

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Tree construction

    Tree Construction

    • Two main categories of tree building methods

    • Distance-based

      • Overall similarity between sequences

    • Character-based

      • Consider the entire MSA

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Distance based methods

    Distance-Based Methods

    • Given a MSA and an evolutionary model, calculate the distance between all pairs of sequences

    • Construct distance matrix

    • Construct phylogenetic tree based on the distance matrix

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Distance matrices

    Distance Matrices

    a0

    b60

    c730

    d141090

    abcd

    0

    1

    2

    3

    4

    5

    6

    7

    8

    a

    b

    c

    d

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Distance based methods1

    Distance-Based Methods

    • Two ways to construct a tree based on a distance matrix

      • Clustering

      • Optimality

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Clustering based methods

    Clustering-Based Methods

    • E.g., UPGMA and Neighbor-Joining

    • A cluster is a set of taxa

    • Interspecies distances translate into intercluster distances

    • Clusters are repeatedly merged

    • “Closest” clusters merged first

    • Distances are recomputed after merging

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Upgma

    UPGMA

    • UPGMA – Unweighted Pair Group Method Using Arithmetic Average

    • Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree)

    • This assumption is usually wrong

    • So why use UPGMA?

      • Very fast

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Upgma example

    UPGMA Example

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Upgma example1

    UPGMA Example

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Upgma example2

    UPGMA Example

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Upgma example3

    UPGMA Example

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining

    Neighbor Joining

    • Idea: Find a pair of taxa that are close to each other but far from other taxa

      • Implicitly finds a pair of neighboring taxa

    • No molecular clock assumption

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining1

    Neighbor Joining

    • NJ corrects for unequal evolutionary rates between sequences by using a conversion step

    • The conversion step requires calculation of “r-values” and “transformed r-values”

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining2

    Neighbor Joining

    The r-value for a sequence is:

    The sum of the distances between sequence i and all other sequences

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining3

    Neighbor Joining

    The transformed r-value for a sequence is:

    Where n is the number of taxa

    Transformed r-values are used to determine the distance of a taxon to the nearest node

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining4

    Neighbor Joining

    The converted distance between two sequences is:

    These converted distances are used in building the tree

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining5

    Neighbor Joining

    The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is:

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example

    Neighbor Joining Example

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example1

    Neighbor Joining Example

    • Initialize tree into a star shape with all taxa connected to the center

    • Step 1: Compute r-values and transformed r-values for all taxa

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example2

    Neighbor Joining Example

    • Step 2: Compute converted distances

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example3

    Neighbor Joining Example

    • Step 3: Fill out converted distance matrix

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example4

    Neighbor Joining Example

    • Step 4: Create a node by merging closest taxa

    • In this example, the distance between A and B is the same as the distance between C and D

    • We can pick either pair to start with

    • Let’s pick A and B and create a node called U

    B

    ?

    A

    A

    U

    B

    ?

    D

    C

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example5

    Neighbor Joining Example

    • Step 5: Compute branch lengths

    • Use the equation for computing the distance from a taxa to a node

    0.15

    A

    U

    B

    0.25

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example6

    Neighbor Joining Example

    • Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U

    • In UPGMA, we simply calculated the average

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example7

    Neighbor Joining Example

    Our reduced distance matrix:

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Neighbor joining example8

    Neighbor Joining Example

    • From here, we go back to step 1

    • Continue until all taxa have been decomposed from the star tree

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Optimality based methods

    Optimality-Based Methods

    • Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies

    • Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix

    • Two algorithms:

      • Fitch-Margoliash

      • Minimum evolution

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Fitch margoliash

    Fitch-Margoliash

    • Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix

    • Basically, a least squares method

    • Dij = distance between i and j in matrix

    • dij = distance between i and j in tree

    • Objective: Find tree that minimizes

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Minimum evolution

    Minimum Evolution

    • Similar to Fitch-Margoliash, but uses a different optimality criterion

    • Searches for a tree with the minimum total branch length

    • This is an indirect way of achieving the best fit of the branch lengths with the original data

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


    Summary of distance based methods

    Summary of Distance-Based Methods

    • Clustering-based methods:

      • Computationally very fast and can handle large datasets that other methods cannot

      • Not guaranteed to find the best tree

    • Optimality-based methods:

      • Better overall accuracies

      • Computationally slow

    • All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node

    BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods


  • Login