- 71 Views
- Uploaded on
- Presentation posted in: General

BCB 444/544

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Lecture 30

Phylogenetics – Distance-Based Methods

#30_Nov02

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

Wed Oct 30 - Lecture 29

Phylogenetics Basics

- Chp 10 - pp 127 - 141
Thurs Oct 31 - Lab 9

Gene & Regulatory Element Prediction

Fri Oct 30 - Lecture 30

Phylogenetic – Distance-Based Methods

- Chp 11 - pp 142 – 169
Mon Nov 5 - Lecture 31

Phylogenetics – Parsimony and ML

- Chp 11 - pp 142 - 169

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

Mon Oct 29 - HW#5

HW#5 = Hands-on exercises with phylogenetics

and tree-building software

Due: Mon Nov 5 (not Fri Nov 1 as previously posted)

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

Last week of classes will be devoted to Projects

- Written reports due:
- Mon Dec 3(no class that day)

- Wed-Fri Dec 5,6,7
- 1 or 2 teams will present during each class period

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

544 Extra#2

Due: √PART 1 - ASAP

PART 2 - meeting prior to 5 PM Fri Nov 2

Part 1 - Brief outline of Project, email to Drena & Michael

after response/approval, then:

Part 2 - More detailed outline of project

Read a few papers and summarize status of problem

Schedule meeting with Drena & Michael to discuss ideas

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB List of URLs for Seminars related to Bioinformatics:

http://www.bcb.iastate.edu/seminars/index.html

- Nov 2 Fri - BCB Faculty Seminar 2:10 in 102 ScI
- Bob Jernigan BBMB, ISU
- Control of Protein Motions by Structure

- Bob Jernigan BBMB, ISU

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 10 Phylogenetics Basics

- Evolution and Phylogenetics
- Terminology
- Gene Phylogeny vs. Species Phylogeny
- Forms of Tree Representation
- Why Finding a True Tree is Dificult
- Procedure of Building a Phylogenetic Tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Choose molecular markers
- Perform MSA
- Choose a model of evolution
- Determine tree building method
- Assess tree reliability

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Very closely related organisms - nucleic acid sequence will show more differences
- For individuals within a species - faster mutation rate is in noncoding regions of mtDNA
- More distantly related species - slowly evolving nucleic acid sequences like ribosomal RNA or protein sequences
- Very distantly related species - use highly conserved protein sequences

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Most critical step in tree building - cannot build correct tree without correct alignment
- Should build alignments with multiple programs, then inspect and compare to identify the most reasonable one
- Most alignments need manual editing
- Make sure important functional residues align
- Align secondary structure elements
- Use full alignment or just parts

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Rascal and NorMD – correct alignment errors, remove potentially unrelated or highly divergent sequences
- Gblocks – detect and eliminate poorly aligned positions and divergent regions

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Simple measure – just count the number of substitutions observed between the sequences in the MSA
- Problem – number of substitutions may not represent the number of evolutionary events that actually occurred

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

C

A

A

T

G

Just because we only see one difference, does not mean that there was only one evolutionary event

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

A

A

A

T

G

Just because we only see no difference, does not mean that there were no evolutionary events

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Statistical models of evolution are used to correct for the multiple substitution problem
- Focus on DNA models

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Jukes-Cantor model assumes all nucleotides are substituted with equal probability
- Can be used to correct for multiple substitutions

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- PAM and JTT substitution matrices already take into account multiple substitutions
- There are also models similar to Jukes-Cantor for protein sequences

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- One of our assumptions was that all positions in a sequence are evolving at the same rate
- Bad assumption
- Third position in a codon changes with higher frequency
- In proteins, some amino acids can change and others cannot

- This variation is called among-site rate heterogeneity
- Many tree building programs have parameters meant to deal with this problem – adds to complexity of getting the correct tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

SECTION IV MOLECULAR PHYLOGENETICS

Xiong: Chp 11 Phylogenetic Tree Construction Methods and Programs

- Distance-Based Methods
- Character-Based Methods
- Phylogenetic Tree Evaluation
- Phylogenetic Programs

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Two main categories of tree building methods
- Distance-based
- Overall similarity between sequences

- Character-based
- Consider the entire MSA

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Given a MSA and an evolutionary model, calculate the distance between all pairs of sequences
- Construct distance matrix
- Construct phylogenetic tree based on the distance matrix

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

a0

b60

c730

d141090

abcd

0

1

2

3

4

5

6

7

8

a

b

c

d

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Two ways to construct a tree based on a distance matrix
- Clustering
- Optimality

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- E.g., UPGMA and Neighbor-Joining
- A cluster is a set of taxa
- Interspecies distances translate into intercluster distances
- Clusters are repeatedly merged
- “Closest” clusters merged first
- Distances are recomputed after merging

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- UPGMA – Unweighted Pair Group Method Using Arithmetic Average
- Uses molecular clock assumption – all taxa evolve at a constant rate and are equally distant from the root (ultrametric tree)
- This assumption is usually wrong
- So why use UPGMA?
- Very fast

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Idea: Find a pair of taxa that are close to each other but far from other taxa
- Implicitly finds a pair of neighboring taxa

- No molecular clock assumption

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- NJ corrects for unequal evolutionary rates between sequences by using a conversion step
- The conversion step requires calculation of “r-values” and “transformed r-values”

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

The r-value for a sequence is:

The sum of the distances between sequence i and all other sequences

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

The transformed r-value for a sequence is:

Where n is the number of taxa

Transformed r-values are used to determine the distance of a taxon to the nearest node

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

The converted distance between two sequences is:

These converted distances are used in building the tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

The final equation we need is for computing the distance from a new cluster to each taxa. Assume taxa i and j were merged into a cluster u. The distance from taxa i to cluster u is:

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Initialize tree into a star shape with all taxa connected to the center
- Step 1: Compute r-values and transformed r-values for all taxa

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Step 2: Compute converted distances

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Step 3: Fill out converted distance matrix

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Step 4: Create a node by merging closest taxa
- In this example, the distance between A and B is the same as the distance between C and D
- We can pick either pair to start with
- Let’s pick A and B and create a node called U

B

?

A

A

U

B

?

D

C

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Step 5: Compute branch lengths
- Use the equation for computing the distance from a taxa to a node

0.15

A

U

B

0.25

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Step 6: Construct reduced distance matrix by computing converted distances from each taxa to the new node U
- In UPGMA, we simply calculated the average

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

Our reduced distance matrix:

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- From here, we go back to step 1
- Continue until all taxa have been decomposed from the star tree

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Clustering methods produce a single tree with no ability to judge how good it is compared to alternative tree topologies
- Optimality-based methods compare all possible tree topologies and select a tree that best fits the distance matrix
- Two algorithms:
- Fitch-Margoliash
- Minimum evolution

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Selects best tree among all possible trees based on minimum deviation between distances calculated in the tree and distances in the distance matrix
- Basically, a least squares method
- Dij = distance between i and j in matrix
- dij = distance between i and j in tree
- Objective: Find tree that minimizes

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Similar to Fitch-Margoliash, but uses a different optimality criterion
- Searches for a tree with the minimum total branch length
- This is an indirect way of achieving the best fit of the branch lengths with the original data

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods

- Clustering-based methods:
- Computationally very fast and can handle large datasets that other methods cannot
- Not guaranteed to find the best tree

- Optimality-based methods:
- Better overall accuracies
- Computationally slow

- All distance-based methods lose all sequence information and cannot infer the most likely state at an internal node

BCB 444/544 F07 ISU Terribilini #30- Phylogenetics - Distance-Based Methods