Terminology of phylogenetic trees Types of phylogenetic trees Types of Data Character Evolution Approaches to Phylogeny Reconstruction. Phylogenetic tree (dendrogram). Nodes: branching points Branches: lines Topology: branching pattern.
Types of phylogenetic trees
Types of Data
Approaches to Phylogeny Reconstruction
Nodes: branching points
Topology: branching pattern
Sister Taxa: two taxa that are more closely related
to eachother than either is to a third taxon.
A + B
C + D
relationships among the OTU’s.
Soft polytomy: lack of resolution.
Unrooted: degree of kinship, no evolutionary path.
3 OTU’s: 1 unrooted tree
3 rooted trees
4 OTU’s: 3 unrooted trees
15 rooted trees.
TYPES OF TREES
- text based representation of relationships.
Quantitative: continuous data
(i.e.height or length)
Qualitative: discrete (2 or more values)
Binary: 2 values
Mulitstate: more than 2 values
Most molecular data are qualitative
Binary: presence or absence of band, or gap in sequence
Multistate: nucleotide data (A, T, G, C)
Characters: position in the nucleotide sequence.
(i.e. position 352)
Character states: nucleotide at the position
in the nucleotide sequence.
(G, A, T, or C)
Unordered: change from one character to
another occurs in one step.
(i.e. nucleotide changes)
Ordered: number of steps from one state
to another equals the absolute value of
the difference between their state number.
1 2 3 4 5 requires 4 steps
5 4 3 2 1 requires 4 steps
(reversible vs. unreversible)
(1) # of discrete steps required for one character state to
change into another
(2) probability with which such change occurs.
- number of
Cladistics (parsimony): recency of common ancestry
Maximum Likelihood: model of sequence evolution
Phenetics (UPGMA, neighbor joining): overall similarity
General scientific criterion for choosing among
competing hypotheses that states that we should accept
the hypothesis that explains the data most simply and
Maximum parsimony method of phylogeny reconstruction:
The optimum reconstruction of ancestral character states is
the one which requires the fewest mutations in the phylogenetic
tree to account for contemporary character states.
Identify all of the informative sites.
Invariant: all OTU’s possess the same character
state at the site.
Any invariant site is uninformative.
Informative: favors a subset of trees over other possible trees.
Uninformative: a character that contains no grouping
information relevant to a cladistic problem (i.e. autapomorphies).
of substitutions at each informative site
Informative: favors tree 1 over other 2 trees.
over all informative sites for each possible tree and choose the tree
associated with the smallest number of changes.
Exhaustive search method: searches all possible fully resolved topologies
and guarantees that all of the minimum length cladograms will be found.
(not a practical option, time consuming)
Branch and bound methods: begins with a cladogram. The length
of starting cladogram is retained as an upper bound for use
during subsequent cladogram construction. As soon as a length
of part of the tree exceeds the upperbound, the cladogram is
abandoned. If equal length, cladogram is saved as an optimal
topology. If length is less, it is substituted for the original as the optimal
upperbound. (good option for fewer than 20 taxa, time consuming)
Heuristic methods: approximate or “hill climbing technique”
Begin with a cladogram, add taxa and swap branches until
a shorter length cladogram is found. Procedure can be replicated many
times to increase chance of finding minimum length cladogram.
Unweighted parsimony: all character state changes are
given equal weight in the step matrix.
Weighted parsimony: different weights assigned to
different character state changes.
Transversion parsimony: transitions are completely
ignored in the analysis, only transversions are considered.
The likelihood (L) of a phylogenetic tree is the
probability of observing the data (nucleotide sequences)
under a given tree and a specified model of
character state changes.
The aim is to find the tree (among all possible trees)
with the highest L value.
Jukes and Cantor 1 parameter model: all changes equal probability
Kimura 2 parameter model: transitions more frequent than
Other more complicated models…...
for each site on a
2. Sum up the L
values for all sites on
3. Compare the L
value for all possible
4. Choose tree with
highest L value.
Distance Methods: evolutionary distances (number of substitutions)
are computed for all pairs of taxa.
UPGMA: unweighted pairgroup method with arithmetic means
- assumes equal rate of substitutions
- sequential clustering algorithms
- pairs of taxa are clustered in order of decreasing similarity
Neighbor Joining: finding shortest (minimum evolution) tree by finding
neighbors that minimize the total length of the tree. Shortest pairs are
chosen to be neighbors and then joined in distance matrix as one OTU.
Consensus trees are derived from a set of trees and
summarize the phylogenetic information of several
trees in a single tree.
Most commonly used consensus trees:
Strict consensus: all conflicting branching patterns are
50% majority rule consensus: branching patterns that
occur with a frequency of 50% or more are retained,
all others are collapsed.
Inferred tree is constructed from data set.
Characters are resampled from the data set with replacement.
Resampling is replicated several (100-1000) times.
Bootstrap trees are constructed from the resampled data sets.
Bootstrap tree is compared to original inferred tree.
% of bootstrap trees supporting a node are determined for
each node in the tree.
- resemblance not due to common ancestry
- evolved independently
- considered “noise”
ancestors at each node known.
Hillis & Huelsenbeck 1992
tested the ability of different methods,
of finding the “true” phylogeny.
Maximum parsimony and
maximum likelihood performed
well, UPGMA & neighbor
joining did not.
UPGMA & neighbor-joining: fast but not as accurate as
Maximum parsimony: time consuming, but more accurate.
can combine morphological characters with DNA characters
in a single analysis.
Maximum likelihood: very time consuming, including
information from morphology is a new technique (but it is
controversial), can invoke a specific model of sequence evolution.
Reference: Molecular Systematics 2nd Ed., Hillis et. al (1996),
Sinauer Associates. ISBN:0-87893-282-8