http://creativecommons.org/licenses/by-sa/2.0/. CIS 786, Lecture 2. Usman Roshan. Phylogenetics. Study of how species relate to each other “Nothing in biology makes sense, except in the light of evolution”, Theodosius Dobzhansky, Am. Biol. Teacher (1973) Rich in computational problems
CIS 786, Lecture 2
Usman Roshan
U
V
W
X
Y
AGGGCAT
TAGCCCA
TAGACTT
TGCACAA
TGCGCTT
X
U
Y
V
W
UPGMA is not additive but works for
ultrametric trees. Takes O(n^2) time
B
A
C
D
A
6
26
26
10
10
26
26
B
6
C
3
3
3
3
D
A
C
D
B
B
A
C
D
13
13
A
6
26
26
26
26
B
3
6
3
C
3
3
D
A
C
D
B
Doesn’t work (in general) for non-ultrametric
trees
B
A
C
D
3
3
A
13
16
26
3
3
12
19
B
10
10
B
C
13
C
D
D
A
UPGMA constructs incorrect tree here
7.25
B
A
C
D
7.25
A
13
16
26
7.25
7.25
12
19
B
6
6
13
C
B
A
D
C
D
Bipartition (BC,AD) is not in true tree
7.25
3
3
3
3
7.25
7.25
7.25
10
10
B
C
6
6
D
A
B
A
D
C
True tree
UPGMA tree
NJ constructs the correct tree for additive
matrices
B
A
C
D
3
3
A
13
16
26
3
3
12
19
B
10
10
B
C
13
C
D
D
A
Can be studied experimentally or theoretically
Theoretical results offer loose bounds
Experiments (under simulation) provide more realistic bounds on sequence lengths
Sequence lengths required to obtain 90% accuracy
such that is minimized.
ACT
ACT
ACA
GTA
GTT
GTT
ACA
GTA
GTA
ACA
ACT
GTT
ACT
ACT
ACA
GTA
GTT
GTA
ACA
ACT
2
1
1
3
3
2
GTT
GTT
ACA
GTA
MP score = 7
MP score = 5
GTA
ACA
ACA
GTA
2
1
1
ACT
GTT
MP score = 4
Optimal MP tree
Optimal labeling can be
computed in linear time O(nk)
GTA
ACA
ACA
GTA
2
1
1
ACT
GTT
MP score = 4
Finding the optimal MP tree is NP-hard
Local optimum
Cost
Global optimum
Phylogenetic trees
Greedy-MP takes O(n^2k^2) time
For each edge we get two different topologies
Neighborhood size is 2n-6
Neighborhood size is quadratic in number of taxa
Computing the minimum number of SPR moves between two rooted phylogenies is NP-hard
Local search
Local optimum
Local search
Local optimum
Perturbation
Output of perturbation
Local search
Local optimum
Perturbation
Local search
Output of perturbation