Tree Searching Methods. Exhaustive search (exact) Branchandbound search (exact) Heuristic search methods (approximate) Stepwise addition Branch swapping Star decomposition. Exhaustive Search. 12. 12. 13. 13. 13. 12. 13. 13. 12. 13. 11. 13. 13. 13. 13. Searching for trees.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
12
12
13
13
13
12
13
13
12
13
11
13
13
13
13
1.Generate all 3 trees for first 4 taxa:
2. Generate all 15 trees for first 5 taxa:
(likewise for each of the other two 4taxon trees)
3. Full search tree:
The search tree is the same as for exhaustive search, with tree lengths for a hypothetical data set shown in boldface type. If a tree lying at a node of this search tree has a length that exceeds the current lower bound on the optimal tree length, this path of the search tree is terminated (indicated by a crossbar), and the algorithm backtracks and takes the next available path. When a tip of the search tree is reached (i.e., when we arrive at a tree containing the full set of taxa), the tree is either optimal (and hence retained) or suboptimal (and rejected). When all paths leading from the initial 3taxon tree have been explored, the algorithm terminates, and all mostparsimonious trees will have been identified. Asterisks indicate points at which the current lower bound is reduced. Circled numbers represent the order in which phylogenetic trees are visited in the search tree.
Branch and bound algorithm:
1
2
2
1
1
3
4
3
3
4
4
2
2
1
3
Stepwise addition
A greedy stepwiseaddition search applied to the example used for branchandbound. The best 4taxon tree is determined by evaluating the lengths of the three trees obtained by joining taxon D to tree 1 containing only the first three taxa. Taxa E and F are then connected to the five and seven possible locations, respectively, on trees 4 and 9, with only the shortest trees found during each step being used for the next step. In this example, the 233step tree obtained is not a global optimum. Circled numbers indicate the order in which phylogenetic trees are evaluated in the stepwiseaddition search.
B
C
E
A
D
E
D
A
B
C
B
C
D
A
E
B
A
E
B
D
A
C
F
C
G
D
D
C
E
E
A
F
F
B
G
G
a
E
D
B
A
F
G
B
A
A
C
C
E
D
B
C
D
D
F
G
E
E
E
D
C
F
A
F
F
G
B
G
G
B
A
C
Reconnection distances:
Reconnection limits in TBR
Reconnection distances:
In PAUP*, use “ReconLim” to set maximum reconnection distance
Stardecomposition search
Likelihood(hypothesis) µProb(datahypothesis)
Likelihood(tree,model) = k Prob(observed sequencestree,model)
[not Prob(treedata,model)]
C
C
A
G
(1)
(3)
(5)
(2)
(4)
(6)
1 jN(1) C…GGACA…C…GTTTA…C(2) C…AGACA…C…CTCTA…C(3) C…GGATA…A…GTTAA…C(4) C…GGATA…G…CCTAG…C
C
C
A
G
C
C
A
G
Prob
+ Prob
A
C
A
A
C
C
A
G
+ … +
Prob
T
T
Likelihood at site j =
But use Felsenstein (1981) pruning algorithm
Note: PAUP* reports ln L, so lower ln L implies higher likelihood
JukesCantor
Kimura 2parameter
HasegawaKishinoYano (HKY)
Felsenstein 1981, 1984
General timereversible
JukesCantor (1969)
Kimura (1980) “2parameter”
HasegawaKishinoYano (1985)
GeneralTime Reversible
C
C
A
A
A
A
A
A
A
A
C
A
C
C
A
A
A
A
A
A
A
A
C
A
A
C
B
D
(Felsenstein, 1978)
A
B
0.8
0.8
0.1
0.1
0.1
C
D
A
C
G
T
A

5
6
2
C
5

3
8
Substitution rates:
G
6
3

1
T
2
8
1

Base frequencies:
A=0.1
C=0.2
G=0.3
T=0.4
1
0.8
parsimony
0.6
Proportion correct
MLGTR
0.4
0.2
0
0
5000
10000
Sequence Length
The longbranch attraction (LBA) problem
Pattern type
14
AI = Uninformative (constant)A
The true phylogeny of
1, 2, 3 and 4
(zero changes required on any tree)
A A
23
The longbranch attraction (LBA) problem
Pattern type
14
AI = Uninformative (constant)A
AII = UninformativeG
The true phylogeny of
1, 2, 3 and 4
(one change required on any tree)
A A
23
The longbranch attraction (LBA) problem
Pattern type
14
AI = Uninformative (constant)A
AII = UninformativeG
CIII = UninformativeG
The true phylogeny of
1, 2, 3 and 4
(two changes required on any tree)
A A
23
The longbranch attraction (LBA) problem
Pattern type
14
AI = Uninformative (constant)A
AII = UninformativeG
CIII = UninformativeG
G IV = MisinformativeG
The true phylogeny of
1, 2, 3 and 4
(two changes required on true tree)
A A
23
The longbranch attraction (LBA) problem
… but this tree needs only one step
G
1
A
2
A
3
G
4
Consistency
If an estimator converges to the true value of a parameter as the amount of data increases toward infinity, the estimator is consistent.
1
3
2
4
A
C
(Siddall, 1998)
D
B
a
b
b
a
b
0.75
p
a
Both methods do poorly
Parsimony has higher
accuracy than likelihood
0
0.75
p
b
Both methods do well
1
B
B
B
B
B
B
B
B
B
B
B
0.9
J
15%
B
67.5%
0.8
J
0.7
67.5%
0.6
J
(expected differences/site)
J
Accuracy
0.5
J
J
J
J
0.4
J
J
J
J
0.3
0.2
B
Parsimony
J
ML/JC
0.1
0
20
100
1,000
10,000
100,000
Sequence length
C
A
C
A
True synapomorphy
A
C
C
C
A
A
A
C
A
Apparent synapomorphies
actually due to
misinterpreted homoplasy
C
C
C
A
A
A
A
G
G
C
C
1
J
J
J
J
J
0.9
J
0.8
J
0.7
67.5%
67.5%
J
0.6
15%
Accuracy
0.5
J
(expected differences/site)
J
0.4
J
0.3
J
0.2
B
Parsimony
B
J
ML/JC
0.1
B
B
0
B
B
B
B
B
B
B
B
B
20
100
1,000
10,000
100,000
Sequence length
A
A
A
C
C
C
D
D
D
B
B
B
A
A
C
C
D
D
B
B
External branches = 0.5 or 0.05 substitutions/site, JukesCantor model of nucleotide substitution
G
H
H
H
H
H
G
1.0
H
G
G
G
G
J
J
J
J
J
J
0.8
0.6
Accuracy
0.4
100 sites
J
1,000 sites
G
J
J
10,000 sites
H
0.2
J
J
J
J
G
G
H
H
G
H
G
H
G
G
H
0
H
0.05
0.04
0.03
0.02
0.01
0
0.01
0.02
0.03
0.04
0.05
Farris zon
e
Length of internal branch (
d
)
Felsenstein zone
H
H
1.0
H
H
H
H
G
G
H
H
G
H
G
G
G
0.8
H
J
G
G
J
J
G
0.6
J
J
J
Accuracy
J
G
J
J
0.4
J
H
G
J
G
J
H
100 sites
J
0.2
1,000 sites
G
10,000 sites
H
ML/JC
0
0.05
0.04
0.03
0.02
0.01
0
0.01
0.02
0.03
0.04
0.05
Farris zon
e
Length of internal branch (
d
)
Felsenstein zone
Simulation
results:
Parsimony
Likelihood
A
B
0.8
0.8
0.1
0.1
0.1
C
D
A
C
G
T
A

5
6
2
C
5

3
8
Substitution rates:
G
6
3

1
T
2
8
1

Base frequencies:
A=0.1
C=0.2
G=0.3
T=0.4
equal rates?
Lemur AAGCTTCATAG TTGCATCATCCA …TTACATCATCCA
Homo AAGCTTCACCG TTGCATCATCCA …TTACATCCTCAT
Pan AAGCTTCACCG TTACGCCATCCA …TTACATCCTCAT
Goril AAGCTTCACCG TTACGCCATCCA …CCCACGGACTTA
Pongo AAGCTTCACCG TTACGCCATCCT …GCAACCACCCTC
Hylo AAGCTTTACAG TTACATTATCCG …TGCAACCGTCCT
Maca AAGCTTTTCCG TTACATTATCCG …CGCAACCATCCT
.
.
.
.
.
Modeling amongsite rate variation with a gamma distribution...
0.08
a=200
0.06
a=0.5
a=2
Frequency
0.04
a=50
0.02
0
0
1
2
Rate
…can also estimate a proportion of “invariable” sites (pinv)
Pr(D) = Pr(D) Pr() Pr(D)
Pr(D) Pr()
L() Pr()
( =tree topology, branchlengths, and substitutionmodel parameters)
r =Pr(*D) Pr( *)
Pr(D) Pr(* )
B
A
“burn in”
C
D
B
A
B
B
B
A
A
A
C
D
B
A
A
D
D
D
A
C
C
C
A
C
B
C
D
C
D
B
D
D
B
C
C
A
A
B
ABCD
ABCD
ABCD
BCAD
BCAD
BCAD
BCAD
ACBD
D
C
B
D
Initialize the chain, e.g., by picking a random state X0 (topology,branch lengths, substitutionmodel parameters) from the assumed prior distribution
2.For each time t, sample a new candidate state Y from some proposal distribution q(.Xt) (e.g., change branch lengths or topology plus branch lengths)
Calculate acceptance probability
A brief intro to Markov chain Monte Carlo (MCMC)
...
Likelihood
Iterations
If Y is accepted, let Xt+1 = Y; otherwise let Xt+1 = Xt
If the chain is run “long enough”, the stationary distribution of states in the chain will represent a good approximation to the target distribution (in this case, the Bayesian posterior)
1
3
1
2
a
d
c
3
b
e
4
2
4
1
2
3
4
p12 = a+b
p13 = a+c+d
p14 = a+c+e
p23= b+c+d
p24= b+c+e
p34= d+e
pij = dij for all i and j if the tree
topology is correct and distances
are additive
Minumum
evolution
(ME)
LeastSquares
pij
dij
SS
LS branch lengths