1 / 57

# Tree Searching Methods - PowerPoint PPT Presentation

Tree Searching Methods. Exhaustive search (exact) Branch-and-bound search (exact) Heuristic search methods (approximate) Stepwise addition Branch swapping Star decomposition. Exhaustive Search. 12. 12. 13. 13. 13. 12. 13. 13. 12. 13. 11. 13. 13. 13. 13. Searching for trees.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Tree Searching Methods' - melinda-oliver

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Exhaustive search (exact)

• Branch-and-bound search (exact)

• Heuristic search methods (approximate)

• Branch swapping

• Star decomposition

12

12

13

13

13

12

13

13

12

13

11

13

13

13

13

1.Generate all 3 trees for first 4 taxa:

• Generation of all possible trees

2. Generate all 15 trees for first 5 taxa:

(likewise for each of the other two 4-taxon trees)

3. Full search tree:

The search tree is the same as for exhaustive search, with tree lengths for a hypothetical data set shown in boldface type. If a tree lying at a node of this search tree has a length that exceeds the current lower bound on the optimal tree length, this path of the search tree is terminated (indicated by a cross-bar), and the algorithm backtracks and takes the next available path. When a tip of the search tree is reached (i.e., when we arrive at a tree containing the full set of taxa), the tree is either optimal (and hence retained) or suboptimal (and rejected). When all paths leading from the initial 3-taxon tree have been explored, the algorithm terminates, and all most-parsimonious trees will have been identified. Asterisks indicate points at which the current lower bound is reduced. Circled numbers represent the order in which phylogenetic trees are visited in the search tree.

Branch and bound algorithm:

2

2

1

1

3

4

3

3

4

4

2

Stepwise Addition (in a nutshell)

2

1

3

A greedy stepwise-addition search applied to the example used for branch-and-bound. The best 4-taxon tree is determined by evaluating the lengths of the three trees obtained by joining taxon D to tree 1 containing only the first three taxa. Taxa E and F are then connected to the five and seven possible locations, respectively, on trees 4 and 9, with only the shortest trees found during each step being used for the next step. In this example, the 233-step tree obtained is not a global optimum. Circled numbers indicate the order in which phylogenetic trees are evaluated in the stepwise-addition search.

• As Is

• add in order found in matrix

• Closest

• add unplaced taxa that requires smallest increase

• Furthest

• add unplaced taxa that requires largest increase

• Simple

• Farris’s (1970) “simple algorithm” uses a set of pairwise reference distances

• Random

• random permutation of taxa is used to select the order

C

E

A

D

E

D

A

B

C

Branch swappingNearest Neighbor Interchange (NNI)

B

C

D

A

E

A

E

B

D

A

C

F

C

G

D

D

C

E

E

A

F

F

B

G

G

Branch swappingSubtree Pruning and Regrafting (SPR)

a

D

B

A

F

G

B

A

A

C

C

E

D

B

C

D

D

F

G

E

E

E

D

C

F

A

F

F

G

B

G

G

B

A

C

Branch swappingTree Bisection and Reconnection (TBR)

Reconnection distances:

Reconnection distances:

In PAUP*, use “ReconLim” to set maximum reconnection distance

• Overall goal: Find a tree topology (and associated parameter estimates) that maximizes the probability of obtaining the observed data, given a model of evolution

Likelihood(hypothesis) µProb(data|hypothesis)

Likelihood(tree,model) = k Prob(observed sequences|tree,model)

[not Prob(tree|data,model)]

C

A

G

(1)

(3)

(5)

(2)

(4)

(6)

Computing the likelihood of a single tree

1 jN(1) C…GGACA…C…GTTTA…C(2) C…AGACA…C…CTCTA…C(3) C…GGATA…A…GTTAA…C(4) C…GGATA…G…CCTAG…C

C

A

G

C

C

A

G

Prob

+ Prob

A

C

A

A

C

C

A

G

+ … +

Prob

T

T

Computing the likelihood of a single tree

Likelihood at site j =

But use Felsenstein (1981) pruning algorithm

Note: PAUP* reports -ln L, so lower -ln L implies higher likelihood

Finding the maximum-likelihood tree(in principle)

• Evaluate the likelihood of each possible tree for a given collection of taxa.

• Choose the tree topology which maximizes the likelihood over all possible trees.

• An explicit model of substitution that specifies change probabilities for a given branch length“Instantaneous rate matrix”

Jukes-Cantor

Kimura 2-parameter

Hasegawa-Kishino-Yano (HKY)

Felsenstein 1981, 1984

General time-reversible

• An estimate of optimal branch lengths in units of expected amount of change ( = rate x time)

Kimura (1980) “2-parameter”

Hasegawa-Kishino-Yano (1985)

General-Time Reversible

For example:

C

A

A

A

A

A

A

A

A

C

A

C

C

A

A

A

A

A

A

A

A

C

A

The Relevance of Branch Lengths

C

B

D

When does maximum likelihood work better than parsimony?

• When you’re in the “Felsenstein Zone”

(Felsenstein, 1978)

B

0.8

0.8

0.1

0.1

0.1

C

D

A

C

G

T

A

-

5

6

2

C

5

-

3

8

Substitution rates:

G

6

3

-

1

T

2

8

1

-

Base frequencies:

A=0.1

C=0.2

G=0.3

T=0.4

In the Felsenstein Zone

1

0.8

parsimony

0.6

Proportion correct

ML-GTR

0.4

0.2

0

0

5000

10000

Sequence Length

Pattern type

14

A I = Uninformative (constant) A

The true phylogeny of

1, 2, 3 and 4

(zero changes required on any tree)

A A

23

Pattern type

14

A I = Uninformative (constant) A

A II = Uninformative G

The true phylogeny of

1, 2, 3 and 4

(one change required on any tree)

A A

23

Pattern type

14

A I = Uninformative (constant) A

A II = Uninformative G

C III = Uninformative G

The true phylogeny of

1, 2, 3 and 4

(two changes required on any tree)

A A

23

Pattern type

14

A I = Uninformative (constant) A

A II = Uninformative G

C III = Uninformative G

G IV = Misinformative G

The true phylogeny of

1, 2, 3 and 4

(two changes required on true tree)

A A

23

… but this tree needs only one step

G

1

A

2

A

3

G

4

Concerns about statistical properties and suitability of models (assumptions)

Consistency

If an estimator converges to the true value of a parameter as the amount of data increases toward infinity, the estimator is consistent.

1 models (assumptions)

3

2

4

When do both methods fail?

• When there is insufficient phylogenetic signal...

• When you’re in the Inverse-Felsenstein (“Farris”) zone

A

C

(Siddall, 1998)

D

B

Siddall (1998) parameter space likelihood?

a

b

b

a

b

0.75

p

a

Both methods do poorly

Parsimony has higher

accuracy than likelihood

0

0.75

p

b

Both methods do well

1

B

B

B

B

B

B

B

B

B

B

B

0.9

J

15%

B

67.5%

0.8

J

0.7

67.5%

0.6

J

(expected differences/site)

J

Accuracy

0.5

J

J

J

J

0.4

J

J

J

J

0.3

0.2

B

Parsimony

J

ML/JC

0.1

0

20

100

1,000

10,000

100,000

Sequence length

Why does parsimony do so well in the likelihood?Inverse-Felsenstein zone?

C

A

C

A

True synapomorphy

A

C

C

C

A

A

A

C

A

Apparent synapomorphies

actually due to

misinterpreted homoplasy

C

C

C

A

A

A

A

G

G

C

C

1

J

J

J

J

J

0.9

J

0.8

J

0.7

67.5%

67.5%

J

0.6

15%

Accuracy

0.5

J

(expected differences/site)

J

0.4

J

0.3

J

0.2

B

Parsimony

B

J

ML/JC

0.1

B

B

0

B

B

B

B

B

B

B

B

B

20

100

1,000

10,000

100,000

Sequence length

From the Farris Zone to the Felsenstein Zone likelihood?

A

A

A

C

C

C

D

D

D

B

B

B

A

A

C

C

D

D

B

B

External branches = 0.5 or 0.05 substitutions/site, Jukes-Cantor model of nucleotide substitution

G likelihood?

H

H

H

H

H

G

1.0

H

G

G

G

G

J

J

J

J

J

J

0.8

0.6

Accuracy

0.4

100 sites

J

1,000 sites

G

J

J

10,000 sites

H

0.2

J

J

J

J

G

G

H

H

G

H

G

H

G

G

H

0

H

0.05

0.04

0.03

0.02

0.01

0

0.01

0.02

0.03

0.04

0.05

Farris zon

e

Length of internal branch (

d

)

Felsenstein zone

H

H

1.0

H

H

H

H

G

G

H

H

G

H

G

G

G

0.8

H

J

G

G

J

J

G

0.6

J

J

J

Accuracy

J

G

J

J

0.4

J

H

G

J

G

J

H

100 sites

J

0.2

1,000 sites

G

10,000 sites

H

ML/JC

0

0.05

0.04

0.03

0.02

0.01

0

0.01

0.02

0.03

0.04

0.05

Farris zon

e

Length of internal branch (

d

)

Felsenstein zone

Simulation

results:

Parsimony

Likelihood

Maximum likelihood models are oversimplifications of reality. If I assume the wrong model, won’t my results be meaningless?

• Not necessarily (maximum likelihood is pretty robust)

A reality. If I assume the wrong model, won’t my results be meaningless?

B

0.8

0.8

0.1

0.1

0.1

C

D

A

C

G

T

A

-

5

6

2

C

5

-

3

8

Substitution rates:

G

6

3

-

1

T

2

8

1

-

Base frequencies:

A=0.1

C=0.2

G=0.3

T=0.4

Model used for simulation...

Performance of ML when its model is violated (one example) reality. If I assume the wrong model, won’t my results be meaningless?

Among site rate heterogeneity reality. If I assume the wrong model, won’t my results be meaningless?

equal rates?

Lemur AAGCTTCATAG TTGCATCATCCA …TTACATCATCCA

Homo AAGCTTCACCG TTGCATCATCCA …TTACATCCTCAT

Pan AAGCTTCACCG TTACGCCATCCA …TTACATCCTCAT

Goril AAGCTTCACCG TTACGCCATCCA …CCCACGGACTTA

Pongo AAGCTTCACCG TTACGCCATCCT …GCAACCACCCTC

Hylo AAGCTTTACAG TTACATTATCCG …TGCAACCGTCCT

Maca AAGCTTTTCCG TTACATTATCCG …CGCAACCATCCT

• Proportion of invariable sites

• Some sites don’t change do to strong functional or structural constraint (Hasegawa et al., 1985)

• Site-specific rates

• Different relative rates assumed for pre-assigned subsets of sites

• Gamma-distributed rates

• Rate variation assumed to follow a gamma distribution with shape parameter 

. reality. If I assume the wrong model, won’t my results be meaningless?

.

.

.

.

Performance of ML when its model is violated (another example)

Modeling among-site rate variation with a gamma distribution...

0.08

a=200

0.06

a=0.5

a=2

Frequency

0.04

a=50

0.02

0

0

1

2

Rate

…can also estimate a proportion of “invariable” sites (pinv)

• Uses Bayes formula:

Pr(|D) = Pr(D|) Pr() Pr(D)

 Pr(D|) Pr()

 L() Pr()

• Calculation involves integrating over all tree topologies and model-parameter values, subject to assumed prior distribution on parameters

( =tree topology, branch-lengths, and substitution-model parameters)

• To approximate this posterior density (complicated multidimensional integral) we use Markov chain Monte Carlo (MCMC)

• Simulated Markov chain in which transition probabilities are assigned such that the stationary distribution of the chain is the posterior density of interest

• E.g., Metropolis-Hastings algorithm: Accept a proposed move from one state  to another state * with probability min(r,1) where

r =Pr(*|D) Pr(| *)

Pr(|D) Pr(*| )

• Sample chain at regular intervals to approximate posterior distribution

• MrBayes (by John Huelsenbeck and Fredrik Ronquist) is most popular Bayesian inference program

B example)

A

“burn in”

C

D

B

A

B

B

B

A

A

A

C

D

B

A

A

D

D

D

A

C

C

C

A

C

B

C

D

C

D

B

D

D

B

C

C

A

A

B

AB|CD

AB|CD

AB|CD

AC|BD

D

C

B

D

Initialize the chain, e.g., by picking a random state X0 (topology,branch lengths, substitution-model parameters) from the assumed prior distribution

2. For each time t, sample a new candidate state Y from some proposal distribution q(.|Xt) (e.g., change branch lengths or topology plus branch lengths)

Calculate acceptance probability

A brief intro to Markov chain Monte Carlo (MCMC)

...

Likelihood

Iterations

If Y is accepted, let Xt+1 = Y; otherwise let Xt+1 = Xt

If the chain is run “long enough”, the stationary distribution of states in the chain will represent a good approximation to the target distribution (in this case, the Bayesian posterior)

Model-based distances example)

• Can also calculate pairwise distances based on these models

• These distances estimate the number of substitutions per site that have accumulated since the two sequences shared a common ancestor, allowing for superimposed substitutions (“multiple hits”)

• E.g.:

• Jukes-Cantor distance

• Kimura 2-parameter distance

• General maximum-likelihood distances available for other models

1 example)

3

1

2

a

d

c

3

b

e

4

2

4

1

2

3

4

Distance-based optimality criteria“Additive trees”

p12 = a+b

p13 = a+c+d

p14 = a+c+e

p23= b+c+d

p24= b+c+e

p34= d+e

pij = dij for all i and j if the tree

topology is correct and distances

Distance-based optimality criteria example)Minimum evolution and least-squares

Minumum

evolution

(ME)

Least-Squares

pij

dij

SS

LS branch lengths