phylogenetics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Phylogenetics PowerPoint Presentation
Download Presentation
Phylogenetics

Loading in 2 Seconds...

play fullscreen
1 / 46

Phylogenetics - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Phylogenetics. “Inferring Phylogenies” Joseph Felsenstein Excellent reference. What is a phylogeny?. Different Representations. Cladogram - branching pattern only Phylogram - branch lengths are estimated and drawn proportional to the amount of change along the branch

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Phylogenetics' - halle


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
phylogenetics

Phylogenetics

“Inferring Phylogenies”

Joseph Felsenstein

Excellent reference

different representations
Different Representations
  • Cladogram - branching pattern only
  • Phylogram - branch lengths are estimated and drawn proportional to the amount of change along the branch
  • Rooted - implies directionality of change
  • Unrooted - does not
  • How do you root a tree?
estimate a phylogeny
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

estimate a phylogeny1
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

working tree
Working Tree

sp2

sp1

c2

sp3

sp5

sp4

estimate a phylogeny2
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

working tree1
Working Tree

sp2

sp1

c2

sp3

c4

sp5

sp4

estimate a phylogeny3
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

working tree2
Working Tree

sp2

sp1

c7

c2

sp3

c4

sp5

sp4

estimate a phylogeny4
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

working tree3
Working Tree

sp2

sp1

c7

c2

sp3

c4

c9

sp5

sp4

estimate a phylogeny5
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

working tree4
Working Tree

sp2

sp1

c10

c7

c2

sp3

c4

c9

sp5

sp4

estimate a phylogeny6
Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAA

Sp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATA

final tree
Final Tree

sp2

sp1

c10

c11

c2

c7

sp3

c4

c9

sp5

sp4

what optimality criteria do we use then
What optimality criteria do we use then?
  • Parsimony
  • Likelihood
  • Bayesian
  • Distance methods?
parsimony
Parsimony
  • Why should we choose a specific grouping?
  • Maximum parsimony: we should accept the hypothesis that explain the data most simply and efficiently
  • “Parsimony is simply the most robust criterion for choosing between competing scientific hypotheses. It is not a statement about how evolution may or may not have taken place”1

1 Kitching, I. J.; Forey, P. L.; Humphries, J. & Williams, D. M. 1998. Cladistics: the theory and practice of parsimony analysis. The systematics Association Publication. No. 11.

parsimony1
Parsimony
  • Optimality criteria that chooses the topology with the less number of transformations of character states
  • Optimizing one component: tree topology (pattern based)
  • Most parsimonious tree: the one (or multiple) with the minimum number of evolutionary changes (smaller size/tree length)
reconstructing trees via sequence data

A

O

C

D

B

6. T=>G

6. T=>G

5. A=> GAP

2. G=>A

4. A=>C

3. T=>C

4. A=>G

1. T=>A

Reconstructing trees via sequence data

Tree length = 8

models of evolution
Models of Evolution

T

C

Pyrimidines

A

G

Purines

Transversions

Transitions

maximum likelihood
Maximum Likelihood
  • Base frequencies: fA + fG + fC + fT = 1
  • Base exchange: fs + fv = 1
  • R-matrix:  +  +  +  +  +  = 1
  • Gamma shape parameter
  • Number of discrete gamma-distribution categories
  • Pinvar: fvar + finv = 1
  • Likelihood: L =  li where i is each character state
maximum likelihood1
Maximum Likelihood

C

G

G

t4

t5

A

G

y

t2

t1

t3

t6

x

z

  • L=Pr(D|H)

t7

t8

w

ml cont
ML cont.

the probability that the nucleotide at time t is i is given by

the probability that the nucleotide at time t is j, ji, is given by

bayes theorem

Prob (H) Prob (D│H)

Prob (H │D) =

Prob (D)

Bayes Theorem

Prior probability or

Marginal probability of H

The conditional probability of H given D: posterior probability

Likelihood

function

H=Hypothesis

D=Data

Prior probability or

Marginal probability of D

∑HP(H) P(D|H)

Normalizing Constant: ensures ∑ P (H │D) = 1

take home message
Take Home Message
  • Likelihood: represents the P of the data given the hypothesis => difficult to interpret
  • Bayes approach: estimates the P of the hypothesis given the data => estimates P for the hypothesis of interest
bayesian inference of phylogeny

f(i) f(X|i)

f(i |X) =

B(s)

∑j=1 f(i) f(X|i)

f(i,i,) f(X|i,i,)

f(i,i,|X) =

B(s)

∑j=1 ∫ ,f(i,i,) f(X| i,i,)dd

∫ , f(i,i,) f(X|i,i,) dd

f(i|X) =

B(s)

∑j=1 ∫ , f(i,i,) f(X| i,i,)dd

Bayesian Inference of Phylogeny
  • Calculating pP of a tree involves a summation over all possible trees and, for each tree, integration over all combinations of bl and substitution-model parameter values
  • Inferences of any single parameter are based on the marginal distribution of the parameter
  • This marginal P distribution of the topology, for example, integrates out all the other parameters
  • Advantage: the power of the analysis is focused on the parameter of interest (i.e., the topology of the tree)
estimating phylogenies
Estimating phylogenies
  • Exhaustive Searches
  • Branch and bound methods
  • Rise in computational time versus rise in solution space
slide38

HIV-1 Whole Genomes

1993 - 15

HIV-1 Whole Genomes

2003 (JAN) - 397

heuristic searches
Heuristic Searches
  • Nearest-neighbor interchanges (NNI) - swap two adjacent branches on the tree
  • Subtree pruning and regrafting (SPR) - removing a branch from the tree (either an interior or an exterior branch) with a subtree attached to it. The subtree is then reinserted into the remaining tree in all possible places
  • Tree bisection and reconnection (TBR) - An interior branch is broken, and the two resulting fragments o the tree ar considered as separate trees. All possible connections are made between a branch of one and a branch of the other.
other approaches
Other approaches
  • Tree-fusing - find two near optimal trees and exchange subgroups between the two trees
  • Genetic Algorithms - a simulation of evolution with a genotype that describes the tree and a fitness function that reflects the optimality of the tree
  • Disc Covering - upcoming paper
phylogenetic accuracy
Phylogenetic Accuracy?
  • Consistency - A phylogenetic method is consistent for a given evolutionary model if the method converges on the correct tree as the data available to the method become infinite.
  • Efficiency - Statistical efficiency is a measure of how quickly a method converges on the correct solution as more data are applied to the problem.
  • Robustness - Robustnessrefers to the degree to which violations of assumptions will affect performance of phylogenetic methods
how reliable is my phylogeny
How reliable is MY phylogeny?
  • Bootstrap Analysis
  • Jackknife Analysis
  • Posterior Probabilities (Bayesian Approaches)
  • Decay Indices