Loading in 2 Seconds...
Loading in 2 Seconds...
Lecture 1: Overview of Phylogenetic methods and applications. Allan Wilson. Charles Darwin and Alfred Russel Wallace Evolution as descent with modification, implying relationships between organisms by unbroken genetic lines Phylogenetics seeks to determine these genetic relationships.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Phylogenetics seeks to determine these genetic relationships
Alfred Russel Wallace
Darwin’s sketch: the first phylogenetic tree?
Ji et al.
Archaic therians (2)
Hu et al.
Opalized lower jaw of the monotreme Steropodon
Modern therians (2)
e.g. Jaw rotation: weak (0), moderate (1), strong (2) as indicated by vertical wear facets on molars. Hu et al. (Nature, 1997) and Ji et al. (Nature, 1999) coded Steropodon (1) and (2) respectively, helping to account for their alternative placements of monotremes
Millions of years
Hominid phylogeny from DNA
Biological problem (the question)
Which data to obtain (data sampling)
Finding the best tree (search strategy)
Defining the best tree (optimality criterion)
What is the relationship of the extinct American Cheetah (Miracinonyx trumani) to other cats?
Two main sister group hypotheses
Cheetahs (Acinonyx jubatus): Limb, skull, vertebrae morphology
B. Pumas (Felis concolor): Geography, early fossils less cheetah-like
See Barnett et al. (Curr. Biol., 2005)
mt control region best < 2 million years
mt Protein/RNA coding, best 2 25 million years
Nuclear protein-coding, best > 25 million years
Dimensions ntax=29 nchar=10692;
Format datatype=dna gap=-;
Type of data
Distances Discrete (e.g. nucleotides)
Information loss often statistical power loss
Unweighted pair group method with arithmetic means (UPGMA)
Maximum parsimony (MP)
Minimum evolution (ME)
Maximum likelihood (ML)
Number of possible trees (where n is the number of taxa)
Unrooted trees: (2n-5) (2n-7) …31
Rooted trees: (2n-3) (2n-5) …31
For the 11-taxon cat phylogeny
Unrooted = 17 5 13 11 9 7 5 3 1 = 34,459,425
Rooted = Unrooted (2n-3) = 654,729,075
An exhaustive search will examine all trees, but is not practical for n > 12
Find an initial tree, and move within near-by tree-space, discarding worse alternatives
Only a small amount of tree-space is searched and there is no guarantee of finding the optimal tree - can be trapped in local maxima
As trees are built and branches added, if the addition of a taxon to a particular branch results in a tree-length greater than a previously determined upper bound for the tree, then this topology and all those derived from it are ignored and the search continues with a new placement for that taxon
Branch and bound guarantees finding globally optimal trees
Absolute distance matrix
1 2 3 4 5 6 7 8 9 10 11
1 Mongoose -
2 Hyena 156 -
3 Sabretooth 207 147 -
4 Am.Cheetah 192 140 159-
5 Lion 186 134 148 131 -
6 Tiger 160 143 132 111 64 -
7 Puma 194 139 162 70 124 100 -
8 House.Cat 206 133 163 124 118 100 117 -
9 Cheetah 192 139 162 108 127 109 96 110 -
10 Ocelot 206 123 165 116 116 98 111 98 113 -
11 Jaguarundi 204 147 177 123 143 121 101 119 128 131 -
Taxon Y TCAGCTA Taxon X ACATGTG Taxon Z ACGTCAG
XZ= 3 difference YZ= 5 differences XY= 4 differences
Taxon Y TC A GCTA Taxon X AC A TGTG Taxon Z AC G TCAG Outgroup AA G TCTG
Synapomorphies are shared derived characters and so are considered to define clades (relationship groupings)
* Character 3 changes G to A
8 step sub-optimal phenetic tree
7 steps (MP tree)
L = Pr(D|H)
Probability of the data, given an hypothesis
The hypothesis is a tree topology, its branch-lengths and a model under which the data evolved
First use in phylogenetics: Cavalli-Sforza and Edwards (1967) for gene frequency data; Felsenstein (1981) for DNA sequences
Model of rate change e.g. Kishino-Hasegawa (1985): 4 base frequencies, transition/transversion (ti/tv ratio)
0.5 substitutions per site
A GC T A G
A A A A C C
Sum the probabilities for each of the 16 internal node combinations to get the likelihood for this single nucleotide site
C T A GC
C CT T T
T A GCT
TG G G G
The tree with the highest –lnL is the ML tree
Prior probability, the probability of the hypothesis on previous knowledge
Likelihood function, probability of the data given the hypothesis
Posterior probability, the probability of the hypothesis given the data
Unconditional probability of the data, a normalizing constant ensuring the posterior probabilities sum to 1.00
First use in phylogenetics: Li (1996, PhD thesis), Rannala and Yang (1996)
Markov chain Monte Carlo (MCMC) is used to approximate Bayesian posterior probabilities *(BPP) over 1,000s – 1,000,000s of generations
New state rejected
New state accepted
BPP(tree 1) = 4/6
Generation 1 2 3 4 5 6
0 0.5 1.0
0 0.5 1.0
Prior for a parameter value (e.g. proportion of invariant sites)
Posterior for the proportion of invariant sites
Maximum parsimony and neighbour-joining (distance) cladogram
Maximum likelihood and Bayesian inference phylogram
The tree of life and inferring our origins
Little evidence from fossils
ACA GAG CGC Threonine - Glutamic acid - Arginine
ACG GAG AGC Threonine - Glutamic acid - Serine
Decreased dN/dS suggests purifying selection
non-synonymous (N) substitutions
The dN/dS ratio can be estimated along branches of phylogenetic trees (e.g. Guindon et al. PNAS, 2004)
Here dN/dS is indicated by branch width
Increased dN/dS suggests Positive selection
Non-synonymous/synonymous ratios for peptide binding regions and non-peptide binding regions
MHC (Major histocompatibility complex) binds antigens and presents them to T-cells as part of the immune response.
Positive selection at binding sites provides high MHC variability with which to confront new pathogenic threats.
Mhc class II B with inferred locations of population-specific amino acid changes for Gloucester and Hot Spot.
Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) has a recombinant history with lineages of types I and III coronavirus
Understanding sequence evolution and the biases that may result from models (which necessarily are simplifications) are of vital importance in phylogenetic inference
Caliciviruses infect diverse mammalian hosts and include Norovirus, the major cause of food-borne viral gastroenteritis in humans.
Host switching by caliciviruses is rare, although pigs have strains from co-speciation (artiodactyl strain) and host switching (carnivoran strain).
Many plants; follows wind dispersal patterns
Many land animals: follows continental break-up
S. South America
From: SanMartin and Ronquist (Syst. Biol. 2004)
Relict population of 25-40 individuals in the Russian Far East.
Does the 65 Ma meteor impact (Alvarez et al. Science, 1980) fully explain the “great reptile extinction” and the rise of modern birds and mammals?
Bison (Lascaux, France)
The distribution of coalescence events over time on the tree allow inference of relative population size
Last glacial maximum