Molecular Phylogenetics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 70

Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Molecular Phylogenetics. Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny: [1] selecting sequences [2] multiple sequence alignment [3] tree-building [4] tree evaluation Practical approaches to making trees. Introduction.

Download Presentation

Introduction to evolution and phylogeny Nomenclature of trees Four stages of molecular phylogeny:

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Molecular Phylogenetics

Introduction to evolution and phylogeny

Nomenclature of trees

Four stages of molecular phylogeny:

[1] selecting sequences

[2] multiple sequence alignment

[3] tree-building

[4] tree evaluation

Practical approaches to making trees


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Introduction

At the molecular level, evolution is a process of

mutation with selection.

Molecular evolution is the study of changes in genes

and proteins throughout different branches of the

tree of life.

Phylogeny is the inference of evolutionary relationships.

Traditionally, phylogeny relied on the comparison

of morphological features between organisms. Today,

molecular sequence data are also used for phylogenetic

analyses.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Dickerson

(1971)

corrected amino acid changes

per 100 residues (m)

Millions of years since divergence


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Molecular clock for proteins:

rate of substitutions per aa site per 109 years

Fibrinopeptides9.0

Kappa casein3.3

Lactalbumin2.7

Serum albumin1.9

Lysozyme0.98

Trypsin0.59

Insulin0.44

Cytochrome c0.22

Histone H2B0.09

Ubiquitin0.010

Histone H40.010


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Molecular clock hypothesis: implications

If protein sequences evolve at constant rates,

they can be used to estimate the times that

sequences diverged. This is analogous to dating

geological specimens by radioactive decay.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

N

L

Molecular clock hypothesis: implications

If protein sequences evolve at constant rates,

they can be used to estimate the times that

sequences diverged. This is analogous to dating

geological specimens by radioactive decay.

N = total number of substitutions

L = number of nucleotide sites compared

between two sequences

K = = number of substitutions

per nucleotide site


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Rate of nucleotide substitution r

and time of divergence T

r = rate of substitution

= 0.56 x 10-9 per site per year for hemoglobin alpha

K = 0.093 = number of substitutions

per nucleotide site (rat versus human)

r = K / 2T

T = .093 / (2)(0.56 x 10-9) = 80 million years


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Neutral theory of evolution

An often-held view of evolution is that just as organisms

propagate through natural selection, so also DNA and

protein molecules are selected for.

According to Motoo Kimura’s 1968 neutral theory

of molecular evolution, the vast majority of DNA

changes are not selected for in a Darwinian sense.

The main cause of evolutionary change is random

drift of mutant alleles that are selectively neutral

(or nearly neutral). Positive Darwinian selection does

occur, but it has a limited role.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Goals of molecular phylogeny

Phylogeny can answer questions such as:

  • How many genes are related to my favorite gene?

  • Was the extinct quagga more like a zebra or a horse?

  • Was Darwin correct that humans are closest

  • to chimps and gorillas?

  • How related are whales, dolphins & porpoises to cows?

  • Where and when did HIV originate?

  • What is the history of life on earth?


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Woese PNAS


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Molecular phylogeny: nomenclature of trees

There are two main kinds of information inherent

to any tree: topology and branch lengths.

We will now describe the parts of a tree.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Molecular phylogeny uses trees to depict evolutionary

relationships among organisms. These trees are based

upon DNA, RNA, and protein sequence data.

A

2

1

1

B

2

C

2

2

1

D

6

one unit

E

chronogram

phylogram


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Tree nomenclature

taxon

taxon

A

2

1

1

B

2

C

2

2

1

D

6

one unit

E


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Tree nomenclature

operational taxonomic unit (OTU)

such as a protein sequence

taxon

A

2

1

1

B

2

C

2

2

1

D

6

one unit

E


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Tree nomenclature

Node (intersection or terminating point

of two or more branches)

branch

(edge)

A

2

1

1

B

2

C

2

2

1

D

6

one unit

E


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Tree nomenclature

Branches are unscaled...

Branches are scaled...

A

2

1

1

B

2

C

2

2

1

D

6

one unit

E

…OTUs are neatly aligned,

and nodes reflect time

…branch lengths are

proportional to number of

amino acid changes


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

2

A

F

1

1

G

B

2

I

H

2

C

1

D

6

E

time

Tree nomenclature

bifurcating

internal

node

multifurcating

internal

node

A

2

1

B

2

C

2

2

1

D

6

one unit

E


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Examples of multifurcation: failure to resolve the branching order

of some metazoans and protostomes

Rokas A. et al., Animal Evolution and the Molecular Signature of Radiations

Compressed in Time, Science 310:1933, 23 December 2005, Fig. 1.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Tree nomenclature: clades

Clade ABF (monophyletic group)

A group is monophyletic (Greek: "of one race") if it consists of a common ancestor and all its descendants.

(http://en.wikipedia.org/wiki/)

A

2

F

1

1

B

G

2

I

H

2

C

1

D

6

E

time


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Tree roots

The root of a phylogenetic tree represents the

common ancestor of the sequences. Some trees

are unrooted, and thus do not specify the common

ancestor.

A tree can be rooted using an outgroup (that is, a

taxon known to be distantly related from all other

OTUs).


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Tree nomenclature: roots

past

9

1

5

7

8

6

7

8

2

3

present

4

2

6

4

5

3

1

Rooted tree

(specifies evolutionary

path)

Unrooted tree


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Tree nomenclature: outgroup rooting

past

root

9

10

7

8

7

9

6

8

2

3

2

3

4

present

4

6

Outgroup

(used to place the root)

5

1

5

1

Rooted tree


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Enumerating trees

Cavalii-Sforza and Edwards (1967) derived the number

of possible unrooted trees (NU) for n OTUs (n> 3):

NU =

The number of bifurcating rooted trees (NR)

NR =

For 10 OTUs (e.g. 10 DNA or protein sequences),

the number of possible rooted trees is  34 million,

and the number of unrooted trees is  2 million.

Many tree-making algorithms can exhaustively

examine every possible tree for up to ten to twelve

sequences.

(2n-5)!

2n-3(n-3)!

(2n-3)!

2n-2(n-2)!


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Species trees versus gene/protein trees

Molecular evolutionary studies can be complicated

by the fact that both species and genes evolve.

speciation usually occurs when a species becomes

reproductively isolated. In a species tree, each

internal node represents a speciation event.

Genes (and proteins) may duplicate or otherwise evolve

before or after any given speciation event. The topology

of a gene (or protein) based tree may differ from the

topology of a species tree.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Species trees versus gene/protein trees

past

speciation

event

present

species 2

species 1


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Species trees versus gene/protein trees

Gene duplication

events

speciation

event

species 2

species 1


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Species trees versus gene/protein trees

Gene duplication

events

speciation

event

OTUs

species 2

species 1


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Orthology/paralogy

Orthologous genes are homologous (corresponding) genes in different species (genomes)

Paralogous genes are homologous genes within the same species (genome)


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Four stages of phylogenetic analysis

Molecular phylogenetic analysis may be described

in four stages:

[1] Selection of sequences for analysis

[2] Multiple sequence alignment

[3] Tree building

[4] Tree evaluation


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Stage 2: Multiple sequence alignment

The fundamental basis of a phylogenetic tree is

a multiple sequence alignment.

(If there is a misalignment, or if a nonhomologous

sequence is included in the alignment, it will still

be possible to generate a tree.)


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Two Major Approaches to Phylogeny Inference

Distance Matrix Methods

Calculate matrix of pairwise distances from all data, then infer tree using a clustering algorithm.

2) Character Based Methods (maximum parsimony)

Inspect columns of characters, infer trees from columns that contain “informative” characters, and use these to infer most likely tree given the data.


Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

Distance Matrix Methods

(matrix calculation)

Reality: Not all sites are free to change, the same sites change multiple times


The simplest model is that of jukes cantor

The simplest model is that of Jukes & Cantor


Jukes cantor d xy 3 4 l n 1 4 3 d

Jukes & Cantor:dxy = -(3/4) ln (1-4/3 D)

  • dxy= distance between sequence x and sequence y expressed as the number of changes per site

  • (note dxy= r/n where r is number of replacements and n is the total number of sites. This assumes all sites can vary and when unvaried sites are present in two sequences it will underestimate the amount of change which has occurred at variable sites) (i.e., previous reality check)

  • D = is the observed proportion of nucleotides which differ between two sequences (fractional dissimilarity)

  • ln = natural log function to correct for superimposed substitutions

    (in general logging tends to convert exponential trends to linear trends)

  • The 3/4 and 4/3 terms reflect that there are four types of nucleotides and three ways in which a second nucleotide may not match a first - with all types of change being equally likely (i.e. unrelated sequences should be 25% identical by chance alone)


The natural logarithm l n is used to correct for superimposed changes at the same site

The natural logarithm ln is used to correct for superimposed changes at the same site

  • If two sequences are 95% identical they are different at 5% or 0.05 (D) of sites thus:

    • dxy = -3/4 ln (1-4/3 0.05) = 0.0517

  • Note that the observed dissimilarity 0.05 increases only slightly to an estimated 0.0517 - this makes sense because in two very similar sequences one would expect very few changes to have been superimposed at the same site in the short time since the sequences diverged apart

  • However, if two sequences are only 50% identical they are different at 50% or 0.50 (D) of sites thus:

    • dxy = -3/4 ln (1-4/3 0.5) = 0.824

  • For dissimilar sequences, which may diverged apart a long time ago, the use of ln infers that a much larger number of superimposed changes have occurred at the same site


  • Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Distance Matrix Methods

    (tree construction)

    UPGMA is

    unweighted pair group method

    using arithmetic mean


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Tree-building methods: UPGMA

    Step 1: compute the pairwise distances of all

    the proteins. Get ready to put the numbers 1-5

    at the bottom of your new tree.


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Tree-building methods: UPGMA

    Step 2: Find the two proteins with the

    smallest pairwise distance. Cluster them.

    6

    1

    2


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Tree-building methods: UPGMA

    Step 3: Do it again. Find the next two proteins

    with the smallest pairwise distance. Cluster them.

    6

    7

    1

    2

    4

    5


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Tree-building methods: UPGMA

    Step 4: Keep going. Cluster.

    8

    7

    6

    3

    1

    2

    4

    5


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    1

    2

    3

    4

    5

    Tree-building methods: UPGMA

    Step 4: Last cluster! This is your tree.

    9

    8

    7

    6

    1

    2

    4

    5

    3


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Distance-based methods: UPGMA trees

    • UPGMA is a simple approach for making trees.

    • An UPGMA tree is always rooted.

    • An assumption of the algorithm is that the molecular

    • clock is constant for sequences in the tree. If there

    • are unequal substitution rates, the tree may be wrong.

    • While UPGMA is simple, it is less accurate than the

    • neighbor-joining approach (described next).


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Distance method: Advantages

    • Fast - suitable for analysing data sets which are too large for other more computationally intensive methods such as maximum likelihood

    • A large number of models are available with many parameters -improves estimation of distances


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Distance method: Disadvantages

    • Information is lost - given only the distances, it is impossible to derive the original sequences

    • Only through character based analyses can the history of sites be investigated; e.g., most informative positions be inferred


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Character Based Methods: Maximum Parsimony

    The best tree: should be the one that requires the smallest number of substitutions to explain the differences among the sequences being studied.

    Occam's razor: Among his statements (translated from his Latin) are: "Plurality is not to be assumed without necessity" and "What can be done with fewer [assumptions] is done in vain with more."

    One consequence of this methodology is the idea that the simplest or most obvious explanation of several competing ones is the one that should be preferred until it is proven wrong.


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Not all Characters are Used

    in Parsimony Analysis

    • informative sites - nucleotide (or amino acid) columns that are represented by at least two different character states found in at least two different sequences, these sites allow the distinction between alternative trees.

    • uninformative sites - nucleotide (or amino acid) columns that do not allow the distinction between two trees (e.g., constant)


    Maximum parsimony 4 taxon case

    Maximum Parsimony (4-taxon case)

    1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - A A T G T T G T C G

    How may informative sites are there

    in this data set?


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    0 3

    0 3

    0 3

    Maximum Parsimony (4-taxon case)

    1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - A A T G T T G T C G


    Maximum parsimony

    G

    T

    3

    C

    A

    C

    G

    C

    3

    T

    A

    C

    G

    T

    3

    A

    C

    C

    Maximum Parsimony

    2

    1 - G

    2 - C

    3 - T

    4 - A


    Maximum parsimony1

    0 3 2

    0 3 2

    0 3 2

    Maximum Parsimony

    1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - A A T G T T G T C G


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    Maximum Parsimony

    G

    A

    2

    G

    T

    3

    1 - G

    2 - G

    3 - A

    4 - T

    G

    G

    G

    2

    A

    T

    G

    G

    A

    2

    T

    G

    G


    Maximum parsimony2

    0 3 2 2

    0 3 2 2

    0 3 2 1

    Maximum Parsimony

    1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - A A T G T T G T C G


    Maximum parsimony3

    G

    A

    2

    A

    G

    A

    G

    A

    2

    A

    G

    A

    A

    G

    1

    A

    G

    A

    Maximum Parsimony

    4

    1 - G

    2 - A

    3 - A

    4 - G


    Maximum parsimony4

    0 3 2 2 0 1 1 1 1 3 14

    0 3 2 2 0 1 2 1 2 3 16

    0 3 2 1 0 1 2 1 2 3 15

    Maximum Parsimony


    Maximum parsimony5

    Maximum Parsimony

    1 2 3 4 5 6 7 8 9 10

    1 - A G G G T A A C T G

    2 - A C G A T T A T T A

    3 - A T A A T T G T C T

    4 - A A T G T T G T C G

    0 3 2 2 0 1 1 1 1 3 14


    Parsimony advantages

    Parsimony - advantages

    • is a simple method - easily understood operation

    • does not seem to depend on an explicit model of evolution

    • gives both trees and associated hypotheses of character evolution

    • should give reliable results if the data is well structured and homoplasy is either rare or widely (randomly) distributed on the tree


    Parsimony disadvantages

    Parsimony - disadvantages

    • May give misleading results if homoplasy is common or concentrated in particular parts of the tree, e.g:

      • thermophilic convergence

      • base composition biases

      • long branch attraction

    • Underestimates branch lengths (Why?)

    • Model of evolution is implicit - behaviour of method not well understood

    • Parsimony often justified on purely philosophical grounds - we must prefer simplest hypotheses - particularly by morphologists

    • For most molecular systematists, this is uncompelling


    Parsimony can be inconsistent

    Parsimony can be inconsistent

    • Felsenstein (1978) developed a simple model phylogeny including four taxa and a mixture of short and long branches

    • Under this model parsimony will give the wrong tree

    Long branches are attracted but the similarity is

    homoplastic

    • With more data the certainty that parsimony will give the wrong tree increases - so that parsimony is statistically inconsistent

    • Advocates of parsimony initially responded by claiming that Felsenstein’s result showed only that his model was unrealistic

    • It is now recognised that the long-branch attraction (in the “Felsenstein Zone”) is one of the most serious problems in phylogenetic inference


    Summary and recommendations

    Summary and recommendations

    • Remember that molecular phylogenetics yields gene trees

    • Accurate gene trees may not be accurate organismal trees

    • Gene duplications and paralogy, and lateral transfer can produce mismatches between gene and organismal phylogenies

    • Use congruence between separate gene trees to identify robust organismal phylogenies or mismatches that require further information


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    The most famous case of LBA misleading biologists…

    The Universal SSU rRNA TreeWheelis et al. 1992 PNAS 89: 2930


    Introduction to evolution and phylogeny nomenclature of trees four stages of molecular phylogeny

    The SSU Ribosomal RNA Tree for Eukaryotes

    Mitochondria?

    Prokaryotic outgroup

    Archezoa


  • Login