Building phylogenetic trees
Download
1 / 38

Building phylogenetic trees - PowerPoint PPT Presentation


  • 207 Views
  • Uploaded on

Building phylogenetic trees. Contents. Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances UPGMA method (+ an example) Neighbor-Joining method (+ an example) Comparison of methods Conclusion. Phylogeny.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Building phylogenetic trees' - shilah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Contents
Contents

  • Phylogeny

  • Phylogenetic trees

  • How to make a phylogenetic tree from pairwise distances

    • UPGMA method (+ an example)

    • Neighbor-Joining method (+ an example)

  • Comparison of methods

  • Conclusion


Phylogeny
Phylogeny

  • Phylogeny is the evolution of related species/genes

  • Phylogenetic tree: diagram showing evolutionary lineages of species/genes

  • The history of genes or species may be very different

  • Genes can be homologous or analogous, but still remind each other

  • Homologous sequences can be devided into two parts

    • Orthologous sequences diverged by specification from a common ancestor

    • Paralogous sequences evolved by gene dublication within species

  • Analogous sequences may appear and function very similarly, but they do not have a common ancestor

  • WHEN WE WANT TO EXPLORE EVOLUTIONARY RELATIONSHIPS, WE NEED TO HANDLE ORTHOLOGOUS SEQUENCES


Phylogenetic trees
Phylogenetic trees

  • WHY construct a phylogenetic tree?

    • to understand lineage of various species

    • to understand how various functions evolved

    • to inform multiple alignments

  • Trees can be rooted (a common ancestor in known) or unrooted

  • Leaves are the terminal nodes that correspond to the observed sequences of genes or species (A, B, C, D)

  • Internal nodes are hypothetical ancestral nodes

  • All trees will be assumed to be binary, meaning that an edge that branches splits into two daughter edges

  • Each edge has a certain amount of evolutionary divergence associated to it, defined by some measure of distance between sequences, or from a model of substitution of residues over the course of evolution


Phylogenetic trees1
Phylogenetic trees

  • Different ways to represent a phylogenetic tree (illustrated by Treeview)


Different algorithms used to infer phylogeny from sequence data
Different algorithms used to infer phylogeny from sequence data

  • Distance methods

  • Parsimony

  • Likelihood

  • Probabilistic methods

  • Phylogenetic invariants


Route from the molecular sequences to the phylogenetic tree
Route from the molecular sequences to the phylogenetic tree data

Distance methods:

  • Select a set of related (orthologous) nucleotide or amino acid sequences

  • Perform multiple sequence alignment (Clustal series widely used)

  • Calculate pairwise distances of the sequence using chosen evolution model of substitution (Distances between sequences describe the evolution: the smaller distances are the closer they are related)

  • Select the most suitable algorithm to infer phylogeny

  • View the tree with a certain program (Treeview, NJPlot,..)



Making a tree from pairwise distances
Making a tree from pairwise distances data

  • Distances dijbetween each pair of sequences iand jare calculated in the given dataset

  • Different ways defining distances

    • For nucleotide sequences:

      Jukes-Cantor, Kimura-2-parameter K2P, HKY (Hasegawa-Kishino-Yano), F84, Tamura-Nei, General time-reversible model, General 12-parameter model

    • For amino acid sequences:

      PAM-matrices, BLOSUM-matrices


Distance matrix methods
Distance matrix methods data

  • UPGMA

    • Algorithm introduced by Sokal and Michener 1958

  • Neighbor-Joining

    • Algorithm introduced by Saitou and Nei 1987

    • Modified by Studier and Keppler 1988


Clustering method upgma
Clustering method: UPGMA data

  • UPGMA = Unweighted pair group method using arithmetic averages

  • Simple method

  • It works by clustering the sequences, at each stage connecting two clusters and finally creating a new node on a tree

  • Method assumes equal rate of evolutionary change along branches  Molecular clock assumption


Upgma
UPGMA data

A

C

B

D

  • UPGMA produces a rooted tree

  • Branch lengths satisfy a molecular clock

     The divergence of sequences is assumed to occur at the same constant rate at all points in the tree

  • Trees that are clocklike are rooted and the total branch length from the root up to any leaf is equal

  • Trees are often referred to be ultrametric

  • A distance measures are ultrametric if either all three distances are equal

    dij = dik = djkor two of them are equal and one is smaller: djk < dij = dik

     UPGMA is guaranteed to build the correct tree if distances are ultrametric

  • Method can be used for reconstructing phylogenies if evolutionary rates are assumed to be same in all lineages  criticism in the phylogeny literature

    • Suitable for the species closely related

  • Running time O(n2)


Algorithm upgma
Algorithm: UPGMA data

Initialisation:

Assign each sequence i in dataset to its own cluster

Define one leaf of T for each sequence, and place at height zero

Iteration:

Find the two clusters iand j for which dijis the smallest (pick randomly if several equal distances)

Define a new cluster ijby Cij = Ci UCj. Cluster ijhas nij = ni + njmembers ( initially ni = 1 )

Connect iand jon the tree to a new node v

The branch lengths from new node to iand jare

placed at height


Algorithm upgma cont
Algorithm: UPGMA (cont.) data

Iteration (cont.)

Compute the distances between the new cluster and the remaining clusters by using

Add ij to the current clusters and remove iand j

Termination:

When only two clusters iandjremain, place the root at height


An example upgma 1
An example UPGMA (1) data

  • Distance matrix (arbitrary)

    for four items (sequences)

    A, B, C and D

    Actually distances are not ultrametric, because three distances are not equal

    dij≠ dik≠ djkor two of them are not equal and one is smaller: djk < dij≠ dik

Step 1. Find the smallest distance, dij, between two clusters

 A and C, where dij is 7


An example upgma 2
An example UPGMA (2) data

Step 2. Define new cluster ij, which has nij = ni + nj

members (initially ni = 1)

New cluster  A and C

nAC= nA+ nC=2

Step 3. Connect A and C on the tree to a new node v1

Step 4. The branch lengths from new node v1 to A and C

3,5

A

C

3,5


An example upgma 3
An example UPGMA (3) data

Step 5. Compute the distances between the new cluster AC and the remaining clusters (B and D):

Step 6. Delete the columns and rows of the distance matrix that correspond to clusters A and C, and add a column and a row for cluster AC

New distance matrix


An example upgma 4
An example UPGMA (4) data

  • 2nd iteration process

  • Step 1. Find the two sequences i and j for which dij

  • is the smallest (randomly if several equal distances)

  • AC-B

  • Step 2. Define new cluster (ij), which has nij = ni + nj

  • members ( initially ni = 1 ) New cluster  AC and B

  • nACB= nAC+ nB = 2 + 1 = 3

  • Step 3. Connect AC and B on the tree to a new node v2

  • Step 4. The branch lengths from new node v2 to AC and B

3,5

A

C

3,5

B

4,25


An example upgma 5
An example UPGMA (5) data

Step 5. Compute the distances between the new cluster and the remaining cluster (D)

Step 6. Delete the columns and rows of the distance matrix that correspond to clusters AC and B, and add a column and a row for cluster ACB

New distance matrix


An example upgma 6
An example UPGMA (6) data

Termination:

Only two clusters (ACB and D) remaining

Place the root height

Original distance matrix and final

phylogenetic tree(including the

branch lengths)

3,5

A

0,75

C

1,92

3,5

B

4,25

D

6,17


Neighbor joining n j
Neighbor-Joining (N-J) data

D

  • Another algorithm that works by clustering the sequences

  • Does not assume molecular clock

  • N-J trees are unrooted

  • N-J assumes additivity

    Def. Edge lengths are said to be additive if the distance between any pair of leaves is the sum of lengths of the edges on the path connecting them

  • Method uses an approximate algorithm, where the tree is built by finding a pair of neighboring leaves i and j that minimize the length of the tree. Finally neighboring leaves are joined.

  • Running time O(n2)

B

A

C


Algorithm neighbor joining
Algorithm: Neighbor-Joining data

Initialisation:

Define T to be the set of leaf nodes, one for each given sequence

Iteration:

Compute for each sequence, where n is the number of sequences in the distance matrix

Pick a pair iand j (for which dij – ui – ujis the smallest (pick randomly if several equal)

Join items iand j with a new node v

Compute the branch lengths from a new node v to items iand j

Compute the distances between new node v and remaining items

Remove iand jfrom the distance matrix and replace them by new node v

Termination:

When only two items i and jremain, add the remaining edge between i and j, with length dij


An example n j 1
An example N-J (1) data

Step 1. Compute

for each row in

distance matrix

Step 2. Compute

(the lower-diagonal

matrix) and choose the

smallest (most negative)


An example n j 2
An example N-J (2) data

Step 3. Join A and B together with a new node v1. Compute the edge lengths, from A to node v and from B to node v1

Step 4. Compute distances between the new node v1 and remaining items (C and D)

B

5

v1

3

A


An example n j 3
An example N-J (3) data

New reduced distance matrix

Step 5. Delete A and B from the distance matrix and replace them by new item AB

Step 6. Continue from step 1, because more than two items remain

Step 1. Compute

for each row in

distance matrix

Step 2 Compute

and choose

the smallest (the lower-diagonal matrix)


An example n j 4
An example N-J (4) data

Step 3 Join v1 and C together with a new node v2. Compute the edge lengths, from v1to node v2and from C to node v2

Step 4 Compute distances between the new node v2 and remaining items (D)

B

5

v1

v2

1

3

3

A

C


An example n j 5
An example N-J (5) data

Step 5 Delete AB and C from the distance matrix and replace them by ABC

Step 6 Only two nodes remaining  connect them

Original distance matrix and final phylogenetic tree (including the edge lengths)

D

8

B

5

1

3

3

A

C


Comparison

UPGMA data

The total branch length from the root up to any leaf is equal

Produces a rooted tree, where the root is hypothesized ancestor of the sequences in the tree

Suitable for closely related sequences

Can be used to infer phylogenies if one can assume that evolutionary rates are the same in all lineages

Neighbor-joining

Unrooted tree, where the direction of evolution is unknown

Suitable for datasets with largely varying rates of evolution

Suitable for large datasets

Comparison

D

8

3,5

A

B

5

C

3,5

1

B

3

3

A

C

4,25

D

6,17


Conclusion
Conclusion data

  • UPGMA method constructs a rooted phylogenetic tree correctly if there is a molecular clock with a constant rate of mutation

  • UPGMA method is rarely used, because molecular clock assumption is not generally true: selection pressures vary across time periods, genes within organisms, organisms, regions within gene

  • N-J method produces an unrooted tree without molecular clock hypothesis

  • N-J method is one of the most popular and widely used by molecular evolutionist

  • Distance methods are strongly dependent on the model of evolution used

  • Sequence information is reduced when transforming sequence data into distances

  • Distance methods are computationaly fast


Reference
Reference data

  • Durbin, R., Eddy, S., Krogh, A., Mithchison G. 2003 Biological sequence analysis – Probabilistic models of proteins and nucleic acid. Campridge University Press.

  • Li, W. 1997. Molecular Evolution. Sinauer Associates, Sunderland, MA.  p. 108

  • Felsenstein, J. 2003. Inferring Phylogenies. Sinauer Associates, Sunderland, MA. p.147-170


Examples of phylogeny programs
Examples data of phylogeny programs

Multiple sequence alignment

  • Clustal series (W, V) (free, http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html )

    Phylogeny packages

  • PAUP (http://paup.csit.fsu.edu/ )

  • Phylip (free, http://evolution.gs.washington.edu)

  • MEGA (free, http://www.megasoftware.net)

    Viewing/plotting phylogenetic trees

  • Treeview (free, http://taxonomy.zoology.gla.ac.uk/rod/treeview.html)

  • NJPlot (free, http://pbil.univ-lyon1.fr/software/njplot.html)


Further reading
Further reading data

  • N-J: Saitou, N. and M. Nei.1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol4(4): 406-25.

  • N-J: Studier, J. A., K. J. Keppler, et al. 1988. A note on the neighbor-joining algorithm of Saitou and Nei The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol5(6): 729-31.

  • UPGMA: Michener, C. D., and R. R. Sokal. 1957. A quantative approach to a problem in classification. Evolution11: 130-162.

  • ClustalW: Thompson, J. D., T. J. Gibson, et al. 1997. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25(24): 4876-82.


ad