phylogenetics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Phylogenetics PowerPoint Presentation
Download Presentation
Phylogenetics

Loading in 2 Seconds...

play fullscreen
1 / 90

Phylogenetics - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Reconstructing the Tree of Life. Phylogenetics. Tree of Life Web Project. http://www.tolweb.org/tree/. Fig. 26-21. EUKARYA. Dinoflagellates. Land plants. Forams. Green algae. Ciliates. Diatoms. Red algae. Amoebas. Cellular slime molds. Euglena. Trypanosomes. Animals. Leishmania.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Phylogenetics' - hedwig


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tree of life web project

Tree of Life Web Project

http://www.tolweb.org/tree/

slide3

Fig. 26-21

EUKARYA

Dinoflagellates

Land plants

Forams

Green algae

Ciliates

Diatoms

Red algae

Amoebas

Cellular slime molds

Euglena

Trypanosomes

Animals

Leishmania

Fungi

Sulfolobus

Green nonsulfur bacteria

Thermophiles

(Mitochondrion)

Spirochetes

Chlamydia

Halophiles

COMMON

ANCESTOR

OF ALL

LIFE

Green

sulfur bacteria

BACTERIA

Methanobacterium

Cyanobacteria

(Plastids, including

chloroplasts)

ARCHAEA

outline

Outline

What is a phylogeny?

How do you construct a phylogeny?

The Molecular Clock

Statistical Methods

slide5
Think about relationships among the major lineages of life and when they appeared in the fossil record

Are Genetic Distances

and fossil record

roughly congruent?

fossil record vs molecular clock
Fossil Record vs Molecular Clock
  • Molecular clock and fossil record are not always congruent
    • Fossil record is incomplete, and soft bodied species are usually not preserved
    • Mutation rates can vary among species (depending on generation time, replication error, mismatch repair)
  • But they provide complementary information
    • Fossil record contains extinct species, while molecular data is based on extant taxa
    • Major events in fossil record could be used to calibrate the molecular clock
evolutionary history of hiv
Evolutionary History of HIV

HIV evolved multiple times from SIV (Simian Immunodeficiency Syndrome)

Evolutionary Analysis

Freeman& Herron, 2004

Time

charles darwin 1809 1882
Charles Darwin (1809 -1882)

On the Origin of Species (1859)

  • Living species are related by common ancestry
  • Change through time occurs at the population not the organism level
  • The main cause of adaptive evolution is natural selection
darwin envisaged evolution as a tree
Darwin envisaged evolution as a tree

The affinities of all the beings of the same class have sometimes been represented by a great tree. I believe this simile largely speaks the truth……

…The green and budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species…..

….the great Tree of Life….covers the earth with ever-branching and beautiful ramifications

Charles Darwin, On the Origin of Species; pages 131-132

what did people believe before darwin

Past

Future

What did people believe before Darwin?

Lamarck proposed a ladder of life

jean baptiste lamarck
Jean-Baptiste Lamarck
  • French Naturalist (1744-1829)
  • “Professor of Worms and Insects” in Paris
  • The first scientific theory of evolution (inheritance of acquired traits)
lamarck s view of evolution
Lamarck’s View of Evolution

God

Being

  • Continuum between physical and biological world (followed Aristotle)
  • Scala Naturae (“Ladder of Life” or “Great Chain of Being”)

Angels

Realm of Being

Demons

Man

Animals

Plants

Realm of Becoming

Minerals

Non-Being

what is wrong with a ladder
What is wrong with a ladder?
  • Evolution is not linear but branching
  • Living organisms are not ancestors of one another
  • The ladder implies progress
what is right with the tree
What is right with the tree?
  • Evolution is a branching process
  • If a mutation occurs, one species is not turning into another, but there is a split, and both lineages continue to evolve
  • So, evolution is not progressive - all living taxa are equally “successful”
  • Phylogenies (Trees) reflect the hierarchical structuring of relationships
the tree of life is a fractal
The Tree of Life is a Fractal

http://tolweb.org/tree/phylogeny.html

genealogical structures
Genealogical structures
  • Phylogeny
    • A depiction of the ancestry relations between species (it includes speciation events)
    • Tree-like (divergent)
  • Pedigree
    • A depiction of the ancestry relations within populations
    • Net-like (reticulating)
slide21

future

Individuals

past

Population

slide22

Population

Lineage/ Species

What happened here?

Phylogeny

Lineage-branching

Speciation

representation of phylogenies
Representation of phylogenies?

A

B

C

A

B

C

A simplified representation

The True History

some terms used to describe a phylogenetic tree

Some terms used to describe a phylogenetic tree

Taxon (taxa)

Tip

Internal branch

Internode

Node (Speciation event)

Root

outline1

Outline

What is a phylogeny?

How do you construct a phylogeny?

The Molecular Clock

Statistical Methods

slide27

What is a Phylogeny?

  • A phylogenetic tree represents a hypothesis about evolutionary relationships
  • Each branch point represents the divergence of two taxa (e.g. species)
  • Sister taxa are groups that share an immediate common ancestor
molecular clock
Molecular Clock
  • Mutations
  • On average, mutations occur at a given rate

Example:

Mitochondria: 1 mutation every ~2.2%/million years.

molecular clock1
Molecular Clock

Faster if

  • Mutation rate is faster:
    • Shorter generation time

(greater number of meiosis or mitosis events in a given time)

    • Replication Error (e.g. Sloppy DNA or RNA polymerase, inefficient mismatch repair)
phylogenetic trees with proportional branch lengths
Phylogenetic Trees with Proportional Branch Lengths
  • In some trees, the length of a branch can reflect the number of genetic changes that have taken place in a particular DNA sequence in that lineage
  • So longer branches = greater evolutionary distance
phylogenetic informative characters mutations

Phylogenetic Informative Characters(mutations)

Neutral mutations:

Mutations that are not subjected to selection

Better for constructing phylogenies because selection could make unrelated taxa appear more similar or related taxa more different

Examples: Noncoding regions of DNA, 3rd codon position in proteins, introns, microsatellites (“junk DNA”)

codon bias
Codon Bias
  • In the case of amino acids
  • Mutations in Position 1, 2 lead to change
  • Mutations in Position 3 don’t matter
slide34

Species

Order

Family

Genus

Pantherapardus

Panthera

Felidae

Taxidea

taxus

Taxidea

Carnivora

Mustelidae

Lutra lutra

Lutra

Canis

latrans

Canidae

Canis

Canis

lupus

slide35

Branch point

(node)

Taxon A

Taxon B

Sister

taxa

Taxon C

ANCESTRAL

LINEAGE

Taxon D

Taxon E

Taxon F

Common ancestor of

taxa A–F

Polytomy (unresolved branching point)

slide36

A monophyletic clade consists of an ancestral taxa and all its descendants

A

A

A

Group I

B

B

B

C

C

C

D

D

D

Group III

Group II

E

E

E

F

F

F

G

G

G

(b) Paraphyletic group

(c) Polyphyletic group

(a) Monophyletic group (clade)

slide38

A

Group I

B

C

D

E

F

G

(a) Monophyletic group (clade)

(in the lecture on species concepts we discussed that the “smallest” monophyletic group is a “phylogenetic species”)

synapomorphies

Synapomorphies

Synapomorphies are shared derived homologous traits

They can be DNA nucleotides or other heritable traits

They are used to group taxa that are more closely related to one another

slide49

Sometimes similar looking traits are not homologous, and are not synapomorphies, but are the result of convergent evolution

phylogenetic methods
Phylogenetic Methods
  • Parsimony: Minimize # steps
  • Distance Matrix: minimize pairwise genetic distances
  • Maximum Likelihood: Probability of the data given the tree
  • Bayesian: Probability of the tree given the data
parsimony
Parsimony

Uses Discrete

Characters (like mutations, or some heritable trait)

Select the tree with the minimum number of character-state transitions summed across all characters

slide55

Fig. 26-15-1

Parsimony: Example 1

Species III

Species I

Species II

Three phylogenetic hypotheses:

I

I

III

II

II

III

I

III

II

slide56

Fig. 26-15-2

Site

4

1

2

3

1/C

Species I

T

C

T

A

I

I

III

1/C

Species II

C

T

T

C

III

II

II

1/C

Species III

G

C

A

A

II

I

III

1/C

1/C

G

T

A

T

Ancestral

sequence

slide57

Fig. 26-15-3

Site

4

1

2

3

1/C

Species I

T

C

T

A

I

I

III

1/C

Species II

C

T

T

C

III

II

II

1/C

Species III

G

C

A

A

II

I

III

1/C

1/C

G

T

A

T

Ancestral

sequence

3/A

2/T

3/A

I

III

I

3/A

2/T

4/C

III

II

II

2/T

4/C

4/C

II

III

I

2/T

3/A

3/A

4/C

2/T

4/C

slide58

Fig. 26-15-4

Site

4

1

2

3

1/C

Species I

T

C

T

A

I

I

III

1/C

Species II

C

T

T

C

III

II

II

1/C

Species III

G

C

A

A

II

I

III

1/C

1/C

G

T

A

T

Ancestral

sequence

3/A

2/T

3/A

I

III

I

3/A

2/T

4/C

III

II

II

2/T

4/C

4/C

II

III

I

2/T

3/A

3/A

4/C

2/T

4/C

I

III

I

III

II

II

I

II

III

7 events

6 events

7 events

three possible trees

O

O

O

A

C

B

B

A

B

C

C

A

Parsimony: Example 2

Three possible trees

O

A

O

C

C

B

A

B

Tree 1

Tree 2

O

A

B

Tree 3

C

map the characters mutations onto tree 1

O

C

B

A

G

C

G

A

A

G

C

A

A

A

G

C

A

C

T

Map the characters (mutations) onto tree 1

1

3

4

5

2

O

T

G

G

A

A

A

1

B

2

C

map the characters mutations onto tree 11

O

C

B

A

4

5

3

G

C

G

A

A

3

G

C

A

A

A

G

C

A

C

T

Map the characters (mutations) onto tree 1

1

3

4

5

2

O

T

G

G

A

A

A

1

B

2

C

Total # number of steps = 6

actually there is more than one way to map character 3
Actually, there is more than one way to map character 3

3

O

C

B

A

O

C

B

A

G

3

O

3

A

G

3

B

A

C

A

3

Either way the character contributes 2 steps to the overall tree length

slide63

O

A

B

C

5

4

1

2

3

G

C

G

A

A

G

C

A

A

A

G

C

A

C

T

Map the characters onto tree 2

1

2

3

4

5

T

G

G

A

A

O

A

B

C

# steps = 5

tree 3

O

B

A

C

5

4

1

2

G

C

G

A

A

G

C

A

A

A

G

C

A

C

T

Tree 3

1

2

3

4

5

3

T

G

G

A

A

O

3

A

B

C

Length = 6 steps

which tree had the shortest branch lengths most parsimonious

Most parsimonious tree

O

O

O

A

C

B

B

B

A

C

C

A

Which tree had the shortest branch lengths (most parsimonious)?

O

A

C

B

Tree 1: length = 6

Tree 2: length = 5

O

A

Tree 3: length = 6

B

C

slide66

Where do the Whales belong?

Example from Freeman & Herron, Fig. 4.8

slide67

Freeman & Herron, Fig. 4.9: Using maximum parsimony, looks like the whales cluster with the hippos (and cows)

parsimony1
Parsimony
  • Simplest and fastest method of phylogenetic reconstruction
  • Can give misleading results if rates of evolution (rates that mutations occur) differ in different lineages
  • Tends to become less accurate as genetic distances get greater
    • Could be mislead by reversals, homoplasy: Because with only 4 nucleotides, after a while, same mutations occur repeatedly at a given site (called “saturation”)
distance matrix
Distance Matrix

Continuous or

Discrete Characters

distance matrix1
Distance Matrix
  • Calculate pairwise distances between taxa
  • Choose the tree that minimizes overall distances between taxa

proportion sequence distance at 2 genes

(hypothetical data)

mouse cat dog dolphin seal

Mouse 1

Cat 0.05 1

Dog 0.03 0.02 1

Dolphin 0.08 0.15 0.03 1

Seal 0.09 0.23 0.01 0.02 1

slide71

Freeman & Herron, Fig. 4.10: Using genetic distances, looks like the whales again cluster with the hippos (and cows)

distance matrix2
Distance Matrix
  • Generally more accurate than parsimony
  • Like parsimony, it tends to be computationally fast
maximum likelihood r a fisher
Maximum Likelihood (R.A. Fisher)
  • Probability of the data given the tree
  • This is a “Frequentist” method: one true answer (one true tree)
  • Draw from the data (probability distribution of DNA sequence data) to find the true tree
  • Choose the tree (x, y axis) that maximizes the probability of the observed data (z axis)

Z: Probability of the data

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17(6):368-76.

x,y: Tree space

maximum likelihood r a fisher1
Maximum Likelihood (R.A. Fisher)
  • Probability of the data given the tree
  • The aim of maximum likelihood estimation is to find the parameter value(s) that makes the observed data most likely.
  • For example: finding a mean. If you want to have a number that describes the data, like human height, you could find the mean

Z: Probability of the data

Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution. 17(6):368-76.

x,y: Tree space

maximum likelihood r a fisher2
Maximum Likelihood(R.A. Fisher)
  • Often yields more accurate tree than parsimony or distance
  • Relies on an accurate assumption of which mutations are more probable (A->G more often than A->T or C? i.e. accurate model of molecular evolution)
  • Computationally intensive
bayesian inference reverend thomas bayes 1702 1760

Bayesian InferenceReverend Thomas Bayes (1702-1760)

Probability of a tree given the data

Uses prior information on the tree

Does not assume that there is one correct tree

Will modify estimate based on additional information

Uses Bayes’ Theorem

P(A/B) = P(B/A)P(A)

P(B)

bayesian inference reverend thomas bayes 1702 17601

Bayesian InferenceReverend Thomas Bayes (1702-1760)

Probability of a tree given the data:

Will modify estimate based on additional information: so as you get more data, you update your hypothesis for the tree

Uses prior information on the tree: this is where you start

The sequential use of the Bayes' formula (recursive): when more data become available after calculating a posterior distribution, the posterior becomes the next prior

Does not assume that there is one correct tree

bayesian inference reverend thomas bayes 1702 17602

Bayesian InferenceReverend Thomas Bayes (1702-1760)

Uses Bayes’ Theorem

P(A/B) =P(B/A)P(A)

P(B)

P(A) = prior probability

P(A/B) = posterior probability—this is the tree

P(B/A) = the probability B of observing given A, is also known as the likelihood. It indicates the compatibility of the evidence with the given hypothesis.

bayesian inference
Bayesian Inference
  • Like Likelihood, often yields more accurate tree than parsimony or distance
  • Computationally more intensive than parsimony or distance matrix, but less intensive than likelihood
  • Needs a prior probability for the tree and model
potential problems of phylogenetic reconstruction
Potential problems of Phylogenetic Reconstruction
  • Sufficient Amount of Data:
    • With enough data most statistical methods usually yield the same tree
    • Insufficient data would yield a tree that lacks resolution (lacks statistical power)
  • Gene trees vs species trees
    • Evolutionary history of individual genes are not necessarily the same
    • Should try to get data from many genes, or the whole genome
challenges of phylogenetic reconstructions
Challenges of Phylogenetic Reconstructions
  • Different parts of the genome might have different evolutionary histories (different gene genealogies, horizontal gene transfers, allopolyploidy, etc)
  • So, there might not be one true tree for a group of taxa, and relationships might be difficult to resolve because they are inherently complex
slide82

Current trend is to use whole genome data to reconstruct phylogenies

  • Gain a comprehensive picture of the evolutionary relationships among taxa for the whole genome
slide83
Neutral data are better for capturing genetic distances (the molecular clock) than genes that might be under selection
  • Why?
slide84

Phylogenetic Reconstructions

  • Typically, evolutionary biologists will use a variety of methods to reconstruct a phylogeny.
    • Maximum likelihood and Bayesian methods are considered more robust.
  • Tree is only as good as the data. Having many homoplastic characters (due to convergent evolution, reversals, etc.) will make the reconstruction less robust
    • Standard to use Bootstrapping to assess the validity of the tree
  • Understanding statistics is fundamental to understanding evolution
    • Much of statistics was in fact developed in order to model evolutionary processes (such as ANOVA, analysis of variance)
slide85

1. Sometimes the Molecular Clock (based on genetic data) conflicts with the Geological Record. Why would this happen?

(A) Sometimes there are gaps in the geological record, because fossils do not form everywhere, and mutation rate might vary between different species

(B) Radiometric dating relies on chance events in the preservation of isotopes, making the timing events in the geological time scale less accurate than the molecular clock

(C) Mutation rates slow down as you go back in time, making estimation of timing of events less accurate as you go back in time

(D) The molecular clock is calculated from radioisotopes, while the geological record is obtained from fossil data. The two can conflict when fossils end up displaced from their original sedimentary layer

slide86

2. You are a medical researcher working on HIV. A novel strain has appeared in Madison, Wisconsin. To determine which drugs would be most effective in treating this new strain (because different strains are resistant to different drugs), you need to determine its recent evolutionary history. You decide to reconstruct the evolutionary history of HIV by using a phylogenetic approach. Thus, you collect samples from patients in various geographic locations and sequence a fragment of RNA. Using parsimony, which is the correct phylogeny for HIV-1 based on the data below?

HIV-1, Uganda, Africa ACAUG

HIV-1, San Francisco, USA UGAUG

HIV-1, Madison, USA UAAGG

HIV-1, New York, USA UAAAG

HIV-1, Paris ACAUC

HIV-2 Africa (ancestral outgroup): ACCUG

3 which of the following is most true regarding phylogenetic reconstructions
3. Which of the following is most TRUE regarding phylogenetic reconstructions?
  • Phylogenetic reconstruction based on any gene would yield the same tree
  • Parsimony is the most accurate method for reconstructing phylogenies
  • Some DNA sequence data is better for phylogenetic reconstruction than others, such as those that tend to be less subjected to selection (3rd codon, introns)
  • Maximum likelihood relies on maximizing distances among taxa
slide88

4. Which of the following types of data would be most optimal for constructing a phylogeny?

(a) Non-coding and regulatory sequences

(b) Non-coding and non functional sequences

(c) Paralogous genes

(d) Genes that have undergone purifying selection

(e) Intron sequences within rapidly evolving genes

slide89

5. Which of the following reasons is FALSE on why the type of data chosen in the question above would be optimal for constructing a phylogeny?

(a) Because selection might make taxa seem more closely related due to convergent evolution

(b) Because selection might make taxa seem more distantly related due to disruptive evolution

(c) Because selection might make taxa seem more closely related due to purifying selection

(d) Because non-coding regulatory sequences are likely to be neutral

(e) Because coding sequences are likely to be under selection

answers
Answers
  • 1A
  • 2C
  • 3C
  • 4B
  • 5D