CS 177
Download
1 / 45

CS 177 Phylogenetics I - PowerPoint PPT Presentation


  • 163 Views
  • Uploaded on

CS 177 Phylogenetics I. Taxonomy and phylogenetics Phylogenetic trees Cladistic versus phenetic analyses Model of sequence evolution. Phylogenetic trees and networks Cladistic and phenetic methods Computer software and demos. Taxonomy and phylogenetics Phylogenetic trees

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CS 177 Phylogenetics I' - juro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

CS 177 Phylogenetics I

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Model of sequence evolution

Phylogenetic trees and networks

Cladistic and phenetic methods

Computer software and demos

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


(very) basic

advanced

Phylogenetic Inference I

Recommended readings

A science primer: Phylogenetics

http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

Brown, S.M. (2000) Bioinformatics, Eaton Publishing, pp. 145-160

Brown, S.M.: Molecular Phylogenetics

www.med.nyu.edu/rcr/rcr/course/PPT/phylogen.ppt

Hillis, D.M.; Moritz, G. & Mable, B.K. (1996) Molecular Systematics, 2. Edition, Sinauer Associates, 655 pp.

Mount, D.W. (2001) Bioinformatics,Cold Spring Harbor Lab Press, pp.237-280

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


CS 177 Phylogenetic Inference I

Evolution

The theory of evolution is the foundation upon which all of modern biology is built

From anatomy to behavior to genomics, the scientific method requires an appreciation of changes in organisms over time

It is impossible to evaluate relationships among gene sequences without taking into consideration the way these sequences have been modified over time

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Ernst Haeckel (1834-1919)


CS 177 Phylogenetic Inference I

Relationships

Similarity searches and multiple alignments of sequences naturally lead to the question

“How are these sequences related?”

and more generally:

“How are the organisms from which these sequences come related?”

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Classifying Organisms

Nomenclature is the science of naming organisms

Evolution has created an enormous diversity, so how do we deal with it?

Names allow us to talk about groups of organisms.

- Scientific names were originally descriptive phrases; not practical

- Binomial nomenclature

> Developed by Linnaeus, a Swedish naturalist

> Names are in Latin, formerly the language of science

> binomials - names consisting of two parts

> The generic name is a noun.

> The epithet is a descriptive adjective.

- Thus a species' name is two words e.g. Homo sapiens

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Carolus Linnaeus (1707-1778)


Classifying Organisms

Taxonomyis the science of the classification of organisms

Taxonomy deals with the naming and ordering of taxa.

The Linnaean hierarchy:

1. Kingdom

2. Division

3. Class

4. Order

5. Family

6. Genus

7. Species

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Evolutionary distance


Classifying Organisms

Systematics is the science of the relationships of organisms

Systematics is the science of how organisms are related and the evidence for those relationships

Systematics is divided primarily into phylogenetics and taxonomy

Speciation -- the origin of new species from previously existing ones

- anagenesis - one species changes into another over time

- cladogenesis - one species splits to make two

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Reconstruct evolutionary history

Phylogeny


Phylogenetics

Phylogenetics is the science of the pattern of evolution.

A. Evolutionary biology is the study of the processes that generate diversity, while phylogenetics is the study of the pattern of diversity produced by

those processes.

B. The central problem of phylogenetics:

1. How do we determine the relationships between species?

2. Use evidence from shared characteristics, not differences

3. Use homologies, not analogies

4. Use derived condition, not ancestral

a. synapomorphy - shared derived characteristic

b. plesiomorphy - ancestral characteristic

C. Cladistics is phylogenetics based on synapomorphies.

1. Cladistic classification creates and names taxa based only on synapomorphies.

2. This is the principle of monophyly

3. monophyletic, paraphyletic, polyphyletic

4. Cladistics is now the preferred approach to phylogeny

Review of protein structures

Need for analyses of protein structures

Sources of protein structure information

Computational Modeling

The phylogeny and classification of life as proposed by Haeckel (1866)


Phylogenetics

Evolutionary theory states that groups of similar organisms are descendedfrom a common ancestor.

Phylogenetic systematics is a method of taxonomic classification basedon their evolutionary history.

It was developed by Hennig, a German entomologist, in 1950.

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Willi Hennig (1913-1976)


Phylogenetics

Phylogenetics is the science of the pattern of evolution

Evolutionary biology versus phylogenetics

- Evolutionary biology is the study of the processes that generate diversity

- Phylogenetics is the study of the pattern of diversity produced by those processes

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetics

Who uses phylogenetics? Some examples:

Evolutionary biologists (e.g. reconstructing tree of life)

Systematists (e.g. classification of groups)

Anthropologists (e.g. origin of human populations)

Forensics (e.g. transmission of HIV virus to a rape victim)

Parasitologists (e.g. phylogeny of parasites, co-evolution)

Epidemiologists (e.g. reconstruction of disease transmission)

Genomics/Proteomics (e.g. homology comparison of new proteins)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees

The central problem of phylogenetics:

how do we determine the relationships between taxa?

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

in phylogenetic studies, the most convenient way of presenting evolutionary relationships among a group of organisms is the phylogenetic tree


Phylogenetic trees

Node: a branchpoint in a tree (a presumed ancestral OTU)

Branch: defines the relationship between the taxa in terms of descent and ancestry

Topology: the branching patterns of the tree

Branch length (scaled trees only): represents the number of changes that have occurred in the branch

Root: the common ancestor of all taxa

Clade: a group of two or more taxa or DNA sequences that includes both their common ancestor and all their descendents

Operational Taxonomic Unit (OTU): taxonomic level of sampling selected by the user to be used in a study, such as individuals, populations, species, genera, or bacterial strains

Branch

Node

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Clade

Root


Phylogenetic trees

There are many ways of drawing a tree

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees

There are many ways of drawing a tree

E

D

C

B

A

=

=

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees

There are many ways of drawing a tree

=

=

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

no meaning


=

/

Phylogenetic trees

There are many ways of drawing a tree

Bifurcation

Trifurcation

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

Bifurcation versus Multifurcation (e.g. Trifurcation)

Multifurcation (also called polytomy): a node in a tree that connects more than three branches. A multifurcation may represent a lack of resolution because of too few data available for inferring the phylogeny (in which case it is said to be a soft multifurcation) or it may represent the hypothesized simultaneous splitting of several lineages (in which case it is said to be a hard multifurcation).


Phylogenetic trees

Trees can be scaled or unscaled (with or without branch lengths)

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Unrooted tree

Rooted tree

D

B

A

C

C

A

Root

Root

B

D

D

B

A

C

C

A

Root

Root

B

D

Phylogenetic trees

Trees can be unrooted or rooted

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees

Trees can be unrooted or rooted

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

These trees showfive different evolutionary relationships among the taxa!


Phylogenetic trees

Possible evolutionary trees

Taxa (n):

4

2

3

Taxa (n)

Unrooted/rooted

2 1/1

3 1/3

4 3/15

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees

Possible evolutionary trees

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Use information from ancestors

Phylogenetic trees

How to root?

In most cases not available

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Use statistical tools will root trees automatically (e.g. mid-point rooting)

Phylogenetic trees

How to root?

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

This must involve assumptions … BEWARE!


- the outgroup should be a taxon known to be less closely related to the rest ofthe taxa (ingroups)

- it should ideally be as closely related as possible to the rest of the taxa while still satisfying the above condition

Phylogenetic trees

How to root?

Using “outgroups”

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy


Phylogenetic trees related to the rest of

Exercise: rooted/unrooted; scaled/unscaled

A

B

C

Taxonomy and phylogenetics

Phylogenetic trees

Cladistic versus phenetic analyses

Homology and homoplasy

F

D

E


Phylogenetics related to the rest of

  • What are useful characters?

  • Use homologies, not analogies!

  • Homology: common ancestry of two or more character states

  • Analogy: similarity of character states not due to shared ancestry

  • - Homoplasy: a collection of phenomena that leads to similarities in character states for reasons other than inheritance from a common ancestor (e.g. convergence, parallelism, reversal)

  • Homoplasy is huge problemin morphology data sets!

  • But in molecular data sets, too!

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

Cactaceae and Euphorbiaceae


Phylogenetics related to the rest of

Molecular data and homoplasy

gene sequences represent character data

characters are positions in the sequence (not all workers agree; some say one gene is one character)

character states are the nucleotides in the sequence (or amino acids in the case of proteins)

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

Problems:

the probability that two nucleotides are the same just by chance mutation is 25%

what to do with insertions or deletions (which may themselves be characters)

homoplasy in sequences may cause alignment errors


Phylogenetics related to the rest of

Molecular data and homoplasy: Orthologs vs. Paralogs

When comparing gene sequences, it is important to distinguish between identical vs. merely similar genes in different organisms

Orthologs are homologous genes in different species with analogous functions

Paralogs are similar genes that are the result of a gene duplication

A phylogeny that includes both orthologs and paralogs is likely to be incorrect

Sometimes phylogenetic analysis is the best way to determine if a new gene is an ortholog or paralog to other known genes

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses


Phylogenetics related to the rest of

What are useful characters?

  • Use derived condition, not ancestral

  • Synapomorphy (shared derived character): homologous traits share the same character state because it originated in their immediate common ancestor

  • Plesiomorphy (shared ancestral character”): homologous traits share the same character state because they are inherited from a common distant ancestor

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses


Phenetics related to the rest ofversus cladistics

Within the field of taxonomy there are two different methods and philosophies of building phylogenetic trees: cladistic and phenetic

  • Phenetic methods construct trees (phenograms) by considering the current states of characters without regard to the evolutionary history that brought the species to their current phenotypes;phenograms are based on overall similarity

  • Cladistic methods construct trees (cladograms) rely on assumptions about ancestral relationships as well as on current data;cladograms are based on character evolution (e.g. shared derived characters)

Cladistics is becoming the method of choice; it is considered to be more powerfuland to provide more realistic estimates, however, it is slower than phenetic algorithms


Phenetics vs. cladistics related to the rest of

An example


4 related to the rest of

3

5

overall similarity

Phenetics vs. cladistics

Phenetic (overall similarity)

A

B

C


2 related to the rest of

1

shared derived characters

1

Phenetics vs. cladistics

Cladistics (character evolution; e.g. shared derived characters)

A

B

C


Model of sequence evolution related to the rest of

The problem

- A basic process in the evolution of a sequence is change in that sequence over time

- Now we are interested in a mathematical model to describe that

- It is essential to have such a model to understand the mechanisms of change and is required to estimate both the rate of evolution and the evolutionary history of sequences


Pyrimidine (C related to the rest of4N2H4)

Purine (C5N4H4)

base

+ sugar

+ phosphate

Thymine

Adenine

Cytosine

Guanine

Model of sequence evolution

Nucleotide


Models of sequence evolution related to the rest of

Examples

Jukes-Cantor model (1969)

All substitutions have an equal probability and base frequencies are equal


Models of sequence evolution related to the rest of

Examples

Felsenstein (1981)

All substitutions have an equal probability, but there are unequal base frequencies


Models of sequence evolution related to the rest of

Examples

Kimura 2 parameter model (K2P) (1980)

Transitions and transversions have different probabilities


Models of sequence evolution related to the rest of

Examples

Hasegawa, Kishino & Yano (HKY) (1985)

Transitions and transversions have different probabilities,base frequencies are unequal


Models of sequence evolution related to the rest of

Examples

General time reversible model (GTR)

Different probabilities for each substitution,base frequencies are unequal


Models of sequence evolution related to the rest of

a

G

A

Jukes-Cantor

b

b

b

b

C

T

a

K2P

Felsenstein

HKY

GTR


More models of sequence evolution … related to the rest of

  • Currently, there are more than 60 models described

  • plus gamma distribution and invariable sites

  • accuracy of models rapidly decreases for highly divergent sequences

  • problem: more complicated models tend to be less accurate (and slower)

  • How to pick an appropriate model?

  • use a maximum likelihood ratio test

  • - implemented in Modeltest 3.06 (Posada & Crandall, 1998)


B related to the rest of

Equal base frequencies

Null model = JC -lnL0 = 3369.2803

Alternative model = F81 -lnL1 = 3342.5513

2(lnL1-lnL0) = 53.4580 df = 3

P-value = <0.000001

C

Model selected: TVM+G

-lnL = 2911.3660

More models of sequence evolution …

Example for Modeltest file

JC = 3158.0095

F81 = 3121.2188

K80 = 2994.6611

HKY = 2924.4182

TrNef = 2994.5491

TrN = 2923.6340

K81 = 2987.6548

K81uf = 2923.5620

TIMef = 2987.6196

TIM = 2922.9878

TVMef = 2983.3450

TVM = 2922.1970

SYM = 2983.3069

GTR = 2921.1187

A


related to the rest of helix

 sheet

Did the Florida dentist infect his patients with HIV?

Taxonomy and phylogenetics

Phylogenetic trees

Homology and homoplasy

Cladistic versus phenetic analyses

From Ou et al. (1992) and Page & Holmes (1998)


ad