slide1 l.
Skip this Video
Loading SlideShow in 5 Seconds..
Polly R. Walker D. Phil Student Dept of Zoology, University of Oxford PowerPoint Presentation
Download Presentation
Polly R. Walker D. Phil Student Dept of Zoology, University of Oxford

Loading in 2 Seconds...

play fullscreen
1 / 24

Polly R. Walker D. Phil Student Dept of Zoology, University of Oxford - PowerPoint PPT Presentation

  • Uploaded on

Molecular Clocks and HIV-1 Polly R. Walker D. Phil Student Dept of Zoology, University of Oxford Summary of Talk Molecular clocks Measurably Evolving Populations (MEPs) Methods for measuring evolution Coalescent theory Application of the molecular clock Estimating divergence times

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Polly R. Walker D. Phil Student Dept of Zoology, University of Oxford

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Molecular Clocks and HIV-1

Polly R. Walker

D. Phil Student

Dept of Zoology, University of Oxford

summary of talk
Summary of Talk
  • Molecular clocks
  • Measurably Evolving Populations (MEPs)
  • Methods for measuring evolution
  • Coalescent theory
  • Application of the molecular clock
    • Estimating divergence times
    • Population dynamics using coalescent theory
  • Demonstration: HIV-1 in South Africa.
the molecular clock
The Molecular Clock
  • Gene sequences accumulate substitutions at a constant rate, therefore we can use genes sequences to time divergences. This is referred to as a ‘Molecular Clock’
  • • The idea of a molecular clock was initially suggested by Zuckerkandl and Pauling in 1962. They noted that rates of amino acid replacements in animal haemoglobins were roughly proportional to real time, as judged against the fossil record.
  • • The “constancy” of the molecular clock is particularly striking when compared to the obvious variation in the rates of morphological evolution (e.g. the existence of “living fossils”).
there is no universal molecular clock
There is No “Universal” Molecular Clock
  • Sources of variation in the Clock:
  • Mutation rates are variable though time
  • - different generation times of organism
  • - different metabolic rates
  • - different genomic systems, e.g. repair mechanisms
  • - different region genes or sites in a molecule
  • (together referred to as lineage effects - a neutralist explanation)
  • The existence of “nearly” neutral mutations and fluctuations in population size (thenearly neutral theory).
  • Natural selection - species adapt to variable environments.
  • The molecular clock can vary over time
  • - how constant is the environment?
  • - how neutral is evolution?
average rates of nucleotide substitution in different organisms
Average Rates of Nucleotide Substitutionin Different Organisms




(per site, per year)



chloroplast DNA

~ 1 x 10


Mammalian nuclear DNA

3.5 x 10


Plant nuclear DNA

~ 5 x 10








~5 x 10



nuclear DNA

1.5 x 10


Mammalian mitochondrial DNA

5.7 x 10



6.6 x 10

constant molecular clocks are difficult to obtain under natural selection

• For natural selection to produce a molecular clock population sizes, selection pressures, and mutation rates must be constant over evolutionary time.

How true is THAT for HIV?

Constant Molecular Clocks are Difficult toObtain Under Natural Selection

• The rate of substitution of mutations with selective advantage depends on;

i. effective population size (4Ne)

ii. degree of selective advantage (s)

iii. mutation rate (m)

k = 4Nesm

testing the molecular clock
Testing the Molecular Clock

• So, is there a good molecular clock?

• There are a variety of ways to test the molecular clock.

i. The dispersion index, R(t)

ii. The relative rate test

iii. The Likelihood Ratio test using ML statistics.

maximum likelihood tests of the molecular clock
Maximum Likelihood Tests of the Molecular Clock

log Likelihood

= -2660.61

log Likelihood

= -2659.18













•Likelihood Ratio Test: The differences in log likelihood can be compared directly

LRT = Chidist 2(ABSlnL), df (n-2)

(not significantly different in this case - primate mitochondrial DNA)

measurably evolving populations
Measurably Evolving Populations

Population is heterochronously sampled, spanning hundreds or thousands of generations, and contain a significant amount of genetic variation.

Hence, this typically includes either

1. Organisms with rapid evolution and small generation time

e.g, RNA viruses

2. Organisms with a wide range of sampling dates of dates

e.g ancient DNA samples

maximum likelihood estimation of viral substitution rates

• RNA viruses often have different sampling times. Small differences can have big effects.





Maximum Likelihood Estimation of Viral Substitution Rates

Programme “Tip-Date” or “Rhino”

• Construct rooted maximum likelihood tree

• Optimise branch lengths under a single rate with relative tip positions consistent with isolation dates

• Test molecular clock using a likelihood ratio test

• Estimate confidence intervals

substitution rate


The Coalescent



Demographic History

  • •Tells us how phylogenies of sample populations are affected by changes in population size and structure (demography).
  • • The descent of lineages is traced backwards in time, to the point when they share common ancestral alleles. The number of lineages is reduced at each coalescent event (creating nodes on the tree).
  • • The probability that two sequences share a common ancestor ( a coalescent event occurs in the previous generation) is 1/2N. Therefore the probability any two sequences shared a common ancestor a number of generations (G) ago: f(G) = (1/2N)e-(G-1)/2N
  • Therefore the probability that sequences sampled randomly from a population share a common ancestor is dependent on population size.

The Coalescent

• Changes in population size affect the distribution of coalescent times (i.e. when in time branching events occur).

• In a constant sized population more coalescent events occur near the tips than the root, but in a growing population coalescent events more towards the root because the population size is smaller so that coalescent events are more likely (i.e. drift is more powerful in small populations).

Big N

Small N





• Therefore possible to distinguish continually large populations, from those that have only recently grown in size.



rapid growth

slow growth

large population

small population


Models of Demographic History

• Constant size (endemic) population;

- 1 parameter, population size (N)

• Exponentially growing (epidemic) population;

2 parameters, current (N0)

and rate of growth (r)

• More complex models:

- logistic (growth slows down toward the present)

- expansion (sudden change in growth rate)

• Estimate all parameters (e.g. N0, r) from tree structure

Can compare these nested models using the likelihood ratio test

assumptions of the model
Assumptions of the Model

A) Lineages coalesce independently

B) No more than one coalescent event can occur in a single generation

C) The time-scale is so large that it can be represented as continuous

• Works best for neutral mutations subject to genetic drift innon-recombiningpopulations - i.e. in this case any change in the structure of the genealogy must be due to demographic processes, rather than fitness differences (i.e. fit alleles produce more branches).


Los Alamos Sequence Database (

Estimating Demographic History of HIV-1 Subtype C

Step 1 Sequence selection

  • Large range of dates e.g. 1989- 2001
  • Monophyletic (to comparison group e.g. subtype B
  • Length of sequences available, optimise length against samples size.

Example: CgagSR - ntax = 29, nchar = 1659

1986: C.ET.86.ETH2220

1993: C.IN.93.N904 C.IN.93.IN905 C.IN.93.IN101 C.IN.93.IN99,

1995: C.IN.95.IN21068 C.IN.95.IN21301

1996: C.BW.96.BW17B05 C.BW.96.BWM032 C.BW.96.BW0504 C.BW.96.BW1626 C.ZM.96.ZM651 C.ZM.96.ZM751

1997: C.ZA.97.ZA012

1998: C.TZ.98.TZ013 C.TZ.98.TZ017 C.ZA.98.TV001 C.ZA.TV002

1999: C.ZA.99.DU151 C.ZA.99.DU179 C.BW.99.BW47547 C.BW.99.BWMC168

2000: C.BW.00.BW18595 C.BW.00.BW18802 C.BW.00.BW192113 C.BW.00.BW20361 C.BW.00.BW20636


Sequences are out-of-frame

Step1. Sequence Alignment

Using Clustal

AND manual alignment e.g. Se-Al version

Remove all incomplete or codons (*, ?), and in the correct reading frame.


The closest sequence/s to the root of the tree is defined as the outgroup

Return to your original tree and use this sequence to root the tree (under rooting options)

Subtype B is the most

distantly related sequence.

Step 2. ML tree construction

  • Make a Neighbour Joining tree, check this tree and remove identical / almost identical sequences
  • Estimate all parameters under a realistic evolutionary model, e.g. GTR: gamma., derive the best ML tree.
  • Rooting the tree: e.g. outgroup rooting.
    • Add in a distantly related sequence, like another subtype.

Rhino Version 1.2 http//evolve/

Macintosh version - Runs on MacOS9 and MacOSX

UNIX/Linux version - could be compiled for Windows

Step 3 Tip-dating the Tree

  • Prepare correct input format: must have sequence file in nexus format, rooted tree file, and tip dating information
  • Use the same evolutionary model here as you have used to generate the tree (get commmands from the manual.
  • Estimate the rate of evolution (absrate) and confidence intervals (interval tree) using bootstrapping.

Begin RHINO;





STATUS param;

interval tree:absrate;


  • Carry out the likelihood ratio test: is it significant?

The likelihood ratio test tells us whether we are justified in assuming a molecular clock. If a clock exists then the difference is not significant.

  • LRT = dist (2 x (ABS (lnL (VR) - lnL (clock))
  • df = n - 2
  • This is a very strict measure of a molecular clock. Look at root- to tip regression lines.

Using The Clock:1. Timing the origin of the epidemicTMRCA = tree node height = years since MRCA substitution rate 

Not significant difference between timing of two subtypes.

Subtype C has a slightly lower point estimate for rate but broader CIs

Can apply the rates to other data sets, provided it is the same gene region

population dynamics
Population Dynamics

Determine the maximum likelihood population growth model.

Estimate the parameter of Rho under the best-fit growth model

Scale skyline plots according to the substitution rate.

4. Estimate parameter R, which is the growth rate in units year -1, or rho/ 

5. Estimate the doubling time :

Doubling time (years) = LN (2)


  • Within subtype C confidence intervals overlap. Subtype B and C show different demographic histories.
  • Subtype C has a slower exponential phase than subtype B
  • Subtype C on a global level is showing a logistic trend, not yet significant, but in Africa it is still exponentially growing.

Potential Applications:

Comparing growth rates within different groups, e.g. risk group, HLA type, or the spread of different clades.

Detecting decreases in epidemic growth rate.


Molecular Clocks can be used to:

a.) time the origin of an epidemic

b.) determine population dynamics

c.) Your estimates are only as good as your clock.

d.) HIV is subject to variable rates of evolution among branches: needs new models which allow for this (relaxed clocks).