Evolutionary models
1 / 25

Evolutionary Models - PowerPoint PPT Presentation

  • Updated On :

Evolutionary Models. CS 498 SS Saurabh Sinha. Models of nucleotide substitution. The DNA that we study in bioinformatics is the end(??)-product of evolution Evolution is a very complicated process Very simplified models of this process can be studied within a probabilistic framework

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Evolutionary Models' - zeheb

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Evolutionary models l.jpg

Evolutionary Models

CS 498 SS

Saurabh Sinha

Models of nucleotide substitution l.jpg
Models of nucleotide substitution

  • The DNA that we study in bioinformatics is the end(??)-product of evolution

  • Evolution is a very complicated process

  • Very simplified models of this process can be studied within a probabilistic framework

  • Allows testing of various hypotheses about the evolutionary process, from multi-species data

Source: Ewens and Grant, Chapter 14.

Diversity in a population l.jpg
Diversity in a population

  • There IS genetic variation between individuals in a population

  • But relatively little variation at nucl. level

  • E.g., two humans differ at the nucl. level at one in 500 to 1000 nucls.

  • Roughly speaking, a single nucleotide dominates the population at a particular position in the genome

Substitution l.jpg

  • Over long time periods, the nucleotide at a given position remains the same

  • But periodically, this nucleotide changes (over the entire population)

  • This is called “substitution”, i.e., replacement of the predominant nucl. for that position with another predominant nucl.

Markov chain to model substitution l.jpg
Markov Chain to model substitution

  • Markov chain to describe the substitution process at a position

  • States are “a”, “c”, “g”, “t”

  • The chain “runs” in certain units of time, i.e., the state may change from one time point to the next time point

  • The unit of time (difference between successive time points) may be arbitrary, e.g., 20000 generations.

Markov chain to model substitution6 l.jpg
Markov Chain to model substitution

  • A symbol such as “pag” is the probability of a change from “a” to “g” in one unit of time

  • When studying two extant species, the evolutionary model has to provide the joint probability of the two species’ data

  • Sometimes, this is done by computing probability of the ancestor, starting from one extant species, and then the probability of the other extant species, starting from the ancestor

  • If we want to do this, the evolutionary process (model) must be “time reversible”: P(x)P(x->y) = P(y)P(y->x)

Jukes cantor model l.jpg
Jukes Cantor Model

  • Markov chain with four states: a,c,g,t

  • Transition matrix P given by:

Jukes cantor model8 l.jpg
Jukes Cantor Model

  •  is a parameter depending on what a “time unit” means. If time unit represents more #generations,  will be larger

  •  must be less than 1/3 though

Jukes cantor model9 l.jpg
Jukes Cantor Model

  • Whatever the current nucl is, each of the other three nucls are equally likely to substitute for it

Understanding the j c model l.jpg
Understanding the J-C Model

  • Consider a transition matrix P, and a probability vector v (a row vector)

  • What does w = vP represent ?

  • If v is the probability distribution of the 4 nucls (at a position) now, w is the prob. distr. at the next time step.

Understanding the j c model11 l.jpg
Understanding the J-C model

  • Suppose we can find a vector  such that P = 

  • If the probability distribution is , it will continue to remain  at future times

  • This is called the stationary distribution of the Markov Chain

Understanding the j c model12 l.jpg
Understanding the J-C model

  • Check that  = (0.25, 0.25, 0.25, 0.25) satisfies  P = 

  • Therefore, if a position evolves as per this model, for long enough, it will be equally likely to have any of the 4 nucls!

  • This is the very long term prediction, but can we write down what the position will be as a function of time (steps) ?

Spectral decomposition l.jpg
Spectral Decomposition

  • Recall that we found a  such that

     P = 

  • Such a vector is called an “eigenvector” of P, and the corresponding “eigenvalue” is 1.

  • In general, if v P =  v (for scalar ), is called an eigenvalue, and v is a left eigenvector of P

Spectral decomposition14 l.jpg
Spectral decomposition

  • Similarly, if P uT =  uT, then u is called a right eigenvector

  • In general, there may be multiple eigenvalues jand their corresponding left and right eigenvectors vjand uj

  • We can write P as

Spectral decomposition15 l.jpg
Spectral decomposition

  • Then, for any positive integer, it is true that

  • Why is Pninteresting to us ?

  • Because it tells us what the probability distribution will be after n time steps

  • If we started with v, then Pnv will be the prob. distr. after n steps

Back to the j c model l.jpg
Back to the J-C model

  • We reasoned that  = (.25,.25,.25,.25) is a left eigenvector for the eigenvalue 1.

  • Actually, the J-C transition matrix has this eigenvalue and the eigenvalue (1-4), and if we do the math we get the spectral decomposition of P as:

Back to the j c model17 l.jpg
Back to the J-C model

  • So, if we started with (1,0,0,0), i.e., an “a”, the probability that we’ll see an “a” at that position after n time steps is:


  • And the probability that the “a” would have mutated to say “c” is:

    0.25 - 0.25(1-4)n

Substitution probability l.jpg
Substitution probability

  • As a function of time n, we therefore get

  • Pr(x -> y) = 0.25 + 0.75 (1-4)n if x = y

  • and = 0.25 - 0.25 (1-4)n otherwise

  • If n ->, we get back our (0.25, 0.25, 0.25, 0.25) calculation

More advanced models l.jpg
More advanced models

  • The J-C model made highly “symmetric” assumptions, in its formulation of the transition matrix P

  • In reality, for example, “transitions” are more common than “transversions”

    • What are these? Purine = A or G. Pyrimidine = C or T. Transition is substitution in the same category; transversion is substitution across categories

    • Purines are similarly sized, and pyrimidines are similarly sized. More likely to be replaced by similar sized nucl.

  • The “Kimura” model captures this transition/transversion bias

Kimura model l.jpg
Kimura model

  • This of course is the transition probability matrix P of the Markov chain

  • Two parameters now, instead of one.

Kimura model21 l.jpg
Kimura model

  • Again, one of the eigenvalues is 1, and the left eigenvector corresponding to it is  = (.25,.25,.25,.25)

  • So again, the stationary distribution is uniform

  • P(x -> x) = .25+.25(1-4)n+.5(1-2( +))n

  • P(x -> y) = .25+.25(1-4)n+.5(1-2( +))nif x is a purine and y is the other purine

Even more advanced models l.jpg
Even more advanced models

  • Get to greater levels of realism

  • Kimura model still has a uniform stationary distribution, which is not true of real data

  • One extension: purine to pyrimidine subst. prob. is different from pyrimidine to purine subst. prob.

    • This leads to a non-uniform stationary probability

Felsenstein models l.jpg
Felsenstein models

Transition probability proportional to the stationary

probability of the target nucleotide. Stationary distribution is

(a, g, c, t)

Reversible models l.jpg
Reversible models

  • Many inference procedures require that the evolutionary model be time reversible

  • What does this mean?

Reversible markov chain l.jpg
Reversible Markov Chain

Looks like time has been reversed. That is, if we can find

a  such that

The models we have seen today all have this property.

Source: Wikipedia