1 / 9

Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit

Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit. J. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G. Wibling. Main Objective of the paper.

Download Presentation

Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Alignment: Computational Properties, Homology Testing and Goodness-of-Fit J. Hein, C. Wiuf, B. Knudsen, M.B. Moller and G. Wibling

  2. Main Objective of the paper • To show how to accelerate the statistical alignment algorithms several orders of magnitude using the model of insertion and deletions by Thorne, Kishino, and Felsenstein in 1991 (TKF91 model). • To propose a new homology test based on the model. • To describe a goodness-of-fit test that allows testing the proposed insertion-deletion process inherent to the model.

  3. Why isn’t statistical alignment popular? • Computationally VERY SLOW • Authors of the paper accelerated the statistical alignment algorithms several orders of magnitude compared with the TKF91 algorithm. • Lack of user-friendly software? • Usually written in Fortran or C, or the compiled program only works in UNIX environment, but most biologists don’t know much about it. • Authors of the paper have provided a web interface to the program

  4. parsimony and similarity alignments • Parsimony strategy: minimizing the distance • For example: • Similarity strategy: maximizing the similarity score • For example: BLAST

  5. TKF91 model of substitutions • continuous time Markov model on the state space of nucleotides or amino acids • Rate matrix Q is specified • Describes the intensity of different substitution events over an infinitesimal time period. • Probability that i has changed to j after time t is • The process is assumed to be time reversible:

  6. TKF91 model of the indel process • Can be view as a Markov model with all sequences as possible states • indel part of the model • links connecting the letters of the sequences • each has a mortal link on the right • left end has an immortal link • For example:  A  G  G  • If the type of the nucleotide is ignored, can be represented as    

  7. TKF91 model • mortal link can give birth to a new mortal link or die out • immortal link can also give birth but would not die • Therefore, the rates can be written as:  A  G  G  I0 S1 I1/D1 S2 I2/D2 S3 I3/D3 where I is the birth rate D is the death rate, D>I S is the substitution rate

  8. TKF91 model To calculate the probability of a particular alignment: s(1):  A  T  - s(2):  C  T  G  P(s(1), s(2), alignment) = (p1’’)(AP1 PAC)(T P2 PTT G)

  9. Calculating the probability of two sequences • Without conditioning on the alignment, it is necessary to sum over all alignments weighted with their probabilities according to the TKF91 process. • Confine likelihood calculations to a band close to the similarity based alignment allows an efficient numerical optimization algorithm for finding the maximum likelihood estimate • The recursions originally presented by Thorne, Kishino and Felsenstein can be simplified.

More Related