Evolutionary HMMs Bayesian Approach to multiple alignment

1 / 38

# Evolutionary HMMs Bayesian Approach to multiple alignment - PowerPoint PPT Presentation

Evolutionary HMMs Bayesian Approach to multiple alignment. Siva Theja Maguluri CS 598 SS. Goal.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Evolutionary HMMs Bayesian Approach to multiple alignment' - sana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Evolutionary HMMsBayesian Approach to multiple alignment

Siva ThejaMaguluri

CS 598 SS

Goal
• Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences.

Siva Theja Maguluri

Evolutionary Model
• Pairwise likelihood for relation between two sequences
• Reversibility

Siva Theja Maguluri

Alignment can be inferred from the sequences using DP if Markov condition applies

• Joint likelihood of a multiple alignment on a tree

Siva Theja Maguluri

Alignment Model
• Substitution models

Siva Theja Maguluri

• Birth Death process with Immigration ie each residue can either spawn a child or die
• Birth rate λ, Death rate µ
• Immortal link at the left hand side
• Independent Homogenous Substitution

Siva Theja Maguluri

Probability evolution in Links Model
• Time evolution of the probability of a link surviving and spawning n descendants
• Time evolution of the probability of a link dying before time t and spawning n descendants

Siva Theja Maguluri

Probability evolution in Links Model
• Time evolution of the probability of the immortal link spawning n descendants at time t

Siva Theja Maguluri

Probability evolution in Links Model
• Solution of these differential equations is
• where

Siva Theja Maguluri

Probability evolution in Links Model
• Conceptually, αis the probability the ancestral residue survives
• βis the probability of more insertions given one or more descendants
• γis the probability of insertion given ancestor did not survive
• In the limit, immortal link generates residues according to geometric distribution

Siva Theja Maguluri

Links model as a Pair HMM
• Just like a standard HMM, but emits two sequences instead of one
• Aligning two sequences with pair HMM, implicitly aligns the sequences

Siva Theja Maguluri

Pair HMM for Links model
• Either the residue lives or dies, spawning geometrically distributed residues in each case

Siva Theja Maguluri

Links model as a Pair HMM
• The path through the Pair HMM is π
• DP used to infer alignment of two sequences
• Viterbi Algorithm for finding optimum π
• Forward algorithm to sum over all alignments or to sample from the posterior,

Siva Theja Maguluri

Multiple HMMs
• Instead of emitting 2 sequences, emit N sequences
• 2N-1 emit states!
• Can develop such a model for any tree
• Viterbi and Forward algorithms use N dimensional Dynamic programming Matrix
• Given a tree relating N sequences, Multiple HMM can be constructed from Pair HMMs so that the likelihood function is

Siva Theja Maguluri

Multiple HMMs

Siva Theja Maguluri

Multiple HMMs

Siva Theja Maguluri

Composing multiple alignment from branch alignments
• Residues Xi and Yj in a multiple alignment containing sequences X and Y are aligned iff
• They are in the same column
• That column contains no gaps for intermediate sequences
• No deletion, re-insertion is allowed
• Ignoring all gap columns, provides and unambiguous way of composing multiple alignment from branch alignments and vice versa

Siva Theja Maguluri

Eliminating internal nodes
• Internal nodes are Missing data
• Sum them out of the likelihood function
• Summing over indel histories will kill the independence
• Sum over substitution histories using post order traversal algorithm of Felsentein

Siva Theja Maguluri

Algorithm
• Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy
• Iterative refinement – revisit branches following initial alignment phase – Greedy
• Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements

Siva Theja Maguluri

Algorithm
• Moves to explore alignment space
• These moves need to be ergodic, i.e. allow for transformation of any alignment into any other alignment
• These moves need to satisfy detailed balance i.e. converges to desired stationary distribution

Siva Theja Maguluri

Move 1: Parent Sampling .
• Goal: Align two sibling nodes Y and Z and infer their parent X
• Construct the multiple HMM for X,Y and Z
• Sample an alignment of Y and Zusing the forward algorithm
• This imposes an alignment of XZ and YZ
• Similar to sibling alignment step of impatient-progressive alignment

Siva Theja Maguluri

Move 2: Branch Sampling
• Goal: realign two adjacent nodes X and Y
• Construct the pair HMM for X and Y, fixing everything else
• Resample the alignment using the forward algorithm
• This is similar to branch alignment step of greedy-refined algorithm

Siva Theja Maguluri

Move 3: Node Sampling
• Goal: resample the sequence at an internal node X
• Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else
• Resample the sequence of X, conditioned on relative alignment of W,Y and Z
• This is similar to inferring parent sequence lengths in impatient-progressive algorithms

Siva Theja Maguluri

Algorithm
• Parent sample up the guide tree and construct a multiple alignment
• Visit each branch and node once for branch sampling or node sampling respectively
• Repeat 2 to get more samples

Siva Theja Maguluri

Algorithm
• Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’
• Impatient- Progressive is ML version of parent sampling
• Greedy-refinement is ML version of Branch and node sampling

Siva Theja Maguluri

Gibbs sampling in ML context
• Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment
• Store this and compare likelihood to other alignments at the end of the run

Siva Theja Maguluri

Ordered over-relaxation
• Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n)
• Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’
• Impose a strict weak order on alignments
• Sample N alignments at each stage and sort them
• If the original sample ends up in position k, choose the (N-k)th sample for the next emission

Siva Theja Maguluri

Implementation and results

Siva Theja Maguluri

Implementation and results
• A True alignment
• B impatient progressive
• C greedy refined
• D Gibbs Sampling followed by Greedy refinement
• E Gibbs sampling with simulated annealing
• F Gibbs sampling with over relaxation
• G without Felsentein wild cards

Siva Theja Maguluri

Discussion
• Outlines a very appealing Bayesian framework for multiple alignment
• Performs very well, considering the simplicity of the model
• Could add profile information and variable sized indels to the model to improve performance

Siva Theja Maguluri

Questions

Siva Theja Maguluri

Questions
• What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ?
• What is the importance of immortal link in the Links model ?

Siva Theja Maguluri

References
• “Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics 2001

Siva Theja Maguluri

More results

Siva Theja Maguluri

More results

Siva Theja Maguluri

More results

Siva Theja Maguluri

More results
• Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps
• Handle doesn’t incorporate any profile information
• Handle cannot use BLOSUM (it’s not additive)

Siva Theja Maguluri