Evolutionary HMMs: Bayesian Approach to Multiple Alignment

Evolutionary HMMsBayesian Approach to multiple alignment Siva ThejaMaguluri CS 598 SS

Goal • Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences. Siva Theja Maguluri

Evolutionary Model • Pairwise likelihood for relation between two sequences • Reversibility • Additivity Siva Theja Maguluri

Alignment can be inferred from the sequences using DP if Markov condition applies • Joint likelihood of a multiple alignment on a tree Siva Theja Maguluri

Alignment Model • Substitution models Siva Theja Maguluri

Links Model • Birth Death process with Immigration ie each residue can either spawn a child or die • Birth rate λ, Death rate µ • Immortal link at the left hand side • Independent Homogenous Substitution Siva Theja Maguluri

Probability evolution in Links Model • Time evolution of the probability of a link surviving and spawning n descendants • Time evolution of the probability of a link dying before time t and spawning n descendants Siva Theja Maguluri

Probability evolution in Links Model • Time evolution of the probability of the immortal link spawning n descendants at time t Siva Theja Maguluri

Probability evolution in Links Model • Solution of these differential equations is • where Siva Theja Maguluri

Probability evolution in Links Model • Conceptually, αis the probability the ancestral residue survives • βis the probability of more insertions given one or more descendants • γis the probability of insertion given ancestor did not survive • In the limit, immortal link generates residues according to geometric distribution Siva Theja Maguluri

Links model as a Pair HMM • Just like a standard HMM, but emits two sequences instead of one • Aligning two sequences with pair HMM, implicitly aligns the sequences Siva Theja Maguluri

Pair HMM for Links model • Either the residue lives or dies, spawning geometrically distributed residues in each case Siva Theja Maguluri

Links model as a Pair HMM • The path through the Pair HMM is π • DP used to infer alignment of two sequences • Viterbi Algorithm for finding optimum π • Forward algorithm to sum over all alignments or to sample from the posterior, Siva Theja Maguluri

Multiple HMMs • Instead of emitting 2 sequences, emit N sequences • 2N-1 emit states! • Can develop such a model for any tree • Viterbi and Forward algorithms use N dimensional Dynamic programming Matrix • Given a tree relating N sequences, Multiple HMM can be constructed from Pair HMMs so that the likelihood function is Siva Theja Maguluri

Multiple HMMs Siva Theja Maguluri

Composing multiple alignment from branch alignments • Residues Xi and Yj in a multiple alignment containing sequences X and Y are aligned iff • They are in the same column • That column contains no gaps for intermediate sequences • No deletion, re-insertion is allowed • Ignoring all gap columns, provides and unambiguous way of composing multiple alignment from branch alignments and vice versa Siva Theja Maguluri

Eliminating internal nodes • Internal nodes are Missing data • Sum them out of the likelihood function • Summing over indel histories will kill the independence • Sum over substitution histories using post order traversal algorithm of Felsentein Siva Theja Maguluri

Algorithm • Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy • Iterative refinement – revisit branches following initial alignment phase – Greedy • Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements Siva Theja Maguluri

Algorithm • Moves to explore alignment space • These moves need to be ergodic, i.e. allow for transformation of any alignment into any other alignment • These moves need to satisfy detailed balance i.e. converges to desired stationary distribution Siva Theja Maguluri

Move 1: Parent Sampling . • Goal: Align two sibling nodes Y and Z and infer their parent X • Construct the multiple HMM for X,Y and Z • Sample an alignment of Y and Zusing the forward algorithm • This imposes an alignment of XZ and YZ • Similar to sibling alignment step of impatient-progressive alignment Siva Theja Maguluri

Move 2: Branch Sampling • Goal: realign two adjacent nodes X and Y • Construct the pair HMM for X and Y, fixing everything else • Resample the alignment using the forward algorithm • This is similar to branch alignment step of greedy-refined algorithm Siva Theja Maguluri

Move 3: Node Sampling • Goal: resample the sequence at an internal node X • Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else • Resample the sequence of X, conditioned on relative alignment of W,Y and Z • This is similar to inferring parent sequence lengths in impatient-progressive algorithms Siva Theja Maguluri

Algorithm • Parent sample up the guide tree and construct a multiple alignment • Visit each branch and node once for branch sampling or node sampling respectively • Repeat 2 to get more samples Siva Theja Maguluri

Algorithm • Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’ • Impatient- Progressive is ML version of parent sampling • Greedy-refinement is ML version of Branch and node sampling Siva Theja Maguluri

Gibbs sampling in ML context • Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment • Store this and compare likelihood to other alignments at the end of the run Siva Theja Maguluri

Ordered over-relaxation • Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n) • Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’ • Impose a strict weak order on alignments • Sample N alignments at each stage and sort them • If the original sample ends up in position k, choose the (N-k)th sample for the next emission Siva Theja Maguluri

Implementation and results Siva Theja Maguluri

Implementation and results • A True alignment • B impatient progressive • C greedy refined • D Gibbs Sampling followed by Greedy refinement • E Gibbs sampling with simulated annealing • F Gibbs sampling with over relaxation • G without Felsentein wild cards Siva Theja Maguluri

Discussion • Outlines a very appealing Bayesian framework for multiple alignment • Performs very well, considering the simplicity of the model • Could add profile information and variable sized indels to the model to improve performance Siva Theja Maguluri

Siva Theja Maguluri

Questions Siva Theja Maguluri

Questions • What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ? • What is the importance of immortal link in the Links model ? Siva Theja Maguluri

References • “Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics 2001 Siva Theja Maguluri

More results Siva Theja Maguluri

More results • Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps • Handle doesn’t incorporate any profile information • Handle cannot use BLOSUM (it’s not additive) Siva Theja Maguluri

Evolutionary HMMs: Bayesian Approach to Multiple Alignment

Evolutionary HMMs: Bayesian Approach to Multiple Alignment

Presentation Transcript

Multiple Alignment

Multiple Alignment

Multiple Alignment

Bayesian Evolutionary Distance

Bayesian Approach

Multiple Alignment

Evolutionary HMMs: a Bayesian approach to multiple alignment

Evolutionary Models for Multiple Sequence Alignment

Multiple Alignment –

Multiple alignment

Multiple Alignment

Multiple Alignment

Multiple Alignment

Multiple Alignment

Multiple Alignment

Multiple alignment

Multiple Alignment

From Pairwise to Multiple Alignment

Multiple-Alignment

Block Alignment: An Approach for Multiple Sequence Alignment Containing Clusters

Pairwise alignment using HMMs

Proteins, Pair HMMs, and Alignment