Hidden Markov models in Computational Biology

Hidden Markov models in Computational Biology DTC Gerton Lunter, WTCHG February 2011Includes material from:Dirk Husmeier, Heng Li

Overview • First part: • Mathematical context: Bayesian Networks • Markov models • Hidden Markov models • Second part: • Worked example: the occasionally crooked casino • Applications in biology • Third part: • Practical 0: more theory on HMMs • Practical I-V: theory, implementation, biology. Pick & choose.

Part IHMMs in (mathematical) context

Probabilistic models • Mathematical model describing joint distribution over many variables. • Three type of variables are distinguished: • Observed variables • Latent (hidden) variables • Parameters • Latent variables often are the quantities of interest, to be inferred from observations using the model. Sometimes these represent “nuisance variables” necessary to correctly describe the relationships in the data. Example: P(clouds, sprinkler_used, rain, wet_grass)

Some notation / terminology • P(X,Y,Z): probability of (X,Y,Z) occurring simultaneously • P(X,Y): probability of (X,Y) occurring in combination with any Z (“marginalized over Z”). • P(X,Y|Z): probability of (X,Y) occurring, provided that it is known that Z occurs (“conditional on Z”, or “givenZ”) P(X,Y) = ΣZP(X,Y,Z) P(Z) = ΣX,YP(X,Y,Z) P(X,Y| Z ) = P(X,Y,Z) / P(Z) ΣX,Y,ZP(X,Y,Z) = 1 P(Y | X ) = P(X | Y) P(Y) / P(X) (Bayes’ rule)

Independence • Two variables X, Y are independent if P(X,Y) = P(X) P(Y) Knowing or assuming that two variables are independent reduces the model complexity. Suppose X, Y each take N possible values: specification of P(X,Y) requires N2-1 numbers specification of P(X), P(Y) requires 2N-2 numbers. • Two variables X,Y are conditionally independent (given Z) if P(X,Y|Z) = P(X|Z) P(Y|Z).

Probabilistic model: example • P(Clouds, Sprinkler, Rain, WetGrass) =P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain) • This specification of the model determines which variables are deemed to be (conditionally) independent (e.g. Sprinkler and Rain given Clouds; WetGrass and Clouds given Sprinkler and Rain). These independence assumptions simplify the model. • Using formulas as above to describe the independence relationship is not very intuitive, particularly for large models. Graphical models (in particular, Bayesian Networks) are a more intuitive way to do the same

Bayesian network: example P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain) Cloudy Rule: Two nodes of the graph are conditionally independent given the state of their parents E.g. Sprinker and Rain areindependent given Cloudy Rain Sprinkler Wet grass

Bayesian network: example P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain) Cloudy Convention: Latent variables are openObserved variables are shaded Rain Sprinkler Wet grass

Bayesian network: example Combat Air Identification algorithm; www.wagner.com

Bayesian networks • Intuitive formalism to develop models • Algorithms to learn parameters from training data (maximum likelihood; EM) • General and efficient algorithms to infer latent variables from observations (“message passing algorithm”) • Allows dealing with missing data in a robust and coherent way(make relevant node a latent variable) • Simulate data

Markov model • A particular kind of Bayesian network • All variables are observed • Suitable for modeling dependencies within sequences P(Sn | S1,S2,…,Sn-1) = P(Sn | Sn-1) (Markov property) P(S1, S2, S3, …, Sn) = P(S1) P(S2|S1) … P (Sn | Sn-1) S1 S2 S3 S4 S5 S6 S7 S8 …

Markov model • States: letters in English words • Transitions: which letter follows which S1 S2 S3 S4 S5 S6 S7 S8 … MR SHERLOCK HOLMES WHO WAS USUALLY VERY LATE IN THE MORNINGSSAVE UPON THOSE NOT INFREQUENT OCCASIONS WHEN HE WAS UP ALL …. S1=M S2=R S3=<space> S4=S S5=H …. P(Sn= y| Sn-1= x ) = (parameters) P(Sn-1Sn = xy ) / P (Sn-1 = x )(frequency of xy) / (frequency of x) (max likelihood) UNOWANGED HE RULID THAND TROPONE AS ORTIUTORVE OD T HASOUT TIVEIS MSHO CE BURKES HEST MASO TELEM TS OME SSTALE MISSTISE S TEWHERO

Markov model • States: triplets of letters • Transitions: which (overlapping) triplet follows which S1 S2 S3 S4 S5 S6 S7 S8 … MR SHERLOCK HOLMES WHO WAS USUALLY VERY LATE IN THE MORNINGSSAVE UPON THOSE NOT INFREQUENT OCCASIONS WHEN HE WAS UP ALL …. S1=MR<space> S2=R<space>S S3=<space>SH S4=SHE S5=HER …. P(Sn= xyz| Sn-1= wxy ) = P( wxyz ) / P( wxy )(frequency of wxyz) / (frequency of wxy) THERE THE YOU SOME OF FEELING WILL PREOCCUPATIENCE CREASON LITTLEDMASTIFF HENRY MALIGNATIVE LL HAVE MAY UPON IMPRESENT WARNESTLY

Markov model • States: word pairs • Text from: http://www.gutenberg.org/etext/1105 When thou thy sins enclose! That tongue that tells the story of thy love Ay fill it full with feasting on your sight Book both my wilfulness and errors down And on just proof surmise accumulate Bring me within the level of your eyes And in mine own when I of you beauteous and lovely youth When that churl death my bones with dust shall cover And shalt by fortune once more re-survey These poor rude lines of life thou art forced to break a twofold truth Hers by thy deeds Then churls their thoughts (although their eyes were kind) To thy fair appearance lies To side this title is impanelled A quest of thoughts all tenants to the sober west As those gold candles fixed in heaven's air Let them say more that like of hearsay well I will drink Potions of eisel 'gainst my strong infection No bitterness that I was false of heart Though absence seemed my flame to qualify As easy might I not free

Hidden Markov model • HMM = probabilistic observation of Markov chain • Another special kind of Bayesian network • Siform a Markov chain as before, but states are unobserved • Instead, yi (dependent on Si) are observed • Generative viewpoint: state Si “emits” symbol yi • yi do not form a Markov chain (= do not satisfy Markov property)They exhibit more complex (long-range) dependencies S1 S2 S3 S4 S5 S6 S7 S8 … y1 y2 y3 y4 y5 y6 y7 y8 …

Hidden Markov model • Representation above emphasizes relation to Bayesian networks • Different graph representation, emphasizing “transition probabilities” P(Si|Si-1). E.g. in the case Si {A,B,C,D}:Notes: • “Emission probabilities” P( yi | Si ) not explicitly represented • Advance from i to i+1 also implicit • Not all arrows need to be present (prob = 0) S1 S2 S3 S4 S5 S6 S7 S8 … y1 y2 y3 y4 y5 y6 y7 y8 … A B C D

Pair Hidden Markov model y1 y2 y3 y4 y5 S11 S21 S31 S41 S51 … z1 S12 S22 S23 S24 S25 … z2 S31 S32 S33 S34 S35 … z3

Pair Hidden Markov model y1 y2 y3 y4 y5 • States may “emit” a symbol in sequence y, or in z, or both, or neither (“silent” state). • If a symbol is emitted, the associated coordinate subscript increases by one. E.g. diagonal transitions are associated to simultaneous emissions in both sequences. • A realization of the pair HMM consists of a state sequence, with each symbol emitted by exactly one state, and the associated path through the 2D table. (A slightly more general viewpoint decouples the states and the path; then the hidden variables are the sequence of states S, and a path through the table. In this viewpoint the transitions, not states, emit symbols. The technical term in finite state machine theory is Mealy machine; the standard viewpoint is also known as Moore machine) S11 S21 S31 S41 S51 … z1 S12 S22 S23 S24 S25 … z2 S31 S32 S33 S34 S35 … Normalization: ΣpathspΣsp(1) … sp(N)Σy1…yAΣz1…zBP(sp(1),…,sp(N),y1…yA,z1…zB) = 1 N = N(p) = length of path z3

Inference in HMMs So HMMs can describe complex (temporal, spatial) relationships in data. But how can we use the model? • A number of (efficient) inference algorithms exist for HMMs: • Viterbi algorithm: most likely state sequence, given observables • Forward algorithm: likelihood of model given observables • Backward algorithm: together with Forward, allows computation of posterior probabilities • Baum-Welch algorithm: parameter estimation given observables

Summary of part I • Probabilistic models Observed variables Latent variables: of interest for inference, or nuisance variables Parameters: obtained from training data, or prior knowledge • Bayesian networks independence structure of model represented as a graph • Markov models linear Bayesian network; all nodes observed • Hidden Markov models observed layer, and hidden (latent) layer of nodes efficient inference algorithm (Viterbi algorithm) • Pair Hidden Markov model two observed sequences with interdependencies, determined by an unobserved Markov sequence

Part IIExamples of HMMs

Example:The Occasionally Corrupt Casino

Application:Sequence alignment

Application:Profile HMMs

Hidden Markov models in Computational Biology

Hidden Markov models in Computational Biology

Presentation Transcript

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY

Hidden Markov Models

Hidden Markov models in Computational Biology

Hidden Markov Models

Hidden Markov models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models

Hidden Markov Models