Lecture 5
1 / 20

tongaip - PowerPoint PPT Presentation

  • Uploaded on

Lecture 5. Hidden Markov Models Jones & Pevzner, Chapt. 11 (handout). What are HMMs?. An important machine learning method. Widely used in sequence analysis, voice recognition, pattern analysis. A state machine . An HMM has a finite number of states.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'tongaip' - richard_edik

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lecture 5 l.jpg

Lecture 5

Hidden Markov Models

Jones & Pevzner, Chapt. 11


What are hmms l.jpg
What are HMMs?

  • An important machine learning method. Widely used in sequence analysis, voice recognition, pattern analysis.

  • A state machine.

    • An HMM has a finite number of states.

    • Each state can emit a character, and make a transition to another state.

    • An HMM is a probabilistic state machine; character emissions and state transitions are not deterministic, but are random events that obey fixed probability distributions.

Focus problem cg islands l.jpg
Focus problem - CG Islands

  • CG is the most infrequently-observed dinucleotide.

    • The C in this pair is easily methylated, and subsequently replaced with a T.

    • Methylation is an important mechanism for silencing genes, and genes that are actively-expressed are in regions of the genome that are maintained in an unmethylated state. As a consequence, CG dinucleotides are observed much more frequently in regions of the genome that include active genes.

    • Location of a putative gene in a “CG island” provides important supporting evidence that it is in fact an expressed gene.

    • Identifying CG islands computationally requires that we be able to observe a subtle shift in dinucleotide frequencies.

Related problem the fair bet casino l.jpg
Related problem - the “Fair Bet Casino”

  • In the Fair Bet Casino you can wager on the outcome of a coin-flipping game.

  • The trouble is, the dealer uses two different coins, one is fair and the other is weighted. He changes coins randomly, but infrequently.

  • Given a series of tosses, can we estimate when the fair coin was being used and when the biased one was in play?

The probabilities l.jpg
The probabilities…

For the outcome of a single flip (0=tails, 1=heads) -

The fair coin:

The biased coin:

For a series of throws x1x2…xn,

(where there are k heads…)

An elementary approach l.jpg
An elementary approach:

  • Slide a window n-characters long against the sequence of coin flips, and at each position compute a measure of the likelihood that the fair coin was used as opposed to the biased one; we often use the log-odds ratio:

The elementary approach evaluation l.jpg
The elementary approach - evaluation

  • If the ratio of probabilities of fair to biased is unity, then n - klog23 = 0, or k = n/log23. If we have this many heads in the sequence, we have equal chance of a fair or biased coin.

  • If the ratio is > unity, then we are more likely to have a fair coin; in this case n - klog23 > 0, or k < n/log23

  • If the ratio is < unity, then it is more likely we have a biased coin; in this case n - klog23 < 0, or k > n/log23

How useful is the elementary approach l.jpg
How useful is the elementary approach?

  • Our underlying assumption is that only one variety of coin is used to generate a sequence; we cannot consider the possibility that the coins are swapped in the middle of generating the sequence.

  • If we slide a window across the sequence, we need to “capture” a subsequence generated by either the fair or biased coins.

  • Do we know ahead of time how big the islands will be? In fact they will have a distribution of sizes, which cannot be captured using a window of fixed width.

The hidden markov state machine approach l.jpg
The Hidden Markov (state machine) Approach















Components of the model l.jpg
Components of the Model

  • S, an alphabet of symbols to be emitted.

  • Q, a set of states, each of which can emit symbols from the alphabet S.

  • A, a |Q|X|Q| matrix; Aij is the probability that the machine will switch to state j from state i. These are the transition probabilities.

  • E = (ek(b)), a |Q|X|S| matrix of probabilities. ek(b) is the probability of emitting character b while in state k. These are the emission probabilities.

The components of the fair bet casino model l.jpg
The Components of the Fair Bet Casino Model

  • S = {0,1} corresponding to tails (0) or heads (1).

  • Q = {F, B}, corresponding to fair or biased coin.

  • AFF = ABB = 0.9, AFB = ABF = 0.1

  • eF(0) = eF(1) = 0.5, eB(0) = 0.25, eB(1) = 0.75

Paths in hmms l.jpg
Paths in HMMs

A path in an HMM is a sequence of states (not

emitted characters). It is symbolized

We can match up an observed sequence of characters with a hypothetical series of states:

The probability of a path l.jpg
The Probability of a Path

Need to factor in the probabilities of transitions between states:

Notational extensions l.jpg
Notational Extensions…

  • In the preceding equation, there is an initial probability transition to get us to the first state:This is introduced to get us into the sequence; we can set it to 1/2 to represent that the preceding state (fair or biased) is unknown.

  • We also introduce a final probability for exiting the sequence. I have set this to unity (10/10)

Examples l.jpg

  • For the given sequence, we compute the probability of the path FFFBBBBBFFF to be 2.66 X 10-6. Is this an optimal choice for the path?

  • NO - the sequence FFFBBBFFFFF has a probability of 3.54 X 10-6.

The hmm decoding problem l.jpg
The HMM Decoding Problem:

  • Given a sequence of observed characters x1x2…xn generated by the HMM M=(S,Q,A,E), find the path the maximizes P(x|path).

Solving the decoding problem l.jpg
Solving the Decoding Problem

  • Solution due to Viterbi (1967): Use dynamic programming to find the optimal series of states.

  • Have a matrix of n columns and Q rows. Being in the kth row of the ith column associates state k with character i. (The columns are thus labeled by the characters xi of the sequence.)

  • Each state in column i is connected by an edge to every state in column i+1; the edge from (k,i) to (l,i+1) has weight el(xi+1)Akl.

  • A path through the matrix corresponds to a path in the HMM; multiplying the weights of all the edges reproduces the probability of the HMM path.

  • The optimal path through the matrix is the optimal path in the HMM.

Viterbi decoding l.jpg
Viterbi Decoding:

The path shown corresponds to FBFB













Parameter estimation l.jpg
Parameter Estimation

  • The Viterbi algorithm assumes that we already know the emission and transition probabilities, and given these we want to know the most probable series of states for an observed sequence of characters.

  • The usual case is that we don’t know the parameters (probabilities), and need to estimate them from data.

  • General approach; We are given a set of training strings, and our goal is to find the parameter set which maximizes the probability for generating the strings. This is a difficult problem; we need an initial set of parameters which are then adjusted by an optimization algorithm. The Baum-Welch algorithm is a popular iterative method.

An easier situation l.jpg
An easier situation…

  • Sometimes we are blessed in knowing not just the sequence of characters in a set of training strings, but also the state sequences! In this lucky situation we can directly estimate the probabilities by accumulating simple statistics:Let Akl be the number of times we observe transitions from state k to l, and let Ek(b) be the number of times we observe character b emitted by state k; then our estimated parameters are