Hidden Markov Modeling,
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Hidden Markov Modeling, Multiple Alignments PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Hidden Markov Modeling, Multiple Alignments and Structure. Bioinformatic Modeling Techniques Student: Patricia Pearl. The basic notion of a hidden Markov model was covered during the class lectures and in our midterm.

Download Presentation

Hidden Markov Modeling, Multiple Alignments

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Hidden Markov Modeling,

Multiple Alignments

and Structure

Bioinformatic Modeling Techniques

Student: Patricia Pearl

The basic notion of a hidden Markov model was covered during the class lectures and in our midterm.

There are more issues about its



and future

that we’ll discuss tonight.

There was a time

when scientists started to think about

using hidden Markov models

for multiple protein alignments.

When was that?

Which professional field was using it already?

This is the bibliographic reference for the article that

protein scientists used when they got started.

Rabiner, L. R.

“A tutorial on hidden Markov models and selected

application in speech recognition.”

Proceedings of the IEEE, 77 (2), 257-286. 1989.

This work was sophisticated and a group

of scientists at University of California at Santa Cruz

could make an analogy between computer speech

recognition and protein multiple alignments.

How did they make the analogy between

speech recognition and

multiple protein and DNA alignments?

Speech Recognition Multiple Alignments

Alphabetphonemes amino acids

Observationwords or stringsprimary sequence

of phonemes

Good – assignssounds thatsequences in the

high probabilityare real wordsset

The paper they published is:

Krogh, A., Brown, M., Mian, I.S., Sjölander, K., and Haussler, D.

“Hidden Markov Models in Computational Biology:

Applications to Protein Modeling.”

Journal of Molecular Biology, 1994, 235:1501-1531.

Sean Eddy was a student at UCSC then. In an article of his, (1996)

he describes the paper referenced above as:

“The paper that introduced the use of HMM methods for protein

and DNA sequence profiles. “

Then, the software was developed by two collections of

scientists and grad students, separately. There are

many researchers in the subject that are not at these labs.

University of California at Santa Cruz and

University of Washington, St Louis, Missouri,

by UCSC’s former student, Sean Eddy and his

research group.

Two suites of software have been developed. Their

differences are non-trivial.


Sequence Alignment and Modeling


HMMER at U of W.

Both suites can be downloaded. SAM needs UNIX.

HMMER can use many systems.

As has been emphasized in lecture, the advantage of the HMM approach is that it does not guess aabout gap penalties, nor about amino acids nor states. It bases those values on actual data, Bayesian probabilities based in facts.


Sequence Alignment and Modeling System.


Their software is based on HMM’s.

Also use a mathematical approach called

Dirichlet mixtures to improve detection of weak

homologies and to derive hidden Markov models

for protein families.

HMMER at University of Washington

Sean Eddy’s Lab Home Page


This page and related pages have many articles that are available

to download.

URL for User’s Guide


If we had HMMER installed at BRANDEIS for us, we could all

use it with the help of this manual.


One of the approaches that Sean Eddy has taken to improve HMMER is to use an approach from computational physical chemistry and x-ray diffraction protein crystallography called

simulated annealing. The probability values of the fundamental

recursive HMM algorithm are varied by an exponential

factor taken from the Boltzman formula for physical entropy.

S = kb ln Ω

The Boltzman constant, kb, is multiplied by t, for temperature. It is started at t = high temp and decreased. The “kt” is used as an exponent P^(1/kt). Eddy reports that it improves accuracy. (Eddy, S., 1995)

Many people are developing the HMM approach to use it on RNA sequences. It is meaningful to briefly describe a recent paper that makes extensive use of primarily hand done RNA alignments, using both primary sequenceandsecondary RNA structure. It produces evidence toward resolving a problem in systematics biology or evolutionary biology.

With HMMER, or any similar software, for RNA alignments, much of this work may be much easier and

have measurable probabilistic statistics in the future.

“However, accurate alignment is only possible for proteins of known structure – at least for an identifiable core of residues that comprises the secondary structure elements and active site of the molecule.”

S. Eddy(1995) quoting Chothia and Lesk(1986)

Common ancestor

Common ancestor




And more










10 20 30 40


Seq1 A-CC-----GC--------GA--CUUG--GA-CC-CG--G

Seq2 A-CC-----GU--------GA--CUUG--GA-CC-CG--G



Figure 1. The problem of aligning short and long sequences.

Sequences 1 and 2 are like the reptilian and bird ribosomal 18s RNA.

Sequences 3 and 4 are like mammals.

Reference: Xiam X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.”

Systematic Biology. Washington: Jun 2003. Vol 52, Iss.3; pg 283.

Phylogenetic tree

From: Xiam et al., 2003

They produced several phylogenetic trees, using different

methods, with the careful manual alignments that took

secondary structure into account. In all, the birds are

closer to the crocodiles than to the mammals.

“Our research indicates that the previous discrepancy of phylogenetic

results between the 18S rRNA gene and other genes is caused

mainly by:

1.) misalignment of sequences

2.) the inappropriate use of the frequency parameters

3.) poor sequence quality.

When the sequences are aligned with the aide of the secondary

structure of the 18S rRNA molecule and when the frequency parameters

are estimated either from all sites or from the variable domains where

substitutions have occurred, the 18S rRNA sequences no longer support

the grouping of the avian species with the mammalian species.”

Xia, X., et al., 2003

If there were more time, this presentation would also Include discussions of Psi Blast and of SuperFam.

Psi Blast is a BLAST software at NCBI that uses HMM’s and can use multiple alignments.

<http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html>a tutorial

<http://www.ncbi.nlm.nih.gov/BLAST/> the site

SuperFamis a relatively new website. It uses the HMM approach, 59 genomes, and all the solved structures, from those genomes, that are publicly available, as well.


The head scientist of SuperFam, Prof. Cyrus Chothia,

also supervised a web site calledSCOP, or Structural

Classification of Proteins. You might find it interesting, that all of the

protein structures that are “solved” are actually organized and classified.



Eddy, S.R. “Multiple alignment using hidden Markov models.” Proc. Int. Conf. Intell. Syst. Mol Biol. 1995;3:114-120.

Eddy, S.R. “Hidden Markov Models.” Curr Opin Struct Biol. 1996 Jun;6(3):361-5. Review.

Eddy, S.R., “Profile hidden Markov models.” Bioinformatics, 1998;

14(9): 755-763. Review.

Gough, J., and Chothia, C., “SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.” Nucleic Acids Research, 2002, Vol 30:1.

Krogh, A., Brown, M., Mian, I.S., Sjolander, Haussler, D. “Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501-1531, February


Rabiner, L. R. “A tutorial on hidden Markov models and selected

application in speech recognition.”

Proceedings of the IEEE, 77 (2), 257-286. 1989.

Xia, X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.” Systematic Biology. Washington: Jun 2003.

Jun 2003. Vol. 52, Iss. 3; pg 283.

  • Login