1 / 20

Hidden Markov Modeling, Multiple Alignments

Hidden Markov Modeling, Multiple Alignments and Structure. Bioinformatic Modeling Techniques Student: Patricia Pearl. The basic notion of a hidden Markov model was covered during the class lectures and in our midterm.

sevita
Download Presentation

Hidden Markov Modeling, Multiple Alignments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Modeling, Multiple Alignments and Structure Bioinformatic Modeling Techniques Student: Patricia Pearl

  2. The basic notion of a hidden Markov model was covered during the class lectures and in our midterm. There are more issues about its history development and future that we’ll discuss tonight.

  3. There was a time when scientists started to think about using hidden Markov models for multiple protein alignments. When was that? Which professional field was using it already?

  4. This is the bibliographic reference for the article that protein scientists used when they got started. Rabiner, L. R. “A tutorial on hidden Markov models and selected application in speech recognition.” Proceedings of the IEEE, 77 (2), 257-286. 1989. This work was sophisticated and a group of scientists at University of California at Santa Cruz could make an analogy between computer speech recognition and protein multiple alignments.

  5. How did they make the analogy between speech recognition and multiple protein and DNA alignments? Speech Recognition Multiple Alignments Alphabetphonemes amino acids Observationwords or stringsprimary sequence of phonemes Good – assignssounds thatsequences in the high probabilityare real wordsset

  6. The paper they published is: Krogh, A., Brown, M., Mian, I.S., Sjölander, K., and Haussler, D. “Hidden Markov Models in Computational Biology: Applications to Protein Modeling.” Journal of Molecular Biology, 1994, 235:1501-1531. Sean Eddy was a student at UCSC then. In an article of his, (1996) he describes the paper referenced above as: “The paper that introduced the use of HMM methods for protein and DNA sequence profiles. “

  7. Then, the software was developed by two collections of scientists and grad students, separately. There are many researchers in the subject that are not at these labs. University of California at Santa Cruz and University of Washington, St Louis, Missouri, by UCSC’s former student, Sean Eddy and his research group. Two suites of software have been developed. Their differences are non-trivial. SAM at UCSC Sequence Alignment and Modeling System. HMMER at U of W. Both suites can be downloaded. SAM needs UNIX. HMMER can use many systems.

  8. As has been emphasized in lecture, the advantage of the HMM approach is that it does not guess aabout gap penalties, nor about amino acids nor states. It bases those values on actual data, Bayesian probabilities based in facts. SAM at UCSC Sequence Alignment and Modeling System. <http://www.cse.ucsc.edu/research/compbio/> Their software is based on HMM’s. Also use a mathematical approach called Dirichlet mixtures to improve detection of weak homologies and to derive hidden Markov models for protein families.

  9. HMMER at University of Washington Sean Eddy’s Lab Home Page http://www.genetics.wustl.edu/eddy/publications/ This page and related pages have many articles that are available to download. URL for User’s Guide http://www.psc.edu/general/software/packages/hmmer/manual/main.html If we had HMMER installed at BRANDEIS for us, we could all use it with the help of this manual.

  10. HMMER One of the approaches that Sean Eddy has taken to improve HMMER is to use an approach from computational physical chemistry and x-ray diffraction protein crystallography called simulated annealing. The probability values of the fundamental recursive HMM algorithm are varied by an exponential factor taken from the Boltzman formula for physical entropy. S = kb ln Ω The Boltzman constant, kb, is multiplied by t, for temperature. It is started at t = high temp and decreased. The “kt” is used as an exponent P^(1/kt). Eddy reports that it improves accuracy. (Eddy, S., 1995)

  11. Many people are developing the HMM approach to use it on RNA sequences. It is meaningful to briefly describe a recent paper that makes extensive use of primarily hand done RNA alignments, using both primary sequenceandsecondary RNA structure. It produces evidence toward resolving a problem in systematics biology or evolutionary biology. With HMMER, or any similar software, for RNA alignments, much of this work may be much easier and have measurable probabilistic statistics in the future.

  12. “However, accurate alignment is only possible for proteins of known structure – at least for an identifiable core of residues that comprises the secondary structure elements and active site of the molecule.” S. Eddy(1995) quoting Chothia and Lesk(1986)

  13. Common ancestor Common ancestor OR Anatomical Evidence And more rRNA Multiple alignments w/out secondary structure Mammal Bird Crocodile

  14. 10 20 30 40 ----|----|----|----|----|----|----|----| Seq1 A-CC-----GC--------GA--CUUG--GA-CC-CG--G Seq2 A-CC-----GU--------GA--CUUG--GA-CC-CG--G Seq3 AACCCCGGUGUAGGGGGAAGAACCUUGAUGAACCUCGAUG Seq4 AACCCCGGUGCAGGGGGAAGAACCUUCAUGAACCUCGAUG Figure 1. The problem of aligning short and long sequences. Sequences 1 and 2 are like the reptilian and bird ribosomal 18s RNA. Sequences 3 and 4 are like mammals. Reference: Xiam X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.” Systematic Biology. Washington: Jun 2003. Vol 52, Iss.3; pg 283.

  15. Phylogenetic tree From: Xiam et al., 2003

  16. They produced several phylogenetic trees, using different methods, with the careful manual alignments that took secondary structure into account. In all, the birds are closer to the crocodiles than to the mammals. “Our research indicates that the previous discrepancy of phylogenetic results between the 18S rRNA gene and other genes is caused mainly by: 1.) misalignment of sequences 2.) the inappropriate use of the frequency parameters 3.) poor sequence quality. When the sequences are aligned with the aide of the secondary structure of the 18S rRNA molecule and when the frequency parameters are estimated either from all sites or from the variable domains where substitutions have occurred, the 18S rRNA sequences no longer support the grouping of the avian species with the mammalian species.” Xia, X., et al., 2003

  17. If there were more time, this presentation would also Include discussions of Psi Blast and of SuperFam. Psi Blast is a BLAST software at NCBI that uses HMM’s and can use multiple alignments. <http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html>a tutorial <http://www.ncbi.nlm.nih.gov/BLAST/> the site

  18. SuperFamis a relatively new website. It uses the HMM approach, 59 genomes, and all the solved structures, from those genomes, that are publicly available, as well. <http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/> The head scientist of SuperFam, Prof. Cyrus Chothia, also supervised a web site calledSCOP, or Structural Classification of Proteins. You might find it interesting, that all of the protein structures that are “solved” are actually organized and classified. <http://scop.mrc-lmb.cam.ac.uk/scop/>

  19. Bibliography Eddy, S.R. “Multiple alignment using hidden Markov models.” Proc. Int. Conf. Intell. Syst. Mol Biol. 1995;3:114-120. Eddy, S.R. “Hidden Markov Models.” Curr Opin Struct Biol. 1996 Jun;6(3):361-5. Review. Eddy, S.R., “Profile hidden Markov models.” Bioinformatics, 1998; 14(9): 755-763. Review. Gough, J., and Chothia, C., “SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments.” Nucleic Acids Research, 2002, Vol 30:1. Krogh, A., Brown, M., Mian, I.S., Sjolander, Haussler, D. “Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology, 235:1501-1531, February 1994.

  20. Rabiner, L. R. “A tutorial on hidden Markov models and selected application in speech recognition.” Proceedings of the IEEE, 77 (2), 257-286. 1989. Xia, X., Xie, Z., Kjer, K.M. “18S ribosomal RNA and tetrapod phylogeny.” Systematic Biology. Washington: Jun 2003. Jun 2003. Vol. 52, Iss. 3; pg 283.

More Related