1 / 7

Hidden Markov Models Sasha Tkachev and Ed Anderson

Presenter: Sasha Tkachev. Hidden Markov Models Sasha Tkachev and Ed Anderson. Forward algorithm. • We want to find P(sequence | HMM) • Naïve way: sum up probabilities of all possible paths

tadita
Download Presentation

Hidden Markov Models Sasha Tkachev and Ed Anderson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presenter: Sasha Tkachev Hidden Markov ModelsSasha TkachevandEd Anderson

  2. Forward algorithm • We want to find P(sequence | HMM) • Naïve way: sum up probabilities of all possible paths • Using recursion this can be done more effectively, probability to be in cloudy state at t=2 only depends on t=1 and observation at t=2 • When we reach t=3 our P is simply a sum of probabilities of being sunny, cloudy or rainy at t=3 http://www.comp.leeds.ac.uk/roger/HiddenMarkovModels/html_dev/forward_algorithm/s1_pg1.html

  3. Pfam • Database of protein domains and domain families • Contains multiple sequence alignments and profile HMMs for every domain • “Seed” and “full” alignments, seed alignment is rather small full alignment contains everything and is built using HMMER out of seed alignment http://www.sanger.ac.uk/Software/Pfam

  4. Using Pfam • For known proteins, get a pre-calculated domain structure • For new sequences, get a list of matching domains • Analyse domain structure, e.g., find a list of proteins with a similar domain structure; find a list of proteins containing domains A and B; • Species specific analysis, e.g. find all domains unique to a certain virus

  5. generalized HMM (GENSCAN) Gene prediction, GENSCAN (1997) • “Explicit state duration HMM”, generalized HMM (GHMM) • P(Φ, S) = P(s1|q1,d1)f(d1)T(q1|q2) x P(s2|q2,d2)f(d2) … T(qN-1|qN) x P(sN|qN,dN)f(dN) Φ – sequence of states {q1 … qN} T(q|q’) – transition probability q’ → q f(d) – state duration probability according to a distribution • Individual states can themselves be an HMM, e.g. coding exon states

  6. Modelling Internal Coding Exons • See if evaluated sequence looks like coding or non-coding region by looking at hexamer (a “word” of 6 bp long) frequencies in exons/introns. This is done with 5-th order HMM • Take into account splice signals, start and stop translational signals (all non-HMM) • Use modified Viterbi algorithm to get the optimal parse

  7. Comparative genomic methods • Mouse and human genome sequences provide new data, how to use it ? • Use GPHMM for alignment and gene prediction at the same time for both genomes (SLAM) • Or modify GENSCAN scoring schema with alignment scores (TWINSCAN) generalized pair HMM (SLAM) • Methods that can use more than two genomes are being developed, e.g. TWINSCAN 3.0

More Related