Applications of hidden markov models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Applications of Hidden Markov Models PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

Applications of Hidden Markov Models. (Lecture for CS397-CXZ Algorithms in Bioinformatics) March 6, 2004 ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign. Today’s Lecture. HMM Applications Profile HMMs (Classification)

Download Presentation

Applications of Hidden Markov Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Applications of hidden markov models

Applications of Hidden Markov Models

(Lecture for CS397-CXZ Algorithms in Bioinformatics)

March 6, 2004

ChengXiang Zhai

Department of Computer Science

University of Illinois, Urbana-Champaign


Today s lecture

Today’s Lecture

  • HMM Applications

    • Profile HMMs (Classification)

    • HMMs for Multiple Sequence Alignment (Pattern discovery)

    • HMMs for Gene Finding (Segmentation)

  • Special issues in HMMs

    • Local Maximas

    • Model construction

    • Weighting training sequences


Hmm applications

HMM Applications

  • Classification (e.g., Profile HMMs)

    • Build an HMM for each class (profile HMMs)

    • Classify a sequence using Bayes rule

  • Multiple sequence alignment

    • Build an HMM based on a set of sequences

    • Decode each sequence to find a multiple alignment

  • Segmentation (e.g., gene finding)

    • Use different states to model different regions

    • Decode a sequence to reveal the region boundaries


Hmms for classification

HMMs for Classification

E.g., Protein families

Assign a family to X

p(X|C) is modeled by a profile HMM built specifically for C

Assuming example sequences are available for C


Hmms for multiple alignment

HMMs for Multiple Alignment

  • Given a set of sequences S={X1, …,Xk}

  • Train an HMM, e.g., using Baum-Welch (finding the HMM that maximizes the probability of S)

  • Decode each sequence Xi

  • Assemble the Viterbi paths to form a multiple alignment (insertions are uncertain)


Hmm based gene finding

HMM-based Gene Finding

  • Design two types of states

    • “Within Gene” States

    • “Outside Gene” States

  • Use known genes to estimate the HMM

  • Decode a new sequence to reveal which part is a gene

  • Example software:

    • GENSCAN (Burge 1997)

    • FGENESH (Solovyev 1997)

    • HMMgene (Krogh 1997)

    • GENIE (Kulp 1996)

    • GENMARK (Borodovsky & McIninch 1993)

    • VEIL (Henderson, Salzberg, & Fasman 1997)


Veil viterbi exon intron locator

Exon HMM Model

Upstream

3’ Splice Site

Start Codon

Exon

Intron

Stop Codon

5’ Splice Site

Downstream

5’ Poly-A Site

VEIL: Viterbi Exon-Intron Locator

  • Enter: start codon or intron (3’ Splice Site)

  • Exit: 5’ Splice site or three stop codons (taa, tag, tga)

VEIL Architecture

(Slide from N. F. Samatova’s lecture)


Genscan architecture

It is based on Generalized HMM (GHMM)

Model both strands at once

Other models: Predict on one strand first, then on the other strand

Avoids prediction of overlapping genes on the two strands (rare)

Each state may output a string of symbols (according to some probability distribution).

Explicit intron/exon length modeling

Special sensors for Cap-site and TATA-box

Advanced splice site sensors

GenScan Architecture

Fig. 3, Burge and Karlin 1997


Special issues

Special Issues

  • Local maxima

  • Optimal model construction

  • Weighting training sequences


Solutions to the local maxima problem

Solutions to the Local Maxima Problem

  • Repeat with different initializations

  • Start with the most reasonable initial model

  • Simulated annealing (slow down the convergence speed)


Local maxima illustration

Local Maxima: Illustration

Global maxima

Local maxima

Good starting point

Bad starting point


Optimal model construction

Optimal Model Construction

Bayesian model selection:

P(HMM) should prefer simpler models


Sequence weighting

Sequence Weighting

  • Avoid over-counting similar sequences from the same organisms

  • Typically compute a weight for a sequence based on an evolutionary tree

  • Many ways to incorporate the weights, e.g.,

    • Unequal likelihood

    • Unequal weight contribution in parameter estimation


Hmms in real applications

HMMs in Real Applications

  • SAM-T98 Tutorial:

    • http://www.cse.ucsc.edu/research/compbio/ismb99.tutorial.html

  • Pfam

    • http://www.sanger.ac.uk/Software/Pfam/


  • Login