slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hidden Markov Models in Bioinformatics 14.11 60 min PowerPoint Presentation
Download Presentation
Hidden Markov Models in Bioinformatics 14.11 60 min

Loading in 2 Seconds...

play fullscreen
1 / 12

Hidden Markov Models in Bioinformatics 14.11 60 min - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Hidden Markov Models in Bioinformatics 14.11 60 min. Definition Three Key Algorithms Summing over Unknown States Most Probable Unknown States Marginalizing Unknown States Key Bioinformatic Applications Pedigree Analysis Isochores in Genomes (CG-rich regions)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hidden Markov Models in Bioinformatics 14.11 60 min' - kerri


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Hidden Markov Models in Bioinformatics 14.11 60 min

  • Definition
  • Three Key Algorithms
  • Summing over Unknown States
  • Most Probable Unknown States
  • Marginalizing Unknown States
  • Key Bioinformatic Applications
  • Pedigree Analysis
  • Isochores in Genomes (CG-rich regions)
  • Profile HMM Alignment
  • Fast/Slowly Evolving States
  • Secondary Structure Elements in Proteins
  • Gene Finding
  • Statistical Alignment
slide2

Hidden Markov Models

O1 O2O3 O4O5 O6O7 O8 O9 O10

H1

H2

H3

  • (O1,H1), (O2,H2),……. (On,Hn) is a sequence of stochastic variables with 2 components - one that is observed (Oi) and one that is hidden (Hi).
  • The marginal discribution of the Hi’s are described by a Homogenous Markov Chain:
  • Let pi,j = P(Hk=i,Hk+1=j)
  • Let pi =P{H1=i) - often pi is the equilibrium distribution of the Markov Chain.
  • Conditional on Hk (all k), the Ok are independent.
  • The distribution of Ok only depends on the value of Hi and is called the emit function
slide3

What is the probability of the data?

O1 O2O3 O4O5 O6O7 O8 O9 O10

H1

H2

H3

The probability of the observed is , which can be hard to calculate. However, these calculations can be considerably accelerated. Let the probability of the observations (O1,..Ok) conditional on Hk=j. Following recursion will be obeyed:

slide4

What is the most probable ”hidden” configuration?

O1 O2O3 O4O5 O6O7 O8 O9 O10

H1

H2

H3

Let be the sequences of hidden states that maximizes the observed sequence O ie ArgMaxH[ ]. Let probability of the most probability of the most probable path up to k ending in hidden state j.

Again recursions can be found

The actual sequence of hidden states can be found recursively by

slide5

What is the probability of specific ”hidden” state?

O1 O2O3 O4O5 O6O7 O8 O9 O10

H1

H2

H3

Let be the probability of the observations from j+1 to n given Hk=j. These will also obey recursions:

The probability of the observations and a specific hidden state can found as:

And of a specific hidden state can found as:

slide6

Fast/Slowly Evolving States

Felsenstein & Churchill, 1996

positions

1

n

1

sequences

k

slow - rs

HMM:

fast - rf

  • pr - equilibrium distribution of hidden states (rates) at first position
  • pi,j - transition probabilities between hidden states
  • L(j,r) - likelihood for j’th column given rate r.
  • L(j,r) - likelihood for first j columns given j’th column has rate r.

Likelihood Recursions:

Likelihood Initialisations:

slide7

Statistical Alignment

Steel and Hein,2000 + Holmes and Bruno,2000

T

C

C

A

C

Emit functions:

e(##)= p(N1)f(N1,N2)

e(#-)= p(N1),e(-#)= p(N2)

p(N1) - equilibrium prob. of N

f(N1,N2) - prob. that N1 evolves into N2

An HMM Generating Alignments

- # # E

# # - E

*

* lb l/m (1- lb)e-m l/m (1- lb)(1- e-m) (1- l/m) (1- lb)

-

# lb l/m (1- lb)e-m l/m (1- lb)(1- e-m) (1- l/m) (1- lb)

_

#lb l/m (1- lb)e-m l/m (1- lb)(1- e-m) (1- l/m) (1- lb)

#

- lb

slide8

Probability of Data given a pedigree.

Elston-Stewart (1971) -Temporal Peeling Algorithm:

Father

Mother

Condition on parental states

Recombination and mutation are Markovian

Lander-Green (1987) - Genotype Scanning Algorithm:

Father

Mother

Condition on paternal/maternal inheritance

Recombination and mutation are Markovian

Comment: Obvious parallel to Wiuf-Hein99 reformulation of Hudson’s 1983 algorithm

slide9

Further Examples I

poor

HMM:

rich

Isochore:

Churchill,1989,92

Lp(C)=Lp(G)=0.1, Lp(A)=Lp(T)=0.4, Lr(C)=Lr(G)=0.4, Lr(A)=Lr(T)=0.1

Likelihood Recursions:

Likelihood Initialisations:

Simple Eukaryotic

Gene Finding:

Burge and Karlin, 1996

Simple Prokaryotic

slide10

Secondary Structure Elements:

Goldman, 1996

Further Examples II

L

L

L

a

a

HMM for SSEs:

Adding Evolution:

SSE Prediction:

Profile HMM Alignment:

Krogh et al.,1994

slide11

Summary

O1 O2O3 O4O5 O6O7 O8 O9 O10

H1

H2

H3

  • Definition
  • Three Key Algorithms
  • Summing over Unknown States
  • Most Probable Unknown States
  • Marginalizing Unknown States
  • Key Bioinformatic Applications
  • Pedigree Analysis
  • Isochores in Genomes (CG-rich regions)
  • Profile HMM Alignment
  • Fast/Slowly Evolving States
  • Secondary Structure Elements in Proteins
  • Gene Finding
  • Statistical Alignment
slide12

Recommended Literature

Vineet Bafna and Daniel H. Huson (2000) The Conserved Exon Method for Gene Finding ISMB 2000 pp. 3-12

S.Batzoglou et al.(2000) Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction. Genome Research. 10.950-58.

Blayo, Rouze & Sagot (2002) ”Orphan Gene Finding - An exon assembly approach” J.Comp.Biol.

Delcher, AL et al.(1998) Alignment of Whole Genomes Nuc.Ac.Res. 27.11.2369-76.

Gravely, BR (2001) Alternative Splicing: increasing diversity in the proteomic world. TIGS 17.2.100-

Guigo, R.et al.(2000) An Assesment of Gene Prediction Accuracy in Large DNA Sequences. Genome Research 10.1631-42

Kan, Z. Et al. (2001) Gene Structure Prediction and Alternative Splicing Using Genomically Aligned ESTs Genome Research 11.889-900.

Ian Korf et al.(2001) Integrating genomic homology into gene structure prediction. Bioinformatics vol17.Suppl.1 pages 140-148

Tejs Scharling (2001) Gene-identification using sequence comparison. Aarhus University

JS Pedersen (2001) Progress Report: Comparative Gene Finding. Aarhus University

Reese,MG et al.(2000) Genome Annotation Assessment in Drosophila melanogaster Genome Research 10.483-501.

Stein,L.(2001) Genome Annotation: From Sequence to Biology. Nature Reviews Genetics 2.493-