Hidden Markov Models in Bioinformatics 14.11 60 min. Definition Three Key Algorithms Summing over Unknown States Most Probable Unknown States Marginalizing Unknown States Key Bioinformatic Applications Pedigree Analysis Isochores in Genomes (CGrich regions)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Hidden Markov Models in Bioinformatics 14.11 60 min
O1 O2O3 O4O5 O6O7 O8 O9 O10
H1
H2
H3
What is the probability of the data?
O1 O2O3 O4O5 O6O7 O8 O9 O10
H1
H2
H3
The probability of the observed is , which can be hard to calculate. However, these calculations can be considerably accelerated. Let the probability of the observations (O1,..Ok) conditional on Hk=j. Following recursion will be obeyed:
What is the most probable ”hidden” configuration?
O1 O2O3 O4O5 O6O7 O8 O9 O10
H1
H2
H3
Let be the sequences of hidden states that maximizes the observed sequence O ie ArgMaxH[ ]. Let probability of the most probability of the most probable path up to k ending in hidden state j.
Again recursions can be found
The actual sequence of hidden states can be found recursively by
What is the probability of specific ”hidden” state?
O1 O2O3 O4O5 O6O7 O8 O9 O10
H1
H2
H3
Let be the probability of the observations from j+1 to n given Hk=j. These will also obey recursions:
The probability of the observations and a specific hidden state can found as:
And of a specific hidden state can found as:
Felsenstein & Churchill, 1996
positions
1
n
1
sequences
k
slow  rs
HMM:
fast  rf
Likelihood Recursions:
Likelihood Initialisations:
Steel and Hein,2000 + Holmes and Bruno,2000
T
C
C
A
C
Emit functions:
e(##)= p(N1)f(N1,N2)
e(#)= p(N1),e(#)= p(N2)
p(N1)  equilibrium prob. of N
f(N1,N2)  prob. that N1 evolves into N2
An HMM Generating Alignments
 # # E
# #  E
*
* lb l/m (1 lb)em l/m (1 lb)(1 em) (1 l/m) (1 lb)

# lb l/m (1 lb)em l/m (1 lb)(1 em) (1 l/m) (1 lb)
_
#lb l/m (1 lb)em l/m (1 lb)(1 em) (1 l/m) (1 lb)
#
 lb
Probability of Data given a pedigree.
ElstonStewart (1971) Temporal Peeling Algorithm:
Father
Mother
Condition on parental states
Recombination and mutation are Markovian
LanderGreen (1987)  Genotype Scanning Algorithm:
Father
Mother
Condition on paternal/maternal inheritance
Recombination and mutation are Markovian
Comment: Obvious parallel to WiufHein99 reformulation of Hudson’s 1983 algorithm
poor
HMM:
rich
Isochore:
Churchill,1989,92
Lp(C)=Lp(G)=0.1, Lp(A)=Lp(T)=0.4, Lr(C)=Lr(G)=0.4, Lr(A)=Lr(T)=0.1
Likelihood Recursions:
Likelihood Initialisations:
Simple Eukaryotic
Gene Finding:
Burge and Karlin, 1996
Simple Prokaryotic
Goldman, 1996
Further Examples II
L
L
L
a
a
HMM for SSEs:
Adding Evolution:
SSE Prediction:
Profile HMM Alignment:
Krogh et al.,1994
O1 O2O3 O4O5 O6O7 O8 O9 O10
H1
H2
H3
Vineet Bafna and Daniel H. Huson (2000) The Conserved Exon Method for Gene Finding ISMB 2000 pp. 312
S.Batzoglou et al.(2000) Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction. Genome Research. 10.95058.
Blayo, Rouze & Sagot (2002) ”Orphan Gene Finding  An exon assembly approach” J.Comp.Biol.
Delcher, AL et al.(1998) Alignment of Whole Genomes Nuc.Ac.Res. 27.11.236976.
Gravely, BR (2001) Alternative Splicing: increasing diversity in the proteomic world. TIGS 17.2.100
Guigo, R.et al.(2000) An Assesment of Gene Prediction Accuracy in Large DNA Sequences. Genome Research 10.163142
Kan, Z. Et al. (2001) Gene Structure Prediction and Alternative Splicing Using Genomically Aligned ESTs Genome Research 11.889900.
Ian Korf et al.(2001) Integrating genomic homology into gene structure prediction. Bioinformatics vol17.Suppl.1 pages 140148
Tejs Scharling (2001) Geneidentification using sequence comparison. Aarhus University
JS Pedersen (2001) Progress Report: Comparative Gene Finding. Aarhus University
Reese,MG et al.(2000) Genome Annotation Assessment in Drosophila melanogaster Genome Research 10.483501.
Stein,L.(2001) Genome Annotation: From Sequence to Biology. Nature Reviews Genetics 2.493