CS5263 Bioinformatics. Lecture 9: Motif finding Biological & Statistical background. Roadmap. Review of last lecture Intro to probability and statistics Intro to motif finding problems Biological background. Multiple Sequence Alignment. Scoring functions. Ideally:
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
x
y
z
?
w
v
x: ACGCGGC
y: ACGCGAG
z: GCCGCGAG
x: ACGCGGC x: ACGCGGC; y: ACGCGAG
y: ACGCGAC z: GCCGCGAG; z: GCCGCGAG
(i1,j1,k1)
(i1,j,k1)
(i1,j1,k)
(i1,j,k)
F(i1,j1,k1) + S(xi, xj, xk),
F(i1,j1,k ) + S(xi, xj, ),
F(i1,j ,k1) + S(xi, , xk),
F(i,j,k) = max F(i ,j1,k1) + S(, xj, xk),
F(i1,j ,k ) + S(xi, , ),
F(i ,j1,k ) + S(, xj, ),
F(i ,j ,k1) + S(, , xk)
(i,j,k1)
(i,j1,k1)
(i,j1,k)
(i,j,k)
z
x
y
Running Time: O(2N RN1 L)
F (x) = P(X ≤ x) for −∞ < x < +∞
A
B
P(A  B) = P(A ∩ B) / P(B)
=> P(A ∩ B) = P(B) * P(A  B)
Prob(even)
= Prob(even  d < 6) * Prob(d<6)
+ Prob(even  d=6) * Prob(d=6)
= 2/5 * 0.5 + 1 * 0.5
= 0.7
Likelihood
P
(
A

B
)
P
(
B
)
Prior of B
=>
=
P
(
B

A
)
P
(
A
)
Posterior probability of A
Normalizing constant
This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics
Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition
= P(A  Bj) * P(Bj) / jP(A  Bj)*P(Bj)
Bj: different models
In the observation of A, should you choose a model that maximizes P(Bj  A) or P(A  Bj)? Depending on how much you know about Bj !
P (g  m) = P (m  g) * P(g) / P (m)
~ P(g) / P(m)
= 106 * P(i) / P(m)
P(i  m) / P(g  m) = 106 * P(i) / P(g)
log (P(model1  observation) / P(model2  observation))
= LLR + log P(model1)  log P(model2)
P(m) = P(mi)*P(i) + P(mg)*P(g)
= 106 * 1 + 1 * 107
= 1.1 x 106
P(i  m) = P(m  i) * P(i) / P(m) = 1 / 1.1 = 0.91
P(high score  unrelated) * P(unrelated)
P(high score  related) * P(related)
Likelihood ratio