150 likes | 254 Views
A AAA : 10% A AAC : 15% A AAG : 40% A AAT : 35%. AAA AAC AAG AAT ACA . . . TTG TTT. Training Set. How to find foreign genes? Markov Models. Building the model. Candidate gene. 0.10. AAAACAA…. How to find foreign genes? Markov Models. 3rd order Markov model.
E N D
AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAAAACAAGAATACA . . .TTGTTT TrainingSet How to find foreign genes?Markov Models Building the model
Candidategene 0.10 AAAACAA… How to find foreign genes?Markov Models 3rd order Markov model A C G TAAA 0.10 0.15 0.40 0.35AAC 0.25 0.45 0.25 0.05AAG 0.25 0.20 0.30 0.25 AAT 0.25 0.20 0.30 0.25 ACA 0.15 0.20 0.25 0.40 . . .TTG 0.20 0.50 0.05 0.25TTT 0.10 0.55 0.25 0.10
Prg = 1 Pgy = 1 Pyr = 1 Markov Chains A traffic light considered as a sequence of states A trivial Markov chain – the transition probability between the states is always 1
Markov Chains A traffic light considered as a sequence of states If we watch our traffic light, it will emit a string of states In the case of a simple Markov model, the state labels (e.g. green, red, yellow) are the observable outputs of the process
Markov Chains An occasionally malfunctioning traffic light!! Pgy = 1 Prg = .85 Pyg = .10 Pyr = .9 Pry = .15 The Markov property is that the probability of observing next a given future state depends only on the current state!
The transition probability ast from state s to state t… …is equal to the probability that the ith state was t.. given that that the immediately proceeding state (xi-1) was s Markov Chains The Markov Property English Translation: ast = P(xi = t | xi-1 = s) This is a form of conditional probability
Markov Chain An occasionally malfunctioning traffic light!! Now we can consider the probability of an observed sequence!
The probability of observing sequence of states x... ...is equal to the probability that the XLth state was whatever AND the XL-1th state was whatever else, AND etc., etc. Markov Chains What is the probability of chain of events x? English Translation: P(x) = P(xL, xL-1, … ,x1) This is a form of joint probability
English Translation: The probability of events X AND Y happening is equal to the probability of X happening given that Y has already happened, times the probability of event Y Markov Chains What is the probability of chain of events x? P(x) = P(xL, xL-1, … ,x1) = P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1) This is because P(X,Y) = P(X|Y) * P(Y)
Therefore: P(x) = P(xL | xL-1) P(xL-1 | xL-2) ... P(x2|x1) P(x1) L P P(x) = P(x1) axi-1xi i=2 Markov Chains What is the probability of chain of events x? P(x) = P(xL | xL-1, … ,x1) P(xL-1 | xL-2, … ,x1) ... P(x1) But remember the key property of a Markov Chain is that probability of symbol xidepends ONLY on the value of preceding symbol Xi-1!!
Markov Chains How about nucleic acid sequences? A C T G No reason why nucleic acid sequences found in an organism cannot be modeled using Markov chains
States Transition probabilities Markov Model What do we need to probabilistically model DNA sequences? A C T G The states are the same for all organisms, so the transition probabilities are the modelparameters we need to estimate
AAAA: 10% AAAC: 15% AAAG: 40% AAAT: 35% AAAAACAAGAATACA . . .TTGTTT TrainingSet Parameter estimation Building the Markov Model This is a maximum likelihood approach to parameter estimation. Such procedures maximize the overall probability of the training set data.
Markov Model Which model best explains a newly observed sequence? A A C C G T G T Organism B Organism A Each organism will have different transition probabilities parameters, so you can ask “was the sequence more likely to be generated by model A or model B?”
Markov Model Which model best explains a newly observed sequence? P(x|model A) S(x) = log P(x|model B) L =S aAxi-1xi log aBxi-1xi i =1 A commonly used metric for discrimination using Markov Chains is the Log-Odds ratio