194 Views

Download Presentation
## Lirong Xia

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Hidden Markov Models**Lirong Xia Tue, March 28, 2014**The “Markov”s we have learned so far**• Markov decision process (MDP) • transition probability only depends on (state,action) in the previous step • Reinforcement learning • unknown probability/rewards • Markov models • Hidden Markov models**Markov Models**• A Markov model is a chain-structured BN • Conditional probabilities are the same (stationarity) • Value of X at a given time is called the state • As a BN: • Parameters: called transition probabilities p(X1) p(X|X-1)**Computing the stationary distribution**• p(X=sun)=p(X=sun|X-1=sun)p(X=sun)+ p(X=sun|X-1=rain)p(X=rain) • p(X=rain)=p(X=rain|X-1=sun)p(X=sun)+ p(X=rain|X-1=rain)p(X=rain)**Hidden Markov Models**• Hidden Markov models (HMMs) • Underlying Markov chain over state X • Effects(observations) at each time step • As a Bayes’ net:**Example**• An HMM is defined by: • Initial distribution: p(X1) • Transitions: p(X|X-1) • Emissions: p(E|X)**Filtering / Monitoring**• Filtering, or monitoring, is the task of tracking the distribution B(X) (the belief state) over time • B(Xt) = p(Xt|e1:t) • We start with B(X) in an initial setting, usually uniform • As time passes, or we get observations, we update B(X)**Example: Robot Localization**Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.**HMM weather example: a question**.6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • You have been stuck in the lab for three days (!) • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside? • p(X3 = r | E1= d, E2 = w, E3 = w)**Filtering**.6 p(w|s) = .1 p(w|c) = .3 p(w|r) = .8 s .1 .3 .4 .3 .2 c r .3 .3 .5 • Computationally efficient approach: first compute • p(X1 = i, E1 = d) for all states i • p(Xt, e1:t) = p(et | Xt)Σxt-1 p(xt-1, e1:t-1) p(Xt| xt-1)**Today**• Formal algorithm for filtering • Elapse of time • compute p(Xt+1|Xt,e1:t) from p(Xt|e1:t) • Observe • compute p(Xt+1|e1:t+1) from p(Xt+1|e1:t) • Renormalization • Introduction to sampling**Elapse of Time**• Assume we have current belief p(Xt-1|evidence to t-1) B(Xt-1)=p(Xt-1|e1:t-1) • Then, after one time step passes: p(Xt|e1:t-1)=Σxt-1p(Xt|xt-1)p(Xt-1|e1:t-1) • Or, compactly B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • With the “B” notation, be careful about • what time step t the belief is about, • what evidence it includes**Observe and renormalization**• Assume we have current belief p(Xt| previous evidence): B’(Xt)=p(Xt|e1:t-1) • Then: p(Xt|e1:t)∝p(et|Xt)p(Xt|e1:t-1) • Or: B(Xt) ∝p(et|Xt)B’(Xt) • Basic idea: beliefs reweighted by likelihood of evidence • Need to renormalize B(Xt)**Recap: The Forward Algorithm**• We are given evidence at each time and want to know • We can derive the following updates We can normalize as we go if we want to have p(x|e) at each time step, or just once at the end…**Observe and time elapse**• Want to know B(Rain2)=p(Rain2|+u1,+u2) Time elapse and renormalize Observe**Online Belief Updates**• Each time step, we start with p(Xt-1 | previous evidence): • Elapse of time B’(Xt)=Σxt-1p(Xt|xt-1)B(xt-1) • Observe B(Xt) ∝p(et|Xt)B’(Xt) • Renormalize B(Xt) • Problem: space is |X| and time is |X|2 per time step • what if the state is continuous?**Continuous probability space**• Real-world robot localization**Approximate Inference**• Sampling is a hot topic in machine learning, and it’s really simple • Basic idea: • Draw N samples from a sampling distribution S • Compute an approximate posterior probability • Show this converges to the true probability P • Why sample? • Learning: get samples from a distribution you don’t know • Inference: getting a sample is faster than computing the right answer (e.g. with variable elimination)**Prior Sampling**Samples: +c, -s, +r, +w -c, +s, -r, +w**Prior Sampling (w/o evidences)**• This process generates samples with probability: i.e. the BN’s joint probability • Let the number of samples of an event be • Then • I.e., the sampling procedure is consistent**Example**• We’ll get a bunch of samples from the BN: +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w • If we want to p(W) • We have counts <+w:4, -w:1> • Normalize to get p(W) = <+w:0.8, -w:0.2> • This will get closer to the true distribution with more samples • Can estimate anything else, too • What about p(C|+w)? p(C|+r,+w)? p(C|-r,-w)? • Fast: can use fewer samples if less time (what’s the drawback?)**Rejection Sampling**• Let’s say we want p(C) • No point keeping all samples around • Just tally counts of C as we go • Let’s say we want p(C|+s) • Same thing: tally C outcomes, but ignore (reject) samples which don’t have S=+s • This is called rejection sampling • It is also consistent for conditional probabilities (i.e., correct in the limit) +c, -s, +r, +w +c, +s, +r, +w -c, +s, +r, -w +c, -s, +r, +w -c, -s, -r, +w**Likelihood Weighting**• Problem with rejection sampling: • If evidence is unlikely, you reject a lot of samples • You don’t exploit your evidence as you sample • Consider p(B|+a) • Idea: fix evidence variables and sample the rest • Problem: sample distribution not consistent! • Solution: weight by probability of evidence given parents -b, -a -b, -a -b, -a -b, -a +b, +a -b, +a -b, +a -b, +a -b, +a +b, +a**Likelihood Weighting**Samples: +c, +s, +r, +w ……**Likelihood Weighting**• Sampling distribution if z sampled and e fixed evidence • now, samples have weights • Together, weighted sampling distribution is consistent**Ghostbusters HMM**• p(X1) = uniform • p(X|X’) = usually move clockwise, but sometimes move in a random direction or stay in place • p(Rij|X) = same sensor model as before: red means close, green means far away. p(X1) p(X|X’=<1,2>)**Example: Passage of Time**• As time passes, uncertainty “accumulates” T = 1 T = 2 T= 5 Transition model: ghosts usually go clockwise**Example: Observation**• As we get observations, beliefs get reweighted, uncertainty “decreases” Before observation After observation