temporal probabilistic models n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Temporal Probabilistic Models PowerPoint Presentation
Download Presentation
Temporal Probabilistic Models

Loading in 2 Seconds...

play fullscreen
1 / 68

Temporal Probabilistic Models - PowerPoint PPT Presentation


  • 99 Views
  • Uploaded on

Temporal Probabilistic Models. Motivation. Observing a stream of data Monitoring (of people, computer systems, etc ) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables. Time Series Modeling. Time occurs in steps t=0,1,2,…

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Temporal Probabilistic Models' - karena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
motivation
Motivation
  • Observing a stream of data
    • Monitoring (of people, computer systems, etc)
    • Surveillance, tracking
    • Finance & economics
    • Science
  • Questions:
    • Modeling & forecasting
    • Unobserved variables
time series modeling
Time Series Modeling
  • Time occurs in steps t=0,1,2,…
    • Time step can be seconds, days, years, etc
  • State variable Xt, t=0,1,2,…
  • For partially observed problems, we see observations Ot, t=1,2,… and do not see the X’s
    • X’s are hidden variables (aka latent variables)
modeling time
Modeling Time
  • Arrow of time
  • Causality? Bayesian networks to the rescue

Causes

Effects

probabilistic modeling

X0

X1

X2

X3

Probabilistic Modeling
  • For now, assume fully observable case
  • What parents?

X0

X1

X2

X3

markov assumption

X0

X0

X0

X0

X1

X1

X1

X1

X2

X2

X2

X2

X3

X3

X3

X3

Markov Assumption
  • Assume Xt+k is independent of all Xi for i<tP(Xt+k | X0,…,Xt+k-1) = P(Xt+k | Xt,…,Xt+k-1)
  • K-th order Markov Chain

Order 0

Order 1

Order 2

Order 3

1st order markov chain

X0

X1

X2

X3

1st order Markov Chain
  • MC’s of order k>1 can be converted into a 1st order MC[left as exercise]
  • So w.o.l.o.g., “MC” refers to a 1st order MC
inference in mc
Inference in MC
  • What independence relationships can we read from the BN?

X0

X1

X2

X3

Observe X1

X0 independent of X2, X3, …

P(Xt|Xt-1) known as transition model

inference in mc1
Inference in MC
  • Prediction: the probability of future state?
    • P(Xt) = Sx0,…,xt-1P (X0,…,Xt) = Sx0,…,xt-1P (X0) Px1,…,xt P(Xi|Xi-1)= Sxt-1P(Xt|Xt-1) P(Xt-1)
  • “Blurs” over time, and approaches stationary distribution as t grows
    • Limited prediction power
    • Rate of blurring known as mixing time

[Incremental approach]

how does the markov assumption affect the choice of state
How does the Markov assumption affect the choice of state?
  • Suppose we’re tracking a point (x,y) in 2D
  • What if the point is…
    • A momentumless particlesubject to thermal vibration?
    • A particle with velocity?
    • A particle with intent, likea person?
how does the markov assumption affect the choice of state1
How does the Markov assumption affect the choice of state?
  • Suppose the point is the position of our robot, and we observe velocity and intent
  • What if:
    • Terrain conditions affectspeed?
    • Battery level affects speed?
    • Position is noisy, e.g. GPS?
is the markov assumption appropriate for
Is the Markov assumption appropriate for:
  • A car on a slippery road?
  • Sales of toothpaste?
  • The stock market?
history dependence
History Dependence
  • In Markov models, the state must be chosen so that the future is independent of history given the current state
  • Often this requires adding variables that cannot be directly observed
partial observability

X0

X1

X2

X3

Partial Observability
  • Hidden Markov Model (HMM)

Hidden state variables

Observed variables

O1

O2

O3

P(Ot|Xt) called the observation model (or sensor model)

inference in hmms

X0

X1

X2

X3

Inference in HMMs
  • Filtering
  • Prediction
  • Smoothing, aka hindsight
  • Most likely explanation

O1

O2

O3

inference in hmms1
Inference in HMMs
  • Filtering
  • Prediction
  • Smoothing, aka hindsight
  • Most likely explanation

Query variable

X0

X1

X2

O1

O2

filtering
Filtering
  • Name comes from signal processing
  • P(Xt|o1:t) = Sxt-1P(xt-1|o1:t-1)P(Xt|xt-1,ot)
  • P(Xt|Xt-1,ot) = P(ot|Xt-1,Xt)P(Xt|Xt-1)/P(ot|Xt-1) = a P(ot|Xt)P(Xt|Xt-1)

Query variable

X0

X1

X2

O1

O2

filtering1
Filtering
  • P(Xt|o1:t) = aSxt-1P(xt-1|o1:t-1) P(ot|Xt)P(Xt|xt-1)
  • Forward recursion
  • If we keep track of P(Xt|o1:t)=> O(1) updates for all t!

Query variable

X0

X1

X2

O1

O2

inference in hmms2
Inference in HMMs
  • Filtering
  • Prediction
  • Smoothing, aka hindsight
  • Most likely explanation

Query

X0

X1

X2

X3

O1

O2

O3

prediction
Prediction
  • P(Xt+k|o1:t)
  • 2 steps: P(Xt|o1:t), then P(Xt+k|Xt)
  • Filter then predict as with standard MC

Query

X0

X1

X2

X3

O1

O2

O3

inference in hmms3
Inference in HMMs
  • Filtering
  • Prediction
  • Smoothing, aka hindsight
  • Most likely explanation

Query

X0

X1

X2

X3

O1

O2

O3

smoothing

Standard filtering to time k

Smoothing
  • P(Xk|o1:t) for k < t
  • P(Xk|o1:k,ok+1:t)= P(ok+1:t|Xk,o1:k)P(Xk|o1:k)/P(ok+1:t|o1:k)= aP(ok+1:t|Xk)P(Xk|o1:k)

Query

X0

X1

X2

X3

O1

O2

O3

smoothing1

Backward recursion

Smoothing
  • Computing P(ok+1:t|Xk)
  • P(ok+1:t|Xk) = Sxk+1P(ok+1:t|Xk,xk+1) P(xk+1|Xk)= Sxk+1P(ok+1:t|xk+1) P(xk+1|Xk)= Sxk+1P(ok+2:t|xk+1)P(ok+1|xk+1)P(xk+1|Xk)

Given prior states

X0

X1

X2

X3

What’s the probability of this sequence?

O1

O2

O3

inference in hmms4
Inference in HMMs
  • Filtering
  • Prediction
  • Smoothing, aka hindsight
  • Most likely explanation

Query returns a path through state space x0,…,x3

X0

X1

X2

X3

O1

O2

O3

mle viterbi algorithm
MLE: Viterbi Algorithm
  • Recursive computation of max likelihood of path to all xt in Val(Xt)
  • mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t) =a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1)
  • Previous ML stateargmaxxt-1P(Xt|xt-1) mt-1(xt-1)
applications of hmms in nlp
Applications of HMMs in NLP
  • Speech recognition
  • Hidden phones(e.g., ah eh ee th r)
  • Observed, noisy acoustic features (produced by signal processing)
phone observation models
Phone Observation Models

Phonet

Model defined to be robust over variations in accent, speed, pitch, noise

Featurest

Signal processing

Features(24,13,3,59)

phone transition models
Phone Transition Models

Phonet

Phonet+1

Good models will capture (among other things):

Pronunciation of wordsSubphone structure

Coarticulation effects

Triphone models = order 3 Markov chain

Featurest

word segmentation
Word Segmentation
  • Words run together when pronounced
  • Unigrams P(wi)
  • Bigrams P(wi|wi-1)
  • Trigrams P(wi|wi-1,wi-2)

Random 20 word samples from R&N using N-gram models

Logical are as confusion a may right tries agent goal the was diesel more object then information-gathering search is

Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate

Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time

tricks to improve recognition
Tricks to improve recognition
  • Narrow the # of variables
    • Digits, yes/no, phone tree
  • Training with real user data
    • Real story: “Yes ma’am”
kalman filtering
Kalman Filtering
  • In a nutshell
    • Efficient filtering in continuous state spaces
    • Gaussian transition and observation models
  • Ubiquitous for tracking with noisy sensors, e.g. radar, GPS, cameras
hidden markov model for robot localization

X0

X1

X2

X3

Hidden Markov Model for Robot Localization
  • Use observations to get a better idea of where the robot is at time t

Hidden state variables

Observed variables

z1

z2

z3

Predict – observe – predict – observe…

linear gaussian transition model
Linear Gaussian Transition Model
  • Consider position and velocity xt, vt
  • Time step h
  • Without noise xt+1 = xt + h vtvt+1 = vt
  • With Gaussian noise of std s1

P(xt+1|xt)  exp(-(xt+1 – (xt + h vt))2/(2s12)

i.e. xt+1 ~ N(xt + h vt, s1)

linear gaussian transition model1

vh

s1

Linear Gaussian Transition Model
  • If prior on position is Gaussian, then the posterior is also Gaussian

N(m,s)  N(m+vh,s+s1)

linear gaussian observation model
Linear Gaussian Observation Model
  • Position observation zt
  • Gaussian noise of std s2

zt ~ N(xt,s2)

linear gaussian observation model1

Observation probability

Linear Gaussian Observation Model
  • If prior on position is Gaussian, then the posterior is also Gaussian

Posterior probability

Position prior

  •  (s2z+s22m)/(s2+s22)

s2s2s22/(s2+s22)

multivariate case
Multivariate Case
  • Transition matrix F, covariance Sx
  • Observation matrix H, covariance Szmt+1 = F mt + Kt+1(zt+1 – HFmt)St+1 = (I - Kt+1)(FStFT + Sx)WhereKt+1= (FStFT + Sx)HT(H(FStFT + Sx)HT +Sz)-1
  • Got that memorized?
properties of kalman filter
Properties of Kalman Filter
  • Optimal Bayesian estimate for linear Gaussian transition/observation models
  • Need estimates of covariance… model identification necessary
  • Extensions to nonlinear transition/observation models work as long as they aren’t too nonlinear
    • Extended Kalman Filter
    • Unscented Kalman Filter
properties of kalman filter1
Properties of Kalman Filter
  • Optimal Bayesian estimate for linear Gaussian transition/observation models
  • Need estimates of covariance… model identification necessary
  • Extensions to nonlinear systems
    • Extended Kalman Filter: linearize models
    • Unscented Kalman Filter: pass points through nonlinear model to reconstruct gaussian
    • Work as long as systems aren’t too nonlinear
non gaussian distributions
Non-Gaussian distributions
  • Gaussian distributions are a “lump”

Kalman filter estimate

non gaussian distributions1
Non-Gaussian distributions
  • Integrating continuous and discrete states

“up”

“down”

Splitting with a binary choice

example failure detection
Example: Failure detection
  • Consider a battery meter sensor
    • Battery = true level of battery
    • BMeter = sensor reading
  • Transient failures: send garbage at time t
  • Persistent failures: send garbage forever
example failure detection1
Example: Failure detection
  • Consider a battery meter sensor
    • Battery = true level of battery
    • BMeter = sensor reading
  • Transient failures: send garbage at time t
    • 5555500555…
  • Persistent failures: sensor is broken
    • 5555500000…
dynamic bayesian network
Dynamic Bayesian Network

(Think of this structure “unrolled” forever…)

Batteryt-1

Batteryt

BMetert

BMetert ~ N(Batteryt,s)

dynamic bayesian network1
Dynamic Bayesian Network

Batteryt-1

Batteryt

BMetert

BMetert ~ N(Batteryt,s)

Transient failure model

P(BMetert=0 | Batteryt=5) = 0.03

results on transient failure

With model

Without model

Results on Transient Failure

Meter reads 55555005555…

Transient failure occurs

E(Batteryt)

results on persistent failure
Results on Persistent Failure

Meter reads 5555500000…

Persistent failure occurs

E(Batteryt)

With transient model

persistent failure model
Persistent Failure Model
  • Example of a Dynamic Bayesian Network (DBN)

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

BMetert ~ N(Batteryt,s)

P(BMetert=0 | Batteryt=5) = 0.03

P(BMetert=0 | Brokent) = 1

results on persistent failure1

With persistent failure model

Results on Persistent Failure

Meter reads 5555500000…

Persistent failure occurs

E(Batteryt)

With transient model

how to perform inference on dbn
How to perform inference on DBN?
  • Exact inference on “unrolled” BN
    • Variable Elimination – eliminate old time steps
    • After a few time steps, all variables in the state space become dependent!
    • Lost sparsity structure
  • Approximate inference
    • Particle Filtering
particle filtering aka sequential monte carlo
Particle Filtering (aka Sequential Monte Carlo)
  • Represent distributions as a set of particles
  • Applicable to non-gaussian high-D distributions
  • Convenient implementations
  • Widely used in vision, robotics
particle representation
Particle Representation
  • Bel(xt) = {(wk,xk)}
  • wk are weights, xk are state hypotheses
  • Weights sum to 1
  • Approximates the underlying distribution
particle filtering

Weighted resampling step

Particle Filtering
  • Represent a distribution at time t as a set of N “particles” St1,…,StN
  • Repeat for t=0,1,2,…
    • Sample S[i]from P(Xt+1|Xt=Sti) for all i
    • Compute weight w[i] = P(e|Xt+1=S[i]) for all i
    • Sample St+1i from S[.] according to weights w[.]
battery example

Sampling step

Battery Example

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

battery example1
Battery Example

P(BMeter=0|sample) = ?

Brokent-1

Brokent

0.03

1

Batteryt-1

Batteryt

BMetert

Suppose we now observe BMeter=0

battery example2
Battery Example

P(BMeter=0|sample) = ?

Brokent-1

Brokent

0.03

1

Batteryt-1

Batteryt

BMetert

Compute weights (drawn as particle size)

battery example3
Battery Example

P(BMeter=0|sample) = ?

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

Weighted resampling

battery example4

Sampling Step

Battery Example

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

battery example5
Battery Example

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

Now observe BMetert = 5

battery example6
Battery Example

Brokent-1

Brokent

1

0

Batteryt-1

Batteryt

BMetert

Compute weights

battery example7
Battery Example

Brokent-1

Brokent

Batteryt-1

Batteryt

BMetert

Weighted resample

applications of particle filtering in robotics
Applications of Particle Filtering in Robotics
  • Simultaneous Localization and Mapping (SLAM)
  • Observations: laser rangefinder
  • State variables: position, walls
simultaneous localization and mapping slam
Simultaneous Localization and Mapping (SLAM)
  • Mobile robots
  • Odometry
    • Locally accurate
    • Drifts significantly over time
  • Vision/ladar/sonar
    • Inaccurate locally
    • Global reference frame
  • Combine the two
    • State: (robot pose, map)
    • Observations: (sensor input)
couple of plugs
Couple of Plugs
  • CSCI B553
  • CSCI B659: Principles of Intelligent Robot Motion
    • http://cs.indiana.edu/classes/b659-hauserk
  • CSCI B657: Computer Vision
    • David Crandall/Chen Yu
next time
Next Time
  • Learning distributions from data
  • Read R&N 20.1-3
mle viterbi algorithm1
MLE: Viterbi Algorithm
  • Recursive computation of max likelihood of path to all xt in Val(Xt)
  • mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t) =a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1)
  • Previous ML stateargmaxxt-1P(Xt|xt-1) mt-1(xt-1)

Does this sound familiar?

mle viterbi algorithm2
MLE: Viterbi Algorithm
  • Do the “logarithm trick”
  • log mt(Xt) = log a P(ot|Xt) + maxxt-1 [logP(Xt|xt-1) + log mt-1(xt-1) ]
  • View:
    • log a P(ot|Xt) as a reward
    • logP(Xt|xt-1) as a cost
    • log mt(Xt) as a value function
  • Bellman equation