Inference v mcmc methods
1 / 21

Inference V: MCMC Methods - PowerPoint PPT Presentation

  • Uploaded on

Inference V: MCMC Methods. Stochastic Sampling. In previous class, we examined methods that use independent samples to estimate P(X = x | e ) Problem: It is difficult to sample from P(X 1 , …. X n | e ) We had to use likelihood weighting to reweigh our samples

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Inference V: MCMC Methods' - shiloh

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Inference v mcmc methods

Inference V:MCMC Methods


Stochastic sampling
Stochastic Sampling

  • In previous class, we examined methods that use independent samples to estimate P(X = x |e )

    Problem: It is difficult to sample from P(X1, …. Xn |e )

  • We had to use likelihood weighting to reweigh our samples

  • This introduced bias in estimation

  • In some case, such as when the evidence is on leaves, these methods are inefficient

Mcmc methods
MCMC Methods

  • We are going to discuss sampling methods that are based on Markov Chain

    • Markov Chain Monte Carlo (MCMC) methods

  • Key ideas:

    • Sampling process as a Markov Chain

      • Next sample depends on the previous one

    • These will approximate any posterior distribution

  • We start by reviewing key ideas from the theory of Markov chains

Markov chains







Markov Chains

  • Suppose X1, X2, … take some set of values

    • wlog. These values are 1, 2, ...

  • A Markov chain is a process that corresponds to the network:

  • To quantify the chain, we need to specify

    • Initial probability: P(X1)

    • Transition probability: P(Xt+1|Xt)

  • A Markov chain has stationary transition probability

    • P(Xt+1|Xt) is the same for all times t

Irreducible chains
Irreducible Chains

  • A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0

    • There is a positive probability of reaching j from i after some number steps

  • A chain is irreducible if every state is accessible from every state

Ergodic chains
Ergodic Chains

  • A state is positively recurrent if there is a finite expected time to get back to state i after being in state i

    • If X has finite number of states, then this is suffices that i is accessible from itself

  • A chain is ergodic if it is irreducible and every state is positively recurrent

A periodic chains
(A)periodic Chains

  • A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d

  • A chain is aperiodic if it contains no periodic state

Stationary probabilities
Stationary Probabilities


  • If a chain is ergodic and aperiodic, then the limitexists, and does not depend on i

  • Moreover, letthen, P*(X) is the unique probability satisfying

Stationary probabilities1
Stationary Probabilities

  • The probability P*(X) is the stationary probability of the process

  • Regardless of the starting point, the process will converge to this probability

  • The rate of convergence depends on properties of the transition probability

Sampling from the stationary probability
Sampling from the stationary probability

  • This theory suggests how to sample from the stationary probability:

    • Set X1 = i, for some random/arbitrary i

    • For t = 1, 2, …, n

      • Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt)

    • return xn

  • If n is large enough, then this is a sample from P*(X)

Designing markov chains
Designing Markov Chains

  • How do we construct the right chain to sample from?

    • Ensuring aperiodicity and irreducibility is usually easy

  • Problem is ensuring the desired stationary probability

Designing markov chains1
Designing Markov Chains

Key tool:

  • If the transition probability satisfiesthen, P*(X) = Q(X)

  • This gives a local criteria for checking that the chain will have the right stationary distribution

Mcmc methods1
MCMC Methods

  • We can use these results to sample from P(X1,…,Xn|e)


  • Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)

  • Simulate the chain n steps to get a sample

Mcmc methods2
MCMC Methods


  • The Markov chain variable Y takes as value assignments to all variables that are consistent evidence

  • For simplicity, we will denote such a state using the vector of variables

Gibbs sampler
Gibbs Sampler

  • One of the simplest MCMC method

  • At each transition change the state of just on Xi

  • We can describe the transition probability as a stochastic procedure:

    • Input: a state x1,…,xn

    • Choose i at random (using uniform probability)

    • Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)

    • let x’j = xj for all j  i

    • return x’1,…,x’n

Correctness of gibbs sampler
Correctness of Gibbs Sampler

  • By chain rule

    P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)

  • Thus, we get

  • Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

Gibbs sampling for bayesian network
Gibbs Sampling for Bayesian Network

  • Why is the Gibbs sampler “easy” in BNs?

  • Recall that the Markov blanket of a variable separates it from the other variables in the network

    • P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

  • This property allows us to use local computations to perform sampling in each transition

Gibbs sampling in bayesian networks
Gibbs Sampling in Bayesian Networks

  • How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

  • Let Y1, …, Yk be the children of Xi

    • By definition of Mbi, the parents of Yj are in Mbi{Xi}

  • It is easy to show that

Sampling strategy
Sampling Strategy

  • How do we collect the samples?

    Strategy I:

  • Run the chain M times, each run for N steps

    • each run starts from a different state points

  • Return the last state in each run

M chains

Sampling strategy1
Sampling Strategy

Strategy II:

  • Run one chain for a long time

  • After some “burn in” period, sample points every some fixed number of steps

“burn in”

M samples from one chain

Comparing strategies
Comparing Strategies

Strategy I:

  • Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity

  • Have to perform “burn in” steps for each chain

    Strategy II:

  • Perform “burn in” only once

  • Samples might be correlated (although only weakly)

    Hybrid strategy:

  • run several chains, and sample few samples from each

  • Combines benefits of both strategies