1 / 21

# Inference V: MCMC Methods - PowerPoint PPT Presentation

Inference V: MCMC Methods. Stochastic Sampling. In previous class, we examined methods that use independent samples to estimate P(X = x | e ) Problem: It is difficult to sample from P(X 1 , …. X n | e ) We had to use likelihood weighting to reweigh our samples

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Inference V: MCMC Methods' - shiloh

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Inference V:MCMC Methods

.

• In previous class, we examined methods that use independent samples to estimate P(X = x |e )

Problem: It is difficult to sample from P(X1, …. Xn |e )

• We had to use likelihood weighting to reweigh our samples

• This introduced bias in estimation

• In some case, such as when the evidence is on leaves, these methods are inefficient

• We are going to discuss sampling methods that are based on Markov Chain

• Markov Chain Monte Carlo (MCMC) methods

• Key ideas:

• Sampling process as a Markov Chain

• Next sample depends on the previous one

• These will approximate any posterior distribution

• We start by reviewing key ideas from the theory of Markov chains

...

Xn

X1

X2

X3

Markov Chains

• Suppose X1, X2, … take some set of values

• wlog. These values are 1, 2, ...

• A Markov chain is a process that corresponds to the network:

• To quantify the chain, we need to specify

• Initial probability: P(X1)

• Transition probability: P(Xt+1|Xt)

• A Markov chain has stationary transition probability

• P(Xt+1|Xt) is the same for all times t

• A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0

• There is a positive probability of reaching j from i after some number steps

• A chain is irreducible if every state is accessible from every state

• A state is positively recurrent if there is a finite expected time to get back to state i after being in state i

• If X has finite number of states, then this is suffices that i is accessible from itself

• A chain is ergodic if it is irreducible and every state is positively recurrent

• A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d

• A chain is aperiodic if it contains no periodic state

Thm:

• If a chain is ergodic and aperiodic, then the limitexists, and does not depend on i

• Moreover, letthen, P*(X) is the unique probability satisfying

• The probability P*(X) is the stationary probability of the process

• Regardless of the starting point, the process will converge to this probability

• The rate of convergence depends on properties of the transition probability

• This theory suggests how to sample from the stationary probability:

• Set X1 = i, for some random/arbitrary i

• For t = 1, 2, …, n

• Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt)

• return xn

• If n is large enough, then this is a sample from P*(X)

• How do we construct the right chain to sample from?

• Ensuring aperiodicity and irreducibility is usually easy

• Problem is ensuring the desired stationary probability

Key tool:

• If the transition probability satisfiesthen, P*(X) = Q(X)

• This gives a local criteria for checking that the chain will have the right stationary distribution

• We can use these results to sample from P(X1,…,Xn|e)

Idea:

• Construct an ergodic & aperiodic Markov Chain such that P*(X1,…,Xn) = P(X1,…,Xn|e)

• Simulate the chain n steps to get a sample

Notes:

• The Markov chain variable Y takes as value assignments to all variables that are consistent evidence

• For simplicity, we will denote such a state using the vector of variables

• One of the simplest MCMC method

• At each transition change the state of just on Xi

• We can describe the transition probability as a stochastic procedure:

• Input: a state x1,…,xn

• Choose i at random (using uniform probability)

• Sample x’i from P(Xi|x1, …, xi-1, xi+1 ,…, xn, e)

• let x’j = xj for all j  i

• return x’1,…,x’n

• By chain rule

P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)

• Thus, we get

• Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

• Why is the Gibbs sampler “easy” in BNs?

• Recall that the Markov blanket of a variable separates it from the other variables in the network

• P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

• This property allows us to use local computations to perform sampling in each transition

• How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

• Let Y1, …, Yk be the children of Xi

• By definition of Mbi, the parents of Yj are in Mbi{Xi}

• It is easy to show that

• How do we collect the samples?

Strategy I:

• Run the chain M times, each run for N steps

• each run starts from a different state points

• Return the last state in each run

M chains

Strategy II:

• Run one chain for a long time

• After some “burn in” period, sample points every some fixed number of steps

“burn in”

M samples from one chain

Strategy I:

• Better chance of “covering” the space of pointsespecially if the chain is slow to reach stationarity

• Have to perform “burn in” steps for each chain

Strategy II:

• Perform “burn in” only once

• Samples might be correlated (although only weakly)

Hybrid strategy:

• run several chains, and sample few samples from each

• Combines benefits of both strategies