bayesian statistics mcmc and the expectation maximization algorithm
Download
Skip this Video
Download Presentation
Bayesian Statistics, MCMC, and the Expectation Maximization Algorithm

Loading in 2 Seconds...

play fullscreen
1 / 21

Bayesian Statistics, MCMC, and the Expectation Maximization Algorithm - PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on

Bayesian Statistics, MCMC, and the Expectation Maximization Algorithm. The Burglar Alarm Problem. A burglar alarm is sensitive to both burglaries and earthquakes. In california earthquakes happen fairly frequently.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Bayesian Statistics, MCMC, and the Expectation Maximization Algorithm' - brita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the burglar alarm problem
The Burglar Alarm Problem
  • A burglar alarm is sensitive to both burglaries and earthquakes. In california earthquakes happen fairly frequently.
    • You are at a conference far from california but in phone contact with the alarm: You observe the alarm ring.
      • A = Alarm rings
      • A C = Alarm does not ring
alarm problem continued
Alarm problem (continued)
  • The alarm could be due to a burglary or an earthquake: (these are assumed not to be observed)
likelihood
Likelihood
  • The likelihood is concerned with what is observed – we observe whether the alarm goes off or not:
  • P(A|b=1,e=1)= .607 (the chance that the alarm goes off given a burglary and an earthquake)
  • P(A|b=0,e=1)=.135 (the chance that the alarm goes off given no burglary but an earthquake)
  • P(A|b=1,e=0)=.368 (the chance that the alarm goes off given a burglary but no earthquake)
  • P(A|b=0,e=0)= .001 (the chance that the alarm goes off given no burglary and no earthquake.
prior
PRIOR
  • The prior governs probability distributions over the presence/absence of burglaries, earthquakes:
  • P(b=1)=.1 P(e=1)=.1;
  • b, e are mutually independent
  • Priors characterize available information about burglary,earthquakes
bayes theorem
Bayes Theorem
  • Bayes Rule and related results:
  • Bayes Theorem (a consequence of above) serves to combine prior expertise with likelihood. Suppose ‘D’ stands for data and Θ for parameters. Then:
bayes theorem1
Bayes Theorem

We use Bayes’ theorem to find the probability that there was a burglary given that the alarm went off and the probability that there was an earthquake given that the alarm went off. To do this, we need to make use of two quantities:

  • a) the likelihood: the probability that the alarm went off given that the burglary did/didn’t take place and/or the earthquake did or did not take place.
  • b) the prior: the probability that the burglary did/didn’t take place and/or the earthquake did or didn’t take place.
bayes theorem for burglaries
Bayes Theorem for burglaries
  • We first update the likelihood relative to earthquakes and then use Bayes rule to calculate the probability of interest:
  • So, about 75% of the time when the alarm goes off, there is a burglary.
updating likelihood relative to earthquakes

P(A|b=1,e=1) = .607

P(e=1) = .1

P(A|b=1,e=0) = .368

P(e=0) = .9

Updating likelihood relative to earthquakes

Prior

probabilities

Likelihood

probabilities

New Likelihood probabilities

bayes law

P(A|b=1) = .3919.

+

P(b=1) = .1

P(Ac |b=1) = .6081

P(A|b=0) = .0144

P(b=0) = .9

P(A) =.05215

P( Ac |b=0) = .9856

Bayes’ Law

P(b=1 & A)

=.1 * .3919

P(b=0 & A)

=.9 * .0144

bayes theorem for earthquakes
Bayes Theorem for earthquakes
  • We first update the likelihood relative to burglaries and then calculate the probability of interest:
  • So, about 35% of the time when the alarm goes off, there is an earthquake.
expectation maximization algorithm
Expectation Maximization Algorithm
  • The EM algorithm concerns how to make inference about parameters in the presence of latent variables. Such variables are used to indicate the state of a process to which an observation belongs. Inference is based on estimating these parameters.
em algorithm for the alarm problem
EM algorithm for the alarm problem
  • Let b,e be latent variables. We now represent the prior on these latent variables by,
  • The EM algorithm yields estimates of the parameters by maximizing:
em estimate
EM estimate
  • The EM estimates are,
slide15
MCMC
  • Gibbs sampling is one example of Markov Chain Monte Carlo. The idea behind MCMC is to simulate a posterior distribution by visiting the values of parameters in proportion to their posterior probability. In Gibb’s sampling visits depend entirely on conditional posterior probabilities. Another form of MCMC is Metropolis Hastings (MH). In MH visiting depends on Markov Kernel.
gibbs sampling mcmc
Gibbs Sampling (MCMC)
  • Gibbs sampling is an iterative algorithm which successively visits the parameter values of b and e in proportion to their posterior probabilities.

GIBBS SAMPLER

  • 1. Simulate b,e according to their priors.
  • 2. For given b simulate e as a bernoulli variable with probability P(e=1|b,A) with,
gibb s sampling continued
Gibb’s sampling (continued)
  • 3. For the e obtained from step 2, simulate b from P(b=1|e,A) with,
  • 4. Iterate steps 2 and 3. The proportion of times b=1 in this chain is P(b=1|A); the proportion of times e=1 in this chain is P(e=1|A).
gibb s sampler convergence cumulative proportion of times that b 1 burglary in the sampler
Gibb’s sampler convergence: Cumulative Proportion of times that b=1 (burglary) in the sampler
gibb s sampler convergence cumulative proportion of times that e 1 earthquake in the sampler
Gibb’s sampler convergence: Cumulative Proportion of times that e=1 (earthquake) in the sampler
appendix derivation of em
Appendix: Derivation of EM
  • For data D, latent variables Z, and parameters Θ, Bayes theorem shows that,

Solving for (Θ|D) and taking logs we get:

log(Θ|D)=log(Θ|D,Z)+log(Z|D)-log(Z| Θ,D)

Integrating both sides over (Z|D, Θ#), we get,

em continued
EM (continued)

It follows that the term log(Θ|D) is improved in Θ iff the first term on the right is improved.

  • The resulting quantity to be maximized in Θ is:
ad