Bayesian inference
Download
1 / 19

Bayesian Inference - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Bayesian Inference. Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013. Outline. Probability distributions Maximum likelihood estimation Maximum a posteriori estimation Conjugate priors Conceptualizing models as collection of priors Noninformative priors Empirical Bayes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Bayesian Inference' - zenia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bayesian inference

Bayesian Inference

Ekaterina Lomakina

TNU seminar: Bayesian inference

1 March 2013


Outline
Outline

  • Probability distributions

  • Maximum likelihood estimation

  • Maximum a posteriori estimation

  • Conjugate priors

  • Conceptualizing models as collection of priors

  • Noninformative priors

  • Empirical Bayes


Probability distribution
Probability distribution

  • Density estimation – to model distribution p(x)of a random variable x given a finite set of observations x1, …, xN.

Nonparametric approach

Parametric approach

  • Histogram

  • Kernel density estimation

  • Nearest neighbor approach

  • Gaussian distribution

  • Beta distribution


The exponential family
The Exponential Family

Gaussian distribution

Binomial distribution

Beta distribution

etc…


Gaussian distribution
Gaussian distribution

  • Central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed

Bean machine by Sir Francis Galton


Maximum likelihood estimation
Maximum likelihood estimation

  • The frequentist approach to estimate parameters of the distribution given a set of observations is to maximize likelihood.

– data are i.i.d

– monotonic transformation


Mle for gaussian distribution
MLE for Gaussian distribution

– simple average


Maximum a posterior estimation
Maximum a posterior estimation

  • The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution.

  • It allows to account for the prior information.


Map for gaussian distribution
MAP for Gaussian distribution

Posterior distribution is given by

– weighted average


Conjugate prior
Conjugate prior

  • In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

  • For any member of the exponential family, there exists a conjugate prior that can be written in the form

  • Important conjugate pairs include:

    Binomial – Beta

    Multinomial – Dirichlet

    Gaussian – Gaussian (for mean)

    Gaussian – Gamma (for precision)

    Exponential – Gamma


Mle for binomial distribution
MLE for Binomial distribution

  • Binomial distribution models the probability of m “heads” out of N tosses.

  • The only parameter of the distribution μ encodes probability of a single event (“head”)

  • Maximum likelihood estimation

    is given by


Map for binomial distribution
MAP for Binomial distribution

  • The conjugate prior for this distribution is Beta

  • The posterior is then given by

where l = N – m, simply the number of “tails”.


M odels as collection of priors 1
Models as collection of priors - 1

  • Take a simple regression model

  • Add a prior on weights

  • And get Bayesian linear regression!


M odels as collection of priors 2
Models as collection of priors - 2

yn

yn

  • Take again a simple regression model

β

β

Where yn is some function of xn

  • Add a prior on function

  • And get Gaussian processes!

K


M odels as collection of priors 3
Models as collection of priors - 3

  • Take a model where xn is discrete and unknown

θ

  • Add a prior on states (xn), assuming they are temporarily smooth

  • And get Hidden Markov Model!

x1

xn

xn+1

x2

xn-1

t1

t2

tn

tn+1

tn-1


Noninformative priors
Noninformative priors

  • Sometimes we have no strong prior belief but still want to apply Bayesian inference. Then we need noninformativepriors.

  • If our parameter λis a discrete variable with K states then we can simply set each prior probability to 1/K.

  • However for continues variables it is not so clear.

  • One example of a noninformative prior could be a noninformative prior over μ for Gaussian distribution:

    with

  • We can see that the effect of the prior on the posterior over μis vanished in this case.


Empirical bayes
Empirical Bayes

  • But what if still want to assume some prior information but want to learn it from the data instead of assuming in advance?

  • Imagine the following model

  • We cannot use full Bayesian inference but we can approximate it by finding the best λ* to maximize p(X|λ)

λ

θs

xn

  • N

S


Empirical bayes1
Empirical Bayes

  • We can estimate the result by the following iterative procedure (EM-algorithm):

  • Initialize λ*

  • E-step:

  • M-step:

  • It illustrates the other term for Empirical Bayes – maximum marginal likelihood.

  • This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches.

Compute p(θ|X,λ) given fixed λ*