- By
**zenia** - Follow User

- 141 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Bayesian Inference' - zenia

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Probability distributions
- Maximum likelihood estimation
- Maximum a posteriori estimation
- Conjugate priors
- Conceptualizing models as collection of priors
- Noninformative priors
- Empirical Bayes

Probability distribution

- Density estimation – to model distribution p(x)of a random variable x given a finite set of observations x1, …, xN.

Nonparametric approach

Parametric approach

- Histogram
- Kernel density estimation
- Nearest neighbor approach

- Gaussian distribution
- Beta distribution
- …

Gaussian distribution

- Central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed

Bean machine by Sir Francis Galton

Maximum likelihood estimation

- The frequentist approach to estimate parameters of the distribution given a set of observations is to maximize likelihood.

– data are i.i.d

– monotonic transformation

MLE for Gaussian distribution

– simple average

Maximum a posterior estimation

- The bayesian approach to estimate parameters of the distribution given a set of observationsis to maximize posterior distribution.
- It allows to account for the prior information.

Conjugate prior

- In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.
- For any member of the exponential family, there exists a conjugate prior that can be written in the form
- Important conjugate pairs include:

Binomial – Beta

Multinomial – Dirichlet

Gaussian – Gaussian (for mean)

Gaussian – Gamma (for precision)

Exponential – Gamma

MLE for Binomial distribution

- Binomial distribution models the probability of m “heads” out of N tosses.
- The only parameter of the distribution μ encodes probability of a single event (“head”)
- Maximum likelihood estimation

is given by

MAP for Binomial distribution

- The conjugate prior for this distribution is Beta
- The posterior is then given by

where l = N – m, simply the number of “tails”.

Models as collection of priors - 1

- Take a simple regression model

- Add a prior on weights

- And get Bayesian linear regression!

Models as collection of priors - 2

yn

yn

- Take again a simple regression model

β

β

Where yn is some function of xn

- Add a prior on function

- And get Gaussian processes!

K

Models as collection of priors - 3

- Take a model where xn is discrete and unknown

θ

- Add a prior on states (xn), assuming they are temporarily smooth

- And get Hidden Markov Model!

x1

xn

xn+1

x2

xn-1

t1

t2

tn

tn+1

tn-1

Noninformative priors

- Sometimes we have no strong prior belief but still want to apply Bayesian inference. Then we need noninformativepriors.
- If our parameter λis a discrete variable with K states then we can simply set each prior probability to 1/K.
- However for continues variables it is not so clear.
- One example of a noninformative prior could be a noninformative prior over μ for Gaussian distribution:

with

- We can see that the effect of the prior on the posterior over μis vanished in this case.

Empirical Bayes

- But what if still want to assume some prior information but want to learn it from the data instead of assuming in advance?
- Imagine the following model
- We cannot use full Bayesian inference but we can approximate it by finding the best λ* to maximize p(X|λ)

λ

θs

xn

- N

S

Empirical Bayes

- We can estimate the result by the following iterative procedure (EM-algorithm):
- Initialize λ*
- E-step:
- M-step:
- It illustrates the other term for Empirical Bayes – maximum marginal likelihood.
- This is not fully Bayesian treatment however offers a useful compromise between Bayesian and frequentist approaches.

Compute p(θ|X,λ) given fixed λ*

Download Presentation

Connecting to Server..