4. Maximum Likelihood

1 / 11

# 4. Maximum Likelihood - PowerPoint PPT Presentation

4. Maximum Likelihood. Prof. A.L. Yuille Stat 231. Fall 2004. Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '4. Maximum Likelihood' - damisi

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### 4. Maximum Likelihood

Prof. A.L. Yuille

Stat 231. Fall 2004.

Learning Probability Distributions.
• Learn the likelihood functions and priors from datasets.
• Two Main Strategies. Parametric and Non-Parametric.
• This Lecture and the next will concentrate on Parametric methods.

(This assumes a parametric form for the distributions).

Maximum Likelihood Estimation.

Assume distribution is of form

• Independent Identically Distributed (I.I.D.) samples;
• Choose
Supervised versus Unsupervised Learning.
• Supervised Learning assumes that we known the class label for each datapoint.
• I.e. We are given pairs
• where is the datapoint and is the class label.
• Unsupervised Learning does not assume that the class labels are specified. This is a harder task.
• But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians).
• Stat 231 is almost entirely concerned with supervised learning.
Example of MLE.
• One-Dimensional Gaussian Distribution:
• Solve for by differentiation:
MLE
• The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data.
• More usually, algorithms are required.
• Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.
MLE and Kullback-Leibler
• What happens if the data is not generated by the model that we assume?
• Suppose the true distribution is and our models are of form
• The Kullback-Leiber divergence is:
• This is
• K-L is a measure of the difference between
MLE and Kullback-Leibler
• Samples
• Approximate
• By the empirical KL:
• Minimizing the empirical KL is equivalent to MLE.
• We find the distribution of form

### MLE example

We denote the log-likelihood as

a function of q

q* is computed by solving equations

For example, the Gaussian family

gives close form solution.

Learning with a Prior.
• We can put a prior on the parameter values
• We can estimate this recursively (if samples are i.i.d):
• Bayes Learning: estimate a probability distribution on