1 / 11

# 4. Maximum Likelihood - PowerPoint PPT Presentation

4. Maximum Likelihood. Prof. A.L. Yuille Stat 231. Fall 2004. Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' 4. Maximum Likelihood' - damisi

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### 4. Maximum Likelihood

Prof. A.L. Yuille

Stat 231. Fall 2004.

• Learn the likelihood functions and priors from datasets.

• Two Main Strategies. Parametric and Non-Parametric.

• This Lecture and the next will concentrate on Parametric methods.

(This assumes a parametric form for the distributions).

Assume distribution is of form

• Independent Identically Distributed (I.I.D.) samples;

• Choose

• Supervised Learning assumes that we known the class label for each datapoint.

• I.e. We are given pairs

• where is the datapoint and is the class label.

• Unsupervised Learning does not assume that the class labels are specified. This is a harder task.

• But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians).

• Stat 231 is almost entirely concerned with supervised learning.

• One-Dimensional Gaussian Distribution:

• Solve for by differentiation:

• The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data.

• More usually, algorithms are required.

• Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.

• What happens if the data is not generated by the model that we assume?

• Suppose the true distribution is and our models are of form

• The Kullback-Leiber divergence is:

• This is

• K-L is a measure of the difference between

• Samples

• Approximate

• By the empirical KL:

• Minimizing the empirical KL is equivalent to MLE.

• We find the distribution of form

### MLE example

We denote the log-likelihood as

a function of q

q* is computed by solving equations

For example, the Gaussian family

gives close form solution.

• We can put a prior on the parameter values

• We can estimate this recursively (if samples are i.i.d):

• Bayes Learning: estimate a probability distribution on