- 91 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' 4. Maximum Likelihood' - damisi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### MLE example

Learning Probability Distributions.

- Learn the likelihood functions and priors from datasets.
- Two Main Strategies. Parametric and Non-Parametric.
- This Lecture and the next will concentrate on Parametric methods.
(This assumes a parametric form for the distributions).

Maximum Likelihood Estimation.

Assume distribution is of form

- Independent Identically Distributed (I.I.D.) samples;
- Choose

Supervised versus Unsupervised Learning.

- Supervised Learning assumes that we known the class label for each datapoint.
- I.e. We are given pairs
- where is the datapoint and is the class label.
- Unsupervised Learning does not assume that the class labels are specified. This is a harder task.
- But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians).
- Stat 231 is almost entirely concerned with supervised learning.

Example of MLE.

- One-Dimensional Gaussian Distribution:
- Solve for by differentiation:

MLE

- The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data.
- More usually, algorithms are required.
- Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.

MLE and Kullback-Leibler

- What happens if the data is not generated by the model that we assume?
- Suppose the true distribution is and our models are of form
- The Kullback-Leiber divergence is:
- This is
- K-L is a measure of the difference between

MLE and Kullback-Leibler

- Samples
- Approximate
- By the empirical KL:
- Minimizing the empirical KL is equivalent to MLE.
- We find the distribution of form

We denote the log-likelihood as

a function of q

q* is computed by solving equations

For example, the Gaussian family

gives close form solution.

Learning with a Prior.

- We can put a prior on the parameter values
- We can estimate this recursively (if samples are i.i.d):
- Bayes Learning: estimate a probability distribution on

Download Presentation

Connecting to Server..