1 / 16

Maximum Likelihood

Maximum Likelihood. There are three major paradigms of estimating linear models Method of Moments Oldest estimation method Population moments are best estimated by sample moments Not too useful for complex estimation Least Squares Minimize the sum of the squared errors

tarala
Download Presentation

Maximum Likelihood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood • There are three major paradigms of estimating linear models • Method of Moments • Oldest estimation method • Population moments are best estimated by sample moments • Not too useful for complex estimation • Least Squares • Minimize the sum of the squared errors • Maximum Likelihood Estimation • Find the model which has the highest probability of producing the observed data (or the maximum likelihood)

  2. MLE - A Simple Idea • Maximum Likelihood Estimation (MLE) is a relatively simple idea. • Different populations generate different samples, and any given sample is more likely to have come from one population versus some other

  3. An Illustration • Suppose you have 3 different normally distributed populations, • And a set of data points, x1, x2, …, x10

  4. Parameters of the Model • Given that these points are normally distributed, they differ only in their mean and standard deviation • The population with a mean of 5 will generate sample with a mean close to 5 more often than populations with a mean closer to 6 or 4.

  5. More likely … • It is more likely that the population with the mean of 5 generate the sample than one (or any other) of the populations • Variances can factor into this likelihood as well.

  6. A Definition of MLE • If a random variable X has a probability distribution f(x) characterized by parameters θ1, θ2, .. θk and if we observe a sample x1, x2, .. xn then the maximum likelihood estimators ofθ1, θ2, .. θk are those values of these parameters that would generate the sample most often

  7. An example • Suppose X is a binary variable that can take on the value of 1 with probability of π f(0) = 1 – π f(1) = π • Suppose a random sample from this population is drawn: {1, 1, 0}

  8. The MLE of π • Let us consider values for π between 0.0 and 1.0 • If π = 0.0, there are no successes and we could not generate the sample. (Similarly, 1.0 won’t work either – we couldn’t observe the 0. • But what about π = .1

  9. π = .1 • The probability of drawing our sample would be estimated as: f(1, 1, 0) = f(1)f(1)f(0) = .1 x .1 x .9 = .009 • Because the joint probability of independent events is equal to the product of the simple events

  10. A Grid search for π

  11. Our MLE of π • Give the iterative grid search, we would conclude that our MLE for π would equal .7 • Yes, if we took it to the next significant digit, it would be .67. • Hence we would say that a population with π = .7 would be more likely to generate sample of {1, 1, 0} more often than any other population

  12. The Likelihood Function • In order to derive MLEs we therefore need to express the likelihood function l. l = f(x1, x2, … xn) • And if the observations are independent: l = f(x1)f(x2) … f(xn)

  13. To find MLE • Like least squares, set the first derivative= 0.0 • Also second derivative needs to be positive

  14. Log-Likelihood • For some reason, the log-likelihood is easier to find. • The logs of multiplicative components are added, and some will therefore drop out, making derivatives easier to estimate if a = bc log(a) = log(b) + log(c) • In addition, logs make otherwise intractably small numbers usable • (e.g.) Log10 .0000001 = -7.0 • This means that to maximize the likelihood, we need to minimize the negative of the log-likelihood.

  15. Goodness-of-fit • In the LLR test, we are comparing an alternate model to a null model. If the alternate model has a higher likelihood than the null model, then -2 LLR will be larger. • Since the alternate and null models are nested models, the LLR will always increase • But is it enough? • -2 LLR is Chi-square with #parameters -1 degrees of freedom

  16. MLE - Definitions • The MLEs of the parameters of a given population are those values which will generate the observed sample most often • Find likelihood function • Maximize it • Indicate goodness-of-fit and inference • Inference is based on the assumption of normality, and thus the test statistics are z statistics

More Related