1 / 13

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation. The basic idea. Assume a particular model with unknown parameters. Determine how the likelihood of a given event varies with model model parameters Choose the parameter values that maximize the likelihood of the observed event.

Download Presentation

Maximum Likelihood (ML) Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood(ML) Estimation Computational statistics 2009

  2. The basic idea Assume a particular model with unknown parameters. Determine how the likelihood of a given event varies with model model parameters Choose the parameter values that maximize the likelihood of the observed event Computational statistics 2009

  3. A general mathematical formulation • Consider a sample (X1, ..., Xn) which is drawn from a probability distribution P(X|A) where A are parameters. • If the Xs are independent with probability density function P(Xi|A) the joint probability of the whole set is Consider a sample (X1, ..., Xn) which is drawn from a probability distribution P(X|) where  are parameters. If the Xs are independent with probability density function P(Xi|) then the joint probability of the whole set is Find the parameters that maximize this function Computational statistics 2009

  4. The likelihood function for the general non-linear model Assume that Then the likelihood function is Note that the ML-estimator of  is identical to the mean square estimator if  = 2I, where I is the identity matrix. Computational statistics 2009

  5. Large sample properties of ML estimators Consistency: As the sample size increases, the ML estimator converges to the true parameter value Invaríance: If f() is a function of the unknown parameters of the distribution, then the ML estimator of f() is f( ) Asymptotic normality: As the sampe size increases, the sampling distribution of an ML estimator converges to a normal distribution Variance: For large sample sizes, the variance of an ML estimator (assuming a single unknown parameter) is approximately the negative of the reciprocal of the second derivative of the log-likelihood function evaluated at the ML estimate. Note that the ML-estimator of  is identical to the mean square estimator if  = 2I, where I is the identity matrix. Computational statistics 2009

  6. The information matrix (Hessian) The matrix is a measure of how `pointy' the likelihood function is. The variance of the ML estimator is given by the inverse Hessian Computational statistics 2009

  7. The Cramer-Rao lower bound The Cramer-Rao lower bound is the smallest theoretical variance which can be achieved. ML gives this, so any other estimation technique can at best only equal it. Do we need estimators other than ML estimators? Computational statistics 2009

  8. ML estimators for dynamic models A general decomposition technique for the log likelihood function allows us to extend standard ML procedures to dynamic models (time series models). From the basic definition of conditional probability This may be applied directly to the likelihood function Computational statistics 2009

  9. Prediction error decomposition Consider the decomposition The first term is the conditional probability of Y given all past values. We can then condition the second term and so on to give that is, a series of one step ahead prediction errors conditional on actual lagged Y. Computational statistics 2009

  10. Numerical optimization In simple cases (e.g. OLS) we can calculate the maximum likelihood estimates analytically. But in many cases we cannot, then we resort to numerical optimisation of the likelihood function. This amounts to hill climbing in parameter space. • set an arbitrary initial set of parameters. • determine a direction of movement • determine a step length to move • examine some termination criteria and either stop or go back to 2 Computational statistics 2009

  11. L Lu Computational statistics 2009

  12. Gradient methods for determining the maximum of a function These methods base the direction of movement on the first derivatives of the likelihood function with respect to the parameters. Often the step length is also determined by (an approximation to) the second derivatives. So The class of gradient methods include: Newton, Quasi Newton, Steepest descent etc. Computational statistics 2009

  13. Qualitative response models Assume that we have a quantitative model but we only observe certain limited information, e.g. Then we can group the data into two groups and form a likelihood function with the following form where F is the cumulative distribution function of the error terms ut Computational statistics 2009

More Related