Maximum Likelihood (ML) Estimation

Maximum Likelihood(ML) Estimation Computational statistics 2009

The basic idea Assume a particular model with unknown parameters. Determine how the likelihood of a given event varies with model model parameters Choose the parameter values that maximize the likelihood of the observed event Computational statistics 2009

A general mathematical formulation • Consider a sample (X1, ..., Xn) which is drawn from a probability distribution P(X|A) where A are parameters. • If the Xs are independent with probability density function P(Xi|A) the joint probability of the whole set is Consider a sample (X1, ..., Xn) which is drawn from a probability distribution P(X|) where  are parameters. If the Xs are independent with probability density function P(Xi|) then the joint probability of the whole set is Find the parameters that maximize this function Computational statistics 2009

The likelihood function for the general non-linear model Assume that Then the likelihood function is Note that the ML-estimator of  is identical to the mean square estimator if  = 2I, where I is the identity matrix. Computational statistics 2009

Large sample properties of ML estimators Consistency: As the sample size increases, the ML estimator converges to the true parameter value Invaríance: If f() is a function of the unknown parameters of the distribution, then the ML estimator of f() is f( ) Asymptotic normality: As the sampe size increases, the sampling distribution of an ML estimator converges to a normal distribution Variance: For large sample sizes, the variance of an ML estimator (assuming a single unknown parameter) is approximately the negative of the reciprocal of the second derivative of the log-likelihood function evaluated at the ML estimate. Note that the ML-estimator of  is identical to the mean square estimator if  = 2I, where I is the identity matrix. Computational statistics 2009

The information matrix (Hessian) The matrix is a measure of how `pointy' the likelihood function is. The variance of the ML estimator is given by the inverse Hessian Computational statistics 2009

The Cramer-Rao lower bound The Cramer-Rao lower bound is the smallest theoretical variance which can be achieved. ML gives this, so any other estimation technique can at best only equal it. Do we need estimators other than ML estimators? Computational statistics 2009

ML estimators for dynamic models A general decomposition technique for the log likelihood function allows us to extend standard ML procedures to dynamic models (time series models). From the basic definition of conditional probability This may be applied directly to the likelihood function Computational statistics 2009

Prediction error decomposition Consider the decomposition The first term is the conditional probability of Y given all past values. We can then condition the second term and so on to give that is, a series of one step ahead prediction errors conditional on actual lagged Y. Computational statistics 2009

Numerical optimization In simple cases (e.g. OLS) we can calculate the maximum likelihood estimates analytically. But in many cases we cannot, then we resort to numerical optimisation of the likelihood function. This amounts to hill climbing in parameter space. • set an arbitrary initial set of parameters. • determine a direction of movement • determine a step length to move • examine some termination criteria and either stop or go back to 2 Computational statistics 2009

L Lu Computational statistics 2009

Gradient methods for determining the maximum of a function These methods base the direction of movement on the first derivatives of the likelihood function with respect to the parameters. Often the step length is also determined by (an approximation to) the second derivatives. So The class of gradient methods include: Newton, Quasi Newton, Steepest descent etc. Computational statistics 2009

Qualitative response models Assume that we have a quantitative model but we only observe certain limited information, e.g. Then we can group the data into two groups and form a likelihood function with the following form where F is the cumulative distribution function of the error terms ut Computational statistics 2009

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation

Presentation Transcript

Maximum likelihood (ML)

Maximum likelihood (ML) and likelihood ratio (LR) test

Maximum likelihood (ML) and likelihood ratio (LR) test

Maximum Likelihood Detection

4. Maximum Likelihood

Maximum Likelihood

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Maximum Likelihood

The maximum likelihood method

Maximum likelihood

Maximum likelihood decoding

Maximum likelihood (cont.)

Information Bottleneck versus Maximum Likelihood

Maximum Likelihood and the Information Bottleneck

Maximum likelihood (cont.)

Maximum Likelihood

Maximum Likelihood

Maximum likelihood testing model

Maximum Likelihood

Maximum likelihood estimates

Maximum likelihood (ML) method

Maximum Likelihood Estimate