1 / 21

# Tutorial 6 - PowerPoint PPT Presentation

Tutorial 6. Bias and variance of estimators The score and Fisher information Cramer-Rao inequality. Estimators and their Properties.

Related searches for Tutorial 6

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Tutorial 6' - Mercy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

• Bias and variance of estimators

• The score and Fisher information

• Cramer-Rao inequality

236607 Visual Recognition Tutorial

• Let be a parametric set of distributions. Given a sample drawn i.i.d from one of the distributions in the set we would like to estimate its parameter (thus identifying the distribution).

• An estimator for w.r.t. is any function notice that an estimator is a random variable.

• How do we measure the quality of an estimator?

• Consistency: An estimator for is consistent if

this is a (desirable) asymptotic property that motivates us to acquire large samples. But we should emphasize that we are also interested in measures for finite (and small!) sample sizes.

236607 Visual Recognition Tutorial

• Bias: Define the bias of an estimator to be Here, the expectation is w.r.t. to the distribution

The estimator is unbiased if its bias is zero

• Example: the estimators and , for the mean of a normal distribution, are both unbiased. The estimator for its variance is biased whereas the estimator is unbiased.

• Variance: another important property of an estimator is its variance . We would like to find estimators with minimum bias and variance.

• Which is more important, bias or variance?

236607 Visual Recognition Tutorial

• Employ our decision-theoretic framework to measure the quality of estimators.

• Abbreviate and consider the square error loss function

• The conditional risk associated with when is the true parameter

• Claim:

• Proof:

236607 Visual Recognition Tutorial

• So, for a given level of conditional risk, there is a tradeoff between bias and variance.

• This tradeoff is among the most important facts in pattern recognition and machine learning.

• Classical approach: Consider only unbiased estimators and try to find those with minimum possible variance.

• This approach is not always fruitful:

• The unbiasedness only means that the average of the estimator (w.r.t. to ) is . It doesn’t mean it will be near for a particular sample (if variance is large).

• In general, an unbiased estimate is not guaranteed to exist.

236607 Visual Recognition Tutorial

• The score of the family is the random variable

measures the “sensitivity” of as a function of the parameter .

• Claim:

• Proof:

• Corollary:

236607 Visual Recognition Tutorial

• Consider the normal distribution

• clearly,

• and

236607 Visual Recognition Tutorial

• In case where is a vector, the score is the vector whose th component is

• Example:

236607 Visual Recognition Tutorial

• Fisher information: Designed to provide a measure of how much information the parametric probability law carries about the parameter .

• An adequate definition of such information should possess the following properties:

• The larger the sensitivity of to changes in , the larger should be the information

• The information should be additive: The information carried by the combined law should be the sum of those carried by and

• The information should be insensitive to the sign of the change in and preferably positive

• The information should be a deterministic quantity; should not depend on the specific random observation

236607 Visual Recognition Tutorial

• Definition (scalar form):Fisher information (about ), is the variance of the score

• Example: consider a random variable

236607 Visual Recognition Tutorial

• Whenever is a vector, Fisher information is the matrix where

• Remainder:

• Remark: the Fisher information is only defined whenever the distributions satisfy some regularity conditions. (For example, they should be differentiable w.r.t. and all the distributions in the parametric family must have same support set).

236607 Visual Recognition Tutorial

• Claim: Let be i.i.d. random variables . The score of is the sum of the individual scores.

• Proof:

• Example: If are i.i.d. , the score is

236607 Visual Recognition Tutorial

• Based on i.i.d. samples, the Fisher information about is

• Thus, the Fisher information is additive w.r.t. i.i.d. random variables.

• Example: Suppose are i.i.d. . From previous example we know that the Fisher information about the parameter based on one sample is Therefore, based on the entire sample,

236607 Visual Recognition Tutorial

• Theorem: Let be an unbiased estimator for . Then

• Proof: Using we have:

236607 Visual Recognition Tutorial

• Now

236607 Visual Recognition Tutorial

• So,

• By the Cauchy-Schwarz inequality

• Therefore,

• For a biased estimator we have:

236607 Visual Recognition Tutorial

• The Cramer-Rao inequality also true in general form: The error covariance matrix for is bounded as follows:

236607 Visual Recognition Tutorial

• Example: Let be i.i.d. . From previous example

• Now let be an (unbiased) estimator for .

• So matches the Cramer-Rao lower bound.

• Def: An unbiased estimator whose covariance meets the Cramer-Rao lower bound is called efficient.

236607 Visual Recognition Tutorial

• Theorem (Efficiency): The unbiased estimator is efficient, that is,

iff

• Proof (If): If then

meaning

236607 Visual Recognition Tutorial

• Only if: Recall the cross covariance between

The Cauchy-Schwarz inequality for random variables says

thus

236607 Visual Recognition Tutorial

• Theorem: Suppose there exists an efficient estimator for all . Then the ML estimator is .

• Proof: By assumption

By previous claim or

for all

This holds at and since this is a maximum point the left side is zero so

236607 Visual Recognition Tutorial