Chap 8 : Estimation of parameters & Fitting of Probability Distributions. Section 6.1 : INTRODUCTION Unknown parameter(s) values must be estimated before fitting probability laws to data. . Section 8.2 : Fitting the Poisson Distribution to Emissions of Alpha Particles (classical example).
Section 6.1: INTRODUCTION
Unknown parameter(s) values must be estimated before fitting probability laws to data.
Recall: The Probability Mass Function of a Poisson random variable X is given by:
From the observed data, we must estimate a value for the parameter
The estimate of will be viewed as a random variable which has a probability dist’n referred to as its sampling distribution.
The spread of the sampling distribution reflects the variability of the estimate.
Chap 8 is about fitting the model to data.
Chap 9 will be dealing with testing such a fit.
Example: Fit a Poisson dist’n to counts-p240
Informally, GOF is assessed by comparing the Observed (O) and the Expected (E) counts that are grouped (at least 5 each) into the 16 cells.
Formally, use a measure of discrepancy such as the Pearson’s chi-square statistic
to quantify the comparison of the O and E counts.
In this example,
is a random variable (as a function of random counts) whose probability dist’n is called its null distribution. It can be shown that the null dist’n of is approximately the chi-square dist’n with degrees of freedom df = no. of cells — no. of independent parameters fitted — 1.
Notation: df = 16 (cells) –1(parameter ) –1 = 14
The larger the value of , the worse the fit.
Figure 8.1 on page 242 gives a nice feeling of what a p-value might be. The p-value measures the degree of evidence against the statement “model fits data well == Poisson is the true model.”
The smaller the p-value, the worse the fit or there is more evidence against the model.
Small p-value means then rejecting the null or saying that “the model does NOT fit the data well.”
How small is small?
when P-value < = ALPHA,
where ALPHA is the level of confidence.
Let the observed data be a random sample i.e. a sequence of I.I.D. random variables whose joint distribution depends on an unknown parameter (scalar or vector).
An estimate of will be a random variable function of the whose dist’n is known as its sampling dist’n.
The standard deviation of the sampling dist’n will be termed as its standard error.
Definition: the (pop’n) moment of a random variable X is denoted by and its (sample) moment by
is viewed as an estimate of
Algorithm: MOM estimates parameter(s) by finding expressions for them in terms of the lowest possible (pop’n) moments and then substituting (sample) moments into the expressions.
Algorithm: Let be a sequence of I.I.D. random variables.
Suppose that , the counts in cells , follows a multinomial distribution with total count n and cell probabilities
Caution: the marginal dist’n of each is binomial
BUT the … are not INDEPENDENT i.e. their joint PMF is not the product of the marginal PFMs. The good news is that the MLE still applies.
Problem: Estimate the p’s from the x’s.
To answer the question, we assume n is given and we wish to estimate
From the joint PMF , the log-likelihood becomes:
To maximize such a log-likelihood subject to the constraint , we use a Lagrangemultiplier to get after maximizing
Deja vu: note that the sampling dist’n of the is determined by the binomial dist’ns of the
Hardy-Weinberg Equilibrium: GENETICS
Here the multinomial cell probabilities are functions of other unknown parameters ; that is
Read example A on page 260-261.
Let be an estimate of a parameter based on
The variance of the sampling dist’n of many estimators decreases as the sample size n increases.
An estimate is said to be a consistent estimate of a parameter if approaches as the sample size n approaches infinity.
Consistency is a limiting property that does not require any behavior of the estimator for a finite sample size.
Theorem: Under appropriate smoothness conditions on f , the MLE from an I.I.D sample is consistent and the probability dist’n of tends to N(0,1). In other words, the large sample distribution of the MLE is approximately normal with mean (say, the MLE is asymptotically unbiased ) and its asymptotic variance is
where the information about the parameter is:
Recall that a confidence interval (as seen in Chap.7) is a random interval containing the parameter of interest with some specific probability.
Three (3) methods to get CI for MLEs are:
Problem: Given a variety of possible estimates, the best one to choose should have its sampling distribution highly concentrated about the true parameter.
Because of its analytic simplicity, the mean square error, MSE, will be used as a measure of such a concentration.
Definition: Given two estimates, and , of a parameter , the efficiency of relative to is
defined to be:
Theorem: (Cramer-Rao Inequality)
Under smooth assumptions on the density of the IID sequence when is an unbiased estimate of , we get the lower bound:
Is there a function containing all the information in the sample about the parameter ?
If so, without loss of information the original data may be reduced to this statistic .
Definition: a statistic is said to be sufficient for if the conditional dist’n of , given T = t, does not depend on for any value t
In other words, given the value of T, which is called a sufficient statistic, one can gain no more knowledge about the parameter from further investigation with respect to the sample dist’n.
How to get a sufficient statistic?
Theorem A: a necessary and sufficient condition for to be sufficient for a parameter is that the joint PDF or PMF factors in the form:
Corollary A: if T is sufficient for , then the MLE is a function of T.
The following theorem gives a quantitative rationale for basing an estimator of a parameter on an existing sufficient statistic.
Theorem: Rao-Blackwell Theorem
Let be an estimator of with for all Suppose that T is sufficient for ,
and let .
Then, for all ,
The inequality is strict unless
Some key ideas in Chap.7 such as sampling distributions, Confidence Intervals were revisited
MOM and MLE were applied to some distributional theory approximations.
Theoretical concepts of efficiency, Cramer-Rao lower bound, and efficiency were discussed.
Finally, some light was shed in Parametric Bootstrapping.