1 / 28

Asymptotics for the Least Squares Estimator in the Classical Regression Model

Asymptotics for the Least Squares Estimator in the Classical Regression Model. Based on Greene’s Note 12 . Asymptotics for Least Squares. Assumptions: Convergence of X  X / n (doesn’t require nonstochastic or stochastic X ). Convergence of X’  / n to 0 . Sufficient for consistency.

mercury
Download Presentation

Asymptotics for the Least Squares Estimator in the Classical Regression Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Asymptotics for the Least Squares Estimator in the Classical Regression Model Based on Greene’s Note 12

  2. Asymptotics for Least Squares Assumptions: Convergence of XX/n (doesn’t require nonstochastic or stochastic X). Convergence of X’/n to 0. Sufficient for consistency. Assumptions: Convergence of (1/n)X’ to a normal vector, gives asymptotic normality What about asymptotic efficiency?

  3. Setting The least squares estimator is (X X)-1Xy = (X X)-1ixiyi =  + (X X)-1ixiεi So, it is a constant vector plus a sum of random variables. Our ‘finite sample’ results established the behavior of the sum according to the rules of statistics. The question for the present is how does this sum of random variables behave in large samples?

  4. Well Behaved Regressors A crucial assumption: Convergence of the moment matrix XX/n to a positive definite matrix of finite elements, Q What kind of data will satisfy this assumption? What won’t? Does stochastic vs. nonstochastic matter? Various conditions for “well behaved X”

  5. Mean Square Convergence E[b|X]=β for any X. Var[b|X]0 for any specific X b converges in mean square to β

  6. Probability Limit

  7. Probability Limit

  8. Crucial Assumption of the Model

  9. Consistency of s2

  10. Asymptotic Distribution

  11. Asymptotics

  12. Asymptotic Distributions • Finding the asymptotic distribution • b β in probability. How to describe the distribution? • Has no ‘limiting’ distribution • Variance  0; it is O(1/n) • Stabilize the variance? Var[n b] ~ σ2Q-1 is O(1) • But, E[n b]= n β which diverges • n (b- β)  a random variable with finite mean and variance. (stabilizing transformation) • b apx. β +1/n times that random variable

  13. Limiting Distribution n (b- β) = n (X’X)-1X’ε = n (X’X/n)-1(X’ε/n) Limiting behavior is the same as that of n Q-1(X’ε/n) Q is a fixed matrix. Behavior depends on the random vector n (X’ε/n)

  14. Limiting Normality

  15. Asymptotic Distribution

  16. Asymptotic Properties • Probability Limit and Consistency • Asymptotic Variance • Asymptotic Distribution

  17. Root n Consistency • How ‘fast’ does b β? • Asy.Var[b] =σ2/n Q-1 is O(1/n) • Convergence is at the rate of 1/n • n b has variance of O(1) • Is there any other kind of convergence? • x1,…,xn = a sample from exponential population; mean has variance O(1/n2). This is ‘n – convergent’ • Certain nonparametric estimators have variances that are O(1/n2/3). Less than root n convergent.

  18. Asymptotic Results • Distribution of b does not depend on normality of ε • Estimator of the asymptotic variance (σ2/n)Q-1 is (s2/n)(X’X/n)-1. (Degrees of freedom corrections are irrelevant but conventional.) • Slutsky theorem and the delta method apply to functions of b.

  19. Test Statistics We have established the asymptotic distribution of b. We now turn to the construction of test statistics. In particular, we based tests on the Wald statistic F[J,n-K] = (1/J)(Rb - q)’[R s2(XX)-1R]-1(Rb - q) This is the usual test statistic for testing linear hypotheses in the linear regression model, distributed exactly as F if the distirbances are normally distributed. We now obtain some general results that will let us construct test statistics in more general situations.

  20. Wald Statistics General approach to the derivation based on a univariate distribution (just to get started). A. Core result: Square of a standard normal variable  chi-squared with 1 degree of freedom. Suppose z ~ N[0,2], i.e., variance not 1. Then (z/)2 satisfies A. Now, suppose z~N[,2]. Then [(z - )/]2 is chi-squared with 1 degree of freedom. This is the normalized distance between z and , where distance is measured in standard deviation units. Suppose zn is not exactly normally distributed, but (1) E[zn] = , (2) Var[zn] = 2, (3) the limiting distribution of zn is normal. Then by our earlier results, (zn - )/ N[0,1], though again, this is a limiting distribution, not the exact distribution in a finite sample.

  21. Extensions If the preceding holds, then n2 = [(zn - )/]2 {N[0,1]}2, or 2[1]. Again, a limiting result, not an exact one. Suppose  is not a known quantity, and we substitute for  a consistent estimator of , say sn. plim sn = . What about the behavior of the “empirical counterpart,” tn = [(zn - )/sn]? Because plim sn = , the large sample behavior of this statistic will be the same as that of the original statistic using  instead of sn. Therefore, under our assumptions, tn2 = [(zn - )/sn]2 converges to chi-squared [1], just like n2 . tn and n converge to the same random variable.

  22. Full Rank Quadratic form A crucial distributional result (exact): If the random vector x has a K-variate normal distribution with mean vector  and covariance matrix , then the random variable W = (x - )-1(x - ) has a chi-squared distribution with K degrees of freedom.

  23. Proof of Full Rank Q-F Result Proof: (Short, but very important that you understand and are comfortable with all parts. Details appear in Section 3.10.5 of your text.) Requires definition of a square root matrix: 1/2 is a matrix such that 1/21/2 = . Then, V = (1/2)-1 is the inverse square root, such that VV = -1/2-1/2 = -1. Let z = (x - ). Then z has mean 0, covariance matrix , and the normal distribution. The random vector w = Vz has mean vector V0 = 0 and covariance matrix VV = I. (Substitute and add exponents.) w has a normal distribution with mean 0 and covariance I. ww = kwk2 where each element is the square of a standard normal, thus chi-squared (1). The sum of chi-squareds is chi-squared, so this gives the end result, as ww = (x - ) -1(x - ).

  24. Building the Wald Statistic-1 Suppose that the same normal distribution assumptions hold, but instead of the parameter matrix  we do the computation using a matrix Sn which has the property plim Sn = . The exact chi-squared result no longer holds, but the limiting distribution is the same as if the true  were used.

  25. Building the Wald Statistic-2 Suppose the statistic is computed not with an x that has an exact normal distribution, but with an xn which has a limiting normal distribution, but whose finite sample distribution might be something else. Our earlier results for functions of random variables give us the result (xn - ) Sn-1(xn - ) 2[K] Note that in fact, nothing in this relies on the normal distribution. What we used is consistency of a certain estimator (Sn) and the central limit theorem for xn.

  26. General Result for Wald Distance The Wald distance measure: If plim xn = , xn is asymptotically normally distributed with a mean of and variance , and if Sn is a consistent estimator of , then the Wald statistic, which is a generalized distance measure between xn converges to a chi-squared variate. (xn - ) Sn-1(xn - ) 2[K]

  27. The F Statistic An application: (Familiar) Suppose bn is the least squares estimator of  based on a sample of n observations. No assumption of normality of the disturbances or about nonstochastic regressors is made. The standard F statistic for testing the hypothesis H0: R - q = 0 is F[J, n-K] = [(e*’e* - e’e)/J] / [e’e / (n-K)] where this is built of two sums of squared residuals. The statistic does not have an F distribution. How can we test the hypothesis?

  28. F Statistic F[J,n-K] = (1/J)  (Rbn - q)[R s2(XX) -1R’]-1 (Rbn - q). Write m = (Rbn - q). Under the hypothesis, plim m=0. n m N[0, R(2/n)Q-1R’] Estimate the variance with R(s2/n)(X’X/n)R’] Then, (n m)’ [Est.Var(n m)]-1(n m) fits exactly into the apparatus developed earlier. If plim bn = , plim s2 = 2, and the other asymptotic results we developed for least squares hold, then JF[J,n-K] 2[J].

More Related