1 / 39

ML Estimation of the General Error Variance and Nonlinear Models

ML Estimation of the General Error Variance and Nonlinear Models. ML Estimation of Linear Models with General Covariance Matrix. Lets return to our general model where: y=X β + e y is a (T x 1) vector of obs. on the dependent variable

phuoc
Download Presentation

ML Estimation of the General Error Variance and Nonlinear Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ML Estimation of the General Error Variance and Nonlinear Models

  2. ML Estimation of Linear Models with General Covariance Matrix • Lets return to our general model where: • y=Xβ + e • y is a (T x 1) vector of obs. on the dependent variable • X is (T x K) matrix of obs. on the K exogenous variables • β is (K x 1) vector of parameters • e is a (T x 1) error vector where E(e)=0 and E(ee′) = Φ = σ2Ψ • Ψ is a (T x T) matrix assumed to depend on parameters Θ, Ψ = Ψ(Θ) • Dimension of Θ and how Ψ depends on Θ is a function of assumptions made about the data generating process (e.g., heteroscedasticy vs. autocorrelation) scaler

  3. ML Estimation of Linear Models with General Covariance Matrix • Previously, we developed a number of 2-step estimators depending on the structure of Ψ: • AR(1) • Multiplicative Heterscedasticity • In these 2-step approaches • Θ are estimated from information provided by the CRM residuals • Ψ(Θ) developed using above estimates • FGLS procedures used • As an alternative, lets see how we can use ML techniques to estimate the parameters of the general linear model directly

  4. ML Estimation of Linear Models with General Covariance Matrix • If we assume that the error term and therefore y are both normally distributed, the total sample log likelihood function for (β,Θ) can be written as: Exogenous variables Φ(Θ*) = σ2Ψ(Θ) Θ* = σ2|Θ

  5. ML Estimation of Linear Models with General Covariance Matrix • Similar to the CRM we saw earlier that maximization of L with respect to β and σ2 conditional on Θ results in • If we substitute the above definition of βG into the original likelihood function • We obtain the concentrated log-likelihood function • Only a function of exogenous data and Θ, the parameters of the error covariance matrix βG does not depend on σ2

  6. ML Estimation of Linear Models with General Covariance Matrix • This implies the concentrated log-likelihood apart from constants is: eG = y – XβG(Θ)

  7. ML Estimation of Linear Models with General Covariance Matrix • The maximum likelihood estimator for Θ, ΘL is that value of Θ for which L*(Θ) is a maximum • Given the concentrated log likelihood: • Lets define a modified weighted sums of squared errors, S*(Θ)

  8. ML Estimation of Linear Models with General Covariance Matrix • Given the above positive monotonic transformations, the value of Θ that maximizes the concentrated log-likelihood function is also the value of Θ that minimizes S*(Θ) • The presence of the term |Ψ(Θ)|1/T differs from our previous definition of the weighted sum of squared errors under the general variance model, • S(β,Θ)=e′GΨ(Θ)-1eG S*(β,Θ)= e′GΨ(Θ)-1eG|Ψ(Θ)|1/T • If Ψ(Θ) does not depend on T, as T→∞, |Ψ(Θ)|1/T →1 which implies S*(β,Θ)→ S(β,Θ)

  9. Maximum Likelihood and Mult. Heteroscedasticy Linear wrt β Nonlinear wrt α • Lets return to our multiplicative heteroscedasticy example we set up before where: • Yt=Xtβ + et E(e2t) = σ2t=exp(Wtα) =σ2exp(W*tα*) Wt=(1,W*t) t=(1,…,T) α' = (ln(σ2) α*) • Φ is a diagonal matrix w/tth diagonal element: σ2t=exp(Wtα) • JHGLL, p.538-541, 548-551, Greene, p.522-527

  10. Maximum Likelihood and Mult. Heteroscedasticy • If we assume the error terms follow a multivariate normal distribution then the sample log-likelihood is: σ2=exp(α1)

  11. Maximum Likelihood and Mult. Heteroscedasticy • Under this framework, the information matrix can be derived: S = # of variance exog. variables Assumed normal distribution For β’s W is TxS vector of exogenous data For α’s vs. 4.9348(W′W)-1 under 2-step method

  12. Maximum Likelihood and Mult. Heteroscedasticy ML Model of Multiplicative Hetero. Proc Defining Likelihood Function Dep. & Exog. Variables Numerical Gradients Maximum Likelihood Procedure Estimates of ,  Starting Values Proc for Calc. Variances , 

  13. Maximum Likelihood and Mult. Heteroscedasticy • ML estimation of multiplicative heteroscedasticity model • JHGLL, pg. 538-541, Table 9.1 • MATLAB Code uses the BHHH algorithm • I use the 2-step results as starting values for the ML estimation (K x 1) step BHHH Lt is tth contribution (K x K)

  14. Maximum Likelihood and Mult. Heteroscedasticy • ML estimation of multiplicative heteroscedasticity model • MATLAB Code: • ∂L/∂θ = Z, use numerical gradients • H = Z′Z Pn= -solve(H) • db = (solve(H)%*%colSums (z)) • Overview of Results Analytical Gradients LLF Function full step R commands H

  15. Maximum Likelihood and Mult. Heteroscedasticy • I use two methods to obtain parameter covariance matrix: • BHHH method (Numerical) • -Pn • Previous Analytical Analytical Gradient Function 15

  16. Maximum Likelihood and Mult. Heteroscedasticy • As an alternative to the BHHH algorithm used in the above MATLAB code • I(Θ) is block diagonal (given the functional form of the heteroscedasticity) • Judge et al., p.540., recommend using the method of scoring (GN) to undertake the iterations separately for β and α .

  17. Maximum Likelihood and Mult. Heteroscedasticy • That is, for β under the method of scoring we have: • Given the above structure of the Hessian matrix, one can show that

  18. Maximum Likelihood and Mult. Heteroscedasticy • where Wt is a (1 x D) vector of variables on which the variance of et depends and Xt is (1 x K) • Note the dimensions used in the JHGLL discussion and that presented here, there is a difference

  19. Maximum Likelihood and Mult. Heteroscedasticy • Similarly, for α under the method of scoring we have:

  20. Maximum Likelihood and Mult. Heteroscedasticy • Testing for Multiplicative Heteroskedasticity Using 3 Asymptotic Tests • H0: Homoscedasticity H1: Multiplicative Heteroscedasticity • H0: α2=α3=…=αS=0 H1: at least one of the above ≠0 • Continuing with our example (Table 9.1 JHGLL)

  21. Maximum Likelihood and Mult. Heteroscedasticy • Wald Test • 1 element in α* • Remember Σα,ML=2(W′W)-1 • λW=(0.21732)2[(0.00379774)-1]/2 = 6.218 • χ21,.05=3.84→Reject H0 exog. variables est. coeff. (W*′W*)-1 Var(α2)=2*.003798=0.007596

  22. Maximum Likelihood and Mult. Heteroscedasticy • Lagrangian Multiplier Test • Information Matrix=1/2(W′W) • Σα= 2(W′W)-1 • Using the theoretical results shown in the previous handout λLM=4.028>3.84→reject H0 LM= S(0)′I(0)-1S(0)

  23. Maximum Likelihood and Mult. Heteroscedasticy • Likelihood Ratio Test • Using the full likelihood function LU=-61.748 LR=-64.315 λLR=2(-61.748-(-64.315))=5.134 difference in concentrated LLF’s χ21,.05=3.84

  24. Maximum Likelihood and Mult. Heteroscedasticy • Remember that in general: λW > λLR > λLM • In our empirical example: λW = 6.218 λLR = 5.204 λLM=4.028

  25. Maximum Likelihood and the AR(1) Model • Lets examine another example of the general linear model • Lets assume that we have a linear model where the error terms are autocorrelated with an AR(1) error structure • y=Xβ + e • y is a (T x 1) vector of observations on the dependent variable • X is (T x K) matrix of observations on the K exogenous variables • β is (K x 1) vector of parameters • e is a (T x 1) error vector where E(e)=0 and E(ee′)=σ2Ψ • et=ρet-1 + νt • Var(et) = Var(et+i) = Var(et-i) = σe2

  26. Maximum Likelihood and the AR(1) Model • As we derived earlier, the above implies: Constant variance

  27. Maximum Likelihood and the AR(1) Model • The above implies:

  28. Maximum Likelihood and the AR(1) Model • Previously, we noted that under the general model (and assuming normally distributed error terms), the sample log-likelihood function can be represented as: • Under the AR(1) structure, Ψ(Θ) = Ψ(ρ) • In order to obtain ML estimates of β and ρ all we have to do is build the Ψ matrix e*=Pe P′P=Ψ-1

  29. Maximum Likelihood and the AR(1) Model • As an alternative, using the previous results with respect to the use of ML techniques of the general linear model • The ML estimators for β and ρ are those values that minimize • This expression is similar to what we derived under the general linear model except only συ2 (not β and συ2) has been concentrated out e*=Pe P′P=Ψ-1

  30. Maximum Likelihood and the AR(1) Model • We can evaluate the determinant: |AB|=|A| |B| and P′P=Ψ(ρ)-1

  31. Maximum Likelihood and the AR(1) Model • Summarizing the above: • This implies that: • If T is large and |ρ| is not too far away from 1, the effect of the first term in the above will be small 31

  32. Maximum Likelihood and the AR(1) Model • Given we can use the above to find maximum likelihood estimates of β and ρ as Judge et al, p.534 note, it can shown that Information Matrix (K+2) x (K+2) Previous Result

  33. Maximum Likelihood and the AR(1) Model • Following the results we obtain using the FGLS procedures, the asymptotic covariance and variance estimators are:

  34. Maximum Likelihood Under the Nonlinear Model • y = f(X,β)+e e~N(0,σ2IT) • Using general notation we can represent the sample likelihood function as: Error term assumption nonlinear function e'e 34

  35. Maximum Likelihood Under the Nonlinear Model • The sample log-likelihood function is: • It is not, in general, possible to find an analytical expression for the max. like. estimate, βML such that ∂L/∂β=0 • But, ∂L/∂(σ2)=0→σ2ML=S(β)/T 35

  36. Maximum Likelihood Under the Nonlinear Model • Given the above we can generate the concentrated log-likelihood function wrt only β by replacing σ2 with σ2ML σ2ML For a given T 36

  37. Maximum Likelihood Under the Nonlinear Model • βML that maximizes L*(β|y,X) is identical to the nonlinear least square estimator that minimizes S(β) since S(β)>0 • Equivalence only holds for nonlinear models w/the form: Y=f(X,β) + e e~N(0,σ2IT) 37

  38. Maximum Likelihood Under the Nonlinear Model • y = f(X,β)+e e~N(0,σ2Ψ) where Ψ=Ψ(θ) • We can represent the log likelihood function as: a vector of parameters Weighted SSE’s Z=∂f/∂β 38

  39. Maximum Likelihood Under the Nonlinear Model • Given that an estimate of σ2 is: • We substitute this into the above log-likelihood function to generate a concentrate log-likelihood function with respect to σ2. That is: the terms cancel out 39

More Related