1 / 59

An Overview of the Classical Regression Model

An Overview of the Classical Regression Model. Assumptions of the Classical Regression Model. Assume we want to investigate the general relationship: y t = g(x 1t , x 2t ,...,x kt |  1 ,  2 ,..,  k ,  2 )

peri
Download Presentation

An Overview of the Classical Regression Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of the Classical Regression Model

  2. Assumptions of the Classical Regression Model • Assume we want to investigate the general relationship: yt = g(x1t, x2t,...,xkt|1,2,..,k,2) • To use linear regression techniques for parameter estimation we need the following assumptions: • A1- g(·) is linear in parameter vector : E(yt)= • A2- xit are non-stochastic variables • A3- et, t = 1,2, …, T, are independently and identically distributed: E(et) =E[yt-Xβ]= E[yt-E(yt)]=0

  3. Assumptions of the Classical Regression Model (cont) • A4- Homoskedastic error term: E[(e-E(e))2]=E(e2)=σ2IT → V(et) =  2, t = 1, …, T. • A5- The determinant of (XX) is non-zero. The k exogenous variables are not perfectly collinear • If A.5 fails → parameters can not be estimated • T≥k, must be at least as many obs. as there are parameters to estimate • |X'X| ≠0 but “very small”→ “collinearity problem”→ β’s can be estimated but the estimator is imprecise. Why? • Var(βs)=σ2(XX)-1

  4. Assumptions of the Classical Regression Model (cont) • If we want to use Maximum Likelihood Techniques to obtain parameter estimates we need the following: • A6- Normality assumption: et N(0, 2), t = 1, …, T

  5. Overview of the Classical Regression Model • Although a linear model in terms of the parameters we can allow for a nonlinear relationship between exogenous and dependent variables via the use of alternative functional forms or variable construction (Stewart, Ch. 6; Greene Section 7.3, 124-130) • Example: • Let Xt=[1 zt ln(zt) (w2tzt)] • yt= [1 zt ln(zt) (wt2zt)]β + et • yt a linear function of the βi’s • yt a non-linear function of explanatory variables (e.g., marginal effects) 4 RHS variables (4 x 1)

  6. Overview of the Classical Regression Model • yt= [1 zt ln(zt) (wt2zt)]β + et • Marginal effects can be represented as: • ∂yt/∂zt=β2+β3/zt+β4wt2 • ∂yt/∂wt=2β4wtzt • Note nonlinear marginal effects with respect to exogenous variables

  7. Overview of the Classical Regression Model Yt Yt = b1 + (1/Xt)b2 + et (with b1>0, b2<0) β1 Yt = b1 E[Yt] = b1 + (b2/Xt) dYt/dXt= -b2/(Xt2) Xt -b2/b1 yx= Elasticity of Y wrt X = [dYt/dXt]Xt/Yt= [-b2/(Xt2)]Xt/Yt = -b2/(XtYt)

  8. Overview of the Classical Regression Model error term Yt =aXtb1exp(et) Yt ln(Yt) = b0 + b1ln(Xt) + et b0=ln(a), 0 < b1 < 1, a = exp(b0) Elas. Y wrt X≡dlnYt/dlnXt = b1 0 Xt Sometimes referred to as the Cobb-Douglas functional form w/more than 1 exogenous variable

  9. Classical Regression Model Summary xt is (1 x K) β is (K x 1) where xt is non-stochastic and not all identical conditional mean If et, yt are normally distributed: → et~N(0,σ2) yt~N(xtβ, σ2)

  10. Example of Food Expenditures 40-Households • Weekly food exp. and income data

  11. Example of Food Expenditures 40-Households • Suppose we have two different estimators to obtain estimates of β0, β1 • Which estimator is “preferred” ?

  12. Our Classical Regression Model • Given the above data and theoretical model we need to obtain parameter estimates to place the mean expenditure line in expenditure/income (X) space • Would expect the position of the line to be in the middle of the data points • What do we mean by middle as some et’s are positive and others negative • We need to develop some rule to locate this line

  13. Our Classical Regression Model • Least Squares Estimation rule • Chose a line such that the sum of the squares of the vertical distances from each point to the line defined by the coefficients (SSE) be as small as possible • The above line refers to E(yt) • SSE= Σtet2= Σt(yt-β0-β1xt)2 • Graphically, in the above scatter plot, the vertical distance from each point to the line representing the above linear relationship are called residuals or regression errors et

  14. Our Classical Regression Model • Let be an initial “guess” of the intercept and slope coefficients negatuve of initial conditional mean “guess” initial error term “guess”

  15. Our Classical Regression Model y3 yt conditonal mean X3 y4 X4 Xt

  16. Our Classical Regression Model • Note that the SSE’s can be obtained via the following: SSE=e'e (Tx1) (1xT) (1x1)

  17. Our Classical Regression Model Naïve Model y3 yt y4 Xt

  18. Our Classical Regression Model • Under the least squares estimation rule we want to choose the value of β0 and β1 to minimize the error sum of squares (SSE) • Can we be assured that whatever values of β0 and β1 we choose, they do indeed minimize the SSE?

  19. Our Classical Regression Model

  20. Our Classical Regression Model • Lets look at the FOC for the minimization of the sum of squared errors (SSE) as a means of obtaining estimates of β1,β2 (2 x 2) (2 x T) (2 x 1) Estimated value (1 x 1)

  21. Our Classical Regression Model • Can we be assured that the SSE function is convex not concave wrt β’s? • The matrix of second derivatives of SSE with respect to β1 and β2 can be shown to be: • HSSE must be positive definite for convexity • To be positive definite, every principle minor of HSSE needs to be postive

  22. Our Classical Regression Model • The two diagonal elements must be positive • |HSSE| is positive unless all values of x2 are the same • →HSSE is positive definite • → SSE convex wrt β’s

  23. Our Classical Regression Model • For our 40 HH Food Expenditure Data

  24. Our Classical Regression Model dYt/dIt=0.2323 7.3832

  25. Sampling Properties of Estimated Coefficients • “True” relation: Y=Xb+e • Use random variable, Y to generate estimate of unknown coefficients bS=(XX)-1XY • bS is a random variable with a distribution • bS will vary from sample to sample • What is the E(βs)?

  26. Sampling Properties of Estimated Coefficients • Properties of Least Squares Esitmator • Does E(bS) = b (e.g., unbiased)? Y=Xβ+e βs True unknown value unbiased estimate = β = 0

  27. Sampling Properties of Estimated Coefficients • Properties of Least Squares Esitmator • What is the covariance matrix, Sb, of the estimated parameters? • Σβ = σ2(X'X)-1 • What is a reasonable estimate of s2? • σ2 ≡ variance of et = E[(et-E(et))2]=E(et2) with E(et)=0 • Up to this point σ2assumed known • due to iid assumption K=2 (K x K)

  28. Sampling Properties of Estimated Coefficients • Is this an unbiased estimator of σ2 • Standardize the SSE by the number of parameters in the regression model • Given the above:

  29. Sampling Properties of Estimated Coefficients • In contrast to our least squares estimate of β which is linear form of y, the above is a quadratic form of the observable random vector y • The above implies that σU2 is a random variable and that our estimate of σ2 will vary from sample to sample • We have derived the E(σU2) • Lets now evaluate the variance of the random variable σU2 • We showed in a previous handout that es′es=e(IT-X(X′X)-1X′)e=e′Me where M is an idempotent matrix. • Before we examine the variance of σU2 lets talk about the PDF of e′Me/σ2 True unknown error CRM error

  30. Sampling Properties of Estimated Coefficients • Lets assume that e~N(0,σ2IT) • I will show a little later that βl=βs=(X′X)-1X′Y ~N(β,σ2(X′X)-1) where βl is the maximum likelihood estimator of the unknown CRM coefficients assuming normality • Given this assumption, lets look at e′Me/σ2 • The numerator in the above is a quadratic form involving the normal random vector, e • On page 52 of JHGLL, and Section A.19, the distributional characteristics of quadratic forms of normal RV’s are discussed

  31. Sampling Properties of Estimated Coefficients • The implications of this discussion that with e~N(0,σ2IT) and M idempotent, • e′Me/σ2 is distributed χ2 with DF equal to the rank of M where the rank of an idempotent matrix equals its trace • tr(M) = tr(IT-X(X′X)-1X′) • = tr(IT)-tr[X(X′X)-1X′] • = tr(IT)-tr[X′X(X′X)-1] • = tr(IT) – tr(IK) • =T-K = rank of M tr(ABC)=tr(CAB) a constant

  32. Sampling Properties of Estimated Coefficients • We can use the above result find the variance of σU2 • A characteristic of a RV that is distributed χ2 is that its variance is equal to twice its DF • Note that in order for us to say something about the variance of our error term under the CRM we needed the additional normality assumption

  33. Sampling Properties of Estimated Coefficients • Given the normality assumption of the error term, →βs=βl ~ N(β,σ2(X′X)-1) • I would like to now show that the random vector βs(=βl) is independent of the random variable σU2 (p.29-30 of JHGLL) • Since σU2=e′ses/(T-K), βl and σU2 will be independent if es (= el) and βs (=βl) are independent • Given the above assumptions, both el and βl are normal random vectors • To show that they are independent it is sufficient to show that the matrix containing the covariances between the elements of el and βl are zero • This (T x K) covariance matrix can be represented by E[el(βl-β)′] (T x K)

  34. Sampling Properties of Estimated Coefficients • Previously we showed that es=(IT-X(X′X)-1X′)e [= el] • We also know that: βs (=βl) = (X′X)-1X′y = (X′X)-1X′(Xβ+e) =β+(X′X)-1X′e • →βl- β= (X′X)-1X′e • This implies the covariance matrix can shown to be: true unknown value E[ee′]=σ2IT (βl-β)′ el

  35. Sampling Properties of Estimated Coefficients • The above results show that βl and σU2are independent • For more theoretical treatment refer to section 2.5.9, bottom of page 52 in JHGLL

  36. Food Expenditure Model Results Summary Std. error K=2 T=40 SSE=e'e =1780.4 (X'X)-1 16.0669½ = 4.008

  37. Our Classical Regression Model • In summary, with K regressors, T observations and our linear model: • The random variable Y is composed of nonstochastic conditional mean and an unobservable error term: Y = Xb + e • bs= (XX)-1XY • bs is a linear function of the observable random vector Y • bs is a random vector with a sampling distribution • bs is unbiased • bs cov. matrix ≡Sb = s2(XX)-1 • →bs ~(b, s2(XX)-1) • Finite sample properties, JHGLL: 198-209 • This implies that with e~(0T, s2IT) • Y~(Xb, s2IT)

  38. Our Classical Regression Model • βs was obtained w/o knowing the distribution of et • Lets compare the above estimate of β with estimates obtained from other linear and unbiased estimators (β*) • βs = AY where A=(X'X)-1X' • β*=CY where C is a (K x T) matrix that is not a function of Y or the unknown parameters (A is an example of such a matrix) • By assumption, E(βS)=E(β*)=β • Interested in finding the Best Linear Unbiased Estimator (BLUE) of the true, unknown parameter vector, β

  39. Our Classical Regression Model • Is bs BLUE (e.g., minimum variance compared to β*)? • Gauss-Markov Theorem: Given the CRM assumptions A1-A5, bs is BLUE • Multiple β’s→ βS is better than any other linear unbiased estimator, β* if: • Var(a'βS) Var(a'β*) where a'β is any linear combination of β’s • → a'(β*)a  a'(βs)a  a • a'(β*-βs)a  0 for βS to be best • To determine the above, I need to know the characteristics of definite matrices a is Kx1 constant vector βs, β* are (K x K) Var(a'β*) Var(a'βS) (1 x 1)

  40. Our Classical Regression Model • Is bs BLUE (e.g., minimum variance compared to β*)? • a'(β*-βs)a  0  a for βS to be best • I want to show that if this holds  a, (β*-βs) is positive semi-definite • The above is a characteristic of a pos. semi-definite matrix (JHGLL, p. 960) • “A symmetric matrix D is positive semi-definite iff C′DC≥0  C” • Let D be β*-βs and is symmetric • The above shows that βs has the “smallest” variance among all linear unbiased estimators of β • →βS is Best Linear Unbiased Estimate of β, the true unknown parameter (1 x 1)

  41. Our Classical Regression Model • How well does the estimated equation explain the variance of the dependent variable? • Lets first talk about the variation of the dependant variable unexplained part explained by model 1 x 1 SSE Sum of sq. of yt’s 1 x 1 1 x 1 1 x 1 1 x 1

  42. Our Classical Regression Model • Note that • Lets use the above but within the framework of deviations from the mean of Y (e.g. our naïve model) = 0 given that X′[(I – X(X′X)-1X′)]=(X′-X′)=0 Sum of sq. of yt’s (TSS)

  43. Our Classical Regression Model • We can represent the total variation about the mean of the data via Subtract mean from both sides est

  44. Our Classical Regression Model Yt Yt Xt

  45. Our Classical Regression Model • Total Variation About Mean • If our goal is to have an accurate prediction of Y, we would like the component explained by our exogenous variables, , to be large relative to error component, • A large unexplained/unpredictable component would mean our prediction could be “way off” Total variation explained unexplained

  46. Our Classical Regression Model • Note that: • TSS: a measure of total variation of Yt about its mean • RSS: portion of total variation in Yt about sample mean explained by RHS variables, X • SSE: portion of total variation in Yt about mean not explained by RHS Total Sum of Squares (TSS) Explained Sum of Squares (RSS) Error Sum of Squares (SSE)

  47. Our Classical Regression Model • In scalar notation, the above definition of deviation from sample mean can be represented as: Total Sum of Squares (TSS) Explained Sum of Squares (RSS) Error Sum of Squares (SSE)

  48. Our Classical Regression Model • How well does the estimated equation explain the variance of the dependent variable? • R2 (Coefficient of Determination) is the proportion of total variation about the mean explained by the model • But because TSS=RSS+SSE • Note: The β’s that minimize the SSE → maximize the R2 value

  49. Our Classical Regression Model • Calculation of R2 (Greene:31-38) • Use the above formulas or the following: • M0 ≡ [IT – (1/T) ii'] • i ≡ column vector of 1’s • diagonals of M0 are (1-(1/T)) • off-diagonals of M0 are -(1/T) • M0 a T x T idempotent matrixthat transforms any variable into deviations from sample means • TSS=Y'M0Y = βs'X'M0Xβs + es'es RSS

  50. Our Classical Regression Model • 0 < R2< 1 (when intercept present) • = 0→regression is horizontal line, all elements of β are zero except constant term→ predicted value of yt equals μ t • = 1→all residuals are 0, perfect fit • R2 will never decrease when another variable is added to regression (Greene, pp.34)

More Related