Time Series Analysis – Chapter 2 Simple Regression

Time Series Analysis – Chapter 2Simple Regression Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339 George Box is the son-in-law of Sir Ronald Fisher.

Time Series Analysis – Chapter 2Simple Regression Equation of a Line – Algebra Vs. Simple Regression – Statistics

Equation of a Line Example y = mx + b wage = 3.55educ – 33.8 y = wage in dollars per hour x = education in years completed Note: if I know how many years of education someone has completed I can predict their wage perfectly. Nothing else matters.

Simple Regression Example y = wage per hour ($) – dependent variable x = education completed (years) – independent variable = unknown intercept = unknown slope u = error term – factors other thanx that affect y

Simple Regression Example Need to estimate and Collect data Conduct a “regression analysis”

Algebra vs. Statistics - Summary Algebra: wage = 3.55educ – 33.8 Deterministic Model Statistics: Stochastic Model

Algebra vs. Statistics - Summary All factors affecting y (wage) other than x (education) are considered unobservable. The error term u represents the effect of these other factors. Upshot: u is independent of x.

+x if

+x if Student GPA ACT 1 2.8 21 2 3.4 24 3 3.0 26 4 3.5 27 5 3.6 29 6 3.0 25 7 2.7 25 8 3.7 30 Equation tells us how the “average” value of y changes or is related to a particular x value. = 0.568 + 0.102 ACT

= 0.568 + 0.102 ACT

The Analysis of Variance Table Analysis of Variance Source DF SS MS F P Regression 1 0.59402 0.59402 8.20 0.029 Residual Error 6 0.43473 0.07245 Total 7 1.02875

ANOVA Models can be evaluated by examining variability. There are three types of variability that are quantified. Overall or total variability present in the data (SST) Variability explained by the regression model (SSR) Error variability that is unexplained (SSE) SST = SSR+ SSE

ANOVA The larger the regression variability (SSR) is compared to the error variability (SSE) the more evidence there is that the model is explanatory. Analysis of Variance Source DF SS MS F P Regression 1 0.59402 0.59402 8.20 0.029 Residual Error 6 0.43473 0.07245 Total 7 1.02875

ANOVA – R2 R2 is the Coefficient of Determination R2 = SSR/SST = 1 – SSE/SST TYPO on pg. 40!! R2 is the percent of the variation in y (response variable) explained by x (explanatory variable). R-Sq = SSR/SST = 0.59402/1.02875 = 57.7%

ANOVA – r r is the correlation coefficient and r = Positive if a positive relationship is present Negative if a negative relationship is present 0.7596

ANOVA – R2 vs. r R2 always exists for simple regression and multiple regression and always has the same definition r only exists and makes sense for simple regression

Nobel Prize vs. # of McDonalds Explanatory variable is number of McDonalds a country has Response variable is number of Nobel Prizes that have been awarded that country.

Logs

Level – Level Model Student GPA ACT 1 2.8 21 2 3.4 24 3 3.0 26 4 3.5 27 5 3.6 29 6 3.0 25 7 2.7 25 8 3.7 30 Dependent variable: y Independent variable:x = 0.568 + 0.102 ACT (verify) (interpret)

Level – Log Model Dependent variable: y Independent variable:log(x) Not used in this chapter, discussed in future chapters.

Log – Level Model Student GPA ACT log(GPA) 1 2.8 21 1.02962 2 3.4 24 1.22378 3 3.0 26 1.09861 4 3.5 27 1.25276 5 3.6 29 1.28093 6 3.0 25 1.09861 7 2.7 25 0.99325 8 3.7 30 1.30833 Dependent variable: log(y) Independent variable:x = 0.341 + 0.0317 ACT (verify)

Log – Level Model Dependent variable: log(y) Independent variable:x = 0.341 + 0.0317 ACT (see Appendix A) So, for every ACT score increase of 1 the GPA should increase by about 3.17%.

Level – Level Model = 0.568 + 0.102 ACT

Log – Level Model = 0.341 + 0.0317 ACT =

Is this still linear regression? = + ACT and this equation is linear in the parameters and !

Log – Log Model Student GPA ACT log(GPA) log(ACT) 1 2.8 21 1.02962 3.04452 2 3.4 24 1.22378 3.17805 3 3.0 26 1.09861 3.25810 4 3.5 27 1.25276 3.29584 5 3.6 29 1.28093 3.36730 6 3.0 25 1.09861 3.21888 7 2.7 25 0.99325 3.21888 8 3.7 30 1.30833 3.40120 Dependent variable: log(y) Independent variable:log(x) = - 1.41 + 0.791 log(ACT) (verify)

Log – Log or Constant Elasticity Model Dependent variable: log(y) Independent variable:log(x) = - 1.41 + 0.791 log(ACT) (see Appendix A)

Log – Log or Constant Elasticity Model = - 1.41 + 0.791 log(ACT) (see Appendix A) is the estimated elasticity of GPA with respect to ACT. A 1% increase in ACT implies a 0.791% increase in GPA.

Simple Linear Regression Assumptions SLR.1: The model to be estimated must be linear in the parameters and .

Simple Linear Regression Assumptions SLR.2: The sample of size n used to estimate the model parameters is a random sample (sometimes called a simple random sample). What is the definition of a random sample?

Simple Linear Regression Assumptions SLR.3: The sample x values are not all the same value. OK NOT OK xy 3 24 3 26 3 27 3 29 3 25 3 25 xy 3.4 24 3.0 26 3.5 27 3.6 29 3.0 25 2.7 25

Simple Linear Regression Assumptions SLR.4: The error variable u has an expected value of zero given any value f the explanatory variable x.

Simple Linear Regression Assumptions SLR.5: The error term u has the same variance (variability) associated with it given any value of the explanatory variable. In other words Var This is called homoskedasticity.

Ordinary Least Squares Estimators How do we estimate the parameters in the model Ordinary Least Squares gives unique estimates of and . Recall that the mean of is zero so we don’t need to estimate .

Ordinary Least Squares Minimize the sum of the squared residuals.

Ordinary Least Squares Student GPA ACT RESI1 1 2.8 21 0.085714 2 3.4 24 0.379121 3 3.0 26 -0.225275 4 3.5 27 0.172527 5 3.6 29 0.068132 6 3.0 25 -0.123077 7 2.7 25 -0.423077 8 3.7 30 0.065934 Definition of residual: Some are positive, Some negative, and

Ordinary Least Squares Minimize (see notes in class)

Time Series Analysis – Chapter 2 Simple Regression