Chapter 4: Simple or Bivariate Regression. Terms Dependant variable (LHS) the series we are trying to estimate Independent variable (RHS) the data we are using to estimate the LHS. The line and the regression line. Y = f(X)…there is assumed to be a relationship between X and Y.
the series we are trying to estimate
the data we are using to estimate the LHS
Because the line we are looking for is an estimate of the population, and not every observation falls on the estimate of the line we have error (e).
This term (b1) can be interpreted as the rate of change in Y with per unit change in X…just like a simple line eq.
Population (We don’t often have this data)
Sample (We usually have this)
Y - = e (a.k.a. error or the residuals)
X’s have the same mean and St. Dev.
Y’s have the same mean and St. Dev.
From this we might conclude that each of the data sets are identical, but we’d be wrong
Although, they result in the same OLS regression,
they are very different.
Disposable Personal Income (DPI)
For this data set, there
are 144 months.
Index goes from 1-144
Feb 1993: DPI2= 4588.58 + 27.93 (2) =4644.44
Dec 2004: DPI144= 4588.58 + 27.93 (144) =8610.50
Dec 2004: DPI145= 4588.58 + 27.93 (145) =8638.43
And, so on…To forecast, we just need the index for the month (T)
Hypothesis test for slope = 0 and intercept = 0…What does it say
ofDo we reject that the slope and intercept are each equal to 0?!
Do Not Reject H0
The big differences in sales during the Dec. months will make it hard to estimate with a bivariate regression.
We will use both the unadjusted and the seasonally adjusted series to see the difference in model accuracy.
the models we are going to estimate are:
Let’s think of it a little differently…
Trend: R-sq = .9933
Causal (w/season): R-sq = .0845
Causal (w/o season): R-sq = .8641
Although the causal model explains less of the variance, we now have some evidence that sales are related to DPI.
Errors might not be independantly distributed if we have Serial Correlation (or Autocorrelation)
Sum of Sq
The Durbin-Watson Statistic ranges from 0 to 4.
The rule of thumb: If it’s near 2 (i.e., from 1.5 - 2.5) there is no evidence of serial correlation present.
For more precise evaluation you have to calculate and compare 5 inequalities and determine which of the 5 is true.
Lower and Upper DW
4 > DW > (4-DWL) T/F A
Negative serial correlation
Positive serial correlation
Indeterminate or no observed serial correlation
From the table: DWL = 1.41 DWU= 1.52
4 > 0.21 > (4 - 1.41) T/F A
(4 - 1.41) > 0.21 > (4 -1.52) T/F B
(4-1.52) > 0.21 > 1.52 T/F C
1.52 > 0.21 > 1.41 T/F D
1.41 > 0.21 > 0 T/F E
Each observation’s error is normally distributed
around the estimated regression line.
Error can be +/-, but they are grouped around the regression line.
“+” error is just as likely as “– “error
and they balance out.
Here is one specific type of non-constant var. The mean is still 0, but errors get larger as X gets larger.
This is referred to as
Yes, you heard right
and, it’s bad for inference.
Looking at it from another angle, errors can be + or - , but they should be stable over time or X
We will focus on these
(We will cover this model later, but for now…well, just think of it as magic!!!)
They are repeated down the column next to each month.
There is still a trend, but we aren’t
worried about that right now.
Let’s now forecast adjusted housing sales as a function of time (a 12 month forecast).
The equation we are estimating is:
SATHS= b0 + b1(Time)+ e
What do we expect for the sign for b1?
There are two ways to approach this in ForecastX:
Both provide essentially the same results.
Seasonally Adjusted Total
Houses Sold and Time
The thing to note here is that the simple linear model is
capturing some of the seasonal fluctuations…WOW!!!
Seasonally Adjusted Total
Houses Sold and DPI
Serial or Autocorrelation
Remember, autocorrelation occurs when adjacent observations are correlated this causes our estimated standard errors to be too small, and our t-stats to be too big, messing up inference.
We generally aren’t concerned with AC or SC if we are just estimating the time trend (y=f(t)) in OLS.
Remember in Macroeconomics, a guy named Keynes made a couple of observations about aggregate consumption…
Marginal Propensity to Consume (MPC),
which is the slope of the aggregate consumption function and a key factor in determining the "multiplier effect" of tax and spending policies.
If the relationship holds, then increasing C has what effect on GNP?
As C gets larger, GNP is expected to Grow
We can obtain an estimate of the MPC by applying OLS to the following aggregate consumption function:
GC = b0 + b1GNP + e
Where the slope, b1,is the estimate of MPC.
These look pretty good!
...maybe too good.
DW indicates Serial Correlation
Note: If a time series has seasonality, the autocorrelation is usually highest between like seasons in different years.
What method can we use to potentially fix a non-stationary series?…think back
Just to Note: The Holt and Winter’s models allow for trend, but not for RHS variables, so we can’t use these directly to find the MPC.
Didn’t completely take care of it in this series…
but, we use it anyway.
There is still a positive relationship in the differenced data,
but it has more error and it’s weaker.
Let’s now estimate the differenced model.
These are the more