multiple regression applications iii
Skip this Video
Download Presentation
Multiple Regression Applications III

Loading in 2 Seconds...

play fullscreen
1 / 27

Multiple Regression Applications III - PowerPoint PPT Presentation

  • Uploaded on

Multiple Regression Applications III. Lecture 18. Dummy variables. Include qualitative indicators into the regression: e.g. gender, race, regime shifts. So far, have only seen the change in the intercept for the regression line.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Multiple Regression Applications III' - aisha

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dummy variables
Dummy variables
  • Include qualitative indicators into the regression: e.g. gender, race, regime shifts.
  • So far, have only seen the change in the intercept for the regression line.
  • Suppose now we wish to investigate if the slope changes as well as the intercept.
  • This can be written as a general equation:

Wi = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei

  • Suppose first we wish to test for the difference between males and females.
interactive terms
Interactive terms
  • For females and males separately, the model would be:

Wi = a + b1Agei + b2Marriedi + e

    • in so doing we argue thatwould be different for males and females
    • we want to think about two sub-sample groups: males and females
    • we can test the hypothesis that the intercept and partial slope coefficients will be different for these 2 groups
interactive terms 2
Interactive terms (2)
  • To test our hypothesis we’ll estimate the regression equation above (Wi = a + b1Agei + b2Marriedi + e) for the whole sample and then for the two sub-sample groups
  • We test to see if our estimated coefficients are the same between males and females
  • Our null hypothesis is:

H0 : aM, b1M, b2M = aF, b1F, b2F

interactive terms 3
Interactive terms (3)
  • We have an unrestricted form and a restricted form
    • unrestricted: used when we estimate for the sub-sample groups separately
    • restricted: used when we estimate for the whole sample
  • What type of statistic will we use to carry out this test?
    • F-statistic:

q = k, the number of parameters in the model

n = n1 + n2 where n is complete sample size

interactive terms 4
Interactive terms (4)
  • The sum of squared residuals for the unrestricted form will be:


  • L17_2.xls
    • the data is sorted according to the dummy variable “female”
    • there is a second dummy variable for marital status
    • there are 3 estimated regression equations, one each for the total sample, male sub-sample, and female sub-sample
interactive terms 5
Interactive terms (5)
  • The output allows us to gather the necessary sum of squared residuals and sample sizes to construct the test statistic:
  • Since F0.05,3, 27 = 2.96 > F* we cannot reject the null hypothesis that the partial slope coefficients are the same for males and females
interactive terms 6

Irene O. Wong:

Interactive terms (6)

Irene O. Wong:

  • What if F* > F0.05,3, 27 ? How to read the results?
    • There’s a difference between the two sub-samples and therefore we should estimate the wage equations separately
    • Or we could interact the dummy variables with the other variables
  • To interact the dummy variables with the age and marital status variables, we multiply the dummy variable by the age and marital status variables to get:

Wt = a + b1Agei + b2Marriedi + b3Di + b4(Di*Agei) + b5(Di*Marriedi) + ei

interactive terms 7
Interactive terms (7)
  • Using L17_2.xls you can construct the interactive terms by multiplying the FEMALE column by the AGE and MARRIED columns
    • one way to see if the two sub-samples are different, look at the t-ratios on the interactive terms
    • in this example, neither of the t-ratios are statistically significant so we can’t reject the null hypothesis
interactive terms 8
Interactive terms (8)
  • If we want to estimate the equation for the first sub-sample (males) we take the expectation of the wage equation where the dummy variable for female takes the value of zero:

E(Wt|Di = 0) = a + b1Agei + b2Marriedi

  • We can do the same for the second sub-sample (Females)

E(Wt|Di = 1) = (a + b3) + (b1 + b4)Agei + (b2 + b3)Marriedi

  • We can see that by using only one regression equation, we have allowed the intercept and partial slope coefficients to vary by sub-sample
phillips curve example
Phillips Curve example
  • Phillips curve as an example of a regime shift.
  • Data points from 1950 - 1970: There is a downward sloping, reciprocal relationship between wage inflation and unemployment



phillips curve example 2
Phillips Curve example (2)
  • But if we look at data points from 1971 - 1996:
  • From the data we can detect an upward sloping relationship
  • ALWAYS graph the data between the 2 main variables of interest



phillips curve example 3
Phillips Curve example (3)
  • There seems to be a regime shift between the two periods
    • note: this is an arbitrary choice of regime shift - it was not dictated by a specific change
  • We will use the Chow Test (F-test) to test for this regime shift
    • the test will use a restricted form:
    • it will also use an unrestricted form:
    • D is the dummy variable for the regime shift, equal to 0 for 1950-1970 and 1 for 1971-1996
phillips curve example 4
Phillips Curve example (4)
  • L17_3.xls estimates the restricted regression equations and calculates the F-statistic for the Chow Test:
  • The null hypothesis will be:

H0 : b1 = b3 = 0

    • we are testing to see if the dummy variable for the regime shift alters the intercept or the slope coefficient
  • The F-statistic is (* indicates restricted)

Where q=2

phillips curve example 5
Phillips Curve example (5)
  • The expectation of wage inflation for the first time period:
  • The expectation of wage inflation for the second time period:
  • You can use the spreadsheet data to carry out these calculations
today s plan
Today’s Plan
  • A review of what we have learned in regression so far and a look forward to what we will happen when we relax assumptions around the regression line
  • Introduction to new concepts:
    • Heteroskedasticity
    • Serial correlation (also known as autocorrelation)
    • Non-independence of independent variables
clrm revision
CLRM Revision
  • Calculating the linear regression model (using OLS)
  • Use of the sum of square residuals: calculate the variance for the regression line and the mean squared deviation
  • Hypothesis tests: t-tests, F-tests, c2 test.
  • Coefficient of determination (R2) and the adjustment.
  • Modeling: use of log-linear, logs, reciprocal.
  • Relationship between F and R2
  • Imposing linear restrictions: e.g. H0: b2 = b3 = 0 (q = 2); H0: a + b = 1.
  • Dummy variables and interactions; Chow test.
relaxing assumptions1
Relaxing assumptions
  • What are the assumptions we have used throughout?
  • Two assumptions about the population for the bi-variate case: 1. E(Y|X) = a + bX (the conditional expectation function is linear); 2. V(Y|X) = (conditional variances are constant)
  • Assumptions concerning the sampling procedure (i= 1..n) 1. Values of Xi (not all equal) are prespecified; 2. Yi is drawn from the subpopulation having X = Xi; 3. Yi ‘s are independent.
  • Consequences are: 1. E(Yi) = a + bXi; 2. V(Yi) = s2; 3. C(Yh, Yi) = 0
    • How can we test to see if these assumptions don’t hold?
    • What can we do if the assumptions don’t hold?
  • We would like our estimates to be BLUE
  • We need to look out for three potential violations of the CLRM assumptions: heteroskedasticity, autocorrelation, and non-independence of X (or simultaneity bias).
  • Heteroskedasticity: usually found in cross-section data (and longitudinal)
  • In earlier lectures, we saw that the variance of is
  • This is an example of homoskedasticity, where the variance is constant
homoskedasticity 2





Homoskedasticity (2)
  • Homoskedasticity can be illustrated like this:



variance around

the regression line

  • But, we don’t always have constant variance s2
    • We may have a variance that varies with each observation, or
  • When there is heteroskedasticty, the variance around the regression line varies with the values of X
heteroskedasticity 2
Heteroskedasticity (2)
  • The non-constant variance around the regression line can be drawn like this:






serial auto correlation
Serial (auto) correlation
  • Serial correlation can be found in time series data (and longitudinal data)
  • Under serial correlation, we have covariance terms
    • where Yi and Yh are correlated or each Yi is not independently drawn
    • This results in nonzero covariance terms
serial auto correlation 2
Serial (auto) correlation (2)
  • Example: We can think of this using time series data such that unemployment at time t is related to unemployment in the previous time period t-1
  • If we have a model with unemployment as the dependent variable Yt then
    • Yt and Yt-1 are related
    • et and et-1 are also related
non independence
  • The non-independence of independent variables is the third violation of the ordinary least squares assumptions
  • Remember from the OLS derivation that we minimized the sum of the squared residuals
    • we needed independence between the X variable and the error term
    • if not, the values of X are not pre-specified
    • without independence, the estimates are biased
  • Heteroskedasticity and serial correlation
    • make the estimates inefficient
    • therefore makes the estimated standard errors incorrect
  • Non-independence of independent variables
    • makes estimates biased
    • instrumental variables and simultaneous equations are used to deal with this third type of violation
  • Starting next lecture we’ll take a more in-depth look at the three violations of the CLRM assumptions