Econometric analysis
Download
1 / 23

Econometric Analysis - PowerPoint PPT Presentation


  • 118 Views
  • Updated On :

Econometric Analysis. Week 10 Pooled and panel data models. Basics – data sets with both cross-section and time dimensions Pooled regressions and the use of time period dummies A note on Chow tests versus interactive dummies in the presence of heteroskedasticity

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Econometric Analysis' - mahina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Econometric analysis l.jpg

Econometric Analysis

Week 10

Pooled and panel data models


Lecture outline l.jpg

  • Basics – data sets with both cross-section and time dimensions

  • Pooled regressions and the use of time period dummies

  • A note on Chow tests versus interactive dummies in the presence of heteroskedasticity

  • Panel data and longitudinal data sets

  • Attractive features of panel data sets

  • Fixed effects models – differencing and demeaning

  • Random effects models

  • An illustrative example

Lecture outline


References and recommended reading l.jpg

  • Wooldridge, J M (2006) Introductory Econometrics. A Modern Approach. (Third Edition) Chapters 13 and 14

  • Kennedy, P (2003) A Guide to Econometrics. (Fifth Edition) Chapter 17

  • Dougherty, C (2007) Introduction to Econometrics (Third Edition) Chapter 14

  • Gujarati, D N (2003) Basic econometrics (Fourth Edition) Chapter 16

References and recommended reading


Basics l.jpg

Basics Approach. (Third Edition) Chapters 13 and 14

  • many data sets have both a cross-section and a time dimension

  • two subscripts are required for the variables

    xit i = 1,…..,n

    t = 1,…..,T

  • If n is large but T is quite small (say only 2 or 3) then we may decide just to apply cross-section methods, but with intercept (and possibly even slope) dummies to distinguish observations from different time periods – or we might be able just to pool the data.


Pooled regressions with time period dummies l.jpg

  • If the time period dummies are insignificant then the data from the different periods can be brought together to form one pooled data set

  • We might also think of using Chow tests to check the validity of pooling the data (effectively a test of structural change)

  • For example

    i = 1,…n; t=1,2

  • If the data are pooled then we are effectively incorporating the restriction that 0 and 1 remain unchanged between periods 1 and 2

Pooled regressions with time period dummies


More on pooled models with time dummies l.jpg

More on pooled models with time dummies


Chow tests l.jpg

Chow tests


Interactive slope dummies l.jpg

  • The same test could be undertaken using and slope parameters interactive slope dummies in a pooled regression

    Intercept in period 2 =0+0

    Slope in period 2 = 1+1

  • Note: The Chow test requires that the model is free of heteroskedasticity - which may not always be the case. The dummy variable approach could in this case be combined with the use of heteroskedastic consistent standard errors for computing t values to assess the significance of 0 and 1

Interactive slope dummies


Longitudinal data sets l.jpg

  • It T is large, as with so called and slope parameters longitudinal data sets, then we may need to consider the nature of any trends in the variables.

  • Unit root tests for mixed cross-section time series data have been developed, but we shall not consider them here – but we will mention that simpler models may just incorporate lagged dependent variables in the regression equations.

  • So, without concerning ourselves about unit roots, we will just examine two popular types of panel data models : fixed effects and random effects models.

Longitudinal data sets


More on longitudinal data sets l.jpg

  • If the observations in the different time periods relate to and slope parameters exactly the same subjects then we may refer to the data set as panel data. For example, the famous US National Longitudinal Survey of Youth tracks the same individuals over several years. Another well known US panel data set is the Panel Study of Income Dynamics (PSID). Note: sometimes this might mean that we have to deal with unbalanced panels when some individuals disappear from the panel sample.

  • The Current Population Survey (CPS) on the other hand extracts a different random sample each year so the cross-section data are not matched to the same individuals. But because they are independently drawn the data can be pooled with the possible addition of time dummies.

More on longitudinal data sets


Attractive features of panel data sets l.jpg

Panel data can enable us to examine issues not amenable to study using only cross-section or time series data sets.

  • For example with production functions we can deal simultaneously with issues of economies of scale and of technological change.

  • Cross-section labour market data can tell us who is unemployed in any particular year and time series data can tell us how overall unemployment changes from year to year. Panel data enables us to track individuals to help us answer questions about unemployment duration, turnover rates etc.

  • * Panel data can help us deal with issues of heterogeneity in the micro units. Unobserved factors affecting different people can cause bias in cross-section studies, but with panel data differencing or demeaning can control for these factors.

  • We can introduce some allowance for dynamic adjustment in models by including lagged variables.

Attractive features of panel data sets


Analysis of the crime2 data set in wooldridge l.jpg

Analysis of the CRIME2 data set in Wooldridge study using only cross-section or time series data sets.

  • CRIME2.xls contains data on (amongst other variables) the crime rate (crmrte) and the unemployment rate (unem) for 46 US cities in the years 1982 and 1987

  • Question: Would you expect cities with a higher unemployment rate to have a higher or lower crime rate?

  • So if, like Wooldridge p 460, you used just the 1987 data and ran a regression of crmrte on unem would you expect 1 to be positive or negative?


Replicating wooldridge s results on p460 l.jpg

EQ( 1) Modelling crmrte by OLS (using crime2sorted.xls) study using only cross-section or time series data sets.

The estimation sample is: 47 – 92

Coefficient Std.Error t-value t-prob Part.R^2

Constant 28.378 20.76 6.18 0.000 0.4651

unem -4.16113 3.416 -1.22 0.230 0.0326

sigma 34.5999 RSS 52674.6416

R^2 0.0326151 F(1,44) = 1.483 [0.230]

log-likelihood -227.266 DW 1.11

no. of observations 46 no. of parameters 2

mean(crmrte) 103.873 var(crmrte) 1183.71

Replicating Wooldridge’s results on p460

  • Comment: Not only does the coefficient of unem have the “wrong” sign, it is also not significantly different from zero – see the very low t, F and R^2 values.

  • Note: the figures in brackets on p460 of Wooldridge are standard errors, not t-values.


Further comments on wooldridge s results l.jpg

Wooldridge notes that this simple model is deficient in many ways. To improve it we might

  • introduce additional regressors including demographic factors such as the age distribution in each city, gender balance, educational data, law enforcement efforts

  • try a different functional form

  • include a lagged dependent variable (the crime rate in the earlier year)

    But Wooldridge wants to use this model to demonstrate how, with two periods of data, we can control for individual unobserved fixed effects and thus remove this potential form of bias.

Further comments on Wooldridge’s results


Dealing with individual heterogeneity l.jpg

  • Suppose we have an individual ways. To improve it we might unobserved factor, or set of factors, that affects crime in different cities, but remains unchanged (fixed) between the different time periods. Denoting this by ai, following Wooldridge, we can write

  • Here ai picks up all those factors unique to the individual cities that don’t change, or don’t change much, between periods – including the demographic factors noted on the previous slide. Wooldridge calls this the unobserved heterogeneity error term.

  • The usual error term uit picks up all other factors that disturb the observed values of the dependent variable both across the different cities and between the years - Wooldridge calls this the idiosyncratic error term.

  • You can see that this model also includes a dummy variable to allow for intercept shifts (common to all cities) between periods.

Dealing with individual heterogeneity


Replicating wooldridge s results p462 l.jpg

If we were to ignore the heterogeneous effects and just pool the data (but including the time period dummy) we would find

Replicating Wooldridge’s results p462

EQ( 2) Modelling crmrte by OLS (using crime2.in7)

The estimation sample is: 1 - 92

Coefficient Std.Error t-value t-prob Part.R^2

Constant 93.4203 12.74 7.33 0.000 0.3766

d87 7.94041 7.975 0.996 0.322 0.0110

unem 0.426546 1.188 0.359 0.720 0.0014

sigma 29.9917 RSS 80055.7841

R^2 0.0122119 F(2,89) = 0.5501 [0.579]

log-likelihood -441.902 DW 1.16

no. of observations 92 no. of parameters 3

mean(crmrte) 100.791 var(crmrte) 880.929

The coefficient on unem, although positive, is still not significant.

Here pooled OLS has not solved the omitted variables problem.


A first differenced model l.jpg

Because a the data (but including the time period dummy) we would findi is constant over time we can remove its effect by first-differencing the equation. Notice that the initial intercept term 0 gets removed too, leaving us with

Here I am using Wooldridge’s labels ccrmrte and cunem respectively for crmrte and unem.

Wooldridge’s results for this regression are replicated in PcGive and shown on the next slide.

A first-differenced model


Replicating wooldridge s results p464 l.jpg

EQ( 3) Modelling ccrmrte by OLS (using ccrime2.xls) the data (but including the time period dummy) we would find

The estimation sample is: 1 - 46

Coefficient Std.Error t-value t-prob Part.R^2

Constant 15.4022 4.702 3.28 0.002 0.1960

cunem 2.21800 0.8779 2.53 0.015 0.1267

sigma 20.0508 RSS 17689.5501

R^2 0.1267 F(1,44) = 6.384 [0.015]*

log-likelihood -202.169 DW 1.15

no. of observations 46 no. of parameters 2

mean(ccrmrte) 6.16375 var(ccrmrte) 440.348

Replicating Wooldridge’s results p464

Comments:

in this regression the estimate of 1 is positive and statistically significant

the positive and statistically significant estimate of 0 shows evidence of a secular increase in crime across all cities between 1982 and 1987.


The fixed effects model with time demeaned data l.jpg

The fixed effects model with time demeaned data the data (but including the time period dummy) we would find

Define the time demeaned data series for y as

where (and similarly for x and u)

then if we have

and

then we can estimate

Again the unobserved effect has disappeared.


Some comments on these two approaches l.jpg

Some comments on these two approaches the data (but including the time period dummy) we would find

If T=2 the first-differencing approach and the fixed effects demeaned data approaches are equivalent(see W3 p491)

But with T>2 the choice depends on the relative efficiency of the two methods.

Wooldridge says that the first-differences method is better if the differenced u term is serially uncorrelated – but he advises you to try both approaches and look to explain why the results differ (if they do).


Random effects models l.jpg

Again suppose that we have the data (but including the time period dummy) we would find

But here we assume that

(Wooldridge allows for k separate x variables, so each must be uncorrelated with a).

Writing vit as the composite error

We have

The composite error vit is serially correlated

Random effects models


More on the random effects model l.jpg

More on the random effects model the data (but including the time period dummy) we would find

If we knew these variances we could calculate

and use Generalised Least Squares based on the quasi-demeaned data and

This purges the disturbance term of the serial correlation.

In practice this means that we use Feasible GLS estimation (Kennedy calls it EGLS – estimated GLS) where  is estimated as part of the process. Wooldridge comments (p495) that the algebra is fairly unpleasant – but most econometric software packages will do this for you.


Some final comments l.jpg

Some final comments the data (but including the time period dummy) we would find

  • Kennedy proposes a strategy in which you use a Hausman exogeneity test to see if the random effects estimator is unbiased (if this null is not rejected then use the RE model, otherwise use the FE approach).

  • Wooldridge says the key issue is whether one can plausibly assume that the ai are uncorrelated with the x variables (in which case one can use the RE model via FGLS estimation). But it is just a question of which estimator is more efficient so the fixed effects estimator would still be unbiased and consistent.


ad