A short introduction to applied econometrics Part D: Panel Data Analysis

A short introduction to applied econometricsPart D: Panel Data Analysis presented by Dipl. Volkswirt Gerhard Kling

Advantages of panel analysis More observations More degrees of freedom Reduced multicollinearity Pooling of cross sectional and time series data Especially a problem in distributed lag model Stems from more observations  Improved efficiency (unbiased estimator with smallest variance for all possible true parameter values)

Advantages of panel analysis Wider range of problems Causality discussion Dynamics of change e.g. labor market participation Time structure facilitates discussion you can test new hypothesis on individual behavior or policy changes that affect several entities

The importance of the data structure • Example: 11 countries over 10 years • General note: cross-sectional dimension should be larger than time dimension • But: many new models currently developed • Very fertile field for research! • I prefer the following data structure

First cross-sectional unit Time dimension The importance of the data structure missing

Pooled regression • Combine both dimensions in one data set • Neglect time and cross-sectional structure • Run following regression with POLS/SOLS Thereby, i...countries, t...years

Pooled regression

Autocorrelation • Now time dimension; hence, correlation among successive residuals possible • This affects t and p-values – violates assumption E(eiteit-j)=0 for all j0 • How can we test for this problem? • What can we do if we detect autocorrelation?

Autocorrelation • Stata should know that the data set is a panel • Command: tsset (i) year • note: i=cross-section • Normal test commands for autocorrelation do not work; hence, develop own test (several procedures!)

Test for Autocorrelation • Run the following regression and estimate residuals • Insert lagged residuals in regression • Run t-test for autocorrelation coefficient • H0: =0 – if rejected autocorrelation • Note: AR(1) and assumption of strict exogenity!

Hint: Construction of Lags with Panel Data • After regress command – predict r, resid • Then construct lagged residual – gen r1=r[_n-1] • Problem: Panel structure; thus, replace lagged values for first year (1990 in our case) – replace r1=. if year==1990 • Note: t-value reaches 4.62!

Robust Estimation Procedure • We estimate a so called long-run variance using the Newey-West (1987) procedure • Estimation of variance-covariance matrix is now robust against heteroscedasticity and autocorrelation • Command: newey2 gdp pop sav, lag(5) • Number of lags = truncation (can be determined!)

Robust Estimation Procedure Note: point estimates are the same!

GLS Estimation Procedure • Make assumptions regarding heteroscedasticity and autocorrelation • Note: often called FGLS – feasible! • Command: xtgls – then different specifications possible • Can also be used to test for specific heteroscedasticity using log-likelihood ratio tests • Note: If structure too complicated – loss of degrees of freedom!

GLS Estimation Procedure

Pitfalls of GLS • Specification of form of autocorrelation and heteroscedasticity important • If specification bad – estimates are biased • General: I would prefer this procedure for larger samples because more parameters need to be estimated • Can be used to test for instance panel-level heteroscedasticity!

Fixed Effects Regression • Assumption: partial impact (slope) stays constant over time and across countries • Different methods • Insert time dummies into regression • Insert dummies for cross-sectional units • Insert both types of dummies • Note: Sometimes dummies are not reported if too many!

Fixed Effects Regression

Fixed Effects Regression: • Joint F-tests indicate that neither time nor country dummies are relevant • But: For a few countries dummies might be used • General: You have to estimate lots of additional coefficients • But: Widely applied and easy to interpret • Note: Time dummies do not eliminate problems that may arise from stochastic trends!

Random Effects Regression • We assume the following regression • Individual effects are random • Estimation with GLS or maximum likelihood procedure • After estimation: Breusch-Pagan (1980) test or likelihood ratio test whether random effects should be assumed

Random Effects Regression

Which Procedure should we use? • Neither fixed nor random effects are superior • Little evidence that individual effects matter • Hence: stick to POLS/SOLS pooled regression • Maybe: use dummies for extreme countries • Check stability of coefficients over time (goes beyond the scope of the course!)

The Causality Issue • Note: We assume that current saving rate and population growth rate affect GDP growth rate • But: Possible that causality goes the other way round! • Solution: VAR model – test for Granger causality • Result: Savings and population growth rate Granger cause GDP growth rate and not vice versa!

Additional Issues • Stochastic trends in panel data • Spurious regressions • Unit-root tests – panel based; thus, more observations • First differencing or deviation from common trends • Long-term equilibriums and cointegration

A short introduction to applied econometrics Part D: Panel Data Analysis