You are on page 1of 24

A short introduction to applied

econometrics
Part D: Panel Data Analysis
presented by
Dipl. Volkswirt Gerhard Kling
Advantages of panel analysis
More observations
More degrees of
freedom
Reduced
multicollinearity
Improved efficiency (unbiased estimator with smallest
variance for all possible true parameter values)
Pooling of
cross sectional
and time series
data
Stems from more
observations
Especially a
problem in
distributed lag
model
Advantages of panel analysis
Wider range of
problems
Causality
discussion
you can test new hypothesis on individual behavior or
policy changes that affect several entities
Dynamics of
change e.g.
labor market
participation
Time structure
facilitates
discussion
The importance of the data structure
Example: 11 countries over 10 years
General note: cross-sectional dimension
should be larger than time dimension
But: many new models currently developed
Very fertile field for research!
I prefer the following data structure
The importance of the data structure
name code year gdp sav pop
Albania ALB 1990 6,75179343 20,9783993 1,6
Albania ALB 1991 -11,4142038 -13,0284996 -0,2
Albania ALB 1992 -27,5896031 -75,4131012 -1,6
Albania ALB 1993 -5,69153612 -33,6716003 -1,4
Albania ALB 1994 11,1974627 -9,88263035 0,2
Albania ALB 1995 9,1941036 -3,94799995 1,2
Albania ALB 1996 7,55757392 -11,8118 1,3
Albania ALB 1997 7,73893405 -9,25912952 1,2
Albania ALB 1998 -8,06352119 -6,69585991 1,1
Albania ALB 1999 -1,66910005 1,1
Algeria DZA 1990 2,29575915 27,4666996 2,5
Algeria DZA 1991 -3,72084675 36,6562004 2,4
Algeria DZA 1992 -3,55414336 32,3755989 2,4
Algeria DZA 1993 -0,79384221 27,8384991 2,3
Algeria DZA 1994 -4,35723136 27,0359993 2,2
Algeria DZA 1995 -3,31007521 28,4333992 2,2
Algeria DZA 1996 1,59040861 31,4230003 2,2
Algeria DZA 1997 1,58921549 32,1985016 2,2
Algeria DZA 1998 -1,03429441 27,0669003 2,1
Algeria DZA 1999 1,44857954 31,6912003 2,1
First cross-
sectional unit
Time dimension
missing
Pooled regression
Combine both dimensions in one data set
Neglect time and cross-sectional structure
Run following regression with POLS/SOLS

Thereby, i...countries, t...years
it it it it
e pop sav gdp + + + = | o
Pooled regression
. reg gdp pop sav

variables coefficients t-values p-values
pop -1.73028 -1.95 0.055
sav 0.1766935 3.51 0.001
Adjusted R
2
0.10
F-test 6.20 (0.003)
Observations 95

Autocorrelation
Now time dimension; hence, correlation
among successive residuals possible
This affects t and p-values violates
assumption E(e
it
e
it-j
)=0 for all j=0
How can we test for this problem?
What can we do if we detect
autocorrelation?
Autocorrelation
Stata should know that the data set is a
panel
Command: tsset (i) year
note: i=cross-section
Normal test commands for autocorrelation
do not work; hence, develop own test
(several procedures!)

Test for Autocorrelation
Run the following regression and estimate
residuals

Insert lagged residuals in regression

Run t-test for autocorrelation coefficient
H
0
: =0 if rejected autocorrelation
Note: AR(1) and assumption of strict exogenity!


it it it it
e pop sav gdp + + + = | o
it it it it it
e e pop sav gdp + + + + =
1

| o
Hint: Construction of Lags with Panel Data
After regress command predict r, resid
Then construct lagged residual
gen r1=r[_n-1]
Problem: Panel structure; thus, replace
lagged values for first year (1990 in our
case) replace r1=. if year==1990
Note: t-value reaches 4.62!
Robust Estimation Procedure
We estimate a so called long-run variance
using the Newey-West (1987) procedure
Estimation of variance-covariance matrix is
now robust against heteroscedasticity and
autocorrelation
Command: newey2 gdp pop sav, lag(5)
Number of lags = truncation (can be
determined!)
Robust Estimation Procedure
. newey2 gdp pop sav, lag(5)

variables coefficients t-values p-values
constant -0.7222 -0.41 0.679
pop -1.7303 -2.28 0.025
sav 0.1767 2.62 0.010
Adjusted R
2
0.10
F-test 6.20 (0.003)
Observations 95


Note: point estimates are the same!
GLS Estimation Procedure
Make assumptions regarding heteroscedasticity
and autocorrelation
Note: often called FGLS feasible!
Command: xtgls then different specifications
possible
Can also be used to test for specific
heteroscedasticity using log-likelihood ratio tests
Note: If structure too complicated loss of
degrees of freedom!
GLS Estimation Procedure
. xtgls gdp sav pop, corr(ar1) panels(hetero) force

variables coefficients z-values p-values
constant -0.2978 -0.22 0.825
pop -0.3767 -0.76 0.450
sav 0.1012 1.82 0.068
Wald chi
2
3.41 (0.182)
Observations 95

Pitfalls of GLS
Specification of form of autocorrelation and
heteroscedasticity important
If specification bad estimates are biased
General: I would prefer this procedure for
larger samples because more parameters
need to be estimated
Can be used to test for instance panel-level
heteroscedasticity!
Fixed Effects Regression
Assumption: partial impact (slope) stays
constant over time and across countries
Different methods
Insert time dummies into regression
Insert dummies for cross-sectional units
Insert both types of dummies
Note: Sometimes dummies are not reported
if too many!

Fixed Effects Regression
Useful command: areg you do not need to construct dummies by hand!
areg gdp sav pop, absorb(i)
areg gdp sav pop, absorb(year)
both is not possible but use xi: reg gdp sav pop i.year i.i
variables Year dummies Country dummies Both
constant -0.8954 (0.525) -3.8602 (0.106) 2.2578 (0.582)
pop -1.5334 (0.099) -0.7835 (0.654) -0.6431 (0.728)
sav 0.1705 (0.002) 0.2878 (0.005) 0.2710 (0.017)
Adjusted R
2
0.07 0.13 0.10
F-test 5.27 (0.007) 4.60 (0.013) 1.51 (0.102)
Observations 95 95 95


Fixed Effects Regression:
Joint F-tests indicate that neither time nor country
dummies are relevant
But: For a few countries dummies might be used
General: You have to estimate lots of additional
coefficients
But: Widely applied and easy to interpret
Note: Time dummies do not eliminate problems
that may arise from stochastic trends!
Random Effects Regression
We assume the following regression

Individual effects are random
Estimation with GLS or maximum
likelihood procedure
After estimation: Breusch-Pagan (1980) test
or likelihood ratio test whether random
effects should be assumed
it i it it it
e u pop sav gdp + + + + = | o
Random Effects Regression
xtreg gdp pop sav, re random effects with group variable i (countries)
Postestimation command: xttest0 carries out a LM test (H
0
: Var(u
i
)=0)
xtreg gdp pop sav, mle maximum likelihood estimation
Note: Likelihood ratio test is reported
variables GLS ML
constant -0.9731 (0.518) -0.7222 (0.590)
pop -1.7037 (0.076) -1.7303 (0.048)
sav 0.1860 (0.001) 0.1767 (0.000)
Wald chi
2
11.54 (0.003) -
LR test - 12.01 (0.003)
Observations 95 95

Test whether random effects should be used
LM test 0.11 (0.736) -
LR test - 0.00 (1.000)


Which Procedure should we use?
Neither fixed nor random effects are
superior
Little evidence that individual effects matter
Hence: stick to POLS/SOLS pooled
regression
Maybe: use dummies for extreme countries
Check stability of coefficients over time
(goes beyond the scope of the course!)
The Causality Issue
Note: We assume that current saving rate and
population growth rate affect GDP growth rate
But: Possible that causality goes the other way
round!
Solution: VAR model test for Granger causality
Result: Savings and population growth rate
Granger cause GDP growth rate and not vice
versa!
Additional Issues
Stochastic trends in panel data
Spurious regressions
Unit-root tests panel based; thus, more
observations
First differencing or deviation from common
trends
Long-term equilibriums and cointegration

You might also like