You are on page 1of 33

Chap.

5 Unit Roots and Cointegration in panels


5.1 Introduction
With the growing use of cross-country data over time to study purchasing power parity,
growth convergence and international R&D spillovers, the focus of panel data
econometrics has shifted towards studying the asymptotics of macro panels with large
(say over 100) n and large T rather than the usual asymptotics of micro panels with large n
and small (less than 10) T. The fact that T is allowed to increase to in macro panel data,
generated two strands of ideas. The first rejected the homogeneity of the regression
parameters implicit in the use of a pooled regression model in favor of heterogeneous.
Another strand of literature applied time series procedures to panels, worrying about
nonstationarity, spurious regressions and cointegration. However, The process of deriving
the limiting distribution in panel has many differences with that in time series, the way in
which n, the number of cross-section units, and T, the length of the time series, tend to
infinity is crucial for determining asymptotic properties of estimators and tests proposed
for nonstationary panels, see Phillips and Moon (1999). Several approaches are possible,
including (i) sequential limits where one index, say n, is fixed and T is allowed to increase
to infinity, giving an intermediate limit, then by letting n tend to infinity subsequently. (ii)
diagonal limits which allows the two indexes, n and T, to pass a infinity along a specific
diagonal path in the two-dimensional array. (iii) joint limits, which allows both n and T to
pass to infinity simultaneously without placing specific diagonal path restrictions on the
divergence.
Testing for unit roots in time series studies is now common practice among applied
researches and has become an integral part of econometric courses. However, testing for
unit roots in panels is resent. The development of panel unit roots could be described by
the following graph,

Breitung(2000)

Homogeneous Hadri(2000)

First generation LLC(2002)
Maddala and Wu(1999)
Heterogeneous Choi(2001)
IPS(2003)
Panel unit root tests

O 'Connell(1998)
Breitung and Das(2005)

Second generation Moon and Perron(2004)
Bai and Ng(2004)

Pesaran (2003)

Here, we briefly describe them respectively.


Consider a following AR(1) process for panel:

yit = i yi ,t 1 + X it i + uit (5.1)

1
where i = 1, 2, , n cross-section units or series, that are observed over periods

t = 1, 2, , Ti . The X it represent the exogenous variables in the model, including any

fixed effects or individual trends, i are the autoregressive coefficients, and the errors

uit are assumed to be mutually independent idiosyncratic disturbance. If i < 1 , yi is

said to be weakly (trend-) stationary. On the other hand, if i = 1 then yi contains a


unit root.

5.2 Panel unit root tests assuming cross-sectional independence


(First generation)
5.2.1 Test with common unit root process (homogeneous)
Common root indicates that the tests are estimated assuming a common AR structure
for all of the series. Levin, Lin, and Chu (LLC), Breitung, and Hadri tests all assume that

there is a common unit root process so that i is identical across cross-sections. The
first tests employ a null hypothesis of a unit root while the Hadri test uses a null of no unit
root.
1) Levin, Lin, and Chu (2002)
LLC argued that individual unit root tests have limited power against alternative
hypothesizes with highly persistent deviations from equilibrium. This is particularly severe
in small samples. LLC suggest a more powerful panel unit root test than performing
individual unit root tests for each cross-section. The null hypothesis is that each individual
time series contains a unit root against the alternative that each time series is stationary.
The maintained hypothesis is that
pi
yit = yi ,t 1 + iL yi ,t L + mi d mt + it m = 1, 2,3 (5.2)
L =1

with d mt indicating the vector of deterministic variables and mi the corresponding

vector of coefficients for model m = 1, 2,3 . In particular, d1t = {empty set} , d 2t = {1}

and d 3t = {1, t} . Since the log order pi is unknown, LLC suggest a three-step procedure

to implement their test.

Step1. Perform separate augmented Dickey-Fuller(ADF) regressions for each


cross-section:
pi
yit = i yi ,t 1 + iL yi ,t L + mi d mt + it m = 1, 2,3 (5.3)
L =1

2
The lag order pi is permitted to vary across individuals.

For a given T, choose a maximum lag order pmax and then use the t-statistic of iL

to determine if a smaller lag order is preferred.

Once pi is determined two auxiliary regressions are run to get orthogonalized

residuals:

Run yit on yi ,t L ( L = 1, , pi ) and d mt to get residuals eit

Run yi ,t 1 on yi ,t L ( L = 1, , pi ) and d mt to get residuals vi ,t 1

To control for heterogeneity across individuals, standardize these residuals by the


regression standard error from equation (5.3).

eit vi ,t 1
eit = , vi ,t 1 = ,
i i

where i is the regression standard error in (5.3).

Step 2. Estimate the ratio of long-run to short-run standard deviations.


Under the null hypothesis of a unit root, the long-run variance of (5.2) can be
estimated by

1 T K
1 T
yi2 =
T 1 t =2
y 2
it + 2
L =1
KL
T 1 t = 2+ L
yit yi ,t L

where K is a truncation lag that can be data-dependent. K must be obtained in a
manner that ensures the consistency of yi . For a Bartlett kernel,
2
KL = 1 ( L ( K + 1) ) .
For each cross-section i, the ratio of the long-run standard deviation to the innovation

standard deviation is estimated by si = yi i .The average standard deviation is

n
1
estimated by Sn =
n
s . This important statistic will be used to adjust the mean of the
i =1
i

t-statistic later in step 3.


Step 3. Compute the panel test statistics.
Pool all cross sectional and time series observations to estimate

eit = vi ,t 1 + it ,

based on a total of nT observations, where T = T p 1 is the average number of

1 n
observations per individual in the panel, and p = pi is the average lag order for the
n i =1

3
individual ADF regressions. The conventional t-statistic for testing = 0 is given by

t =
STD ( )
where
n T

vi ,t 1eit
n T 2
1 2

( ) (e )
n T
1
, STD = vi ,t 1
2
i =1 t = 2 + pi
= n T
, =
2
it vi ,t 1 .
i =1 t = 2+ pi
2 nT i =1 t = 2 + pi
vi ,t 1
i =1 t = 2 + pi

Compute the adjusted t-statistic


t =
( )
t nTSn 2 STD mT

(5.4)
mT

where mT

and mT are the mean and standard deviation adjustments provided by

table 2 of LLC. LLC shows that t is asymptotically distributed as N ( 0,1) .


LLC suggest using their panel unit root test for panels of moderate size with n
between 10 and 250 and T between 25 and 250.
Limitations: The test crucially depends upon the independence assumption across
cross-sections and its not applicable if cross-sectional correlation is present. Second, the
assumption that all cross-sections have or do not have a unit root is restrictive.

2) Breitung (2000)

The LLC and IPS tests require n such that n T 0 , i.e., n should be small

enough relative to T . This means that both tests may not keep nominal size well when
either n is small or n is relative to T . Breitung(2000) studies the local power of LLC
and IPS test statistics against a sequence of local alternatives. Breitung finds that the LLC
and IPS tests suffer from a dramatic loss of power if individual-specific trends are included.
This is due to the bias correction that also removes the mean under the sequence of local
alternative. Breitung suggests a test statistic that does not employ a bias adjustment
whose power is substantially higher than that of LLC or the IPS tests using Monte Carlo
experiments. The simulation results indicate that the power of LLC and IPS tests is very
sensitive to the specification of the deterministic terms.
Breitung (2000) test statistic without bias adjustment is obtained as follows. Step1 is

the same as LLC but only yi ,t L is used in obtaining the residuals eit and vi ,t 1 . The

residuals are then adjusted (as in LLC) to correct for individual-specific variances. Step 2,

the residuals eit are transformed using the forward orthogonalization transformation

employed by Arellano and Bover (1995):

4
T t ei ,t +1 + + ei ,T
eit = eit
T t +1 T t
Also,

t 1
vi ,t 1 vi ,1 T vi ,T , with intercept and trend

vi,t 1 = vi ,t 1 vi ,1 , with intercept, no trend
v , with no intercept or trend
i ,t 1

The last step is to run the pooled regression

eit = vi,t 1 + it

and obtain the t-statistic for H 0 : = 0 which has in the limit a standard N ( 0,1)

distribution.

3) Hadri (2000)
Hadri(2000) derives a residual-based Lagrange multiplier (LM) test where the null
hypothesis is that there is no unit root in any of the series in the panel against the
alternative of a unit root in the panel. In particular, Hadri(2000) considers the following two
models:

yit = rit + it and yit = rit + i t + it i = 1, , n; t = 1, ,T (5.5)

where rit = ri ,t 1 + uit is a random walk. it IIN ( 0, 2 ) and uit IIN ( 0, u2 ) are
mutually independent normals that IID across i and over t. Using back substitution, model
2 becomes
t
yit = ri 0 + i t + uis + it = ri 0 + i t + vit (5.6)
s =1

t
where vit = u
s =1
is + it . The stationary hypothesis is simply H 0 : u2 = 0 , in which case

vit = it . The LM statistic is given by

1 n 1 T
2
LM 1 = 2 S 2
it (5.7)
n i =1 T t =1
t
where Sit =
s =1
is are partial sum of OLS residuals is from (5.6) and 2 is a

consistent estimate of 2 under the null hypothesis H 0 . A possible candidate is

5
n T
1
2 =
nT
.
i =1 t =1
2
it

Hadri (2000) suggested an alternative LM test that allows for Heteroskedasticity across i,

say 2i . This is in fact

1 n 1 T

LM 2 = S 2i
2

n i =1 T 2
it (5.8)
t =1

The test statistic is given by Z = n ( LM ) and is asymptotically distributed as


1 1 1
N ( 0,1) , where = and = if the model only include a constant, and =
6 45 15
11
and = , otherwise.
6300

5.2.2 Test with individual unit root process (heterogeneous)


1) Im, Pesaran and Shin Test (IPS 2003)
The LLC test is restrictive in the sense that it requires to be homogeneous across i. As
Maddala(1999) points out, the null may be fine for testing convergence in growth among
countries, but the alternative restricts every country to converge at the same rate. Im et

al.(2003) (IPS) allow for a heterogeneous coefficient of yi ,t 1 and propose an alternative

testing procedure based on averaging individual unit root test statistics. IPS suggest an

average of the ADF tests when uit is serially correlated with different serial correlation

properties across cross-sectional units, i.e., the model given in (5.3). The null hypothesis

is that each series in the panel contains a unit root, i.e., H 0 : i = 0 for all i and the

alternative hypothesis allows for some (but not all) of the individual series to have unit
roots, i.e.,

i < 0 for i = 1, 2, , n1
H1 :
i = 0 for i = n1 + 1, ,n
Formally, it requires the fraction of the individual time series that are stationary to be

nonzero, i.e., lim ( n1 n ) = where 0 < 1 . This condition is necessary for the
n

consistency of the panel unit root test. The IPS t-bar statistic is defined as the average of
the individual ADF statistics as

1 n
t = t
n i =1 i

where t i is the individual t-statistic for testing H 0 : i = 0 for all I in (5.3). In case the lag

6
order is always zero ( pi = 0 for all i), IPS provide simulated critical values for t for

different number of cross-section n, series length T and Dickey-Fuller regressions


containing intercepts only or intercepts and linear trends. In the general case where the

lag order pi may be nonzero for some cross-sections, IPS show that a properly

standardized t has an asymptotic N ( 0,1) distribution. Starting from the well-known

result in time series that for a fixed n


1

t i
W
0
iZ dWiZ
= tiT (5.9)
12
1W 2
0 iZ

as T , where W ( r ) dr denotes a Weiner integral with the argument r suppressed


in (5.9), IPS assume that tiT are IID and have finite mean and variance. Then

1 n 1 n
n i =1 tiT i =1 E tiT i = 0
n n N 0,1
( ) (5.10)
1 n
var tiT i = 0
n i =1
as n by the Lindeberg-Levy central limit theorem. Hence

1 n
n t i =1 E tiT i = 0
n N 0,1
t IPS = ( ) (5.11)
1 n
var tiT i = 0
n i =1

as T followed by n sequentially. The values of E tiT i = 0 and

var tiT i = 0 have been computed by IPS via simulations for different values of T and

pi s.

2) Fisher-ADF and Fisher-PP

Let GiTi be a unit root test statistic for the ith group in (5.2) and assume that as the time

series observations for the ith group Ti , GiTi Gi where Gi is a nondegenerate

random variable. Let pi be the asymptotic p-value of a unit root test for cross-section i,

( )
i.e., pi = F GiTi , where F ( ) is the distribution function of the random variable Gi .

Maddala and Wu (1999) and Choi (2001) proposed a Fisher-type test:

7
n
P = 2 ln pi 2 ( 2n ) (5.12)
i =1

which combines the p-values from unit root tests for each cross-section i to test for unit
root in panel data.
Choi(2001) proposes two other test statistics besides Fishers inverse chi-square test
statistic P. The first is the inverse normal test

1 n 1
Z= ( pi )
n i =1
(5.13)

where is the standard normal cumulative distribution function. Since 0 pi 1 ,

1 ( pi ) is a N ( 0,1) random variable and as Ti for all i, Z N ( 0,1) .

The second is the logit test


n
p
L = ln i (5.14)
i =1 1 pi

pi
has the logistic distribution with mean 0 and variance 3 . As
2
where ln
1 pi

3 ( 5n + 4 )
Ti for all i, mL t5 n + 4 where m = .
2 n ( 5n + 2 )
When n is large, Choi (2001) proposed a modified P test,
n
1
Pm =
2 n
( 2 ln p 2 ) N ( 0,1) (T followed by n )
i =1
i i (5.15)

The distribution of the Z statistic is invariant to infinite n, and Z N ( 0,1) as Ti

1 n
pi
and then n . Also, the distribution of mL ln 1 p N ( 0,1) by
n3
2
i =1 i

the LIndeberg-Levy central limit theorem as Ti and then n . Therefore, Z and

mL can be used without modification for infinite n.

5.2 Panel unit root tests allowing for cross-sectional dependence


(Second generation)
5.2.1 Another approach to contemporaneous correlation testing
Pesaran (2004) suggests a simple test of disturbance cross-section dependence (CD)
that is applicable to a variety of panel models including stationary and unit root dynamic
heterogeneous panels with short T and large n. The proposed test is based on an average

8
of pairwise correlation coefficients of OLS residuals from the individual regression in the
panel rather than their squares as in the Breusch-Pagan LM test:

2T n 1 n
CD = ij (5.16)
n ( n 1) i =1 j =i +1

( ) ( )
12 12
where ij = t =1 eit e jt t =1 eit2 t =1 e2jt
T T T
, with eit denoting OLS residuals based

on T observations for each i = 1, , n . Monte Carlo experiments show that the standard

Breusch-Pagan LM test performs badly for n > T panels, whereas Pesarans CD test
performs well even for small T and large n.

5.2.2 Second generation panel unit root tests


1) The SUR Method
According to a seemingly unrelated regression system, Oconnell (1998) suggests to
estimate the system by using a GLS estimator (see also Flores, Jorion, Preumont and

1 T
Szarfarz, 1999). Let
= tt denote the sample covariance matrix of the residual
T t =1
vector. The GLS-t statistic is given by
T

y t
1
yt 1
t gls ( n ) = t =1
(5.17)
T

y
t =1
t 1
1
yt 1

where yt is the vector of demeaned variables. Harvey and Bates (2003) derive the

limiting distribution of t gls ( n ) for a fixed n and as T , and tabulate its asymptotic

distribution for various values of n. Breitung and Das (2005) show that if yt = yt y0 is

used to demean the variables and T if followed by n , then the GLS t-statistic
possesses a standard normal limiting distribution.
The GLS approach cannot be used if T < n as in this case the estimated covariance

matrix
is singular. Furthermore, Monte Carlo simulations suggest that for reasonable

size properties of the GLS test, T must be substantially larger than n (e.g. Breitung and
Das, 2005) Maddala and Wu (1999) and Chang (2004) have suggested a bootstrap
procedure that improves the size properties of the GLS test.
An alternative approach based on panel corrected standard errors (PCSE) is
considered by Jonsson (2005) and Breitung and Das (2005). In the model with weak
dependence, if T is followed by n the robust t statistic

9
T

y t 1 yt
t ROLS = t =1
(5.18)
T

y
t =1
t 1 yt 1

Is asymptotically standard normally distributed (Breitung and Das, 2005).


If it is assumed that the cross correlation is due to common factors, then the largest

eigenvalue of the error covariance matrix, , is O p ( n ) and the robust PCSE approach

breaks down. Specifically, Breitung and Das (2008) showed that in this case t ROLS is

distributed as the ordinary Dickey-Fuller test applied to the first principal component.

2) The methods of Common Factor


Cross-section dependence can arise due to a variety of factors, such as omitted observed
common factors, spatial spill over effects, unobserved common factors, or general
residual interdependence that could remain even when all the observed and unobserved
common effects are taken into account.
Dynamic factor models have been used to capture cross-section correlation. Moon
and Perron (2004c) consider the following model:
yit = i + yit0
(5.19)
yit0 = i yi0,t 1 + it

where it are unobservable error terms with a factor structure and i are fixed effects.

it is generated by M unobservable random factors ft and idiosyncratic shocks eit as


follows:

it = i ft + eit (5.20)

where i are nonrandom factor loading coefficient vectors and the number of factors M

is unknown. Each it contains the common random factor ft , generating the correlation

among the cross-sectional units of it and yit . The extent of the correlation is

( )
determined by the factor loading coefficients i , i.e., E yit y jt = i E ( f t f t ) j . Moon

and Perron treat the factors as nuisance parameters and suggest pooling defactored data

to construct a unit root test. Let Q be the matrix projecting onto the space orthogonal to

the factor loadings. The defactored data is YQ and the defactoreed residuals eQ no

longer have cross-sectional dependence, where Y is a T n matrix whose i-th column

10
contains the observations for cross-sectional unit i.

Let e2,i be the variance of eit , we2,i be the long-run variance of eit and e ,i be

the one-sided long run variance of eit . Also, e2 , we2 and e be their cross-sectional

averages, and e4 be the cross-sectional average of we4,i . The pooled bias-correlate


estimate of is
tr (Y1QY ) nT en
pool
+
= (5.21)
tr (Y1QY1 )

where Y1 is the matrix of lagged data. Moon and Perron suggest two statistics to test

H 0 : i = 1 for all i = 1, , M against the alternative hypothesis H A : i < 1 for some i.

These are
nT ( pool
+
1)
ta = (5.22)
2e4
we4
and

we2
tb = nT ( pool 1)
1
+
tr ( Y Q Y
1 1 ) (5.23)
nT 2 e4

These tests have a standard N ( 0,1) limiting distribution where n and T tend to infinity

such that n T 0 . Moon and Perron also show that estimating the factors by principal

components and replacing we


2
and e4 by consistent estimates leads to feasible
statistics with the same limiting distribution.
Bai and Ng (2004) consider the following dynamic factor model:
yit = i + i ft + yit0
(5.24)
yit0 = i yi0,t 1 + it
They test separately the stationarity of the factors and the idiosyncratic component. To do
so, they obtain consistent estimates of the factors regardless of whether residuals are
stationary or not. They accomplish this by estimating factors on first-differenced data and
cumulating these estimated factors. Bai and Ng suggest pooling results from individual
ADF tests on the estimated defactored data by combining p-values as in Maddala and Wu
(1999) and Choi (2001):
n
2 ln pec ( i ) 2n
Pec = i =1

d
N ( 0,1) (5.25)
4n

11
where pe ( i ) is the p-value of the ADF test (without any deterministic component) on the
c

estimated idiosyncratic shock for cross-section i.


Pesaran (2003) suggests a simpler way of getting rid of cross-sectional dependence
than estimating the factor loading. His method is based on augmenting the usual ADF
regression with the lagged cross-sectional mean and its first difference to capture the
cross-sectional dependence that arises through a single factor model. This is called the
cross-sectionally augmented Dickey-Fuller (CADF) test. This simple CADF regression is

yit = i + i yi ,t 1 + d 0 yt 1 + d1yt + it (5.26)

where yt is the average at time t of all n observations. The presence of the lagged

cross-sectional average and its first difference accounts for the cross-sectional
dependence through a factor structure. If there is serial correlation in the error term or the
factor, the regression must be augmented as usual in the univariate case, but lagged

first-differences of both yit and yt must be added, which leads to

p p
yit = i + i yi ,t 1 + d 0 yt 1 + d j +1yt j + ck yi ,t k + it (5.27)
j =0 k =1

where the degree of augmentation can be chosen by an information criterion or sequential


testing. After running this CADF regression for each unit i in the panel, Pesaran averages
the t-statistics on the lagged value (called CADFi) to obtain the CIPS statistic

1 n 1 n
CIPS =
n i =1
CADFi = ti ( n, T )
n i =1
(5.28)

The joint asymptotic limit of the CIPS statistic is nonstandard and critical values are
provided for various choices of n and T. The t-tests based on this regression should be

devoid of i f t in the limit and therefore free of cross-sectional dependence. The limiting

distribution of these tests is different from the Dickey-Fuller distribution due to the
presence of the cross-sectional average of the lagged level. Pesaran uses a truncated
version of the IPS test that avoids the problem of moment calculation. That is

ti ( n, T ) = ti ( n, T ) K1 < ti ( n, T ) < K 2

ti ( n, T ) = K1 ti ( n, T ) K1 (5.29)

ti ( n, T ) = K 2 ti ( n, T ) K 2

Pesaran suggests that K16.19 and K 22.61 , the values of K1 and K 2 depend

on the determined trend in the CADF regression, Model (5.27) includes intercept only.

12
5.3 Cointegration Analysis in Panel Data
5.3.1 Residual-Based Approachs to Panel Cointegration
Like the panel unit root tests, panel cointegration tests can be motivated by the search for
more powerful tests than those obtained by applying individual time series cointegration
tests. The latter tests are known to have low power, especially for short T and short span
of the data.

1. Kao Tests
Kao (1999) presented two types of cointegration tests in panel data, the DF and ADF
types tests. Consider the panel regression model

yit = xit + zit + eit (5.30)

where yit and xit are I (1) and noncointegrated. For zit = {i } , Kao(1999) proposed

DF and ADF-type unit root tests for eit as a test for the null of no cointegration. The

DF-type tests can be calculated from the fixed effects residuals

eit = ei ,t 1 + vit (5.31)

where eit = y it xit and yit = yit yi , xit = xit xi . In order to test the null

hypothesis of no cointegration, the null can be written as H 0 : = 1 . The OLS estimate of

and the t-statistic are given as


N T

e e it i ,t 1
= i =1 t = 2
N T

e
i =1 t = 2
2
it

and
N T
( 1) ei2,t 1
t = i =1 t = 2
............................................... ( 5.32 )
se
N T

( e ei ,t 1 ) . Kao proposed the following four DF type tests:


1 2
where se =
2
it
NT i =1 t = 2

NT ( 1) + 3 N
DF =
10.2

DFt = 1.25 t + 1.875 N

1
3 N v2
NT ( 1) +
ov2
DF* =
36 v4
3+
5 ov4
and
6 N v
t +
2 ov
DFt * =
ov2 3 v2
+
2 v2 10 ov2

where v2 = yy yx xx1 and ov2 =



yy

yx
1 , and is estimator of long run
xx

covariance of it = ( yit , xit ) ,


is the estimator of contemporaneous covariance of

it = ( yit , xit ) . While DF and DFt are based on the strong exogeneity of the
* *
regressors and disturbances, DF and DFt are for the cointegration with endogenous

relationship between regressors and disturbances. For the ADF test, we can run the
following regression:
P
eit = ei ,t 1 + j ei ,t j + vit ............................................. ( 5.33)
j =1

with the null hypothesis of no cointegration, the ADF test statistics can be constructed as:
6 N v
t ADF +
2 ov
ADF = ....................................................... ( 5.34 )
ov2 3 v2
+
2 v2 10 ov2

Where t ADF is the t-statistic of in (5.33). The asymptotic distributions of DF , DFt ,

DF* , DFt * and ADF converge to a standard normal distribution N ( 0,1) by sequential

limit theory.

2. Pedroni Tests
Pedroni (1999, 2004) also proposed several tests for the null hypothesis of cointegration
in a panel data model that allows for considerable heterogeneity.
Pedroni considered the following type of regression:

yit = i + i t + X it i + eit ................................................... ( 5.35 )

for a time series panel of observables yit and X it for members i = 1, ", N over time

2
periods t = 1," , T , where X it is an m-dimensional column vector for each member i

and i is an m-dimensional column vector for each member i. The variables yit and

X it are assumed to be I(1), for each member i of the panel, and under the null of no

cointegration the residual eit will also be I(1). The parameters i and i allow for
possibility of member specific fixed effects and deterministic trends, respectively. The

slope coefficients i are also permitted to vary by individual, so that in general the
cointegrating vectors may be heterogeneous across members of the panel.
The DF-type tests and ADF-type tests can be calculated from the fixed effects
residuals

eit = i ei ,t 1 + vit

P
eit = i ei ,t 1 + ij ei ,t j + vit ............................................
j =1

The null hypotheses for cointegrated tests are:

H 0 : i = 1 ; H1 : i = < 1 ( i = 1, 2," , N )
And

H 0 : i = 1 ; H1 : i < 1 ( i = 1, 2," , N )
To study the distributional properties of above tests, Pedroni described the DGP in

terms of the partitioned vector Z it = ( Yit , X it ) such that the true process Z it is

generated as Z it = Z i ,t 1 + it , for ( )
it = itY , itX . We then assume that for each
member i the following condition holds with regard to the time series dimension.

Assumption 5.3.1 (Invariance Principle) The process (


it = itY , itX ) satisfies

[Tr ]
1
T

t =1
it Bi ( i ) , for each member i as T . Where signifies weak

convergence and Bi ( i ) is vector Brownian motion with asymptotic covariance i

such that the m m lower diagonal block 22i > 0 and where the Bi ( i ) are taken

to be defined on the same probability space for all i.


The above assumption states that the standard functional CLT is assumed to hold

individually for each member series as T grows large. The ( m + 1) ( m + 1) asymptotic

3
covariance matrix is given by

T T
i = lim E T 1 it it ................................................................. ( 5.36 )
T
t =1 t =1
And can be decomposed as

i = i0 + i + i ........................................................................................... ( 5.37 )

Where i0 and i represent the contemporaneous and dynamic covariance,

respectively of it for a given i.


Assumption 5.3.2 (Cross-Sectional Independence) The individual processes are

assumed to be i.i.d. cross-sectionally, so that E ( ) = 0


it js for all s, t, i j . More

generally, the asymptotic long-run variance matrix for a panel of size N T is block

diagonal positive definite matrix such that diag ( 1 , " , N ) .

Pedronis tests can be classified into two categories. The first set (within dimension)
is similar to the tests discussed above, and involves averaging test statistics for
cointegration in the time series across cross-section. For the second set (between
dimension), the averaging is done in pieces so that the limiting distributions are based on
limits of piecewise numerator and denominator terms. The basic approach in both cases
is to first estimate the hypothesized cointegration relationship separately for each member
of the panel and then pool the resulting residuals when constructing the panel tests for the
null of no cointegration. Specifically, in the first step, one can estimate the proposed
cointegrating regression for each individual member of the panel in the form of (5.35),
including idiosyncratic intercepts or trends as the particular model warrants, to obtain the

corresponding residuals eit . In the second step, the way in which the estimated residuals

are pooled will differ among the various statistics, which are defined as follows.
Panel variance ratio statistic:
1 1
2 2
N N T
= L11 22i = 11 ei ,t 1 ( 5.38 )
ZV A
L 2
NT
i =1 i =1 t =1
Panel-rho statistic:
1
N N
i =1 i =1
(
Z NT 1 = A22i A21i T i

)
1
N T N T
i =1 t =1 i =1 t =1
(
......... = ei2,t 1 eit ei ,t 1 i

) ( 5.39 )

Panel-t statistic:

4
1/2
2 N N
Z tNT =  NT


i =1
A22i
i =1
(
A21i T i

)
1/2
2 N T 2 N T
...... =  NT


i =1 t =1
ei ,t 1
i =1 t =1
(
)
eit ei ,t 1 i .............................................. ( 5.40 )

Group-rho statistic:
N T 1

(
) T
( )......................(5.41)
N
Z NT 1 = A221i A21i T i = ei2,t 1 eit ei ,t 1 i
i =1 t =1
i =1 t =1
Group-t statistic:
N

1/2
2 T 2 T
( )
Z tNT = i ei ,t 1 eit ei ,t 1 i ................................................ ( 5.42 )

t =1
i =1 t =1

1 K T
it = eit i ei ,t 1 , i = wsK
i

Where it i ,t s for some choice of lag window


T s =1
i
t = s +1

s 1 T 1 N
wsKi = 1 , Si2 = it2 ,  i2 = Si2 + 2i2 ,  NT
2
= 2
, and
1 + Ki
i
T t =1 N i =1

( )
N
1
L
1/ 2
= , where L11i =
1
L11 21i 22 i 21i such that
2 2 is a consistent
11i 11i i
N i =1

estimator of i .

The first three statistics are based on pooling the data across the within group of the
panel; the next two statistics are constructed by pooling the data along the between group
of the panel.
Pedroni (1999) derived asymptotic distributions and critical values for several
residual-based tests of the null of no cointegration in panels where there are multiple
regressors.

K K N
N ( 0,1) ( as..T , N )seq ............................................... ( 5.43)
VK

where (
= T 2 N 3/2 ZV , T N Z
NT NT 1 , Z tNT , TN 1/2 Z NT 1 , N 1/2 ZtNT ) for each of the

K = 1," ,5 statistics of , the values of K and VK can be found from the table in

Pedroni (1999), which depend on whether the model includes estimated fixed effects i ,

estimated fixed effects and estimated trends i , i .

Thus, to test the null of no cointegration, one simply computes the value of the

statistic so that it is in the form of (5.43) above based on the values of K and VK from

5
the table II in Pedroni (1999) and compares these to the appropriate tails of the normal
distribution. Under the alternative hypothesis, the panel variance statistic diverges to
positive infinity, and consequently the right tail of the normal distribution is used to reject
the null hypothesis. Consequently, for the panel variance statistic, large positive values
imply that the null of no cointegration is rejected. For each of the other four test statistics,
these diverge to negative infinity under the alternative hypothesis, and consequently the
left tail of the normal distribution is used to reject the null hypothesis. Thus, for any of
these latter tests, large negative values imply the null of no cointegration is rejected.

5.3.2 Tests for Multiple Cointegration


It is also possible to adapt Johansens (1995) multivariate test based on a VAR

representation of the variables. Let i ( r ) denote the cross-section specific

likelihood-ratio (trace) statistic of the hypothesis that there are (at most) r stationary

linear combinations in the cointegrated VAR system given by Yit = ( yit1 ," , yitk ) .

Following the unit root test proposed in IPS (2003), Larsson et al. (2001) suggested the
standardized LR-bar statistic
N i ( r ) E i ( r )
 (r ) = 1
(5.44)
N i =1 Var i ( r )

to test the null hypothesis that r = 0 against the alternative that at most r = r 1 .
0

 ( r ) is asymptotically standard
Using a sequential limit theory it can be shown that

normally distributed. Asymptotic values of E i ( r ) and Var i ( r ) are tabulated in

Larsson et al. (2001) for the model without deterministic terms and Breitung (2005) for
models with a constant and a linear time trend. Unlike the residual-based tests, the
LR-bar test allows for the possibility of multiple cointegration relations in the panel.

5.3.3 Panel Cointegration Tests allowing for cross-sectional


dependence:The Durbin-Hausman Tests
Westerlund proposes two new panel cointegration tests that can be applied under very
general conditions, and that are shown by simulationto be more powerful than other
existing tests.

1) Model and Assumption


We begin by assuming that the Fisher equation holds so that

iit = i + i it + zit (5.45)

it = i i ,t 1 + wit (5.46)

where it is the actual rate of inflation observed at time period t for country i, iit is the

6
ex post nominal interest rate on a nominal bond.
We have argued above that, although the nominal interest rate can in general be
viewed as nonstationary, it seems reasonable to permit inflation to be stationary.Therefore,

we do not impose any a priori restrictions on the value taken by i If i = 1 , then

inflation is nonstationary, whereas if i = 1 , inflation is stationary.

The disturbance zit is assumed to obey the following set of equations that allow for

cross-sectional dependence through the use of common factors:

zit = iFt + eit (5.47)

Fjt = j Fjt 1 + u jt (5.48)

eit = i eit 1 + vit (5.49)

where Ft is a k-dimensional vector of common factors F jt with j = 1," k , and i is a

conformable vector of factor loadings. By assuming that j < 1 for all j, we ensure that

Ft is stationary, which implies that the order of integration of the composite regression

error zit depends only on the integratedness of the idiosyncratic disturbance eit . Thus,

in this data-generating process, the relationship in (5.45) is cointegrated if i < 1 and it is

spurious if i = 1 . Note that, since iit is assumed to be nonstationary, i < 1 implies

both that it is nonstationary and that it is cointegrated with iit .


Next, we lay out the key assumptions needed for developing the new tests.
Assumption 1 (error process)

(a) vit and wit are mean zero for all i and t;

( ) ( )
(b) E vit vkj = 0 and E wit wkj = 0 for all i k , t and j;

( )
(c) E vit wkj = 0 for all i, k, t and j;

(d) var ( vit ) = i < and var ( wit ) = i is positive definite.


2

For the asymptotic theory, the following condition is also required.


Assumption 2 (invariance principle).

The partial sum processes of vit and wit satisfy an invariance principle. In

7
[ rT ]
particular, T
1 2
v
t =1
it iWi ( r ) as T for each i, where Wi ( r ) is a standard

Brownian motion defined on the unit interval r [ 0,1] .

Finally, to be able to handle the common factors, the following conditions are assumed
to hold.
Assumption 3 (common factor).

(a) E ( ut ) = 0 and var ( ut ) < ;

(b) ut is independent of vit and wit for all i and t ;

N
1
(c)
N

i =1
i i as N , where is positive definite;

(d) j < 1 for all j.


2) Test Construction

Our objective is to test whether iit and it are cointegrated or not by inferring whether

eit is stationary or not. A natural approach to do this is to employ the Bai and Ng (2004)

approach, which amounts to first estimating (5.45) in its first difference form by OLS and
then to estimate the common factors by applying the method of principal components to
the resulting residuals. A test of the null hypothesis of no cointegration can then be
implemented as a unit root test of the recumulated sum of the defactored and first
differentiated residuals.
We begin by taking first differences, in which case (5.47) becomes

zit = iFt + eit .

Thus, had zit been known, we could have estimated i and Ft directly by the

method of principal components. However, zit is not known, and we must therefore

apply principal components to its OLS estimate instead, which can be written as

zit = iit i it , (5.50)

where i is obtained by regressing iit on it . Let , F and z be K N ,

(T 1) K and (T 1) N matrices of stacked observations on i , Ft and zit ,

respectively. The principal components estimator F of F can be obtained by

computing T 1 times the eigenvectors corresponding to the K largest eigenvalues of

8
the (T 1) (T 1) matrix zz . The corresponding matrix of estimated factor loadings

is given by = F z ( T 1) . Given i and Ft , the defactored and first differentiated

residuals can be recovered as

eit = zit iFt 5.51

which can be recumulated to obtain


t
eit = eij .
j =1

As shown in the Appendix, eit is a consistent estimate of eit , which suggests that the

cointegration test can be implemented using (5.43) with eit in place of eit . In other

words, testing the null hypothesis of no cointegration is asymptotically equivalent to

testing whether i = 1 in the following autoregression:

eit = i eit 1 + error . (5.52)

At this point, it is useful to let n denote the number of units for which the no cointegration

restriction i = 1 is to be tested. This number can be equal to N but can also be a subset.
The point of having two sets for the cross-section is to highlight the fact that even if n < N
accuracy will be gained by using all N units in the estimation of the common factors. In
what follows, we shall propose two new panel cointegration tests that are based on
applying the DurbinHausman principle to (5.52) (see Choi, 1994). The first, the panel test,

is constructed under the maintained assumption that i = for all i, while the second,

the group mean test, is not. Both tests are composed of two estimators of i that have
different probability limits under the alternative hypothesis of cointegration but share the

property of consistency under the null of no cointegration. In particular, let i denote the

OLS estimator of i in (5.45), and let denote its pooled counterpart. The

corresponding individual and pooled instrumental variable (IV) estimators of i , denoted

i and  , respectively, are obtained by simply instrumenting eit 1 with eit .


As shown by Choi (1994), the IV estimators are consistent under the null hypothesis
but are inconsistent under the alternative. On the other hand, the OLS estimators are
consistent both under the null and alternative hypotheses (see Phillips and Ouliaris, 1990).
The IV and OLS estimators can thus be used to construct the DurbinHausman tests.

9
In so doing, consider the following kernel estimator:

1 Mi j T
i2 = 1 vit vit j ,
T 1 j = M i M i + 1 t = j +1

where vit is the OLS residual obtained from (5.45) and M i is a bandwidth parameter

that determines how many autocovariances of vit to estimate in the kernel. As indicated

in the Appendix, the quantity i is a i2 , the long-run variance


2
consistent estimate of

of vit . The corresponding contemporaneous variance estimate is denoted by i2 . Given

these estimates, we can construct the two variance ratios Si = i i and


2 4

Sn = n2 ( n2 ) , where

1 n 2 1 n 2
n2 = i
n i =1
and 2
n = i
n i =1
The DurbinHausman test statistics can now be obtained as

( ) ( ) e
n 2 T n T
DH g = Si i i eit21 and DH p = Sn 
2
2
it 1 . (5.53)
i =1 t =2 i =1 t = 2

Note that while the panel statistic, denoted DH p , is constructed by summing the n

individual terms before multiplying them together, the group mean statistic, denoted

DH g , is constructed by first multiplying the various terms and then summing. The

importance of this distinction lies in the formulation of the alternative hypothesis. For the

panel test, the null and alternative hypotheses are formulated as H 0 : i = 1 for all

i = 1," , n versus H1p : i = and < 1 for all i. Hence, in this case, we are in effect
presuming a common value for the autoregressive parameter both under the null and
alternative hypotheses. Thus, if this assumption holds, a rejection of the null should be
taken as evidence in favor of cointegration for all n units.

By contrast, for the group mean test, H 0 is tested versus the alternative that

H1g : i < 1 for at least some i. Thus, in this case, we are not presuming a common value
for the autoregressive parameter and, as a consequence, a rejection of the null cannot be
taken to suggest that all n units are cointegrated. Instead, a rejection should be interpreted
as providing evidence in favor of rejecting the null hypothesis for at least some of the
cross-sectional units.

10
3Asymptotic Distribution
The Durbin-Hausman tests are based on the estimated idiosyncratic error term

eit ,and are therefore asymptotically independent of the common factors. As showed in

Westerlunds Appendix ,under the null and Assumptions 1-3, each of individual group
mean statistics converges to
1
Bi = ( Wi (r ) 2 dr ) 1 as N , T
0

The fact that Bi does not depend on the common factors is a direct consequence of the

defactoring, which asymptotically removes the common components from the limiting
distribution of the individual tests. The following theorem shows that the effects of the

common factors is asymptotically negligible ,and the DH g and DH p are indeed

asymptotically normal

Theorem(asymptotic distribution). Under the null hypothesis H 0 and Assumptions

1-3(c), for i = 1 and i < 1 , as N , T with n N 0 and N T 0

n 1 2 DH g nE ( Bi ) N (0,Var ( Bi )) ,

n 1 2 DH p nE ( Ci ) N (0, E ( Ci ) Var (Ci ))


1 4

where Ci is the inverse of Bi .

5.4 Pooled Mean Group Estimation of Nonstationary

Heterogeneous Panels

5.4.1 The Model Specification


The asymptotics of large n, large T dynamic panels are different from the asymptotics
of traditional large n, small T dynamic panels. Small T panel estimation usually relies on
fixed- or random-effects estimators, or a combination of fixed-effects estimators and
instrumental-variable estimators, such W as the Arellano and Bond (1991) generalized
method-of-moments estimator. These methods require pooling individual groups and
allowing only the intercepts to differ across the groups. One of the central findings from
the large n, large T literature, however, is that the assumption of homogeneity of slope
parameters is often inappropriate.
With the increase in time observations inherent in large n, large T dynamic panels,
nonstationarity is also a concern. Recent papers by Pesaran, Shin, and Smith (1997, 1999)
offer two important new techniques to estimate nonstationary dynamic panels in which the
parameters are heterogeneous across groups: the mean-group (MG) and pooled

1
mean-group (PMG) estimators. The MG estimator (see Pesaran and Smith 1995) relies
on estimating n time-series regressions and averaging the coefficients, whereas the PMG
estimator (see Pesaran, Shin, and Smith 1997, 1999) relies on a combination of pooling
and averaging of coefficients.
Assume an autoregressive distributive lag (ARDL) dynamic panel specification of the
form
p q
yit = ij yi ,t j + ij X i ,t j +i + it (5.55)
j =1 j =0

where the number of groups i = 1, 2,..., n ; the number of periods t = 1, 2,..., T ; X it is a

k 1 vector of explanatory variables; it are the k 1 coefficient vectors; ij are

scalars; i is the group-specific effect and var ( it ) = i2 . T must be large enough


such that the model can be fitted for each group separately. Time trends and other fixed
regressors may be included.

If the variables in (5.55) are, for example, I (1) and cointegrated, then the error term

is an I ( 0 ) process for all i . A principal feature of cointegrated variables is their

responsiveness to any deviation from long-run dynamics of the variables in the system are
influenced by the deviation from equilibrium. Thus it is common to reparameterize (5.55)
into the error correction equation
p 1 q 1
yit = i ( yi ,t 1 iX it ) + ij*yi ,t j + ij*X i ,t j +i + it (5.56)
j =1 j =0

where (
i = 1 j =1 ij
p
) , i = j =0 ij
q
(1 k
ik ) , ij* = m = j +1 im
p

q
j = 1, 2,... , p 1 , and ij* =
m = j +1
im j = 1, 2,... , q 1 .

The parameter i is the error-correcting speed of adjustment term. if i = 0 , then


there would be no evidence for long-run relationship. This parameter is expected to be
significantly negative under the prior assumption that the variables show a return to a

long-run equilibrium. Of particular importance is the vector i , which contains the


long-run relationships between the variables.
The recent literature on dynamic heterogeneous panel estimation in which both n
and T are large suggests several approaches to the estimation of (5.56). On one extreme,
a fixed-effects (FE) estimation approach could be used in which the time-series data for
each group are pooled and only the intercepts are allowed to differ across groups. If the
slope coefficients are in fact not identical, however, then the FE approach produces

2
inconsistent and potentially misleading results. On the other extreme, the model could be
fitted separately for each group, and a simple arithmetic average of the coefficients could
be calculated. This is the MG estimator proposed by Pesaran and Smith (1995). With this
estimator, the intercepts, slope coefficients, and error variances are all allowed to differ
across groups.
More recently, Pesaran, Shin, and Smith (1997, 1999) have proposed a PMG
estimator that combines both pooling and averaging. This intermediate estimator allows
the intercept, short-run coefficients, and error variances to differ across the groups (as
would the MG estimator) but constrains the long-run coefficients to be equal across
groups (as would the FE estimator). Under this assumption, relation (5.56) can be written
more compactly as

yi = i i ( ) + Wi i + i i = 1, 2,... , n (5.57)

where i ( ) = yi , 1 X i i = 1, 2,... , n is the error correction component,

Wi = ( yi , 1 ,..., yi , p +1 , X i , X i ,1 ,..., X i , q +1 , ) and i = (i*1 ,..., i*, p 1 , i*0 , i*1 ,...,

i*,q 1 , i ) .
5.4.2 The Model Estimation and Inference
Since (5.57) is nonlinear in the parameters, Pesaran, Shin, and Smith (1999)
develop a maximum likelihood method to estimate the parameters. Expressing the
likelihood as the product of each cross-section's likelihood and taking the log yields

ln 2 i2 2 ( yi ii ( ) ) H i ( yi ii ( ) )
T n 1 n 1
lT ( ) =
2 i =1 2 i =1 i
(5.58)

( )
1
where H i = IT Wi WiWi Wi , = ( , , ) , = (1 , 2 ,..., n ) , and

= ( 12 , 22 , ..., n2 ) .
Maximum likelihood (ML) estimation of the long-run coefficients, , and group-specific
error-correction coefficients, i , can be computed by maximizing (5.58) with respect to
. These ML estimators are termed the PMG estimators to highlight both the pooling
implied by homogeneity restrictions on the long-run coefficients and the averaging across
groups used to obtain means of the estimated error-correction coefficients and the other
short-run parameters of the model.
The PMG estimators can be computed by the familiar Newton-Raphson algorithm,
which makes use of both the first and second derivatives. Alternatively, they can be
computed by a back-substitution algorithm that makes use of only the first derivatives of
(5.58). In this case, setting the first derivatives of the concentrated log-likelihood function

with respect to to 0 yields the following relations in , i , and i2 , which need to be

solved iteratively:

3
1
n 2 n i2
= i 2 X i H i X i
i =1 i i =1 i
(
2 X i H i yi i yi , 1 ) (5.59)

( )
1
i = i H ii i H i yi i = 1, 2,..., n (5.60)

( ) (
i2 = T 1 yi ii H i yi ii i = 1, 2,..., n ) (5.61)

where i = yi ,1 X i . Starting with an initial estimate of , say ( 0) , estimates of i

and i2 can be computed using (5.60) and (5.61), which can then be substituted in

(5.59) to obtain a new estimate of , say (1) , and so on until convergence is achieved.

Under some regular conditions, the MLE of the short-run coefficients and in

the dynamic heterogeneous panel data model (5.57) are T consistent and the MLE of

is T consistent, namely 0 = oP (T 1 2 ) , 0 = o p (1) and 0 = o p (1) .

Furthermore, for fixed n and as T , the MLE of = ( , ) asymptotically

has the mixture-normal distribution

D1 ( 0 )
a
MN {0, I ( 0 )} (5.62)

(
where D = diag T 1 I k , T 1 2 I n ) and I ( 0 ) is the random information matrix.

More specifically, the pooled MLE , defined by (5.59), has the following large T

asymptotic distribution:
n 2
1
(
T
a
)
MN 0, 2 RX i X i
i0
(5.63)
i =1 i 0

where RX i X i , i = 1, 2,..., n , are the random probability limits defined by

T 1 X iH i X i RX i X i .

Once the pooled MLE of the long-run parameters, , is successfully computed, the

short-run coefficients, including the group-specific error-correction coefficients, i can be


consistently estimated by running the individual ordinary least squares (OLS) regressions

of yi on (i ,Wi ) , i = 1, 2,..., n , where i = yi ,1 X i . The covariance matrix of the

4
MLEs, (, ,..., , ,..., ) , is then consistently estimated by the inverse of
1 n 1 n

n i2 X iX i 1 X 11 n X nn 1 X 1W1 n X nWn
i =1
i2 12 n2 12 n2
11 1W1
0 0
12 12


nn nWn
0
n2 n2
WW
1 1
0
12


WnWn
n2

5
5.5 Specification and Estimation of Spatial Panel Data Models

5.5.1 Foundations: Spatial autocorrelation


The formal frame work used for the statistical analysis of spatial autocorrelation is a
so-called spatial stochastic process, or a collection of random variables Y, indexed by
location i,

{Yi , i D} (5.64)

where the index set D is either a continuous surface or a finite set of discrete locations.
The spatial autocorrelation can be formally expressed by the moment condition

Cov(Yi , Y j ) = E (Yi , Y j ) E (Yi ) E (Y j ) 0 for i j (5.65)

where i, j refer to individual observations (locations) and Yi ( Y j ) is the value of a random

variable of interest at that location.


The most often used approach to formally express spatial autocorrelation is through
the specification of a functional form for the spatial stochastic process (5.54) that relates
the value of a random variable at a given location to its value at other location. For
example ,for an N 1 vector of iid random variables Y, observed across apace, and
N 1 vector of iid random error , a simultaneous spatial autoregressive(SAR) process
is defined as

(Y i ) = W (Y i ) + or (Y i ) = ( I W ) 1 (5.66)

where is the (constant) mean of Yi , i is an N 1 vector of ones, is the spatial


autoregressive parameter, and the N N matrix W is the spatial weights matrix, it
specifies which of the other locations in the system affect the value at that location.
The spatial weights crucially depend on the definition of a neighborhood set for each
observation. A spatial lag for Y at i then follows as
[WY ]i = wijY j (5.67)
j =1,..., n

or in matrix form, as
WY (5.68)

since for each i the matrix elements wij are only nonzero for those j si (where si is

the neighborhood set), only the matching Y j are included in the lag. For ease of

interpretation, the elements of the spatial weights matrix are typically row-standardized,
such that for each i , w
j
ij = 1 , so the spatial lag may be interpreted as weighted

average of the neighbors.


For the SAR structure in (5.66), the variance matrix for Y is a function of two

parameters, the variance 2 and the spatial coefficient :

1
Var (Y ) = E [ (Y i )(Y i )] = 2 [ ( I W )( I W )]
1
(5.69)

this matrix implies that shocks at any location affect all other locations, through a so-called
spatial multiplier effect (or, global interaction).
A major distinction between processes in space compared to time domain is that even

with iid disturbances i , the diagonal elements in (5.69) are not constant. Furthermore,
the heteroscedasticity depends on the neighborhood structure embedded in the spatial
weight matrix W.
When specifying the spatial dependence between observations, the model may
incorporate a spatial autoregressive process in the disturbance, or the model may contain
a spatially autoregressive dependent variable. The first model is known as the spatial
error model and the second as the spatial lag model.

5.4.2 The Fixed Effects Spatial Error and Spatial Lag Model
The traditional fixed effects model extended to included spatial error autocorrelation
can be specified as

Yt = X t + + t t = W t + t E ( t ) = 0 E ( t t) = 2 I N (5.70)

where = ( 1 ,... N ) ,and the traditional model extended with a spatially lagged
dependent variable reads as

Yt = WYt + X t + + t E ( t ) = 0 E ( t t) = 2 I N (5.71)

where W denotes a N N spatial weight matrix describing the spatial arrangement of the
spatial units, it is assumed that W is a matrix of known constants. In the spatial error
specification, the properties of the disturbance structure have been changed, is usually
called the spatial autocorrelation coefficient; where as in the spatial lag specification, the
number of explanatory variables has increased by one, is referred to as the spatial
autoregressive coefficient.
The spatial econometric literature has shown that ordinary least squares (OLS)
estimation is inappropriate for models incorporating spatial effects. In the case of spatial
error autocorrelation, the OLS estimator of the response parameters remains unbiased,
but it loses the efficiency property. In the case when the specification contains a spatially
lagged dependent variable, the OLS estimator of the response parameters not only loses
the property of being unbiased but also is inconsistent.
1MLE
Instead of estimating the demeaned equation by OLS, it can also be estimated by
maximum likelihood (ML). The only difference is that ML estimators do not make
corrections for degrees of freedom. The log-likelihood function corresponding to the
demeaned equation incorporating spatial error autocorrelation is

NT 1 T

2
ln(2 2 ) + T ln I N W 2
2
ee , e
t =1
t t t = ( I W ) Yt Y ( X t X )

2
(5.72)
and with a spatially lagged dependent variable,
T
NT 1

2
ln(2 2 ) + T ln I N W 2
2
ee , e
t =1
t t t = ( I W )(Yt Y ) ( X t X )

(5.73)

where Y = (Y1 , , YN ) and X = ( X 1 , , X N ) . An iterative two-stage procedure can be


used to maximize the log-likelihood function of the first model, and a simple two-stage
procedure is available for the second model (Anselin 1988,181-82). Anselin and Hudak
(1992) give instructions on how to implement these procedures in commercial
econometric software. One may also use Spacestat or the MATLAB routines of spatial
error model (SEM) and spatial lag model (SAR), which are freely downloadable from
LeSages Web site at www. spatial-econometrics.com. Although these routines are written
for spatial cross section, they can easily be generalized to spatial panel models.

2IV for Fixed Effects Spatial Lag Model

The endogeneity of the spatially lagged dependent variable suggests a


straightforward instrumental variables strategy in which the spatially lagged (exogenous)
explanatory variables WX are used as instruments (Kelejian and Robinson, 1993;
Kelejian and Prucha, 1998). This applies directly to the spatial lag in the pooled model,

where the instruments would be ( IT W ) X (with X as a stacked NT ( K 1) matrix,

excluding the constant term). We should use the within-transformed variables in the
model with fixed effect.
3GMM for Fixed Effects Spatial Error Model
In the single cross-section, a consistent estimator can be constructed from a set of
moment conditions on the error terms, as demonstrated in the Kelejian-Prucha
generalized moments (KPGM) estimator (Kelejian and Prucha, 1999). These conditions
can be readily extended to the pooled or fixed effect model, by replacing the single

equation spatial weights by their pooled counterparts ( IT W ) and using the

within-transformed variables in the model with fixed effect. The point of departure is the

stacked vector of SAR errors: = ( IT W ) + , where both and are NT 1

vectors, and IID ( 0, 2 I NT ) .


The three KPGM moment conditions (Kelejian and Prucha, 1999, p. 514) pertain to
the idiosyncratic error vector . Extending them to the pooled setting yields:

1
E =2
NT

1 1
E ( IT W )( IT W ) = 2tr (W W )
NT N

3
1
E ( IT W ) = 0
NT

where tr is the matrix trace operator and use is made of tr ( IT W W ) = Ttr (W W ) and

tr ( IT W ) = 0 .The estimator is made operational by substituting = ( IT W )

and replacing by the regression residuals. The result is a system of three equations in

, 2 and 2 .which can be solved by nonlinear least squares (for technical details, see
Kelejian and Prucha, 1999). Under some fairly general regularity conditions, substituting
the consistent estimator for into the spatial FGLS will yield a consistent estimator for

.
5.4.3 The Random Effects Spatial Error Model
Kapoor et al. (2007) introduce generalizations of the GM procedure in Kelejian and
Prucha (1999) to panel data models involving a first order spatially autoregressive
disturbance term, whose innovations have an error component structure. In particular,
they introduce three GM estimators which correspond to alternative weighting schemes
for the moments and derive the large sample properties when T is fixed and N .
Their specifications are such that the models disturbances are potentially both spatially
and time-wise autocorrelated, as well as heteroskedastic. Also they define a feasible
generalized least squares (FGLS) estimator for the models regression parameters. This
FGLS estimator is based on a spatial counterpart to the CochraneOrcutt transformation,
as well as transformations utilized in the estimation of classical error component models.
Baltagi et al. (2007) extended the GM procedure in Kapoor et al. (2007) to the
unbalanced panel data case. More specifically, we assume that in each time period

t = 1, , T the DGP is:

yt = X t + t , t = W t + t <1
Stacking the observations we have

y = X + (5.74)

= ( IT W ) + (5.75)

We assume furthermore the following error component structure for the innovation
vector :

= ( iT I N ) + (5.76)

where represents the N 1 vector of unit specific error component, and contains

the error components that vary over both the cross-sectional units and time periods. The

4
error components are assumed to satisfy: i IID ( 0, 2 ) and it IID ( 0, 2 ) .
The actual estimation of the parameters of the model (5.74)-(5.76) is performed in
three steps.
Step1: In the first step we estimate the regression model in (5.74) using ordinary

OLS = ( X X ) X y and get a consistent estimator of the


1
least squares to obtain

disturbances = y X OLS .

Alternatively, Baltagi et al. (2007) run fixed effects on model (5.74) to obtain the s

consistent estimates and they find although the magnitudes for some estimates change,
the results for the statistically significant estimates are basically the same the OLS
estimates.
Step2: We estimate the spatial autoregressive parameter and the variance

components 2 and 2 (or equivalently 2 and 12 = 2 + T 2 ) in terms of the


residuals obtained via the first step and the generalized moments procedure suggested in
the paper.

iT iT
Defining = ( IT W ) , = ( IT W ) , = ( IT W ) , Q = IT I N and
T
iT iT
P= I N , Kapoor et al. (2007) suggest a GM estimator based on the following six
T
moment conditions:

E ( Q N (T 1) ) = 2
E ( Q N (T 1) ) = 2tr (W W ) N
E ( Q N (T 1) ) = 0
(5.77)
E ( P N ) = 12
E ( P N ) = 12tr (W W ) N
E ( P N ) = 0

From (5.75), = and = substituting these expressions in (5.77)

we obtain a system of six equations involving the second moments of , and .The

GM estimator of 2 , 12 and is the solution of the sample counterpart of the six


equations in (5.77).
Kapoor et al. (2007) suggest three GM estimators. The first involves only the first

three moments in (5.77) which do not involve 12 and yield estimates of 2 and . The

5
fourth moment condition is then used to solve for 12 given estimates of 2 and .
Kapoor et al. (2007) give the conditions needed for the consistency of this estimator
as N . The second GM estimator is based upon weighing the moment equations by
the inverse of a properly normalized variancecovariance matrix of the sample moments
evaluated at the true parameter values. A simple version of this weighting matrix is
derived under normality of the disturbances. The third GM estimator is motivated by
computational considerations and replaces a component of the weighting matrix for the
second GM estimator by an identity matrix. Kapoor et al. (2007) perform Monte Carlo
experiments comparing MLE and these three GM estimation methods. They find that on
average, the RMSE of ML and their weighted GM estimators are quite similar. However,
the first unweighted GM estimator has a RMSE that is 17% to 14% larger than that of the
weighted GM estimators.
step3: In this step the regression model in (5.74)-(5.75) is reestimated in terms of a
feasible GLS estimator.

You might also like