Professional Documents
Culture Documents
Breitung(2000)
Homogeneous Hadri(2000)
First generation LLC(2002)
Maddala and Wu(1999)
Heterogeneous Choi(2001)
IPS(2003)
Panel unit root tests
O 'Connell(1998)
Breitung and Das(2005)
Second generation Moon and Perron(2004)
Bai and Ng(2004)
Pesaran (2003)
1
where i = 1, 2, , n cross-section units or series, that are observed over periods
fixed effects or individual trends, i are the autoregressive coefficients, and the errors
there is a common unit root process so that i is identical across cross-sections. The
first tests employ a null hypothesis of a unit root while the Hadri test uses a null of no unit
root.
1) Levin, Lin, and Chu (2002)
LLC argued that individual unit root tests have limited power against alternative
hypothesizes with highly persistent deviations from equilibrium. This is particularly severe
in small samples. LLC suggest a more powerful panel unit root test than performing
individual unit root tests for each cross-section. The null hypothesis is that each individual
time series contains a unit root against the alternative that each time series is stationary.
The maintained hypothesis is that
pi
yit = yi ,t 1 + iL yi ,t L + mi d mt + it m = 1, 2,3 (5.2)
L =1
vector of coefficients for model m = 1, 2,3 . In particular, d1t = {empty set} , d 2t = {1}
and d 3t = {1, t} . Since the log order pi is unknown, LLC suggest a three-step procedure
2
The lag order pi is permitted to vary across individuals.
For a given T, choose a maximum lag order pmax and then use the t-statistic of iL
residuals:
eit vi ,t 1
eit = , vi ,t 1 = ,
i i
1 T K
1 T
yi2 =
T 1 t =2
y 2
it + 2
L =1
KL
T 1 t = 2+ L
yit yi ,t L
where K is a truncation lag that can be data-dependent. K must be obtained in a
manner that ensures the consistency of yi . For a Bartlett kernel,
2
KL = 1 ( L ( K + 1) ) .
For each cross-section i, the ratio of the long-run standard deviation to the innovation
n
1
estimated by Sn =
n
s . This important statistic will be used to adjust the mean of the
i =1
i
eit = vi ,t 1 + it ,
1 n
observations per individual in the panel, and p = pi is the average lag order for the
n i =1
3
individual ADF regressions. The conventional t-statistic for testing = 0 is given by
t =
STD ( )
where
n T
vi ,t 1eit
n T 2
1 2
( ) (e )
n T
1
, STD = vi ,t 1
2
i =1 t = 2 + pi
= n T
, =
2
it vi ,t 1 .
i =1 t = 2+ pi
2 nT i =1 t = 2 + pi
vi ,t 1
i =1 t = 2 + pi
t =
( )
t nTSn 2 STD mT
(5.4)
mT
where mT
and mT are the mean and standard deviation adjustments provided by
LLC suggest using their panel unit root test for panels of moderate size with n
between 10 and 250 and T between 25 and 250.
Limitations: The test crucially depends upon the independence assumption across
cross-sections and its not applicable if cross-sectional correlation is present. Second, the
assumption that all cross-sections have or do not have a unit root is restrictive.
2) Breitung (2000)
The LLC and IPS tests require n such that n T 0 , i.e., n should be small
enough relative to T . This means that both tests may not keep nominal size well when
either n is small or n is relative to T . Breitung(2000) studies the local power of LLC
and IPS test statistics against a sequence of local alternatives. Breitung finds that the LLC
and IPS tests suffer from a dramatic loss of power if individual-specific trends are included.
This is due to the bias correction that also removes the mean under the sequence of local
alternative. Breitung suggests a test statistic that does not employ a bias adjustment
whose power is substantially higher than that of LLC or the IPS tests using Monte Carlo
experiments. The simulation results indicate that the power of LLC and IPS tests is very
sensitive to the specification of the deterministic terms.
Breitung (2000) test statistic without bias adjustment is obtained as follows. Step1 is
the same as LLC but only yi ,t L is used in obtaining the residuals eit and vi ,t 1 . The
residuals are then adjusted (as in LLC) to correct for individual-specific variances. Step 2,
the residuals eit are transformed using the forward orthogonalization transformation
4
T t ei ,t +1 + + ei ,T
eit = eit
T t +1 T t
Also,
t 1
vi ,t 1 vi ,1 T vi ,T , with intercept and trend
vi,t 1 = vi ,t 1 vi ,1 , with intercept, no trend
v , with no intercept or trend
i ,t 1
The last step is to run the pooled regression
eit = vi,t 1 + it
and obtain the t-statistic for H 0 : = 0 which has in the limit a standard N ( 0,1)
distribution.
3) Hadri (2000)
Hadri(2000) derives a residual-based Lagrange multiplier (LM) test where the null
hypothesis is that there is no unit root in any of the series in the panel against the
alternative of a unit root in the panel. In particular, Hadri(2000) considers the following two
models:
where rit = ri ,t 1 + uit is a random walk. it IIN ( 0, 2 ) and uit IIN ( 0, u2 ) are
mutually independent normals that IID across i and over t. Using back substitution, model
2 becomes
t
yit = ri 0 + i t + uis + it = ri 0 + i t + vit (5.6)
s =1
t
where vit = u
s =1
is + it . The stationary hypothesis is simply H 0 : u2 = 0 , in which case
1 n 1 T
2
LM 1 = 2 S 2
it (5.7)
n i =1 T t =1
t
where Sit =
s =1
is are partial sum of OLS residuals is from (5.6) and 2 is a
5
n T
1
2 =
nT
.
i =1 t =1
2
it
Hadri (2000) suggested an alternative LM test that allows for Heteroskedasticity across i,
1 n 1 T
LM 2 = S 2i
2
n i =1 T 2
it (5.8)
t =1
testing procedure based on averaging individual unit root test statistics. IPS suggest an
average of the ADF tests when uit is serially correlated with different serial correlation
properties across cross-sectional units, i.e., the model given in (5.3). The null hypothesis
is that each series in the panel contains a unit root, i.e., H 0 : i = 0 for all i and the
alternative hypothesis allows for some (but not all) of the individual series to have unit
roots, i.e.,
i < 0 for i = 1, 2, , n1
H1 :
i = 0 for i = n1 + 1, ,n
Formally, it requires the fraction of the individual time series that are stationary to be
nonzero, i.e., lim ( n1 n ) = where 0 < 1 . This condition is necessary for the
n
consistency of the panel unit root test. The IPS t-bar statistic is defined as the average of
the individual ADF statistics as
1 n
t = t
n i =1 i
where t i is the individual t-statistic for testing H 0 : i = 0 for all I in (5.3). In case the lag
6
order is always zero ( pi = 0 for all i), IPS provide simulated critical values for t for
lag order pi may be nonzero for some cross-sections, IPS show that a properly
t i
W
0
iZ dWiZ
= tiT (5.9)
12
1W 2
0 iZ
1 n 1 n
n i =1 tiT i =1 E tiT i = 0
n n N 0,1
( ) (5.10)
1 n
var tiT i = 0
n i =1
as n by the Lindeberg-Levy central limit theorem. Hence
1 n
n t i =1 E tiT i = 0
n N 0,1
t IPS = ( ) (5.11)
1 n
var tiT i = 0
n i =1
var tiT i = 0 have been computed by IPS via simulations for different values of T and
pi s.
Let GiTi be a unit root test statistic for the ith group in (5.2) and assume that as the time
random variable. Let pi be the asymptotic p-value of a unit root test for cross-section i,
( )
i.e., pi = F GiTi , where F ( ) is the distribution function of the random variable Gi .
7
n
P = 2 ln pi 2 ( 2n ) (5.12)
i =1
which combines the p-values from unit root tests for each cross-section i to test for unit
root in panel data.
Choi(2001) proposes two other test statistics besides Fishers inverse chi-square test
statistic P. The first is the inverse normal test
1 n 1
Z= ( pi )
n i =1
(5.13)
pi
has the logistic distribution with mean 0 and variance 3 . As
2
where ln
1 pi
3 ( 5n + 4 )
Ti for all i, mL t5 n + 4 where m = .
2 n ( 5n + 2 )
When n is large, Choi (2001) proposed a modified P test,
n
1
Pm =
2 n
( 2 ln p 2 ) N ( 0,1) (T followed by n )
i =1
i i (5.15)
1 n
pi
and then n . Also, the distribution of mL ln 1 p N ( 0,1) by
n3
2
i =1 i
8
of pairwise correlation coefficients of OLS residuals from the individual regression in the
panel rather than their squares as in the Breusch-Pagan LM test:
2T n 1 n
CD = ij (5.16)
n ( n 1) i =1 j =i +1
( ) ( )
12 12
where ij = t =1 eit e jt t =1 eit2 t =1 e2jt
T T T
, with eit denoting OLS residuals based
on T observations for each i = 1, , n . Monte Carlo experiments show that the standard
Breusch-Pagan LM test performs badly for n > T panels, whereas Pesarans CD test
performs well even for small T and large n.
1 T
Szarfarz, 1999). Let
= tt denote the sample covariance matrix of the residual
T t =1
vector. The GLS-t statistic is given by
T
y t
1
yt 1
t gls ( n ) = t =1
(5.17)
T
y
t =1
t 1
1
yt 1
where yt is the vector of demeaned variables. Harvey and Bates (2003) derive the
limiting distribution of t gls ( n ) for a fixed n and as T , and tabulate its asymptotic
distribution for various values of n. Breitung and Das (2005) show that if yt = yt y0 is
used to demean the variables and T if followed by n , then the GLS t-statistic
possesses a standard normal limiting distribution.
The GLS approach cannot be used if T < n as in this case the estimated covariance
matrix
is singular. Furthermore, Monte Carlo simulations suggest that for reasonable
size properties of the GLS test, T must be substantially larger than n (e.g. Breitung and
Das, 2005) Maddala and Wu (1999) and Chang (2004) have suggested a bootstrap
procedure that improves the size properties of the GLS test.
An alternative approach based on panel corrected standard errors (PCSE) is
considered by Jonsson (2005) and Breitung and Das (2005). In the model with weak
dependence, if T is followed by n the robust t statistic
9
T
y t 1 yt
t ROLS = t =1
(5.18)
T
y
t =1
t 1 yt 1
eigenvalue of the error covariance matrix, , is O p ( n ) and the robust PCSE approach
breaks down. Specifically, Breitung and Das (2008) showed that in this case t ROLS is
distributed as the ordinary Dickey-Fuller test applied to the first principal component.
where it are unobservable error terms with a factor structure and i are fixed effects.
it = i ft + eit (5.20)
where i are nonrandom factor loading coefficient vectors and the number of factors M
is unknown. Each it contains the common random factor ft , generating the correlation
among the cross-sectional units of it and yit . The extent of the correlation is
( )
determined by the factor loading coefficients i , i.e., E yit y jt = i E ( f t f t ) j . Moon
and Perron treat the factors as nuisance parameters and suggest pooling defactored data
to construct a unit root test. Let Q be the matrix projecting onto the space orthogonal to
the factor loadings. The defactored data is YQ and the defactoreed residuals eQ no
10
contains the observations for cross-sectional unit i.
Let e2,i be the variance of eit , we2,i be the long-run variance of eit and e ,i be
the one-sided long run variance of eit . Also, e2 , we2 and e be their cross-sectional
where Y1 is the matrix of lagged data. Moon and Perron suggest two statistics to test
These are
nT ( pool
+
1)
ta = (5.22)
2e4
we4
and
we2
tb = nT ( pool 1)
1
+
tr ( Y Q Y
1 1 ) (5.23)
nT 2 e4
These tests have a standard N ( 0,1) limiting distribution where n and T tend to infinity
such that n T 0 . Moon and Perron also show that estimating the factors by principal
11
where pe ( i ) is the p-value of the ADF test (without any deterministic component) on the
c
where yt is the average at time t of all n observations. The presence of the lagged
cross-sectional average and its first difference accounts for the cross-sectional
dependence through a factor structure. If there is serial correlation in the error term or the
factor, the regression must be augmented as usual in the univariate case, but lagged
p p
yit = i + i yi ,t 1 + d 0 yt 1 + d j +1yt j + ck yi ,t k + it (5.27)
j =0 k =1
1 n 1 n
CIPS =
n i =1
CADFi = ti ( n, T )
n i =1
(5.28)
The joint asymptotic limit of the CIPS statistic is nonstandard and critical values are
provided for various choices of n and T. The t-tests based on this regression should be
devoid of i f t in the limit and therefore free of cross-sectional dependence. The limiting
distribution of these tests is different from the Dickey-Fuller distribution due to the
presence of the cross-sectional average of the lagged level. Pesaran uses a truncated
version of the IPS test that avoids the problem of moment calculation. That is
ti ( n, T ) = ti ( n, T ) K1 < ti ( n, T ) < K 2
ti ( n, T ) = K1 ti ( n, T ) K1 (5.29)
ti ( n, T ) = K 2 ti ( n, T ) K 2
Pesaran suggests that K16.19 and K 22.61 , the values of K1 and K 2 depend
on the determined trend in the CADF regression, Model (5.27) includes intercept only.
12
5.3 Cointegration Analysis in Panel Data
5.3.1 Residual-Based Approachs to Panel Cointegration
Like the panel unit root tests, panel cointegration tests can be motivated by the search for
more powerful tests than those obtained by applying individual time series cointegration
tests. The latter tests are known to have low power, especially for short T and short span
of the data.
1. Kao Tests
Kao (1999) presented two types of cointegration tests in panel data, the DF and ADF
types tests. Consider the panel regression model
where yit and xit are I (1) and noncointegrated. For zit = {i } , Kao(1999) proposed
DF and ADF-type unit root tests for eit as a test for the null of no cointegration. The
where eit = y it xit and yit = yit yi , xit = xit xi . In order to test the null
e e it i ,t 1
= i =1 t = 2
N T
e
i =1 t = 2
2
it
and
N T
( 1) ei2,t 1
t = i =1 t = 2
............................................... ( 5.32 )
se
N T
NT ( 1) + 3 N
DF =
10.2
1
3 N v2
NT ( 1) +
ov2
DF* =
36 v4
3+
5 ov4
and
6 N v
t +
2 ov
DFt * =
ov2 3 v2
+
2 v2 10 ov2
it = ( yit , xit ) . While DF and DFt are based on the strong exogeneity of the
* *
regressors and disturbances, DF and DFt are for the cointegration with endogenous
relationship between regressors and disturbances. For the ADF test, we can run the
following regression:
P
eit = ei ,t 1 + j ei ,t j + vit ............................................. ( 5.33)
j =1
with the null hypothesis of no cointegration, the ADF test statistics can be constructed as:
6 N v
t ADF +
2 ov
ADF = ....................................................... ( 5.34 )
ov2 3 v2
+
2 v2 10 ov2
DF* , DFt * and ADF converge to a standard normal distribution N ( 0,1) by sequential
limit theory.
2. Pedroni Tests
Pedroni (1999, 2004) also proposed several tests for the null hypothesis of cointegration
in a panel data model that allows for considerable heterogeneity.
Pedroni considered the following type of regression:
for a time series panel of observables yit and X it for members i = 1, ", N over time
2
periods t = 1," , T , where X it is an m-dimensional column vector for each member i
and i is an m-dimensional column vector for each member i. The variables yit and
X it are assumed to be I(1), for each member i of the panel, and under the null of no
cointegration the residual eit will also be I(1). The parameters i and i allow for
possibility of member specific fixed effects and deterministic trends, respectively. The
slope coefficients i are also permitted to vary by individual, so that in general the
cointegrating vectors may be heterogeneous across members of the panel.
The DF-type tests and ADF-type tests can be calculated from the fixed effects
residuals
eit = i ei ,t 1 + vit
P
eit = i ei ,t 1 + ij ei ,t j + vit ............................................
j =1
H 0 : i = 1 ; H1 : i = < 1 ( i = 1, 2," , N )
And
H 0 : i = 1 ; H1 : i < 1 ( i = 1, 2," , N )
To study the distributional properties of above tests, Pedroni described the DGP in
terms of the partitioned vector Z it = ( Yit , X it ) such that the true process Z it is
generated as Z it = Z i ,t 1 + it , for ( )
it = itY , itX . We then assume that for each
member i the following condition holds with regard to the time series dimension.
[Tr ]
1
T
t =1
it Bi ( i ) , for each member i as T . Where signifies weak
such that the m m lower diagonal block 22i > 0 and where the Bi ( i ) are taken
3
covariance matrix is given by
T T
i = lim E T 1 it it ................................................................. ( 5.36 )
T
t =1 t =1
And can be decomposed as
i = i0 + i + i ........................................................................................... ( 5.37 )
generally, the asymptotic long-run variance matrix for a panel of size N T is block
Pedronis tests can be classified into two categories. The first set (within dimension)
is similar to the tests discussed above, and involves averaging test statistics for
cointegration in the time series across cross-section. For the second set (between
dimension), the averaging is done in pieces so that the limiting distributions are based on
limits of piecewise numerator and denominator terms. The basic approach in both cases
is to first estimate the hypothesized cointegration relationship separately for each member
of the panel and then pool the resulting residuals when constructing the panel tests for the
null of no cointegration. Specifically, in the first step, one can estimate the proposed
cointegrating regression for each individual member of the panel in the form of (5.35),
including idiosyncratic intercepts or trends as the particular model warrants, to obtain the
corresponding residuals eit . In the second step, the way in which the estimated residuals
are pooled will differ among the various statistics, which are defined as follows.
Panel variance ratio statistic:
1 1
2 2
N N T
= L11 22i = 11 ei ,t 1 ( 5.38 )
ZV A
L 2
NT
i =1 i =1 t =1
Panel-rho statistic:
1
N N
i =1 i =1
(
Z NT 1 = A22i A21i T i
)
1
N T N T
i =1 t =1 i =1 t =1
(
......... = ei2,t 1 eit ei ,t 1 i
) ( 5.39 )
Panel-t statistic:
4
1/2
2 N N
Z tNT = NT
i =1
A22i
i =1
(
A21i T i
)
1/2
2 N T 2 N T
...... = NT
i =1 t =1
ei ,t 1
i =1 t =1
(
)
eit ei ,t 1 i .............................................. ( 5.40 )
Group-rho statistic:
N T 1
(
) T
( )......................(5.41)
N
Z NT 1 = A221i A21i T i = ei2,t 1 eit ei ,t 1 i
i =1 t =1
i =1 t =1
Group-t statistic:
N
1/2
2 T 2 T
( )
Z tNT = i ei ,t 1 eit ei ,t 1 i ................................................ ( 5.42 )
t =1
i =1 t =1
1 K T
it = eit i ei ,t 1 , i = wsK
i
s 1 T 1 N
wsKi = 1 , Si2 = it2 , i2 = Si2 + 2i2 , NT
2
= 2
, and
1 + Ki
i
T t =1 N i =1
( )
N
1
L
1/ 2
= , where L11i =
1
L11 21i 22 i 21i such that
2 2 is a consistent
11i 11i i
N i =1
estimator of i .
The first three statistics are based on pooling the data across the within group of the
panel; the next two statistics are constructed by pooling the data along the between group
of the panel.
Pedroni (1999) derived asymptotic distributions and critical values for several
residual-based tests of the null of no cointegration in panels where there are multiple
regressors.
K K N
N ( 0,1) ( as..T , N )seq ............................................... ( 5.43)
VK
where (
= T 2 N 3/2 ZV , T N Z
NT NT 1 , Z tNT , TN 1/2 Z NT 1 , N 1/2 ZtNT ) for each of the
K = 1," ,5 statistics of , the values of K and VK can be found from the table in
Pedroni (1999), which depend on whether the model includes estimated fixed effects i ,
Thus, to test the null of no cointegration, one simply computes the value of the
statistic so that it is in the form of (5.43) above based on the values of K and VK from
5
the table II in Pedroni (1999) and compares these to the appropriate tails of the normal
distribution. Under the alternative hypothesis, the panel variance statistic diverges to
positive infinity, and consequently the right tail of the normal distribution is used to reject
the null hypothesis. Consequently, for the panel variance statistic, large positive values
imply that the null of no cointegration is rejected. For each of the other four test statistics,
these diverge to negative infinity under the alternative hypothesis, and consequently the
left tail of the normal distribution is used to reject the null hypothesis. Thus, for any of
these latter tests, large negative values imply the null of no cointegration is rejected.
likelihood-ratio (trace) statistic of the hypothesis that there are (at most) r stationary
linear combinations in the cointegrated VAR system given by Yit = ( yit1 ," , yitk ) .
Following the unit root test proposed in IPS (2003), Larsson et al. (2001) suggested the
standardized LR-bar statistic
N i ( r ) E i ( r )
(r ) = 1
(5.44)
N i =1 Var i ( r )
to test the null hypothesis that r = 0 against the alternative that at most r = r 1 .
0
( r ) is asymptotically standard
Using a sequential limit theory it can be shown that
Larsson et al. (2001) for the model without deterministic terms and Breitung (2005) for
models with a constant and a linear time trend. Unlike the residual-based tests, the
LR-bar test allows for the possibility of multiple cointegration relations in the panel.
it = i i ,t 1 + wit (5.46)
where it is the actual rate of inflation observed at time period t for country i, iit is the
6
ex post nominal interest rate on a nominal bond.
We have argued above that, although the nominal interest rate can in general be
viewed as nonstationary, it seems reasonable to permit inflation to be stationary.Therefore,
The disturbance zit is assumed to obey the following set of equations that allow for
conformable vector of factor loadings. By assuming that j < 1 for all j, we ensure that
Ft is stationary, which implies that the order of integration of the composite regression
error zit depends only on the integratedness of the idiosyncratic disturbance eit . Thus,
(a) vit and wit are mean zero for all i and t;
( ) ( )
(b) E vit vkj = 0 and E wit wkj = 0 for all i k , t and j;
( )
(c) E vit wkj = 0 for all i, k, t and j;
The partial sum processes of vit and wit satisfy an invariance principle. In
7
[ rT ]
particular, T
1 2
v
t =1
it iWi ( r ) as T for each i, where Wi ( r ) is a standard
Finally, to be able to handle the common factors, the following conditions are assumed
to hold.
Assumption 3 (common factor).
N
1
(c)
N
i =1
i i as N , where is positive definite;
Our objective is to test whether iit and it are cointegrated or not by inferring whether
eit is stationary or not. A natural approach to do this is to employ the Bai and Ng (2004)
approach, which amounts to first estimating (5.45) in its first difference form by OLS and
then to estimate the common factors by applying the method of principal components to
the resulting residuals. A test of the null hypothesis of no cointegration can then be
implemented as a unit root test of the recumulated sum of the defactored and first
differentiated residuals.
We begin by taking first differences, in which case (5.47) becomes
Thus, had zit been known, we could have estimated i and Ft directly by the
method of principal components. However, zit is not known, and we must therefore
apply principal components to its OLS estimate instead, which can be written as
8
the (T 1) (T 1) matrix zz . The corresponding matrix of estimated factor loadings
As shown in the Appendix, eit is a consistent estimate of eit , which suggests that the
cointegration test can be implemented using (5.43) with eit in place of eit . In other
At this point, it is useful to let n denote the number of units for which the no cointegration
restriction i = 1 is to be tested. This number can be equal to N but can also be a subset.
The point of having two sets for the cross-section is to highlight the fact that even if n < N
accuracy will be gained by using all N units in the estimation of the common factors. In
what follows, we shall propose two new panel cointegration tests that are based on
applying the DurbinHausman principle to (5.52) (see Choi, 1994). The first, the panel test,
is constructed under the maintained assumption that i = for all i, while the second,
the group mean test, is not. Both tests are composed of two estimators of i that have
different probability limits under the alternative hypothesis of cointegration but share the
property of consistency under the null of no cointegration. In particular, let i denote the
OLS estimator of i in (5.45), and let denote its pooled counterpart. The
9
In so doing, consider the following kernel estimator:
1 Mi j T
i2 = 1 vit vit j ,
T 1 j = M i M i + 1 t = j +1
where vit is the OLS residual obtained from (5.45) and M i is a bandwidth parameter
that determines how many autocovariances of vit to estimate in the kernel. As indicated
Sn = n2 ( n2 ) , where
1 n 2 1 n 2
n2 = i
n i =1
and 2
n = i
n i =1
The DurbinHausman test statistics can now be obtained as
( ) ( ) e
n 2 T n T
DH g = Si i i eit21 and DH p = Sn
2
2
it 1 . (5.53)
i =1 t =2 i =1 t = 2
Note that while the panel statistic, denoted DH p , is constructed by summing the n
individual terms before multiplying them together, the group mean statistic, denoted
DH g , is constructed by first multiplying the various terms and then summing. The
importance of this distinction lies in the formulation of the alternative hypothesis. For the
panel test, the null and alternative hypotheses are formulated as H 0 : i = 1 for all
i = 1," , n versus H1p : i = and < 1 for all i. Hence, in this case, we are in effect
presuming a common value for the autoregressive parameter both under the null and
alternative hypotheses. Thus, if this assumption holds, a rejection of the null should be
taken as evidence in favor of cointegration for all n units.
By contrast, for the group mean test, H 0 is tested versus the alternative that
H1g : i < 1 for at least some i. Thus, in this case, we are not presuming a common value
for the autoregressive parameter and, as a consequence, a rejection of the null cannot be
taken to suggest that all n units are cointegrated. Instead, a rejection should be interpreted
as providing evidence in favor of rejecting the null hypothesis for at least some of the
cross-sectional units.
10
3Asymptotic Distribution
The Durbin-Hausman tests are based on the estimated idiosyncratic error term
eit ,and are therefore asymptotically independent of the common factors. As showed in
Westerlunds Appendix ,under the null and Assumptions 1-3, each of individual group
mean statistics converges to
1
Bi = ( Wi (r ) 2 dr ) 1 as N , T
0
The fact that Bi does not depend on the common factors is a direct consequence of the
defactoring, which asymptotically removes the common components from the limiting
distribution of the individual tests. The following theorem shows that the effects of the
asymptotically normal
n 1 2 DH g nE ( Bi ) N (0,Var ( Bi )) ,
Heterogeneous Panels
1
mean-group (PMG) estimators. The MG estimator (see Pesaran and Smith 1995) relies
on estimating n time-series regressions and averaging the coefficients, whereas the PMG
estimator (see Pesaran, Shin, and Smith 1997, 1999) relies on a combination of pooling
and averaging of coefficients.
Assume an autoregressive distributive lag (ARDL) dynamic panel specification of the
form
p q
yit = ij yi ,t j + ij X i ,t j +i + it (5.55)
j =1 j =0
If the variables in (5.55) are, for example, I (1) and cointegrated, then the error term
responsiveness to any deviation from long-run dynamics of the variables in the system are
influenced by the deviation from equilibrium. Thus it is common to reparameterize (5.55)
into the error correction equation
p 1 q 1
yit = i ( yi ,t 1 iX it ) + ij*yi ,t j + ij*X i ,t j +i + it (5.56)
j =1 j =0
where (
i = 1 j =1 ij
p
) , i = j =0 ij
q
(1 k
ik ) , ij* = m = j +1 im
p
q
j = 1, 2,... , p 1 , and ij* =
m = j +1
im j = 1, 2,... , q 1 .
2
inconsistent and potentially misleading results. On the other extreme, the model could be
fitted separately for each group, and a simple arithmetic average of the coefficients could
be calculated. This is the MG estimator proposed by Pesaran and Smith (1995). With this
estimator, the intercepts, slope coefficients, and error variances are all allowed to differ
across groups.
More recently, Pesaran, Shin, and Smith (1997, 1999) have proposed a PMG
estimator that combines both pooling and averaging. This intermediate estimator allows
the intercept, short-run coefficients, and error variances to differ across the groups (as
would the MG estimator) but constrains the long-run coefficients to be equal across
groups (as would the FE estimator). Under this assumption, relation (5.56) can be written
more compactly as
yi = i i ( ) + Wi i + i i = 1, 2,... , n (5.57)
i*,q 1 , i ) .
5.4.2 The Model Estimation and Inference
Since (5.57) is nonlinear in the parameters, Pesaran, Shin, and Smith (1999)
develop a maximum likelihood method to estimate the parameters. Expressing the
likelihood as the product of each cross-section's likelihood and taking the log yields
ln 2 i2 2 ( yi ii ( ) ) H i ( yi ii ( ) )
T n 1 n 1
lT ( ) =
2 i =1 2 i =1 i
(5.58)
( )
1
where H i = IT Wi WiWi Wi , = ( , , ) , = (1 , 2 ,..., n ) , and
= ( 12 , 22 , ..., n2 ) .
Maximum likelihood (ML) estimation of the long-run coefficients, , and group-specific
error-correction coefficients, i , can be computed by maximizing (5.58) with respect to
. These ML estimators are termed the PMG estimators to highlight both the pooling
implied by homogeneity restrictions on the long-run coefficients and the averaging across
groups used to obtain means of the estimated error-correction coefficients and the other
short-run parameters of the model.
The PMG estimators can be computed by the familiar Newton-Raphson algorithm,
which makes use of both the first and second derivatives. Alternatively, they can be
computed by a back-substitution algorithm that makes use of only the first derivatives of
(5.58). In this case, setting the first derivatives of the concentrated log-likelihood function
solved iteratively:
3
1
n 2 n i2
= i 2 X i H i X i
i =1 i i =1 i
(
2 X i H i yi i yi , 1 ) (5.59)
( )
1
i = i H ii i H i yi i = 1, 2,..., n (5.60)
( ) (
i2 = T 1 yi ii H i yi ii i = 1, 2,..., n ) (5.61)
and i2 can be computed using (5.60) and (5.61), which can then be substituted in
(5.59) to obtain a new estimate of , say (1) , and so on until convergence is achieved.
Under some regular conditions, the MLE of the short-run coefficients and in
the dynamic heterogeneous panel data model (5.57) are T consistent and the MLE of
D1 ( 0 )
a
MN {0, I ( 0 )} (5.62)
(
where D = diag T 1 I k , T 1 2 I n ) and I ( 0 ) is the random information matrix.
More specifically, the pooled MLE , defined by (5.59), has the following large T
asymptotic distribution:
n 2
1
(
T
a
)
MN 0, 2 RX i X i
i0
(5.63)
i =1 i 0
T 1 X iH i X i RX i X i .
Once the pooled MLE of the long-run parameters, , is successfully computed, the
4
MLEs, (, ,..., , ,..., ) , is then consistently estimated by the inverse of
1 n 1 n
n i2 X iX i 1 X 11 n X nn 1 X 1W1 n X nWn
i =1
i2 12 n2 12 n2
11 1W1
0 0
12 12
nn nWn
0
n2 n2
WW
1 1
0
12
WnWn
n2
5
5.5 Specification and Estimation of Spatial Panel Data Models
{Yi , i D} (5.64)
where the index set D is either a continuous surface or a finite set of discrete locations.
The spatial autocorrelation can be formally expressed by the moment condition
(Y i ) = W (Y i ) + or (Y i ) = ( I W ) 1 (5.66)
or in matrix form, as
WY (5.68)
since for each i the matrix elements wij are only nonzero for those j si (where si is
the neighborhood set), only the matching Y j are included in the lag. For ease of
interpretation, the elements of the spatial weights matrix are typically row-standardized,
such that for each i , w
j
ij = 1 , so the spatial lag may be interpreted as weighted
1
Var (Y ) = E [ (Y i )(Y i )] = 2 [ ( I W )( I W )]
1
(5.69)
this matrix implies that shocks at any location affect all other locations, through a so-called
spatial multiplier effect (or, global interaction).
A major distinction between processes in space compared to time domain is that even
with iid disturbances i , the diagonal elements in (5.69) are not constant. Furthermore,
the heteroscedasticity depends on the neighborhood structure embedded in the spatial
weight matrix W.
When specifying the spatial dependence between observations, the model may
incorporate a spatial autoregressive process in the disturbance, or the model may contain
a spatially autoregressive dependent variable. The first model is known as the spatial
error model and the second as the spatial lag model.
5.4.2 The Fixed Effects Spatial Error and Spatial Lag Model
The traditional fixed effects model extended to included spatial error autocorrelation
can be specified as
Yt = X t + + t t = W t + t E ( t ) = 0 E ( t t) = 2 I N (5.70)
where = ( 1 ,... N ) ,and the traditional model extended with a spatially lagged
dependent variable reads as
Yt = WYt + X t + + t E ( t ) = 0 E ( t t) = 2 I N (5.71)
where W denotes a N N spatial weight matrix describing the spatial arrangement of the
spatial units, it is assumed that W is a matrix of known constants. In the spatial error
specification, the properties of the disturbance structure have been changed, is usually
called the spatial autocorrelation coefficient; where as in the spatial lag specification, the
number of explanatory variables has increased by one, is referred to as the spatial
autoregressive coefficient.
The spatial econometric literature has shown that ordinary least squares (OLS)
estimation is inappropriate for models incorporating spatial effects. In the case of spatial
error autocorrelation, the OLS estimator of the response parameters remains unbiased,
but it loses the efficiency property. In the case when the specification contains a spatially
lagged dependent variable, the OLS estimator of the response parameters not only loses
the property of being unbiased but also is inconsistent.
1MLE
Instead of estimating the demeaned equation by OLS, it can also be estimated by
maximum likelihood (ML). The only difference is that ML estimators do not make
corrections for degrees of freedom. The log-likelihood function corresponding to the
demeaned equation incorporating spatial error autocorrelation is
NT 1 T
2
ln(2 2 ) + T ln I N W 2
2
ee , e
t =1
t t t = ( I W ) Yt Y ( X t X )
2
(5.72)
and with a spatially lagged dependent variable,
T
NT 1
2
ln(2 2 ) + T ln I N W 2
2
ee , e
t =1
t t t = ( I W )(Yt Y ) ( X t X )
(5.73)
excluding the constant term). We should use the within-transformed variables in the
model with fixed effect.
3GMM for Fixed Effects Spatial Error Model
In the single cross-section, a consistent estimator can be constructed from a set of
moment conditions on the error terms, as demonstrated in the Kelejian-Prucha
generalized moments (KPGM) estimator (Kelejian and Prucha, 1999). These conditions
can be readily extended to the pooled or fixed effect model, by replacing the single
within-transformed variables in the model with fixed effect. The point of departure is the
1
E =2
NT
1 1
E ( IT W )( IT W ) = 2tr (W W )
NT N
3
1
E ( IT W ) = 0
NT
where tr is the matrix trace operator and use is made of tr ( IT W W ) = Ttr (W W ) and
and replacing by the regression residuals. The result is a system of three equations in
, 2 and 2 .which can be solved by nonlinear least squares (for technical details, see
Kelejian and Prucha, 1999). Under some fairly general regularity conditions, substituting
the consistent estimator for into the spatial FGLS will yield a consistent estimator for
.
5.4.3 The Random Effects Spatial Error Model
Kapoor et al. (2007) introduce generalizations of the GM procedure in Kelejian and
Prucha (1999) to panel data models involving a first order spatially autoregressive
disturbance term, whose innovations have an error component structure. In particular,
they introduce three GM estimators which correspond to alternative weighting schemes
for the moments and derive the large sample properties when T is fixed and N .
Their specifications are such that the models disturbances are potentially both spatially
and time-wise autocorrelated, as well as heteroskedastic. Also they define a feasible
generalized least squares (FGLS) estimator for the models regression parameters. This
FGLS estimator is based on a spatial counterpart to the CochraneOrcutt transformation,
as well as transformations utilized in the estimation of classical error component models.
Baltagi et al. (2007) extended the GM procedure in Kapoor et al. (2007) to the
unbalanced panel data case. More specifically, we assume that in each time period
yt = X t + t , t = W t + t <1
Stacking the observations we have
y = X + (5.74)
= ( IT W ) + (5.75)
We assume furthermore the following error component structure for the innovation
vector :
= ( iT I N ) + (5.76)
where represents the N 1 vector of unit specific error component, and contains
the error components that vary over both the cross-sectional units and time periods. The
4
error components are assumed to satisfy: i IID ( 0, 2 ) and it IID ( 0, 2 ) .
The actual estimation of the parameters of the model (5.74)-(5.76) is performed in
three steps.
Step1: In the first step we estimate the regression model in (5.74) using ordinary
disturbances = y X OLS .
Alternatively, Baltagi et al. (2007) run fixed effects on model (5.74) to obtain the s
consistent estimates and they find although the magnitudes for some estimates change,
the results for the statistically significant estimates are basically the same the OLS
estimates.
Step2: We estimate the spatial autoregressive parameter and the variance
iT iT
Defining = ( IT W ) , = ( IT W ) , = ( IT W ) , Q = IT I N and
T
iT iT
P= I N , Kapoor et al. (2007) suggest a GM estimator based on the following six
T
moment conditions:
E ( Q N (T 1) ) = 2
E ( Q N (T 1) ) = 2tr (W W ) N
E ( Q N (T 1) ) = 0
(5.77)
E ( P N ) = 12
E ( P N ) = 12tr (W W ) N
E ( P N ) = 0
we obtain a system of six equations involving the second moments of , and .The
three moments in (5.77) which do not involve 12 and yield estimates of 2 and . The
5
fourth moment condition is then used to solve for 12 given estimates of 2 and .
Kapoor et al. (2007) give the conditions needed for the consistency of this estimator
as N . The second GM estimator is based upon weighing the moment equations by
the inverse of a properly normalized variancecovariance matrix of the sample moments
evaluated at the true parameter values. A simple version of this weighting matrix is
derived under normality of the disturbances. The third GM estimator is motivated by
computational considerations and replaces a component of the weighting matrix for the
second GM estimator by an identity matrix. Kapoor et al. (2007) perform Monte Carlo
experiments comparing MLE and these three GM estimation methods. They find that on
average, the RMSE of ML and their weighted GM estimators are quite similar. However,
the first unweighted GM estimator has a RMSE that is 17% to 14% larger than that of the
weighted GM estimators.
step3: In this step the regression model in (5.74)-(5.75) is reestimated in terms of a
feasible GLS estimator.