You are on page 1of 344

ADVANCED TEXT S I N ECONOMETRIC S

General Editors

C. W . J . GRANGE R G

. E . MIZO N

This page intentionally left blank

CO-INTEGRATION, ERROR
CORRECTION, AND
THE ECONOMETRI C
ANALYSIS O F
NON-STATIONARY DAT A
Anindya Banerjee, Juan J. Dolado,
John W. "Galbraith, and Davi d F . Hendry

OXFORD UNIVERSIT Y PRES S

Ms book lias been printed digitally an d produced i n a standard specification


in order to ensure its continuing availability

OXFORD
UNIVERSITY PRES S

Great Clarendon Street, Oxford 0X 2 6DP


Oxford University Press is a department o f the University of Oxford.
It furthers the University's objective of excellence in research, scholarship ,
and education by publishing worldwide in
Oxford Ne w York
Auckland Bangko k Bueno s Aires Cap e Town Chenna i
Dar es Salaam Delh i Hon g Kong Istanbu l Karach i Kolkata
Kuala Lumpur Madrid Melbourn e Mexico City Mumba i Nairobi
Sao Paulo Shangha i Taipe i Toky o Toronto
Oxford i s a registered trade mark of Oxford University Press
in the UK and in certain other countrie s
Published in the United States
by Oxford University Press Inc., New York
A . Banerjee, J.J. Dolado, J.W. Galbraith, and D.F . Hendry 1993
The moral rights of the author have been asserte d
Database right Oxfor d University Press (maker)
Reprinted 2003
All rights reserved. No part of this publication maybe reproduced,
stored in a retrieval system , or transmitted, i n any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriat e
reprographics right s organization . Enquiries concerning reproductio n
outside the scop e of the above should be sent to the Rights Department,
Oxford University Press, at the addres s above
You must not circulate this book in any other binding or cover
and you must impose this same condition on any acquirer
ISBN 0-19-828810-7

Preface
This boo k i s intended a s a guid e t o th e literatur e o n co-integratio n an d
modelling o f integrate d processes . Time-serie s econometric s ha s devel oped rapidl y durin g th e pas t decade , bu t especiall y s o in th e analysi s of
non-stationarity. I n particular , th e stud y o f integrate d processe s ha s
grown i n importance fro m th e statu s of a n exoti c topic, discusse d onl y in
technical journals , t o bein g a n essentia l par t o f th e econometrician' s
collection o f techniques . I t ha s thereb y develope d int o a n are a o f
interest fo r econometri c theorist s an d applie d econometrician s alike .
This boo k i s aime d a t graduat e student s i n economics , applie d econo metricians, econometri c theorists , an d th e genera l audienc e o f econo mists who use empirica l methods t o analys e tim e series.
Despite th e growin g importanc e o f th e literatur e o n integratio n an d
co-integration, mos t account s o f thi s literatur e remai n confine d t o
journals, edite d collection s o f papers , o r surve y papers. Whil e som e o f
the survey s ar e quit e detailed , spac e restriction s usuall y d o no t allo w a
full expositio n o f man y o f th e theoretica l points . Thi s boo k attempt s t o
bridge th e ga p betwee n account s suc h a s surveys , whic h ar e mainl y
descriptive, an d account s tha t ar e mainl y theoretical . I t explain s th e
important concept s informall y an d als o present s the m formally . Th e
asymptotic theor y o f integrate d processe s i s describe d an d th e tool s
provided b y thi s theor y ar e use d t o derive , i n som e detail , th e
distributions o f estimators. B y taking reader s ste p b y ste p throug h som e
of th e mai n derivations , ou r hop e i s t o mak e th e theor y readil y
accessible t o a wide audience .
We hav e trie d t o mak e th e boo k a s self-containe d a s possible . A
knowledge o f econometrics , statistics , an d matri x algebr a a t th e leve l of
a final-yea r undergraduat e o r first-yea r graduat e cours e i n econometric s
is assumed , bu t otherwis e al l o f th e importan t statistica l concept s an d
techniques ar e described .
A boo k suc h a s thi s one , whic h discusse s a n are a tha t i s developin g
rapidly, i s inevitabl y incomplet e an d run s th e ris k o f no t bein g quit e
up-to-date. T o limi t th e tim e take n i n writin g an d revising , w e di d no t
seek t o chas e a frontie r tha t wa s expanding in man y directions . Rather ,
the topic s covere d reflec t ou r view s of issues, models , an d method s tha t
are likel y t o remai n importan t fo r som e tim e t o come , man y o f whic h
will continue to provid e th e platfor m for futur e research .

Acknowledgements
Our boo k wa s writte n i n tw o continents , thre e years , an d fou r univer sities, s o th e lis t o f people , acros s time , space , an d departments , t o
whom w e ow e extensiv e debt s o f gratitud e ha s grow n formidably large.
A majo r par t o f thi s deb t i s owe d t o th e Department s o f Economic s a t
the Universitie s o f Californi a a t Sa n Diego , Florid a i n Gainesville ,
McGill, an d Oxford , an d th e Ban k o f Spain , wher e th e author s eithe r
worked o r visite d for substantia l periods. Thei r generou s suppor t o f ou r
work i s much appreciated .
The boo k ha s als o benefite d greatl y fro m th e patien t scrutin y o f
several o f ou r colleagues , wh o rea d th e entir e typescript an d mad e
detailed comments . W e hav e pleasur e i n thankin g Michae l Clements ,
Rob Engle , Neil Ericsson, Ton y Hall (an d severa l o f his students), Colin
Hargreaves, S0re n Johansen , Katarin a Juselius , Teu n Kloek , Jame s
MacKinnon, G . S . Maddala , Grayha m Mizon , Jean-Fran9oi s Richard ,
Mark Rush , Nei l Shephard , Tim o Terasvirta , an d fou r anonymou s
referees fo r thei r help . The y hav e mad e a grea t contributio n t o thi s
book, an d foun d man y infelicitie s i n earlie r versions , bu t o f cours e ar e
not responsibl e for an y that remain.
Early version s o f th e boo k wer e inflicte d b y u s upo n ou r graduat e
students. Amon g thos e wh o suffere d fro m th e confusio n cause d
by obscur e notatio n an d prose , bu t continue d unflinchingly , Hughe s
Dauphin, Caro l Dole , Jesu s Gonzalo , Catherin e Liston , Claudi o Lupi ,
Neil Rickman , an d Geet a Sing h deserve specia l thanks.
We ar e als o indebte d t o Juli a Campos , Michae l Clements , Steve n
Cook, Nei l Ericsson an d Claudi o Lup i fo r proof reading.
The financia l suppor t o f th e Economi c an d Socia l Researc h Counci l
(UK) unde r grant s B0125002 4 an d R23118 4 an d th e Fond s pou r l a
Formation de s Chercheur s e t 1'Aid e a l a Recherch e (Quebec ) i s grate fully acknowledged . Finally, w e than k Andre w Schulle r an d th e editor s
of thi s series , wh o remaine d encouragin g abou t th e projec t despit e it s
many difficulties .
Oxford A
Madrid J
Montreal J
Oxford D

.B.
. J. D .
. W. G.
. F. H.

Contents
Notational Conventions, Symbols , an d Abbreviations x
1. Introductio n and Overview 1
1.1. Equilibrium relationships and the long run 2
1.2. Stationarity and equilibrium relationships 4
1.3. Equilibrium and the specification of dynamic models 5
1.4. Estimation of long-run relationships and testing for orders
of integration and co-integration 8
1.5. Preliminary concepts an d definitions 1
1.6. Data representation an d transformations 2
1.7. Examples: typical ARM A processes 3
1.8. Empirical time series: money, prices, output, and interest
rates 4
1.9. Outline o f later chapters 4
Appendix 4
Linear Transformations , Erro r Correction , and the Lon g
Run i n Dynami c Regressio n 4
2.1. Transformations o f a simple model 4
2.2. Th e error-correction model 5
2.3. A n example 5
2.4. Bdrdsen an d Bewley transformations 5
2.5. Equivalence o f estimates from different transformations 5
2.6. Homogeneity and the ECM as a linear transformation
oftheADL 6
2.7. Variances o f estimates o f long-run multipliers 6
2.8. Expectational variables and the interpretation of
long-run solutions 6

Properties of Integrated Processes 6


3.1. Spurious regression 7
3.2. Trends an d random walks 8
3.3. Some statistical features o f integrated processes 8
3.4. Asymptotic theory fo r integrated processes 8
3.5. Using Wiener distribution theory 9
3.6. Near-integrated processes 9

0
8
2
0
2
3

6
8
0
2
3
5
0
1
4
9
0
1
4
6
1
5

viii Content

4. Testin g fo r a Unit Roo t 9


4.1. Similar tests and exogenous regressors in the DGP 10
4.2. General dynamic models fo r th e process o f interest 10
4.3. Non-parametric tests for a unit root 10
4.4. Tests o n more than on e parameter 11
4.5. Further extensions 11
4.6. Asymptotic distributions o f test statistics 12

9
4
6
8
3
9
3

5. Co-integratio n 13
5.1. A n example 13
5.2. Polynomial matrices 14
5.3. Integration and co-integration: formal definitions and
theorems 14
5.4. Significance o f alternative representations 15
5.5. Alternative representations o f co-integrated variables:
two examples 15
5.6. Engle- Granger two-step procedure 15

6
7
0

3
7

6. Regressio n wit h Integrate d Variable s 16


6.1. Unbalanced regressions and orthogonality tests 16
6.2. Dynamic regressions 16
6.3. Functional forms an d transformations 19
Appendix: Vector Brownian Motion 20

2
4
8
2
0

7. Co-integratio n i n Individual Equation s 20


7.1. Estimating a single co-integrating vector 20
7.2. Tests for co-integration i n a single equation 20
7.3. Response surfaces fo r critical values 21
7.4. Finite-sample biases in OL S estimates 21
7.5. Powers o f single-equation co-integration tests 23
7.6. A n empirical illustration 23
7.7. Fully modified estimation 23
7.8. A fully modified least-squares estimator 24
7.9. Dynamic specification 24
7.10. Examples 24
Appendix: Covariance Matrices 25

4
5
6
1
4
0
6
9
0
2
4
2

8. Co-integratio n i n System s o f Equations 25


8.1. Co-integration an d error correction 25
8.2. Estimating co-integrating vectors in systems 26
8.3. Inference about th e co-integration space 26
8.4. A n empirical illustration 26
8.5. Extensions 27

5
7
1
6
8
1

5
3

Contents i

8.6. A second example of the Johansen maximum likelihood


approach 29
8.7. Asymptotic distributions of estimators of co-integrating
vectors i n 1(1) systems 29

9. Conclusio n 29
9.1. Summary 29
9.2. Th e invariance o f co-integrating vectors 30
9.3. Invariance o f co-integration under seasonal adjustment 30
9.4. Structured time-series models an d co-integration 30
9.5. Recent research on integration and co-integration 30
9.6. Reinterpreting econometrics time-series problems 30

9
9
0
1
3
4
7

References 31

Acknowledgements fo r Quoted Extracts 32

Author Index 32

Subject Index 32

This page intentionally left blank

Notational Conventions,
Symbols, and Abbreviations
The following notationa l convention s will be used throughou t the text:
Y, y endogenou
X, Z , x , z exogenou

s variables
s variables, o r vectors
containing both y an d z
Greek letters populatio
n values (parameters)
Greek letters with ~ o r ~ sampl
e values (estimates )
Bold lowe r case (Roma n o r Greek) vector s
Bold upper cas e (Roman or Greek ) matrice s
Equation numbers
Equations ar e numbere d consecutivel y i n eac h chapte r an d referre d t o
within tha t chapte r b y this number alone . Equation s fro m othe r chapter s
are referre d t o b y th e chapte r numbe r an d equatio n numbe r withi n
chapter; e.g . th e fift h equatio n i n Chapte r 2 is (5) within Chapter 2 , an d
(2.5) elsewhere .
Symbols
la
first-differenc
Kronecke
fo

g operator:

e operator :
r produc t

r al l
modulus or absolut e value of x, where x i s a scalar
determinan
t o f A, wher e A is a matrix
x
conditiona l on y
wea
k convergence
convergenc
e i n distribution
convergenc
e i n probability
Abbreviations

ADF augmente
d Dickey-Fuller
ADL autoregressive-distribute
d lag

xii Notationa

l Conventions , Symbols , an d Abbreviation s

AR autoregressio
n
ARIMA autoregressiv
e integrate d movin g average
ARMA autoregressive-movin
g averag e
ARMAX ARM
A + additiona l exogenou s processe s
ASE Asymptoti
c standard erro r
BM Brownia
n motio n
Cl(d, b) co-integrate
d o f order d , b
CLT centra
l limi t theore m
COMFAC commo
n facto r error representatio n
CRDW co-integratin
g regression D W statistic
diag diagona
l matrix
d.f. degree
s o f freedom
DF Dickey-Fulle
r
DGP data-generatio
n proces s
DW Durbin-Watso
n statisti c
ECM error-correctio
n model/mechanis m
ESE (average
) estimate d standar d erro r
FCLT functiona
l centra l limi t theorem/ s
FIML full-informatio
n maximu m likelihood
GLS generalize
d least square s
GNP gros
s national produc t
\(d) integrate
d of orde r d
ID independentl
y distribute d
IID independentl
y an d identically distributed
IMA integrate
d movin g average
IN(/i, a 2 ) independentl
y and normall y distributed with mean fi an d
variance a 2
IV instrumenta
l variables
LIML limited-informatio
n maximum likelihood
MA movin
g averag e
MDS martingal
e difference sequence
MLE maximu
m likelihood estimato r
N(ju, a 2 ) normall
y distribute d wit h mean p, and variance a 2
NI near-integrate
d
OLS ordinar
y least square s
SC Schwar
z information criterion
SD standar
d deviatio n
SE standar
d erro r
SI seasonall
y integrated
SSD sampl
e standar d deviatio n
T sampl
e siz e or las t observatio n i n a time-series
TFE tota
l fina l expenditur e
VAR vecto
r autoregressio n
var varianc
e

Notational Conventions, Symbols , and Abbreviations xii


vec vectorizin
W(r) Wiene

g operator
r (Brownia n motion) process wit h increments of
variance r

This page intentionally left blank

Introduction an d Overvie w
This boo k consider s th e econometri c analysi s o f bot h stationar y
and non-stationar y processe s whic h ma y b e linke d b y equilibriu m
relationships. I t exposit s th e mai n tools , techniques , models , con cepts, an d distribution s involve d i n econometri c modellin g o f
possibly non-stationar y time-serie s data . Sinc e th e focu s i s o n
equilibrium concepts , includin g co-integration an d erro r correction ,
the analysi s begin s wit h a discussio n o f th e applicatio n o f thes e
concepts t o stationar y empirica l models . Late r w e wil l sho w tha t
integrated processe s ca n b e reduce d t o thi s cas e b y suitabl e
transformations tha t tak e advantag e o f co-integrating (equilibrium )
relationships. I n thi s chapte r w e wil l introduc e som e importan t
concepts fro m time-serie s analysi s an d th e theor y o f stochasti c
processes, an d i n particula r th e theor y o f Brownia n motio n pro cesses. W e als o offe r severa l empirica l example s whic h us e thes e
concepts.
A significan t re-evaluatio n o f th e statistica l basis o f econometri c model ling too k plac e durin g th e 1980s . It s analytica l basis expande d fro m th e
assumption o f stationarit y t o includ e integrate d processes . Th e effec t o f
this shif t i s fa r fro m complete , bu t i s alread y radical , influencin g th e
choice o f mode l forms , modellin g practices , statistica l inference , dis tribution theory , an d th e interpretatio n o f man y traditiona l concept s
such a s simultaneity , measurement errors , collinearity , forecasting , an d
exogeneity. Thi s boo k attempt s t o analys e thes e issues , describ e th e
tools necessar y t o investigat e integrate d processes , an d relat e th e ne w
methods t o thos e mor e familia r t o econometricians . Researc h i s con tinuing a t a rapi d pace , an d sinc e thi s boo k canno t cove r al l o f th e
techniques tha t hav e bee n explored , w e wil l concentrat e o n thos e tha t
we believe wil l remain useful .
Time-series econometric s i s concerned wit h th e estimatio n o f relation ships amon g group s of variables , eac h o f whic h is observed a t a numbe r
of consecutiv e point s i n time . Th e relationship s amon g thes e variable s
may b e complicated ; i n particular , th e valu e o f eac h variabl e ma y
depend o n th e value s take n b y man y other s i n severa l previou s tim e
periods. I n consequence , th e effec t tha t a chang e in one variabl e ha s on
another depend s upo n th e tim e horizo n tha t w e consider . I t i s eas y t o

2 Introductio

n an d Overvie w

imagine example s i n whic h a chang e i n on e quantit y ha s littl e o r n o


effect o n anothe r a t firs t an d a substantia l effec t later . Alternatively , a
variable ma y hav e a substantia l effec t o n anothe r fo r a time , bu t tha t
effect ma y eventually die out .
It i s useful , therefore , t o distinguis h wha t ar e ofte n calle d 'short-run '
relationships (thos e holdin g ove r a relativel y shor t period ) fro m 'long run' relationships . Th e forme r relat e t o link s tha t d o no t persist . Fo r
example, a sudde n stor m ma y temporaril y reduc e th e suppl y o f fres h
fish an d increas e it s price , bu t late r fai r weathe r wil l lea d t o th e
re-establishing o f th e earlie r pric e i f deman d i s unaltered . Th e long-ru n
relationships determin e th e generall y prevailing price-quantity combina tions transacte d i n the market , an d s o are closel y linke d t o th e concept s
of equilibriu m relationship s i n economi c theor y an d o f persisten t co movements o f economi c tim e series i n econometrics . Ou r firs t tas k i s t o
clarify thes e concepts .

1.1. Equilibriu m Relationship s an d th e Lon g Run


An equilibrium state i s define d a s on e i n whic h ther e i s n o inheren t
tendency t o change . A disequilibriu m i s an y situatio n tha t i s no t a n
equilibrium an d henc e characterize s a state tha t contain s th e seed s o f its
own destruction . A n equilibriu m stat e ma y o r ma y no t hav e th e
property o f eithe r loca l o r globa l stability ; thus, i t ma y o r ma y no t b e
true tha t th e syste m tend s t o retur n t o th e equilibriu m stat e whe n i t is
perturbed. However , w e generall y conside r onl y stabl e equilibria , sinc e
unstable equilibri a wil l no t persis t give n that ther e ar e stochasti c shock s
to th e economy . Tha t is , equilibri a ar e state s t o whic h th e syste m i s
attracted, othe r thing s bein g equal . I t ma y als o b e possibl e i n som e
circumstances t o vie w th e force s tendin g t o pus h th e syste m bac k int o
equilibrium a s dependin g upo n th e magnitud e o f th e deviatio n fro m
equilibrium a t a given point i n time.
Equilibrium ma y b e eithe r genera l o r partial . I n th e latte r case , a
given market i s viewed as having attained equilibriu m i n spite o f the fac t
that w e hav e no t take n accoun t o f th e feedbac k fro m othe r markets . I n
both cases , a n equilibriu m relationshi p i s expresse d throug h a functio n
f(*i, x 2, . - ., xn) = 0, whic h describes th e relationship s tha t hol d amon g
the n variable s Xi t o x n whe n th e syste m i s in equilibrium . Th e phras e
'long-run equilibrium ' i s also use d t o denot e th e equilibriu m relationshi p
to whic h a syste m converge s over time . Ove r finit e period s o f time , th e
long-run o r equilibriu m relationship s ma y fai l t o hold , bu t the y wil l
eventually hol d t o an y degre e o f accurac y i f th e equilibriu m i s stable ,
and i f th e syste m doe s no t experienc e furthe r shock s fro m outside .
Expressed differently , a long-ru n equilibriu m relationshi p entail s a

Introduction and Overvie w 3


systematic co-movemen t amon g economi c variable s whic h a n economi c
system exemplifie s precisel y i n th e lon g run ; w e wil l writ e equation s
representing suc h co-movement s withou t tim e subscript s as , e.g .
x\ = fix2 to denot e a linear long-ru n relation betwee n x^ an d x^.
Our definitio n o f equilibriu m i s therefor e no t tha t i n whic h 'equili brium' refer s t o clearin g i n a particula r marke t an d wher e 'disequili brium' mean s tha t suppl y i s not equa l t o demand , a s i n Quand t (1978 ,
1982): w e us e th e ter m 'market-clearing ' fo r th e forme r an d a 'non clearing market ' fo r th e latter . A non-clearin g marke t involve s quantity
rationing o f som e agent s and , dependin g o n th e institutiona l structure ,
may o r ma y no t involv e a deviatio n fro m a n equilibriu m functiona l
relationship.
There i s o f cours e a connectio n betwee n th e meanin g o f 'equilibrium '
used i n econometric s b y Quand t an d others , an d tha t use d here , which
is mor e commo n i n time-serie s analysis . Whe n a marke t clears , a n
equilibrium relationshi p o f th e typ e w e hav e define d ma y als o occu r
because clearin g o f tha t marke t ma y retur n th e syste m t o a stat e i n
which som e functiona l relationshi p amon g observabl e variable s holds .
Our definitio n i s intende d t o b e genera l an d therefor e t o incorporat e
market-clearing equilibria , a s well as others whic h may arise throug h th e
behaviour o f a variet y o f differen t type s o f systems . Fo r example , w e
would sa y tha t a n equilibriu m relationshi p exist s betwee n aggregat e
consumption and incom e if consumptio n tend s towar d a fractio n y of
income i n th e absenc e o f shock s whic h ma y temporaril y pertur b th e
relationship. Thi s nee d no t b e a n equilibriu m i n th e Quand t (1978 )
sense, however , becaus e i t ma y no t correspon d t o th e clearin g o f
markets. (Al l consumer s may remain credit-rationed , for example.)
Even i f shock s t o a syste m ar e constantl y occurrin g s o tha t th e
economic syste m i s neve r i n equilibrium , th e concep t o f long-ru n
equilibrium ma y nonetheles s b e useful . Th e presen t i s th e long-ru n
outcome o f th e distan t pas t and , a s wil l b e mad e precis e below , a
long-run relationshi p wil l ofte n hol d 'o n average ' ove r time . Moreover ,
a stabl e equilibriu m ha s th e propert y tha t a give n deviatio n fro m th e
equilibrium become s mor e an d mor e unlikel y a s th e magnitud e o f th e
deviation i s greater , s o tha t on e ma y b e reasonabl y confiden t tha t th e
discrepancy between th e actua l relationship connectin g variables an d this
long-run relationshi p i s withi n certai n bounds . Precis e definition s ar e
provided in Chapte r 5 .
Methods fo r investigatin g such long-ru n relationships ar e ou r concer n
here. A n examinatio n o f these method s wil l lead u s to discus s aspects of
time-series analysis , o f dynami c modelling in general , an d o f th e rapidl y
growing literature treatin g co-integration , erro r correction , an d inference
from non-stationar y data . Th e firs t ste p i s to clarif y th e statistica l notio n
of stationarit y and it s links to th e concep t o f equilibrium.

4 Introductio

n and Overvie w

1.2. Stationarit y an d Equilibriu m Relationship s


In economi c theory , th e concep t o f equilibriu m i s wel l establishe d an d
well defined . Th e statistica l concept o f equilibriu m centre s o n tha t o f a
stationary process, whic h wil l b e define d formall y below. A substantia l
body o f method s i s developin g aroun d th e statistica l feature s o f equili brium relationship s amon g time-serie s processes , an d th e concept s o f
Stationarity an d particula r form s o f non-stationarit y ar e crucia l t o thes e
methods.
If a particula r relationshi p suc h a s x\ = fix 2 emerges a s th e economi c
system i s allowe d t o settl e down , this wil l describ e a n equilibriu m to a n
econometrician jus t a s to a theorist . I n actua l tim e series , however , th e
relation jt l t = fix 2t ma y neve r b e observe d t o hold . Consequently , w e
look fo r way s of characterizin g the relationship s tha t ca n b e observe d t o
hold betwee n x\ t an d x2t.
Roughly speakingagain , term s wil l b e define d precisel y i n Chapte r
5we sa y that a n equilibriu m relationship f(xi,x 2) = 0 hold s betwee n
two variable s x j an d x 2 i f th e amoun t E, = f(xit,x2t) b y whic h actua l
observations deviat e fro m thi s equilibriu m i s a median-zer o stationar y
process.1 Tha t is , th e 'error ' o r discrepanc y betwee n outcom e an d
postulated equilibriu m ha s a fixe d distribution , centre d o n zero , tha t
does no t chang e ove r time . Thi s erro r canno t therefor e gro w indefin itely; i f i t did , the relationshi p coul d no t hav e bee n a n equilibriu m on e
since th e syste m is free t o mov e eve r furthe r awa y fro m it . O f course , i t
may b e difficul t t o distinguis h in finit e sample s between a n ever-growing
discrepancy i n a n hypothesize d equilibriu m relationshi p an d a rando m
fluctuation; forma l statistica l test s fo r problem s suc h a s thi s ar e discussed i n later chapters.
Given th e characterizatio n above , th e short-ru n discrepanc y e t i n a n
equilibrium relationshi p mus t hav e n o tendenc y t o gro w systematically
over time . However , sinc e thi s erro r represent s shock s tha t ar e constantly occurrin g an d affectin g economi c variables , i n a rea l economi c
system ther e i s n o systemati c tendenc y fo r thi s erro r t o diminis h ove r
time either . I t would fall awa y to zer o only if shocks were to cease .
This definitio n o f a n equilibriu m relationshi p hold s automaticall y
when applie d t o serie s tha t ar e themselve s stationary . Fo r an y tw o
stationary serie s {jc 1(} an d {x 2t}, irrespectiv e o f an y substantiv e economic relationshi p betwee n thes e tw o alone , a differenc e o f th e for m
1
Late r we will consider mor e precisely th e propertie s that th e deviatio n mus t have . Th e
requirement i s usually state d a s bein g tha t th e deviatio n fro m th e equilibriu m relationship
be integrate d o f orde r zer o (se e below); alternatively , w e migh t impos e onl y th e weake r
requirement tha t th e unconditiona l expectatio n o f th e deviatio n fro m th e equilibriu m
relationship b e zero , implyin g that onl y th e firs t momen t nee d exis t an d b e constant . Fo r
simplicity, w e omit intercept s fro m th e presen t discussion .

Introduction and Overvie w 5


{xit bx2t} mus t b e a stationary serie s fo r an y b . Thus , whethe r o r no t
there exists a non-zero y 3 which describes a true equilibrium relationship ,
corresponding t o a non-zero derivativ e betwee n x\ an d x2, any arbitrarily chosen b wil l meet th e statistica l equilibriu m condition. Thi s doe s no t
imply tha t w e canno t us e statistica l method s t o determin e th e para meters o f a long-ru n relationship , bu t simpl y tha t on e stag e o f th e
process, i n which we look fo r a stationary discrepancy , is unnecessary.
However, thi s concept o f statistica l equilibrium i s necessary an d usefu l
in examinin g equilibriu m relationship s betwee n variable s tendin g t o
grow ove r time . I n suc h cases , i f the actua l relationshi p i s x = fix 2, th e
discrepancy xi, - bx 2t wil l b e non-stationar y fo r an y b + /3, sinc e th e
discrepancy deviate s fro m th e tru e relationshi p b y th e constan t propor tion ( b - )8 ) o f the growin g variabl e x 2t; onl y th e tru e relationshi p ca n
yield a stationar y discrepancy . Wit h mor e tha n tw o variables , however ,
there ma y b e mor e tha n on e equilibriu m relation , an d thi s lead s t o
another o f th e statistica l problem s tha t i s currentl y bein g pursued : th e
empirical determinatio n o f th e numbe r o f equilibriu m relationship s
between thre e or more non-stationar y tim e series .

1.3. Equilibriu m and th e Specificatio n o f Dynami c


Models
Equilibrium relationship s hav e playe d a n explici t rol e i n econometri c
modelling sinc e it s foundation s (se e Morga n 1990) . I f ther e exist s a
stable equilibriu m x\ = fix 2, th e discrepanc y {x\ t fix 2t} evidentl y
contains usefu l informatio n sinc e o n averag e th e syste m wil l mov e
towards tha t equilibriu m i f i t i s no t alread y there . I n particular ,
(x-it-i - /3x 2t-i) represent s th e previou s disequilibrium . Suppos e th e
equilibrium relationshi p is betwee n a variabl e {y t} to be modelle d and
some serie s {zi} whic h i s exogenou s i n a n appropriat e sense . I f w e le t
x =
it yt an d X 2t = z t t o distinguis h thei r status , an d denote th e equili brium b y y PZ, the n th e discrepancy , o r error , {y t fizt} shoul d b e a
useful explanator y variabl e fo r th e nex t directio n o f movemen t o f y t. I n
particular, whe n y, flz t is positive, y, is too hig h relative t o z t, an d on
average w e might expect a fal l i n y i n futur e period s relativ e t o it s tren d
growth. Th e ter m (y t-\ Pzt-i), calle d a n error-correction mechanism,
is therefor e sometime s include d i n dynami c regression s (se e Sarga n
1964, Hendr y an d Anderso n 1977 , an d Davidson , Hendry , Srba , an d
Yeo 1978) .
The tru e paramete r /3 characterizin g th e relationshi p i s no t know n i n
general. Thi s nee d no t preven t th e error-correctio n mechanis m fro m
being useful , however , sinc e th e unknow n paramete r ca n eithe r b e

6 Introductio

n and Overvie w

estimated separatel y i n a prio r analysi s o r estimate d i n th e cours e o f


modelling th e variabl e o f interest . Moreover , th e genera l error-correc tion mechanis m ca n b e show n t o b e equivalen t t o variou s othe r
transformations o f a genera l linea r mode l incorporatin g pas t value s o f
both th e variabl e o f interes t an d th e explanator y variable s (se e Chapte r
2). A particula r advantag e o f th e error-correctio n mechanis m i s that th e
extent o f adjustmen t i n a give n perio d t o deviation s fro m long-ru n
equilibrium i s give n b y th e estimate d equatio n withou t an y furthe r
calculation. Othe r form s o f th e estimate d mode l ar e als o convenien t i n
that the y allo w th e implie d long-ru n relatio n itsel f t o b e see n directly .
Considerations suc h a s these ar e discusse d i n the followin g chapter .
The practic e o f exploitin g informatio n containe d i n th e curren t
deviation fro m a n equilibriu m relationship, i n explainin g th e pat h o f a
variable, ha s benefite d fro m th e formalizatio n o f th e concep t o f co-inte gration b y Grange r (1981 ) an d Engl e an d Grange r (1987) . Th e informa l
definition o f statistica l equilibriu m discusse d abov e i s base d upo n a
special cas e o f th e definitio n o f co-integration . Further , th e practic e o f
modelling co-integrate d serie s i s closel y relate d t o error-correctio n
mechanisms: error-correctin g behaviou r o n th e par t o f economi c agent s
will induc e co-integratin g relationship s amon g th e correspondin g tim e
series an d vic e versa.
A serie s tha t i s tendin g t o gro w ove r tim e canno t b e stationar y
(although i t ma y possibl y b e stationar y aroun d som e deterministi c
trend), bu t th e changes i n tha t serie s migh t be . T o tak e a mechanica l
example, i f a n objec t ha s a fixe d averag e positio n aroun d whic h i t
moves, alway s returnin g afte r som e interva l t o thi s positio n lik e a
randomly perturbe d weigh t a t th e en d o f a spring , the n it s displacemen t
may b e a stationar y series . A n objec t tha t ha s n o suc h fixe d positio n
may nevertheles s hav e a velocit y (th e chang e i n positio n pe r uni t time) ,
or acceleratio n (th e chang e i n th e velocit y pe r uni t time) , tha t i s
stationary. Fo r example , i f th e objec t i s movin g eve r furthe r fro m it s
point o f origin , bu t wit h velocit y fluctuatin g aroun d som e fixe d positiv e
mean accordin g t o a fixe d distributio n function , the n th e velocit y o f th e
object i s a stationary series.
A serie s is said t o be integrate d o f order 1 (1(1)) if , althoug h it is itself
non-stationary, th e change s i n thi s serie s for m a stationar y series . I t i s
said t o b e integrate d o f orde r 2 (1(2) ) if , althoug h th e change s ar e non stationary, th e changes in th e changes for m a stationar y series . I n othe r
words, i f th e serie s mus t b e difference d exactl y k time s t o achiev e
stationarity, the n th e serie s i s l(k), s o that a stationary serie s i s 1(0). W e
will us e th e ter m 'integrate d process ' t o refe r t o a serie s wit h orde r o f
integration strictl y greate r tha n zero : precis e definition s ar e give n i n
Chapter 3 .
We ca n no w conside r th e concep t o f co-integration , it s relatio n t o th e

Introduction and Overvie w 7


definition o f long-ru n equilibriu m betwee n serie s give n above , an d it s
use a s part o f a statistical descriptio n o f the behaviou r o f time serie s tha t
satisfy som e equilibriu m relationship . A simpl e exampl e concern s tw o
series, eac h o f whic h i s integrate d o f orde r 1 . Assum e tha t a long-ru n
equilibrium relationshi p hold s betwee n them , an d tha t i t i s linear :
x = X
i P 2- The n (x t f3x2) mus t be equa l t o zero i n equilibrium and the
series {xi t flx 2t} ha s a constant unconditiona l mean o f zero. Thi s nee d
not impl y tha t {xi t fix 2t} is stationary : th e varianc e o f {x lt - flx 2t}
might b e non-constant , fo r example . Th e definitio n o f co-integratio n
given b y Engl e an d Grange r (1987) , an d discusse d i n Chapte r 5 , doe s
however requir e stationarit y o f th e deviatio n (x\ t~ fait} - Whe n
stationarity doe s hold , w e sa y that x\ an d x 2 ar e co-integrate d (1,1) ,
denoted CI(1,1) ; tha t is , the y ar e eac h integrate d o f order 1 , and ther e
exists som e linea r combinatio n {x\ t /3x2t} whic h i s integrate d o f a n
order on e lowe r tha n th e component s (i.e . i s 1(0) here) . I f {x it fix 2t}
has a constan t unconditiona l mea n bu t i s no t stationary , the n w e ma y
still wan t t o sa y tha t a n equilibriu m relationshi p holds ; th e serie s wil l
not, however , fi t th e stric t Engle-Grange r definitio n o f co-integration ,
which require s tha t som e linear combinatio n b e stationary.
A substantiv e long-ru n equilibriu m relationshi p i s somethin g fro m
which th e variable s involve d ca n deviate , bu t no t b y a n ever-growin g
amount. Tha t is , th e discrepanc y o r erro r i n th e relationshi p canno t b e
integrated o f an y orde r greate r tha n zero . Serie s integrate d o f strictl y
positive order s whic h ar e linke d b y suc h a n equilibriu m relationshi p
must, therefore , b e co-integrate d wit h eac h other . I n th e exampl e jus t
given, th e fac t tha t th e integrate d series jt j an d x 2 mov e togethe r i n th e
long ru n i s reflecte d i n th e fac t tha t the y ar e co-integrated ; a linea r
relation yield s a stationary deviation .
More generally , we can spea k o f variables that ar e co-integrate d (a , b )
when a > b an d b > 0, wher e a i s th e orde r o f integratio n o f th e
variables and b is the reductio n in orde r of integration produce d by the
linear combination , whic h the n ha s orde r o f integratio n a b. Whe n
b > 0, a linea r relatio n exist s betwee n th e variable s whic h i s integrate d
of lowe r orde r tha n eithe r o f th e variable s themselves , bu t whic h ma y
none th e les s no t b e 1(0) . I n th e latte r cas e ( a b >0), th e variable s
may deviat e fro m th e linea r relationshi p b y a n ever-growing amount ,
and s o i t i s no t th e kin d o f relationshi p tha t w e hav e bee n callin g a
long-run equilibrium . Nevertheless , variable s tha t ar e CI(a , b) fo r b > 0
do contai n som e informatio n abou t th e long-ru n behaviour o f th e serie s
involved.
Since a relationshi p betwee n co-integrate d variable s can be show n to
be representabl e usin g a n error-correctio n mechanis m (se e Chapte r 5) ,
and sinc e suc h representation s hav e bee n foun d t o b e valuabl e i n
empirical modelling , ther e i s a forma l counterpar t t o th e informa l

8 Introductio

n and Overvie w

argument abov e suggestin g th e usefulnes s o f equilibriu m informatio n i n


specifying dynami c regression models .

1.4. Estimatio n o f Long-Run Relationship s an d Testin g


for Order s o f Integration an d Co-integratio n
The existenc e o f long-ru n relationship s betwee n variables , th e potentia l
orders o f integratio n o f particula r tim e series , an d th e implication s o f
these fo r th e specificatio n o f dynami c econometri c model s ca n b e
understood a s mathematica l propertie s withou t implyin g tha t w e kno w
whether o r no t suc h relationship s exist , le t alon e wha t thei r form s fo r a
particular empirica l problem woul d be .
When a n estimate d regressio n equatio n implie s a n equilibriu m rela tionship betwee n tw o processes , i t i s a straightforwar d operatio n t o
extract th e estimate d long-ru n equilibriu m relatio n regardles s o f th e
form i n which the equatio n i s estimated. Th e calculatio n can be mad e by
expressing th e equatio n i n a n equilibriu m for m an d takin g it s expecta tion. Thi s i s analogou s t o assumin g a stat e i n whic h th e value s o f th e
variables d o no t change , s o tha t th e datin g o f variable s become s
irrelevant an d th e equatio n i s treate d a s deterministic . Computin g th e
derivative betwee n th e tw o serie s i s the n straightforward . Approxima tions t o th e variance s o f estimate d long-ru n multiplier s ca n als o b e
computed. Chapte r 2 explore s variou s transformation s o f th e linea r
model tha t ar e convenien t fo r these an d relate d calculations .
Testing fo r th e existenc e o f suc h a n equilibriu m relationshi p i s no t
nearly s o simple. First, i t is difficult empiricall y to establis h th e order s of
integration o f individua l time series . Second , th e orde r o f integratio n o f
a linea r relationshi p amon g variable s i s even harde r t o discove r tha n th e
order o f integratio n o f a singl e series : drawin g inferences is complicate d
by th e fac t tha t th e parameter s o f th e relationshi p ar e i n genera l
unknown.
Testing whethe r a n individua l serie s i s 1(1 ) a s oppose d t o 1(0 ) i s th e
problem tha t ha s bee n widel y discusse d a s tha t o f testin g fo r a 'uni t
root' i n a time series . Strategie s fo r performin g such testin g hav e ha d t o
contend wit h th e proble m tha t 1(0 ) alternative s i n whic h th e serie s i s
'close' t o bein g 1(1 ) (s o tha t th e powe r o f th e tes t i s low ) ar e ver y
plausible i n many economic circumstances . Further , th e for m o f the dat a
generation proces s (e.g . th e order s o f dynamics ; th e questio n o f whic h
exogenous variable s enter ; etc. ) i s not known , an d critica l value s o f tes t
statistics ar e typicall y sensitive to th e structur e o f the process .
Fuller (1976 ) an d Dicke y an d Fulle r (1979 ) emphasize d tha t testin g
for non-stationarit y (again , 1(1 ) a s oppose d t o 1(0 ) series ) i s mor e
difficult tha n conventiona l f-test s o f th e hypothesi s tha t th e autoregress -

Introduction and Overvie w 9


ive paramete r i s equa l t o on e i n a n AR(1 ) model . I n fact , wher e ther e
are root s greate r tha n o r equa l t o one , conventionall y use d test s d o no t
have standar d asymptoti c distributions . Th e origina l test s wer e variant s
of conventiona l tests , wit h critica l value s retabulated usin g Monte Carl o
experiments t o reflec t th e change s i n distributio n when , under th e null ,
the serie s are non-stationary.
These origina l test s wer e base d o n simpl e form s o f autoregressiv e
model: a n AR(1) model , with o r withou t drif t an d tim e tren d term s (i.e .
yt = <xy t^i [+/3 ] [+yt\ +E t). Suc h simpl e form s ma y ofte n b e poo r
approximations t o th e dat a generatio n process . Thi s wil l manifes t itself
in th e failur e o f th e estimate d mode l t o pas s variou s mis-specificatio n
tests. I n particular , test s fo r residua l autocorrelatio n wil l ofte n reflec t
autocorrelated processe s tha t hav e bee n omitte d fro m th e mode l specifi cation. On e wa y o f dealin g wit h th e proble m o f findin g a n adequat e
model withi n whic h t o tes t fo r non-stationarit y ha s therefor e bee n t o
retain a simpl e autoregressiv e mode l form , bu t wit h a non-parametri c
correction t o th e value s o f th e tes t statisti c t o allo w for a genera l for m
of autocorrelatio n i n th e residuals . Anothe r approac h attempt s t o
capture th e autocorrelatio n throug h th e additio n o f extra lagge d terms in
the dependen t variable . Thes e issues are addresse d i n Chapter 4 .
When serie s ma y contai n mor e tha n on e 'uni t root'i.e . wher e the y
may be 1(2 ) or of highe r orderstestin g become s yet mor e difficul t
because th e sequenc e i n which different hypothese s ar e teste d ca n affec t
inference. Suc h issues are als o considere d i n Chapter 4 .
A relate d metho d ca n b e applie d t o th e proble m o f testin g fo r a n
equilibrium relatio n betwee n integrate d variables . A prio r ste p mus t b e
added t o th e metho d above , i n whic h a linea r relationshi p betwee n o r
among th e variable s i n questio n i s estimated . Testin g fo r co-integratio n
then entail s testin g th e orde r o f integratio n o f th e erro r i n thi s
relationship. Fo r example , a stationar y erro r i n a mode l relatin g
integrated serie s entail s a n equilibriu m relationship. Conversely , if there
were n o equilibriu m relationship , ther e woul d b e nothin g t o ti e thes e
series t o an y estimated linea r relation , an d thi s would imply non-stationarity of the residuals .
It migh t appea r a t firs t sight , fo r example , tha t testin g fo r co-integra tion betwee n 1(1 ) serie s {x\ t} an d {x2t} woul d be precisel y th e sam e a s
a tes t o f th e hypothesi s tha t {e j = {x lf - I3x 2,} i s 1(1 ) agains t th e
alternative tha t {e (} is 1(0). However , thi s is true onl y unde r ver y strong
assumptions. Necessar y condition s includ e tha t ther e i s onl y on e co integrating relatio n an d th e value s o f it s parameter s ar e known . I n th e
bivariate case , whe n / 3 i s estimated , th e serie s tha t on e test s fo r
stationarity i s {, } = {XK J3x2t}- Sinc e linea r regressio n minimize s th e
variance of e t, the estimate d serie s of deviation s from equilibriu m has a
smaller varianc e tha n th e tru e deviation s {x it f}x2t}, assumin g tha t (3

10 Introductio

n and Overvie w

exists. Tha t is , th e metho d b y whic h /3 i s usuall y estimated amount s t o


choosing / 3 i n suc h a wa y tha t th e tw o variable s ar e give n th e bes t
chance t o appea r t o mov e together . Regressio n make s co-integratio n
appear t o b e presen t mor e ofte n tha n i t should , s o tha t th e critica l
values o f tes t statistic s mus t b e adjuste d t o reflec t th e fac t tha t / 3 i s
estimated. Co-integratio n test s ar e therefor e similar , bu t no t identical ,
to standard stationarit y tests.
Chapter 7 explore s thes e test s fo r co-integration , an d Chapte r 8
extends the discussio n to estimatio n an d testin g in systems of equations.

1.5. Preliminar y Concept s an d Definitions


We assum e tha t reader s ar e acquainte d wit h the fundamenta l principles
and method s o f econometric s an d statistica l inference. I t i s nonetheles s
worth reviewin g som e importan t concept s an d definition s tha t wil l b e
used i n later chapters , establishing terminology as we do so .

1.5.1. Stochastic Processes and Time-series Models


A numbe r o f concept s fro m standar d time-serie s analysi s wil l b e
necessary. Bo x an d Jenkin s (1970 ) giv e a thoroug h treatmen t o f thes e
models.
A stochastic process i s a n ordere d sequenc e o f rando m variable s
{x(s, t) , s e S, t e T}, suc h that , fo r eac h t e T, x ( , t) i s a rando m
variable o n th e sampl e spac e S and , fo r eac h s e S, x ( s , - ) i s a
realization o f th e stochasti c proces s o n th e inde x se t T (tha t is , a n
ordered se t of values, each correspondin g t o on e valu e of the inde x set).
A give n realization o f th e proces s ma y b e represente d a s {x(t), t e T},
and thi s notatio n i s als o ofte n use d fo r th e stochasti c proces s itself . I n
later chapter s w e wil l typicall y refe r t o realization s o f stochasti c pro cesses by the notatio n x t for a value at t, and {x t}i (or {x t} or {*(}? = i)
for a ful l se t o f values corresponding t o a n inde x set T = {1 , 2 , . . ., T}.
We wil l als o restric t ou r attentio n t o discret e stochasti c processes , fo r
which th e inde x se t i s a discret e set , i n whic h case w e generally use th e
notation x t rathe r tha n x(i), whic h ma y appl y als o t o continuou s
processes.
Next, le t (x(f), t e 1} be a stochastic proces s suc h tha t E(\x(t)\) < <* >
for al l t T, an d E(x(i)\$ t_d = x(t - 1 ) fo r al l t e T, wher e E ( ) is
the expectation s operato r an d $ t^i represent s a particula r information
set o f dat a realize d b y tim e t - 1. The n {x(t), t e } i s calle d a

Introduction and Overvie w 1

martingale wit h respect t o {$ t, t e T}. A martingale difference sequence


can the n b e define d b y {y(t) = x(t) - x( t - 1) , f e T}. I t follow s tha t
E
(\y(t)\} <0 V ? e T and that E(y(t)\^ t_1) = 0 V t e T.
A stochasti c proces s i s calle d strictly stationary if , fo r an y subse t
(?!, t 2, . . ., t n) o f T and any real numbe r h suc h tha t t t + h e T, i = 1,
2, . . ., n, we have

where F ( ) i s th e join t distributio n functio n o f th e n values . Stric t


stationarity therefor e implie s tha t al l existin g moment s o f th e proces s
are constan t throug h time . Th e proces s i s weakly stationary (o r secondorder stationary or covariance stationary) if

where fi, fi 2, an d th e fa j ar e constan t ove r t, fo r al l t e T an d h suc h


that t r + h e T ( r = i, /). Thus , th e contemporaneou s secon d moment s
do no t depen d o n time , an d th e la g dependencie s ar e function s only of
lag length. Tha t th e firs t tw o raw moments ar e constan t als o implies that
the varianc e o f th e proces s i s constant . I f w e conside r a vecto r proces s
(x(?)} = {*i(f) > #2(0 > > x m(t}}'> the n w e requir e i n additio n tha t
covariances o f th e for m E\Xk(tj)xi(tj)\ ar e finit e constant s an d ar e
functions o f i, j, k , I only , for any admissible i, j, k , an d /.
We wil l not offe r a rigorous definitio n of a n integrate d proces s a t this
stage bu t w e ca n highligh t a numbe r o f th e issue s involved . A n
integrated process i s one tha t ca n b e mad e stationar y b y differencing . A
discrete proces s integrate d o f orde r d mus t b e difference d d time s t o
reach stationarity ; tha t is , & dxt i s stationar y wher e th e differencin g
operator A rf i s define d b y ( 1 - L) d (usin g th e la g operato r L , itsel f
defined b y L nxt = *,_). Fo r example , th e firs t differenc e i s
Ax, = x, - x,_i, an d th e secon d differenc e i s A. 2xt = Axt &xt-i =
x, 2x,-i + xt-2 = ( 1 ~ L) 2xt. Th e process ( 1 - L)x, = et, wher e {E,}
is a white-nois e serie s (se e below) , i s calle d a random walk an d i s a
simple exampl e o f a process integrated o f order 1 .
Two issue s meri t comment . First , i f x t i s stationar y then s o i s A* , o r
even A dxt fo r d > 0. Thus , th e stationarit y of A d;cr i s not sufficien t fo r x t
to b e l(d). (Recal l tha t a n l(d) proces s i s one tha t must b e differenced
d time s t o achiev e stationarity. ) Secondly , conside r th e stabl e auto regressive process , x, = a 0 + a\xt^i + st, wher e or j < 1 , XQ = 0, an d
E, ~ IN(0 , or 2), t l, . . ., T . The n {x,} i s non-stationar y sinc e
E(xt) = <*o( l ~ <*i)( l ~ <x i)~l whic h i s no t constan t ove r t , althoug h

12 Introductio

n and Overvie w

{x,} i s asymptoticall y stationar y (se e e.g . Spano s 1986) . Henc e w e hav e


a non-stationar y serie s tha t i s not a n integrate d proces s i n th e sens e w e
wish t o use . Chapte r 3 offers precise definitions .
A white-noise process i s a stationar y proces s whic h ha s a zer o mea n
and i s uncorrelate d ove r time ; tha t is , {x(t), t e } i s whit e nois e i f
V f e T , E[x(t)]= 0, E[(x(t)) 2} = a 2 < o o an d E[x(t)x(t + h)] = 0
where h = 0 an d t + h e T. A white-nois e process i s therefore necessar ily second-orde r stationary , an d i f x ( t } i s normall y distribute d i t i s
strictly stationar y a s wel l sinc e i n thi s cas e higher-orde r moment s ar e
functions o f th e firs t two .
An innovatio n (v(f) } agains t a n informatio n se t ^ r _ j i s a proces s
whose distributio n D[v(t)\$ t-i] doe s no t depend o n $ t-i'> als; v (0 i s a
mean innovatio n i f E[v(t)\$ t -i] = 0. Thus , a n innovatio n mus t b e whit e
noise i f $,-i contain s a histor y o f {v( f - 1) , . . ., v(0)}, bu t no t con versely. Consequently , a n innovatio n mus t b e a martingal e differenc e
sequence. (Se e Spano s (1986 ) fo r furthe r discussion. )
For a stationar y process , th e covarianc e betwee n tw o realization s a t
different point s i n tim e (indices ) wil l depen d onl y upo n th e differenc e
between thos e indices , an d no t o n th e indice s themselves . W e ca n
therefore define , fo r a proces s {x,} tha t i s a t leas t second-order
stationary with E(x t) = //< , th e autocovariance function

Stationarity implie s tha t y(/z ) = y ( h ) , sinc e th e autocovarianc e be tween tw o value s depend s onl y o n th e distanc e betwee n them . Th e
autocorrelation function i s defined similarly, as

y(0) being the varianc e o f the process .


Our understandin g o f an d abilit y t o forecas t stochasti c processe s i s
often enhance d b y fittin g models . Th e autoregressive-movin g averag e
(ARMA) clas s o f model s i s widel y use d fo r univariat e time-serie s
modelling, an d w e wil l mak e frequen t referenc e t o suc h models . A n
ARMA(p, q) mode l (wit h p autoregressiv e (AR ) an d q moving-average
(MA) parameters ) fo r a process {x,} \ i s of the form

with BQ = I an d {e t}i a white-noise process.


Using polynomial s i n th e la g operator , w e ca n expres s th e ARM A
model a s

Introduction and Overvie w 1

with

The polynomial s a(L) an d #(L ) ca n b e expresse d i n term s o f thei r


factors a s

If an y facto r ( 1 Am L) fro m <*(L ) matche s an y ( 1 6kL) fro m d(L),


then thes e ar e sai d t o b e commo n factors , an d ca n b e cancelle d fro m
both side s o f (1) . Thi s i s importan t because , i f y(L ) i s an y arbitrar y
polynomial of order n , fro m (1 ) it is also true that

Such redundan t commo n factor s mus t b e cancelle d t o ensur e a uniqu e


representation. I f th e A R polynomia l <x(L) contain s th e facto r ( 1 L),
(that is , i f ther e i s som e A, - equa l t o one) , the n th e proces s i s sai d t o
contain a unit root.2
When th e parameter s {a t} an d {6j} ar e chose n t o fi t the autocorrela tions o f th e observe d proces s a s wel l a s possible , th e resultin g AR M A
process ma y b e a usefu l predictiv e device . A n autoregressiv e integrated
moving-average (ARIMA ) proces s allow s for an integrate d componen t
in th e underlyin g tim e series ; thus , a n ARIMA(p , d, q) proces s i s a n
I(d) proces s fo r which the dth differenc e follow s a n ARMA(p, q).
An ARM A mode l wit h give n parameter s implie s particula r autoco variance an d autocorrelatio n functions ; se e Bo x an d Jenkin s (1970 :
74 ff.) fo r a descriptio n o f a n algorith m by which these ca n b e calculate d
for a general ARMA(p, q ) process .
If th e parameter s o f th e ARM A proces s ar e known , checkin g stationarity is not difficult . Provide d tha t a(L) and 6(L) contai n no commo n
factors, stationarit y o f a n ARMA(p , q) proces s depend s onl y o n th e p
parameters o f th e autoregressiv e part . A n A R o r ARM A mode l i s
stationary i f an d onl y i f th e root s o f th e A R polynomia l
(1 a^L . . . apLp) li e outside th e uni t circl e (or , equivalently , if
and onl y i f th e laten t root s o f th e polynomial , bein g th e root s o f
(zp (XIZP~I . <x p), li e inside th e uni t circle) . A n analogou s
condition mus t hol d i n th e M A polynomia l t o guarante e invertibility o f
the process ; se e Box and Jenkins (1970 ) o r Fuller (1976) .
Factors such a s (1 + L ) o r ( 1 + L 2) yiel d roots with moduli o f unity.

14 Introductio

n and Overvie w

Examples o f processe s havin g thes e form s wil l b e give n late r i n thi s


chapter.

1.5.2. Orders of Magnitude, Convergence in Probability, and


Convergence in Distribution
During th e cours e o f th e analysis , w e will examin e th e limitin g behavi our o f many random variables . I n particular , we wil l ofte n b e intereste d
in determinin g whethe r o r no t a give n sequenc e o f rando m variable s
converges o r tend s t o a limitin g value (o r t o a limi t rando m variable) ,
and th e rat e a t whic h an y suc h convergenc e occurs . Th e definition s
given below , take n fro m Fulle r (1976 ) an d base d o n Man n an d Wal d
(1943), mak e these concepts of convergence rigorous.
It i s usefu l t o star t wit h a sequenc e o f variables say, rea l numbers
that ar e non-stochastic.
Let {flr)r= i b e a sequenc e o f rea l number s an d {gr}T= i b e a
sequence of positive real numbers . The n
1. a T i s of smaller orde r (i n magnitude ) than g T, denote d a T = o(gT), i f
limr_^ a T/gT = 0.
2. a T i s a t mos t o f orde r (i n magnitude ) g T, denote d a T= O(g T), i f
there exists a real numbe r M suc h that gy 1 aT M fo r al l T.
For a sequenc e o f rando m o r stochasti c variables , 'orde r i n probabil ity' i s the relevan t concept. Le t {X T} b e a sequence o f random variables
with {gr} as above. The n
3. Th e sequenc e {X T} converges in probability t o th e rando m variabl e
X, denote d eithe r X T -^ X o r p]imX T = X, if , fo r ever y e > 0,
lim^oo Pr {\XT - X \ > E} = 0. The probability limit of XT i s X .
4. X T i s o f smalle r orde r i n probabilit y tha n g T, denote d X T o p(gT),
if p\imX T/gT = 0.
5. X T i s a t mos t o f orde r i n probabilit y g T, denote d X T Op(gT), if ,
for ever y e > 0, ther e exist s a positiv e rea l numbe r M suc h tha t
Two importan t point s shoul d be noted . First , th e distinctio n between
the little- o an d big- O concept s o f convergenc e ma y b e understoo d
intuitively b y thinkin g of th e forme r a s scalin g a rando m variabl e suc h
that th e scale d variabl e tend s t o zer o in the limit ; fo r the latter , al l tha t
is require d i s tha t th e scale d variabl e remain s bounde d b y a finit e
interval o f th e rea l line . I n a trivia l case , sa y X T i s o p (l). (Her e th e
sequence {g T} i s a degenerat e sequenc e o f Is. ) Tha t is , unsealed ,
XT^0. The n i t i s certainly true tha t X T i s O p(l). Th e convers e i s not
true i n general .

Intro ductidh and Overvie w 1

The secon d poin t concern s th e specifi c us e made o f these convergenc e


concepts i n this book. Th e sequenc e {X T} wil l in general b e a sequenc e
of estimators . Th e sequenc e o f ordinar y leas t square s (OLS ) estimator s
in a regressio n mode l i s a goo d example . Th e estimato r J3 T i s derive d
from a sampl e o f siz e T, wher e i n time-serie s analysi s T denote s time .
A sampl e o f siz e T i s therefor e compose d o f observation s o n a se t o f
variables fo r T tim e periods , usuall y denote d t = 1, 2 , . . . , T. Thus ,
PT -4 ft if and only if linir_oo Pr {|/3r - /3 | > e } = 0.
The correspondin g sequenc e {g T} i s usually a power functio n o f time .
Thus, fo r OL S estimators whe n the variables ar e stationary , g r = T~ 1/2
and {fi T - /? } = O P(T~ll2). I n a n alternative terminology , w e often say
that / tends t o / 3 at rat e T 1/2. I f th e variable s ar e integrated , g T = T~ l
or larger, which is the cas e of super- (o r faste r tha n T 1^) convergence .
The lemmat a give n belo w ar e ofte n usefu l i n determinin g order s i n
magnitude an d i n probabilit y o f function s (sums , differences , products ,
and quotients ) o f random variables (see Fulle r 1976 , and Whit e 1984) .
LEMMA!. Le t {a T}, {b T} b e sequence s o f rea l numbers . Le t {/ r}
and {g T} b e sequence s o f positive real numbers.

16 Introductio

n and Overvie w

A secon d typ e o f convergenc e i s th e convergenc e o f a sequenc e o f


distribution function s t o a limi t function . Importan t example s o f suc h
convergence ar e centra l limi t theorems, wher e a sequence o f distributio n
functions converge s point - wise t o th e norma l distributio n function . Th e
appendix t o thi s chapte r use s th e Liapuno v centra l limi t theore m t o
derive th e asymptoti c distributio n o f a scale d functio n o f th e sampl e
mean.
6. I f {X T} i s a sequenc e o f rando m variable s with distributio n functions
{ F X r ( x ) } , the n {X T} i s sai d t o converg e i n distributio n t o th e
random variabl e X wit h distributio n functio n F x(x), denote d
XT-+ X, i f lim 7-^00 FX T(X) = F x(x), a t al l points of continuity x.
Finally, convergenc e i n probabilit y implies convergenc e i n distribution .
Thus,
7. Le t {X T} b e a sequenc e o f rando m variables . I f ther e exist s a
random variable X suc h that pli m XT = X, the n X T -i X .
1.5.3. Ergodicity and Mixing Processes
The followin g definition s ar e base d o n Davidso n an d MacKinno n
(1992), Spano s (1986) , and Whit e (1984) , whic h readers can consul t for
further details .
Ergodicity, unifor m mixing , an d stron g mixin g ar e thre e type s o f
asymptotic independence, implyin g that tw o realization s of a tim e serie s
become eve r close r t o independenc e a s th e distanc e betwee n the m
increases. Generically , a stochastic process {y t} i s defined a s asymptotically independent i f

as /z^>o ; tha t is , th e join t distributio n functio n o f th e tw o sub sequences o f {y,} approache s th e produc t o f th e distribution s o f eac h o f

Introduction and Overvie w 1

the sub-sequence s a s th e distanc e betwee n th e sub-sequence s increase s


without bound.
A proces s {y,} i s defined a s ergodic i f it is stationary and if , fo r an y t ,

A sufficien t bu t no t necessar y conditio n fo r thi s t o hol d i s tha t


cov(yt, y,+ r) 0 a s T > . Thu s ergodicit y i s a wea k for m o f averag e
asymptotic independence , an d usuall y w e wil l assum e tha t stronge r
conditions hold which imply ergodicity.
If tw o event s A an d B ar e independent , the n th e quantitie s
P(A\B)-P(A) an d P(AnB) - P(A)P(B) ar e bot h equa l t o zero ,
where P(A\B) i s the conditiona l probability of A give n B an d P(AC\B)
is the joint probabilit y o f A an d B. The concept s o f uniform mixing an d
strong mixing ar e base d o n thes e tw o quantitie s respectively , an d
require tha t expression s havin g these form s b e equa l t o zer o asymptotically. Unifor m mixin g an d stron g mixin g ar e als o calle d cf>-mixin g an d
or-mixing, afte r th e sequence s o f number s {<$>} an d {a n} use d i n
defining them . Unifor m mixin g implies stron g mixing , and fo r a stationary proces s eithe r o f these implie s ergodicity.
Begin by defining th e bounde d mapping s Gi(y f, . . ., y,-+/, ) an d G 2(yi,
. . ., yi+k) ont o th e rea l line . The n th e sequenc e {y t} i s define d a s
4>-mixing i f ther e exist s a sequenc e {<!>} > wit h 4> n > 0 V n, wher e
4>n > 0 as n , suc h tha t fo r n h

The sequenc e {y t} i s defined a s ^-mixin g if there exist s a sequenc e {&}


with a n > 0 V n an d where a n > 0 as n , suc h that

1.5.4. Exogeneity
While ou r primar y focu s i s o n integrate d serie s an d th e problem s the y
imply fo r standar d econometri c analyses , rathe r tha n o n th e problem s
created b y a failur e o f exogeneit y (i n th e appropriat e sense) , i t wil l b e
important t o conside r exogeneit y a t several points .
Econometric analysi s often proceed s o n th e basi s o f a single-equatio n
model o f a proces s o f interest . Implicitly , w e assum e tha t knowledg e of
the processe s generatin g th e explanator y variable s woul d carr y n o
information relevan t t o th e parameter s o f interest . A s Engle , Hendry ,

18 Introductio

n and Overvie w

and Richar d (1983 ) indicate , concept s o f exogeneit y relat e t o th e


circumstances i n whic h thi s assumptio n i s valid . Rathe r tha n refe r t o
particular variable s a s exogenou s i n general , Engl e e t at. refe r t o a
variable a s exogenous with respect t o a particular parameter i f knowledge
of th e proces s generatin g th e exogenou s variabl e contain s n o informa tion abou t that parameter .
The thre e differen t concept s introduce d b y Engl e e t al. ar e calle d
weak, strong , an d supe r exogeneit y an d correspon d t o thre e differen t
ways i n whic h a paramete r estimat e ma y be used : inference , forecasting
conditional o n forecast s of th e exogenou s variables , an d polic y analysis.
These differen t use s requir e tha t differen t condition s mus t b e me t fo r
exogeneity t o hold . Thes e condition s ca n b e examine d wit h th e follow ing definitions.
Let \t = (y t, Zt)' b e generated b y the process wit h conditional densit y
function D(x t\X.t-i> A) , where X,_i denote s the histor y of the variabl e x:
X,_j = (x,_j , x ( _2, . . ., XQ) . Le t th e parameter s A e A b e partitione d
into (A l5 A 2) t o suppor t th e factorization
Then [(y, z t', &i),(z t', A^) ] operate s a sequential cu t o n D(x r |X,_!,A) i f
and onl y if A ! an d A 2 ar e variation free; tha t is , i f an d onl y if
so tha t th e paramete r spac e A i s th e direct produc t o f A j an d A 2. I n
other words , fo r an y value s o f A j an d A 2, admissibl e value s o f th e
parameters A of th e join t distributio n ca n b e recovered . Th e essentia l
element o f weak exogeneit y is that th e margina l distribution contain s n o
information relevan t to A ! (for an exposition , se e Ericsson 1992) .
Weak exogeneity: z t i s weakl y exogenou s fo r a se t o f parameter s o f
interest ij> i f an d onl y i f ther e exist s a partitio n (A j , A2) o f A such that (i )
t/> i s a functio n o f A j alone , an d (ii ) [ ( y t z t ' , ^ i ) , (z t\ A^) ] operate s a
sequential cut .
Strong exogeneity. z t i s strongl y exogenou s fo r t/ > i f an d onl y i f z t i s
weakly exogenou s fo r \f> an d
so that y doe s not Granger-caus e z .
Super exogeneity: z t i s supe r exogenou s fo r t y i f an d onl y i f z t i s
weakly exogenous fo r \l> and A \ i s invariant t o intervention s affecting A^ .
Weak exogeneit y ensure s tha t ther e i s n o los s o f informatio n abou t
parameters o f interest fro m analysin g only the conditiona l distribution ; a
variable z t i s weakl y exogenou s fo r a se t o f parameter s t/ > i f inferenc e
concerning t/ ; can b e mad e conditiona l o n z t wit h no los s o f information
relative t o tha t whic h could be obtaine d usin g the joint density o f y t an d

Introduction and Overvie w 1

Zf Stron g exogeneit y i s necessar y fo r multi-ste p forecastin g whic h


proceeds b y forecasting future z s an d the n forecastin g ys conditiona l on
those zs . Supe r exogeneit y sustain s polic y analysi s o n A I whe n th e
marginal distribution of z t i s altered .
Engle e t al. contras t thes e thre e type s o f exogeneit y wit h th e tradi tional concept s o f strict exogeneity an d pre-determinedness . I f u t i s th e
error ter m i n a model , the n z t i s sai d t o b e strictl y exogenou s i f
E[ztUt+i] = 0 V i, wherea s z t i s said t o b e predetermine d i f E[z tut+i] = 0
V i 3 = 0. Ehgl e e t al . sho w tha t th e latte r concept s ar e neithe r necessar y
nor sufficien t fo r vali d inferenc e sinc e neithe r relate s t o parameter s o f
interest.
The following example (fro m Engl e e t al. 1983 ) seeks t o clarif y thes e
concepts. Conside r th e DGP:

with

The parameter s (/? , <5 l5 d 2, o u, o l2, o 22) ar e assume d t o b e variatio n


free beyon d th e requirement s tha t ensur e th e erro r covarianc e matri x is
positive definite . /3 i s th e paramete r o f interest . Th e reduced-for m
equation for y t i s
with

and E[y, z
tHE cORRESPONDING CONDITIONAL VARIANCE VAR

Consider no w the regressio n mode l


If o- 12 = 0, b y substitutin g this valu e int o th e expression s fo r b, c t, and
a2, w e see tha t E[y t z t, Y t-i, z r-i] =fa t an d tha t th e conditiona l
variance i s a n. Also, E\Zt\L t-\, Y t-i] dizt-i + fyyt- i wit h varianc e
a22- Henc e th e conditiona l densit y of (y t, z t) factorize s as i n ou r earlie r
definition o f a sequentia l cut , s o tha t (ft, o u) an d (<5 1; 6 2, o 22)

20 Introductio

n an d Overvie w

correspond t o A J an d A 2 respectively . Sinc e th e paramete r / ? is (trivially )


a functio n o f A ! only, z t i s weakly exogenous fo r /3. I n th e contex t o f th e
regression model , thi s show s u p i n th e fac t tha t / ? ca n b e derive d fro m
the parameter s o f thi s regression , withou t knowledg e o f th e parameter s
of th e proces s generatin g z t. Here , i n fact , cr 12 = 0 implie s b = /3. I f
a12 = 0, the n f i canno t b e obtaine d fro m th e parameter s b\, c\, c 2, and
a2 o f the regressio n mode l (o r o f the conditiona l distribution) .
As lon g a s 8 2 ^ 0, lagge d ys affec t z an d s o z i s neither strongl y no r
strictly exogenou s fo r f t irrespectiv e of it s wea k exogeneit y status . Whe n
ou = 0, an d /3 i s an invarian t parameter , change s i n th e parameter s (5 l5
82, cr 22) determinin g th e margina l proces s affec t th e parameter s o f th e
conditional process , b an d c, . Thus , th e failur e o f z t t o b e weakl y
exogenous ca n lea d t o a failur e o f constanc y i n th e conditiona l mode l
when th e margina l proces s changes . Conversely , whe n cr 12 = 0 an d
(/3, an ) ar e invarian t t o change s i n (<5 1; <5 2, <7 22), z t i s supe r exogenou s
for p . Thi s hold s eve n whe n 6 2 = 0 an d z t i s no t strongl y exogenou s
for /3 .
1.5.5. Functions of Deterministic Trends
Various sum s o f power s o f trend s appea r regularl y i n th e derivation s i n
this boo k an d s o i t i s convenien t t o recor d th e mos t commo n o f thes e
here:

In eac h case , w e als o hav e that


These formula e ar e wel l known an d easy , i f tedious, t o establish , an d
can be checke d b y induction. Le t

Then fo r T = 1 , 2 , 3 , . . . , y ' + l , solv e th e resultin g simultaneou s


system. Fo r example , whe n ; =1, ^/= i f = aiT + a2T2. A t T = 1,
1 = a\ + a2, while a t T = 2, 3 = 2a + 4a2. Solvin g fo r a\ an d a 2 give s
GI = 1/ 2 and a 2 = 1/2 , so that

Introduction an d Overvie w 2

Also, T~ 2[(1/2)7 + (1/2)T 2 ]^ 1/2 . Th e polynomia l tha t i s fitte d t o


2)t=i^ i s assume d t o b e o f order / + 1 . To sho w that thi s is the correc t
order t o use , conside r wha t woul d happe n i f 2f= i f ha d bee n se t equa l
to a third-orde r polynomia l flj T + a 2T2 + a$T3. Solvin g a s abov e fo r
a1; a 2, and a 3 would yield a\ = a2 = 1/2 (as before) wit h 3 = 0.
We ca n summariz e som e o f th e relation s abov e a s follows . Le t
A, = (1/2)7(7- + 1), A^ = (1/3)(2T + 1), and A3 = (l/5)[3T(T + 1) -1] ,
Then

These sum s ar e o f order s ively, becaus e

1.5.6. Wiener Processes


Wiener, o r Brownia n motion , processe s ar e use d i n explorin g th e
properties o f statistic s involvin g integrated data . W e begi n ou r discus sion b y constructin g a n integrate d process , an d the n ma p a transformation o f it into a Wiener process.
Let {x t} b e a normall y distributed , zero-mean , uni t variance , station ary, an d ergodi c martingal e differenc e sequence , s o tha t ^~IN(0,1) ,
and le t

and

Thus, 5' r ~N(0, T) an d i s an 1(1 ) proces s wit h independen t increments .


Specifically, 5 ^ i s a rando m walk : S T = ST-I + XT- Mor e generally ,

Introduction and Overvie w

22

when the distributio n of {x t} is well behaved, the limitin g distribution of


a suitably standardized S T wil l also be well behaved.
The analysi s o f regression s wit h integrate d serie s use s th e concep t o f
limit theorem s i n functio n space s know n a s functional limit theorems.
These ar e als o calle d invariance principles becaus e th e sam e for m o f
limiting distributio n result s fo r a wid e rang e o f processe s { x t } , havin g
different degree s o f heterogeneit y an d memor y (se e Phillip s 1987a) .
Figure 1. 1 illustrate s a sampl e realizatio n o f a rando m wal k S T fo r
T 10. W e wil l describ e th e variou s stage s o f analysi s i n term s o f thi s
example. W e firs t conside r convergenc e o f th e transformatio n Sy/V T
from (2 ) to a continuous Wiener process denote d b y W(r) for r e [0,1].
A Wiene r proces s i s lik e a continuou s rando m wal k define d o n th e
interval [0,1 ] (regar d thi s a s th e horizonta l axis) , bu t ha s unbounde d
variation despit e bein g continuous , an d s o ca n b e imagine d a s moving
extremely erraticall y i n th e vertica l direction . I n an y sub-interva l [a , b]
of [0,1] , W(r ) fo r r e [a , b] remain s equall y erratic . I n general , a
continuous proces s V(t), t 3= 0, i s a Wiene r proces s i f (i ) fo r al l t ^ 0,
E[V(t)] = 0; (ii ) fo r al l fixe d t 5 * 0, V(t) i s normall y distribute d an d
non-degenerate; (iii ) V(t) ha s independen t increments ; an d (iv )
Pr {V(0) = 0} = 1 . A Wiene r proces s ma y be though t of a s the limi t of a
discrete-time rando m wal k a s th e interva l betwee n realization s goe s t o
zero. It s derivativ e is a continuous-tim e normally distribute d white-nois e
process, whic h i s a n abstraction , no t a physicall y realizabl e process .
Nonetheless, th e limitin g distribution s describe d b y th e Wiene r proces s
may b e usefu l approximation s in many circumstances.
There ar e fe w convenien t analytica l expression s fo r function s o f
the distributio n o f W(r) , r e [0,1], althoug h a s w e hav e note d

FlGl

,1. Realizatio n o f a random wal k over 1 0 points

Introduction and Overvie w

23

W(r) ~ N(0 , r) fo r fixe d r , an d W(r ) ha s independen t increments .


Various functions o f W(r) hav e been tabulated , usuall y by simulation.
The followin g formulatio n map s th e increasin g interva l fro m 0 t o T
into th e fixe d interva l [0 , 1] so that result s wil l be invarian t to th e actua l
value of T . T o d o so , w e construc t fro m S T a new random ste p functio n
Rr(r) a s follows . Le t [rT] denot e th e intege r par t o f rT, wher e
r e [0,1]. Fo r example , i f T = 10 0 an d r = 0.101, the n
[rT] = [10.1] = 10 . Divide th e interva l [0 , 1] int o T + 1 parts a t 0 , l/T,
2/T, . . ., 1 , and let
For example , /? 100(0.101) = 510/10 wherea s /? 100(0.11) = Su/10. Thus ,
RT(?) i s constant fo r value s of r within jumps at successiv e integers, an d
is a right-continuou s rando m variabl e define d ove r [0,1] . Figur e 1. 2
shows thi s second-stag e mapping , leadin g to the ste p functio n grap h of
Rr(r) i n Fig . 1.3 . A s T , R T(r) become s increasingl y dens e o n
[0,1]. Figure s 1. 4 an d 1. 5 sho w thi s happenin g fo r T = 100 an d
T = 1000 . Th e horizonta l axi s lengt h i s fixed , s o th e vertica l axi s
variability increases a s T grows .
Let = > denot e wea k convergenc e i n th e sens e tha t th e probabilit y
measures converge : thi s i s th e analogu e fo r functio n spaces , o f conver gence i n distributio n fo r rando m variable s (se e Hal l an d Heyd e 1980) .
Then, unde r weak assumptions abou t {x t},
(4)

Furthermore, i f /( ) is a continuous functional o n [0 , 1], the n


(5)

FIG 1.2. Mappin g the 10-poin t grap h on t o a step functio n

24

Introduction an d Overvie w

FIG 1.3. Ste p representatio n o f a random walk ove r 1 0 points

FIG 1.4. Ste p representation of a random walk over 10 0 points


For furthe r details , se e Billingsle y (1968) , Dicke y an d Fulle r (1979 ,
1981), Hall an d Heyd e (1980) , an d Phillip s (1986, 1987a) .
In distribution s involvin g 1(1) variables , functional s o f Wiene r pro cesses aris e quit e generally , whereas conventional methods o f obtaining
limiting distribution s tend t o b e specifi c to th e assumption s made abou t
the dat a o r erro r process. 3 Also , man y of the statistic s regularly used in
3
B y thi s w e mea n tha t onl y wea k restriction s nee d t o b e satisfie d b y the {x,} sequenc e
for convergenc e result s suc h a s (4 ) an d (5 ) t o hold . Phillip s (1987a ) provide s a goo d
account o f this issue, an d a discussion is also containe d i n Ch . 3 .

Introduction and Overvie w

25

FIG 1.5. Ste p representation o f a random walk over 100 0 points

empirical researc h involvin g 1(1) tim e serie s hav e differen t distribution s


from thos e tha t aris e wit h 1(0 ) data . I n particular , man y statistics i n 1(1)
processes d o no t converge t o constants , a s i n th e 1(0 ) case , bu t instea d
converge t o rando m variables . Thus , differen t critica l value s ma y b e
required fo r tests , dependin g o n th e degre e o f integratio n o f th e tim e
series.
Consider th e rando m walk , y t = v ( _! + e t, wit h e, ~ IN(0,1 ) an d
>>o - 0 . Then

Alternatively, fro m (7) ,

26

Introduction an d Overview

Similarly, corr 2 (yr , yt-k) na s a numerator of (t k)2 an d a denominato r


of t( t - k ) for k > 0, and so equals 1 - k/t. Whe n k < 0, let 5 = t - k
so that t = s + k, an d let r = k > 0 , in which case

Since y 0 = 0, we have that

The las t approximatio n use s


To illustrat e th e us e o f Wiene r processe s i n derivin g distribution s
involving 1(1 ) variables , w e wil l deriv e th e limitin g distributio n o f
the sampl e mean , y = T~ l Xf= iJ V Becaus e {y,} i s a rando m walk , its
mean converge s t o a functiona l o f a Wiene r process . Le t
RT(r) = y^n/V r = y^/Vr fo r ( i - l)/T = r < i/T ( i = 1, . . ., T) ,
and Rr(l) = yr/VT. Rj(r} i s a ste p functio n wit h step s a t i/T, fo r
z' = 1 , . . ., T , an d i s constant betwee n steps . Thus,

Introduction and Overvie w 2

The las t expressio n i s yi/VT, wher e y\ i s the lagge d mean . Thi s resul t
uses th e fac t that , fo r any constant c,

From (3 ) and (4) ,

and hence

The unlagge d sample mean ha s the sam e limiting distribution.


An interestin g aspec t o f (10 ) i s that th e Lindeberg-Felle r centra l limit
theorem4 (whic h applies t o independen t bu t heterogeneously distribute d
observations; se e Whit e 1984 ) ca n b e applie d t o obtai n th e distributio n
of y an d henc e sho w that

Thus, som e functiona l o f Wiene r processe s ar e familia r rando m vari ables i n disguis e and w e will develo p thi s aspect a s we proceed. A proo f
of (11 ) i s given in the Appendix .
7.5.7. Monte Carlo Simulation
The purpos e o f Mont e Carl o simulatio n i s t o evaluat e b y experimen t
quantities tha t woul d be ver y difficult o r impossibl e t o evaluat e analytically. Suc h experiment s typicall y begi n b y creatin g a se t o f dat a wit h
known statistica l properties . Thi s i s achieve d b y specifyin g ever y aspec t
of a data-generatin g process , o r clas s o f suc h processes , an d replacin g
the rando m error s o f th e DG P b y pseudo-rando m numbers . Pseudo random number s ar e number s generate d deterministicall y t o mimi c a
random proces s wit h a particula r distribution . A n investigato r typically
generates a large numbe r o f suc h artificial data set s (calle d replications )
to investigat e statistica l technique s whic h analys e thes e dat a a s i f th e
process generating them were no t known. Th e performanc e o f th e
statistical techniqu e i n revealin g som e characteristi c o f th e dat a se t ma y
4

Strictl y speaking , th e versio n w e us e her e i s a specia l cas e o f thi s theorem , sometime s


called the Liapuno v centra l limi t theorem.

28 Introductio

n and Overvie w

then b e evaluate d b y generatin g it s distributio n fro m independen t


replications o f the experimen t an d comparin g th e result s wit h the known
characteristics o f the proces s generatin g the data .
For example , a n econometricia n ma y wis h t o examin e th e perform ance o f th e standar d Mes t i n dat a generate d b y a rando m walk .
Artificial data-set s followin g a rando m wal k ma y easil y b e constructe d
using pseudo-rando m disturbances , an d th e empirica l distributio n o f th e
f-statistic i n sample s o f siz e T ca n b e generate d b y replicating N set s of
T observations . Th e mean , variance , o r variou s critica l value s o f th e
f-statistic ca n b e calculate d fro m th e empirica l distributio n and , fo r
sufficiently larg e N , wil l b e clos e t o thei r populatio n (i.e . analytic )
counterparts. Th e investigato r can als o var y the parameter s o f the DG P
in orde r t o observ e thei r effect s o n th e outcome . I n eac h experiment ,
the investigato r know s th e tru e parameter s o f th e process , an d s o ca n
evaluate the estimator s an d tests used .
Unlike analytica l studies , Mont e Carl o simulation s canno t produc e
exact results ; an y resul t fro m a Mont e Carl o experimen t come s fro m a
(pseudo-)random sample , an d therefor e ha s som e variabilit y attached t o
it. Moreover , Mont e Carl o experiment s ar e inevitabl y specifi c t o th e
particular dat a generatio n processe s examine d (althoug h i t ma y b e
possible t o prov e analyticall y tha t result s wil l b e invarian t t o certai n
parameters i n the process) . Nonetheless , Mont e Carl o result s ar e usefu l
when analytica l results ar e difficul t t o obtain . I n particular, Mont e Carl o
experiments ar e ofte n use d t o investigat e th e finite-sampl e performanc e
of statistica l techniques , th e analytica l propertie s o f whic h ar e know n
only asymptotically .
There ar e a numbe r o f subtletie s t o th e desig n an d interpretatio n o f
Monte Carl o experiment s whic h deman d carefu l attention , includin g th e
methods use d t o generat e pseudo-rando m numbers , variance-reductio n
methods suc h a s commo n rando m numbers , antitheti c rando m number s
and contro l variate s intende d t o improv e precision , th e calculatio n o f
standard error s of the experimenta l estimate s o f unknown quantities, th e
use o f respons e surface s t o summariz e an d interpolat e results , an d
recursive updatin g o f quantitie s o f interest . Exposition s o f Mont e Carl o
methods ma y b e foun d in , fo r example , Hammersle y an d Handscom b
(1964), Hendr y (1984) , Riple y (1987) , Hendry , Neale , an d Ericsso n
(1990), an d Davidso n an d MacKinno n (1992) .

1.6. Dat a Representation an d Transformation s


Since dat a transformation s pla y a n importan t rol e i n econometric s
generally, w e briefl y consider thei r impac t o n 1(1 ) data . Conside r th e
hypothesis tha t a se t o f integrate d dat a ca n b e describe d b y a linea r

Introduction and Overvie w 2

model wit h a constan t erro r variance . I n particular , a normall y dis tributed rando m wal k wit h drif t i s ofte n postulate d s o tha t
Axt ~ IN(jW , cr 2). Man y economi c tim e serie s (suc h a s consumption ,
national income an d expenditure , o r th e pric e level ) d o gro w over time ,
but th e amoun t b y whic h the y gro w i n eac h perio d als o tend s t o rise .
However, A.x t = x t xt-i wil l b e stationar y onl y if the absolut e amoun t
of growt h is stationary , i n whic h cas e fo r n > 0, a/x t wil l ten d t o zero .
Percentage growth , b y contrast , ofte n display s n o obviou s tendenc y t o
rise o r fall , makin g it a more likel y candidate fo r stationarity . Since th e
levels o f man y economi c variable s ar e initiall y positive , an d recallin g
that

we se e tha t stationarit y o f th e rat e o f growt h implie s stationarit y o f


Alog(jc ( ). Change s i n th e logarithm s o f economi c dat a serie s suc h a s
those jus t mentioned , therefore , see m mor e likel y t o b e stationar y than
changes i n th e levels . W e wil l retur n t o thi s poin t i n Chapte r 6 below,
where w e conside r ho w co-integratio n i s affecte d b y th e logarithmi c
transformation. W e illustrat e som e o f thes e point s wit h actua l dat a
series.
The tim e serie s tha t we analys e is rea l net nationa l produc t (Y, in
1929 fmillion ) fo r th e Unite d Kingdo m ove r 1872-1975 . Th e dat a ar e
taken fro m Friedma n an d Schwart z (1982 ) an d ar e als o investigate d i n
Hendry an d Ericsso n (19910) . Figure s 1.6-1. 9 plo t thi s dat a serie s an d

FIG 1.6. U K rea l net nationa l produc t ( Y i n 192 9 fmillion), 1872-197 5

30 Introductio

n and Overvie w

FIG 1.7. Logarith m (lo g Y ) o f UK rea l net nationa l product

various transformation s o f it . Figur e 1. 6 plot s th e untransforme d serie s


Yt; th e serie s i s tending t o gro w by increasing amounts , and s o would be
better approximate d b y a conve x functio n than by a straight line . Thi s is
visible fro m th e upwar d curvatur e an d th e muc h close r fi t o f th e
quadratic trend lin e compare d wit h the linea r trend . I n Fig . 1.7 , w e plo t
the logarith m o f th e series : th e curvatur e i s no longe r apparent , an d th e
quadratic an d linea r trend s ar e ver y simila r an d fi t abou t equall y well .
Thus, th e logarith m o f th e serie s i s relativel y wel l approximate d b y a
straight lin e and , whil e growing , ther e i s n o eviden t tendenc y fo r th e
growth rate to chang e over time .
Figure 1. 8 plot s th e changes , AY ( . Ther e i s a tendenc y fo r bot h th e
mean an d th e varianc e t o gro w ove r time , an d th e linea r tren d show n
highlights th e former . (I t require s mor e carefu l inspectio n t o se e th e
latter owin g to th e ver y large shock i n 1919-20. ) Differencin g th e initial
series ha s therefor e no t produce d a stationar y series . I n Fig . 1.9 ,
however, wher e A log Yt i s plotted, ther e i s no longe r an y major chang e
in th e mea n o r variabilit y of th e serie s ove r th e sample , wit h perhaps a
slight tendenc y fo r th e varianc e t o b e smalle r i n th e perio d sinc e 1945 .
Certainly, an y tren d i n th e mea n o f AlogY r i s negligible . Thi s series ,
then, ma y wel l b e stationary , althoug h neithe r th e logarithmi c transfor mation no r th e first-differenc e transformatio n produce d a stationar y
series o n it s own . Sinc e th e difference s i n th e logarithm s appea r
stationary, w e migh t expec t t o fin d tha t th e logarithm s o f th e origina l

Introduction and Overvie w

31

FIG 1.8. Change s (AY ) in UK real net nationa l produc t

FIG 1.9. Change s i n th e logarith m (AlogY ) o f U K rea l ne t nationa l


product
series ar e 1(1) , whil e th e untransforme d initia l serie s apparentl y i s no t
and differencing i t is not sufficien t t o produce stationarity.
Alternatively, an y linea r mode l o f AY , will hav e a n erro r term , whic h
we denot e b y ut, with a standar d deviatio n o u tha t mus t b e in the sam e

32 Introductio

n and Overvie w

units a s Y t. Sinc e thes e ar e 192 9 fmillion, th e linea r mode l assume s a


constant absolut e erro r standar d deviation . However , ne t nationa l
product ha s grow n abou t six-fol d ove r th e sampl e s o tha t o u/Yt (th e
relative error ) wil l b e muc h smalle r i n 197 5 than i n 1875 . It woul d b e
difficult t o imagin e reasons fo r such a decline.
The log-linea r model , b y wa y of contrast , assume s a constan t relativ e
error standar d deviatio n (e.g . 2\ percen t o f Y , a t al l point s i n time) ,
which seem s muc h mor e plausible . Failin g t o transfor m th e dat a
adequately violate s th e statistica l model of an 1(1) o r 1(0 ) series , an d ca n
induce trendin g mean s an d variances , makin g testin g les s reliable .
Certainly, a relativel y lon g tim e serie s i s neede d t o mak e suc h factor s
obvious, bu t the y operat e eve n withi n post-wa r quarterl y dat a (se e e.g.
Ermini an d Hendr y 1991) . Moreover , change s i n mean s an d variance s
over tim e ar e ver y apparen t i n nomina l tim e series , an d ca n confus e
attempts t o determin e co-integration . Grange r an d Mailma n (1991 )
analyse genera l transformation s i n 1(1 ) tim e series , an d Chapte r 4 below
explores forma l statistica l test s o f hypothese s abou t th e degre e o f
integration o f individual time series .

1.7. Examples : Typica l ARM A Processes


Figures 1.10-1.2 0 present graph s o f typical examples o f serie s generate d
by specia l case s o f ARMA(1,1) processes . Fo r eas e o f comparison, eac h
series i s computer-generate d usin g th e sam e se t o f 20 0 observation s o n
normally distribute d white-nois e error s s , ~ IN(0,1 ) wit h w 0 = 0. Th e
data generatio n processe s are:
Fig. 1.1 0 u

= t [whit

e noise ]

Fig. 1.1 1 u,

= e, + 0.8e,_i [MA(1)

, stationary]

Fig. 1.1 2 u,

= e, - 0.8,_ ! [MA(1)

, stationary ]

Fig. 1.1 3 u,

= 0. 5 ,_! + e t [AR(1)

, stationary ]

Fig. 1.1 4 u,

= 0.5 ut-v + e t + Q.8e t^i [ARMA(1,1)

, stationary ]

Fig. 1.1 5 u,

- 0. 5 Mr _! + e, - 0.8e t _i [ARMA(1,1)

, stationary]

Fig. 1.1 6 u,

= 0.9 ,_! + e, [AR(1)

Fig. 1.1 7 u

Fig. 1.1 8 u,
Fig. 1.1 9 u
Fig. 1.2 0 u,

, stationary ]

= 0.9 ut-! + e, + 0.8e,_i [ARMA(1,1)

, stationary ]

= 0.99 ,_! + E , [AR(1)

, stationary ]

= 1.00 M,_! + s t [AR(1)

, non-stationary ]

= 1.0 1 ut-i + e t [AR(1)

, non-stationary ]

Introduction and Overview

Observation

FIG 1.10. A R = 0.0; MA = 0.0

Observation

FIG 1.11. A R =0.0; MA -0.8

33

34

Introduction and Overvie w

Observation

FIG 1.12. A R = 0.0; MA = -0. 8

Observation

FIG 1.13. A R = 0.5; MA = 0.0

Introduction an d Overvie w

Observation

FIG 1.14. A R = 0.5; MA = 0.8

Observation

FIG 1.15. A R = 0.5; MA = -0. 8

35

36

Introduction an d Overvie w

Observation

FIG 1.16. A R = 0.9; M A = 0.0

Observation

FIG 1.17. A R = 0.9; MA = 0.8

Introduction an d Overvie w

Observation

FIG 1.18. A R = 0.99; MA = 0.0

Observation

FIG 1.19. A R = 1.00 ; M A = 0.0 0

37

38 Introductio

n and Overvie w

Observation

FIG 1.20. A R = 1.01 ; M A = 0.00

A proces s suc h a s tha t i n Fig . 1.19 , a n AR(1 ) wit h a uni t root , i s a


random walk and ma y also be expresse d a s ARIMA(0,1,0).
The scale s o n th e graph s i n Figs . 1.10-1.2 0 ar e no t identical ; fo r th e
non-stationary processes , i n particular , th e graph s sho w ver y wid e
movements relativ e t o thos e o f th e stationar y series . Non-stationar y
processes wit h root s strictl y greate r tha n unit y gro w ver y quickl y even
where those root s ar e quit e clos e t o 1 , as can b e see n fro m Fig . 1.20 , a n
AR(1) wit h a roo t i n th e autoregressiv e par t o f 1.01 . Th e stationar y
processes i n Figs. 1.10-1.1 8 have unconditional means of zero an d finit e
unconditional variances . The y ar e 'tied ' t o thi s zer o mea n i n th e sens e
that deviation s fro m i t canno t accumulat e indefinitely . By contrast , th e
process wit h a singl e roo t o f exactl y unit y (Fig . 1.19 ) ha s a n uncondi tional Varianc e which increases ove r tim e and wil l tend t o wande r widely
(see equatio n (7) ) wit h a n unbounde d expecte d crossin g tim e o f th e
origin. Th e proces s wit h a root greate r tha n unity (Fig, 1.20 ) i s explosive
and will tend t o either + <*> o r - < .
Figures 1.11 , 1.14 , an d 1.1 7 ad d a positiv e M A componen t t o th e
series i n Figs . 1.10 , 1.13 , an d 1.1 6 respectively , t o highligh t th e
'smoothing' effec t o f a positive M A term . B y contrast, th e serie s i n Figs.
1.12 an d ^.1 5 ad d a negative MA ter m o f the sam e absolut e magnitude;
these negativ e M A term s hav e th e opposit e effect , makin g th e serie s
appear les s smoot h tha n th e pur e A R serie s i n Figs . 1.1 0 an d 1.13 .
Figure 1.1 5 resemble s Fig . 1.10 , however , reflectin g th e fac t tha t th e

Introduction and Overvie w 3

AR an d M A la g polynomial s ar e clos e t o cancelling . (I f th e A R


coefficient wer e 0.8 , then the AR and MA polynomial s woul d eac h be
(1 0.8L), an d thes e redundan t commo n factor s coul d b e cancelled ,
leaving whit e nois e a s in Fig. 1.10). I n eac h o f th e set s Figs . 1.10-1.12 ,
1.13-1.15, 1.1 6 and 1.17 , respectively, th e dat a serie s plotte d hav e th e
same AR root , and diffe r onl y in their MA parts .
Knowing th e generatin g mechanism , th e difference s amon g th e
ARM A processe s give n i n th e figure s ar e fairl y clear . I n practice ,
however, i t i s not eas y to solv e the convers e proble m o f determining the
generating mechanism s from observation s o n th e variables ; i t ma y eve n
be difficul t t o determin e fro m a moderately size d sampl e whethe r o r no t
a process i s stationary. Althoug h th e distinction s among the example s of
stationary an d non-stationar y processe s abov e ar e substantial , thos e
among 'borderline ' stationar y an d non-stationar y processes ma y not be .
For example, u t = 0.99ut^ + et i s a (borderline) stationar y process, but
will closel y resembl e the rando m wal k u, = ut-\ + t for sample s of the
size reproduce d i n Figs. 1.1 8 and 1.19.
It i s interestin g t o compar e th e latte r tw o processe s b y rewritin g th e
AR(1) i n M A form . Fo r th e proces s u t = <xu t^i + et, i t follow s tha t
ut-i = aut-2 + f- i also . Substitutin g thi s int o th e firs t equation , w e
have u t = oc(aUt-2 + e t-i) + f I f w e continu e t o eliminat e eac h subsequent la g of u , w e fin d

For th e stationar y process , \a\ < 1, s o th e firs t ter m an d th e contribu tions o f mor e distan t error s disappea r a s n > oo , an d u, ma y b e
approximated b y a n MA(rc ) proces s wit h increasing accurac y a s oo.
If a = 1, however, the firs t ter m doe s no t disappear , an d the approxima tion fails ; thi s follow s fro m th e failur e o f th e stationarjt y conditio n
stated above . Whe n a = 1,

so that u t is the su m of a starting value, u t-n, and al l the error s accruing


between t n + 1 an d t . Thi s representatio n o f th e proces s {u t} a s a
sum o f pas t contribution s i s the sourc e o f th e relationshi p o f integration
in thi s time-serie s sens e an d integratio n i n th e integra l calculus , wher e
the integra l o f a functio n ma y b e though t o f a s th e limi t o f a su m o f
discrete area s unde r a curve . Figur e 1.1 9 i s th e cumulativ e sum , or
discrete integral , o f the error s recorde d i n Fig. 1.10 .
Many economi c tim e serie s hav e bee n modelle d usin g ARM A o r
ARIMA processes , an d model s o f these type s will b e use d frequentl y in

40

Introduction and Overview

the followin g chapter s i n describin g th e method s an d tests . Priestley


(1989) provide s example s o f othe r type s o f model s tha t ma y b e use d t o
characterize non-stationar y processes.

1.8. Empirica l Tim e Series : Money, Prices , Output, an d


Interest Rates
Figure 1.2 1 graph s th e logarithm s o f quarterly , seasonall y adjusted ,
nominal M l an d price s (th e implic t deflato r o f tota l fina l expenditure ,
TFE) i n th e U K ove r th e perio d 1963-89 . Th e serie s (denote d logM ,
and lo g Pt) hav e stron g trend s an d ar e relativel y smooth , althoug h thei r
growth rate s alte r perceptibl y aroun d 197 4 an d agai n aroun d 1980 . Suc h
data ar e no t unlik e realization s fro m highl y autoregressiv e (1(1) ) pro cesses. Figure 1.2 2 show s thei r first difference s Alog(M f ) an d Alog(P^) .
These ar e mor e errati c bu t ar e stil l highl y autocorrelated . Th e growt h

FIG 1.21. Tim e serie s o f mone y (Ml ) an d price s (implici t deflato r o f


total fina l expenditure ) in the UK , seasonall y adjusted , i n logs

FIG 1.22. Tim e serie s o f A log M, an d A log P t

Introduction and Overvie w 4

rate o f M appear s t o hav e increase d ove r time , wherea s tha t o f P ha s


fallen, especiall y afte r 1980 . These dat a d o no t see m t o b e stationar y
although th e graph s b y themselve s d o no t revea l th e sourc e o f th e
non-stationarity.
Next, Fig . 1.23 shows the behaviou r o f log s o f th e rea l mone y supply
(log(M/P,)) an d rea l TF E (log(Y,)) . I t migh t hav e bee n anticipate d
from Fig . 1.21 that log(M r ) an d log(P () move d sufficientl y closel y ove r
the whol e sampl e fo r thi s differentia l t o b e stationary , bu t Fig . 1.23
shows tha t th e rea l mone y suppl y i s non-stationary . Th e forma l ap paratus o f testin g fo r co-integratio n develope d i n Chapte r 7 i s designe d
to detec t suc h relationship s statistically . B y wa y o f contrast , log(Y ( )
looks mor e lik e a serie s wit h a constan t linea r trend , subjec t t o
perturbations i n 1973/ 4 and 1979/80 .
In economi c terms , surprisin g features o f Figs . 1.22-1.2 3 ar e th e lo w
pairwise correlation s betwee n Alog(M ( ) an d Alog(P r ), an d betwee n
log(Mt/Pt) an d log(Y ( ), respectively . However , suc h result s hav e n o
implications fo r th e existenc e o r otherwis e o f wel l define d relationship s
between thes e variables . Monetar y theor y suggest s tha t th e opportunit y
cost o f holdin g mone y i s a n importan t determinan t o f th e deman d fo r
money, s o Fig . 1.2 4 show s th e tim e serie s o f th e interes t rat e (R t, a
three-month loca l authorit y bil l rat e adjuste d fo r financia l innovation )
and th e rat e o f inflation , plotte d i n unit s tha t maximiz e thei r apparen t
correlation. Th e serie s {R t} als o seem s t o b e non-stationary , bu t wit h a
different tim e profile fro m th e othe r series . I n particular , i t i s much less
smooth tha n th e othe r leve l series , bu t les s errati c tha n thei r changes .
Finally, Fig . 1.25 shows Alog(Y r ) an d A/?, . Thes e ar e possibl y weakl y
stationary, althoug h bot h appea r t o hav e highe r variance s i n the middl e
of th e sampl e tha n a t th e ends . However , neithe r i s highl y autocor related, no r d o the y drif t noticeabl y i n an y direction . W e wil l analys e
the fou r serie s log(M t), ^og(P t), log(Y,) , an d R t a s a syste m i n late r
chapters. (Se e Hendry an d Ericsson (1991b) , who provided th e data. )

FIG 1.23. Tim e serie s o f real mone y (log M,/Pt) an d rea l TF E (lo g Yr)

42

Introduction and Overvie w

FIG 1.24. Time serie s o f a three-mont h interes t rat e (R t) an d th e rat e of


inflation (AlogP r ) i n th e U K

FIG 1.25. Tim e serie s of A log Yt an d A7? r

1.9. Outlin e of Later Chapter s


Chapter 2 discusses dynamic models fo r stationar y processes. Thi s allows
us t o introduce , i n a familia r context , a numbe r o f consideration s which
will prov e importan t later . Variou s equivalen t transformation s o f linea r
autoregressive-distributed la g model s ar e considered , especiall y error correction, Bewley , and Bardse n forms . The rol e of expectation s in
stationary processe s i s als o investigate d an d i s related t o th e absenc e of
weak exogeneit y fo r th e parameter s o f th e economi c agents ' decisio n
functions.
Chapter 3 the n consider s th e analysi s o f 1(1 ) variables , an d explore s
the concept s o f uni t roots , non-stationarity , order s o f integration , an d
near integration . Th e behaviou r o f least-square s estimator s applie d t o

Introduction and Overvie w 4

spurious relationship s i s investigated an d a number o f results establishe d


for Wiene r processe s (se e Phillip s 1987a) . Univariat e tests for uni t roots
are discusse d i n Chapte r 4 , an d th e forma l definition s in Chapte r 3 ar e
related t o th e propertie s o f integrate d series . Mont e Carl o result s
illustrate th e variou s distributions . Extension s t o multipl e unit roots an d
seasonal dat a ar e considered, an d severa l example s ar e describe d i n
detail.
Chapter 5 move s o n t o th e topi c o f co-integration . Followin g a
bivariate exampl e an d forma l definitions , th e Grange r Representatio n
Theorem i s described , linkin g co-integratio n t o erro r correction , an d
clarifying th e statu s o f othe r representation s suc h a s commo n trends .
The origina l Engle-Grange r two-ste p estimato r o f th e co-integratin g
relationship i s analysed . Chapte r 6 firs t consider s inconsisten t regres sions sometime s use d i n orthogonalit y tests ; th e analysi s the n turn s t o
distributions o f estimator s i n dynami c regressions wit h 1(1 ) data , base d
on th e result s i n Sims , Stock , an d Watso n (1990) , an d i s illustrated b y a
number o f examples.
Chapter 7 discusse s testin g fo r co-integration . A rang e o f test s i s
considered, base d o n testin g fo r a uni t roo t i n th e residual s fro m th e
static regression . Whil e widel y used , suc h test s hav e drawbacks , an d
Monte Carl o experiment s ar e use d t o illustrat e som e o f these . Test s
based o n single-equation dynami c models ar e als o considered .
Finally, i n Chapte r 8 , co-integratio n i n system s o f equation s i s
analysed. Linea r co-integrate d system s ar e expresse d i n error-correctio n
form an d maximu m likelihood estimatio n an d inferenc e fo r co-integrat ing vector s i s discussed, focusin g o n th e approac h propose d b y Johanse n
(1988). A rang e o f extension s i s considered , a s ar e variou s othe r
estimators. Th e analysi s i s agai n illustrate d b y a numbe r o f example s
and simulatio n experiments.

Appendix
Equation (11)
To prove (11) , w e need t o construct a random variable X t, wher e

44 Introductio

n an d Overvie w

If

then, b y the Liapuno v centra l limi t theorem ,


The proo f o f (11 ) i s i n thre e steps . First , conside r (fro m (6) ) th e
sample mean :

and

Thus X, ~ ID(0, cr?), a s required. Further, notin g tha t


and usin g normality of e

and al l the condition s of the Liapuno v theorem are satisfied . Therefore ,


Finally, usin g the result s above, an d noting that y = TX

Introduction an d Overvie w 4
Since y/VT^> \\W(r)&r fro m result s above , w e hav e tha t y/\/T
converges t o both \\W(r}Ar an d to N(0, 1/3) . Therefor e

The derivation s o f later result s follo w simila r lines.

Linear Transformations, Error


Correction, and th e Lon g Run i n
Dynamic Regression
We begi n b y considerin g th e propertie s o f linea r autoregressive distributed la g (ADL ) model s fo r stationar y dat a processes . Trans formations o f th e AD L mode l t o erro r correctio n an d t o variou s
other form s ar e described . W e discus s th e estimatio n o f long-ru n
multipliers fro m dynami c models , an d th e equivalenc e o f th e
estimates o f thes e multiplier s (an d thei r variances ) fro m an y o f
several differen t forms . Finally , w e conside r inferenc e abou t long run multiplier s wher e expectationa l variable s ar e present , an d th e
potential problem s ar e show n t o b e specia l case s o f th e genera l
invalidity o f inferenc e when th e regressor s ar e no t weakl y exogenous fo r parameter s o f interest.
In late r chapters , w e wil l concentrat e o n th e importanc e o f integrate d
processes fo r econometri c modelling , an d i n particula r o n th e detectio n
of th e stochasti c trend s embodie d i n integrate d processes , o n identifyin g
series tha t shar e stochasti c trend s an d therefor e satisf y long-ru n equi librium relations , an d o n th e implication s o f suc h propertie s fo r th e
estimation o f economi c relationships . Befor e beginnin g t o explor e thes e
concepts, however , ther e ar e a numbe r o f aspect s o f th e us e an d
specification o f dynami c econometri c model s whic h ca n b e reviewe d
without a thoroug h knowledg e o f integrate d processes , an d whic h wil l
be usefu l i n late r discussion . Th e calculatio n o f th e parameter s o f
long-run relationship s fro m estimate d models , th e interpretatio n o f
linear transformations , an d th e form s o f particula r model s suc h a s th e
error-correction mode l ar e amon g thes e topics . Th e variable s use d i n
this chapte r ma y al l be treate d a s being stationary , bu t reader s wh o ar e
familiar wit h the concept s examine d i n late r chapter s wil l recogniz e tha t
the sam e result s appl y if the variable s ar e co-integrated .
One simpl e but fundamenta l problem tha t w e addres s i s the following :
given a variabl e whic h in genera l depend s upo n it s ow n past an d o n th e
values o f variou s exogenou s variables , ho w ca n w e determin e th e
long-run equilibriu m relationshi p betwee n th e endogenou s variabl e an d
the exogenou s variables ? I f a n endogenou s variabl e y t i s expresse d a s a

Linear Transformations an d ECM s 4

function onl y o f the valu e of a se t o f exogenou s variable s z t a t th e sam e


point i n time , th e effec t o f z t o n y ( i s immediat e an d complete ;
however, i f a la g distribution applie s t o ever y variable i n the model , th e
long-run effec t mus t b e derive d a s a function o f al l the la g distributions .
Moreover, ther e ar e othe r type s o f informatio n that ca n b e reveale d b y
a dynami c equation; an y o f a numbe r o f equivalen t form s wil l provid e
the sam e informatio n about, say , short-ru n an d long-ru n adjustment, but
different form s o f th e equatio n wil l revea l differen t type s o f information
conveniently.
We wil l conside r a numbe r o f way s i n whic h t o estimat e long-ru n
multipliers fro m dynami c regressio n models , an d i n doin g s o wil l
examine severa l differen t type s o f model . Afte r describin g th e genera l
autoregressive-distributed la g (ADL ) mode l fro m whic h th e othe r
models ar e derived , w e firs t concentrat e upo n th e error-correctio n
model, i n whic h th e term s representin g th e exten t o f deviatio n fro m
equilibrium ar e explicitl y presen t i n th e estimate d equation , an d whic h
therefore immediatel y display s informatio n abou t th e adjustmen t tha t a
process make s to a deviation fro m som e long-ru n equilibrium.
This chapte r wil l emphasiz e tw o importan t point s abou t linea r
transformations. First , eac h o f the transformation s contains precisel y th e
same information : th e estimate d value s o f long-ru n multipliers , hypo thesis tes t statistics , an d explanator y power s o f th e differentl y trans formed model s ar e al l identical . Th e choic e o f transformatio n ca n b e
made purel y o n th e basi s o f convenience , an d w e wil l conside r whic h
ones ar e convenien t fo r differen t purposes . Th e secon d poin t i s a
corollary o f th e first , bu t i s wort h emphasizing : th e estimate s o f
short-run adjustmen t parameters fro m th e error-correctio n mode l d o no t
depend upo n th e paramete r d, use d i n definin g th e error-correctio n
term y t_i 9zt-i, as long a s other level s term s ar e presen t t o allo w for
adjustment t o th e chose n parameter . I n particular , a value of unity for 6
may b e chosen , leadin g t o wha t is called 'homogeneity ' (a n error-correc tion ter m o f y t_i zt-i), a s long a s th e necessar y extr a term s ar e
present.
Next, w e consider severa l othe r transformation s o f the autoregressive distributed la g model , du e t o Bewle y (1979 ) (an d discusse d b y Wickens
and Breusc h 1988 ) an d Bardse n (1989) . Eac h o f thes e transformation s
can b e relate d t o th e error-correctio n transformation , an d w e indicat e
some o f th e implication s o f thi s fact fo r estimatio n usin g one o r othe r o f
the transformations . Finally , w e will discuss som e potentia l difficultie s i n
the estimatio n o f long-run equilibriu m relation s an d thei r interpretation ,
following McCallu m (1984) , Kell y (1985), and Hendr y an d Neale (1988).
While thi s chapte r deal s explicitl y wit h stationar y (1(0) ) processes ,
many o f the model s considere d ca n b e use d wit h co-integrated processe s
as well , a s explore d i n Chapter s 5 an d 6 . I n particular , th e equivalenc e
of thes e transformation s (i n th e sens e tha t eac h for m ca n b e derive d

48 Linea

r Transformations an d ECM s

from an y othe r b y operatin g linearly o n th e variables ) i s relevant whe n


dealing wit h th e Grange r Representatio n Theorem , als o discusse d i n
Chapter 5 . Thi s equivalenc e ha s implication s fo r derivation s o f th e
distributions o f coefficien t estimate s i n co-integrate d systems . I n a
particular transformation , fo r example , th e variable s ma y al l b e inte grated o f orde r zero , s o tha t th e asymptoti c theor y o f stationar y
processes applie s to th e distribution s of the estimates . Suc h a parameter ization migh t b e convenien t fo r inference , becaus e it s informatio n
content i s identica l t o tha t o f th e origina l parameterization , i f fo r
example tha t for m containe d bot h 1(1 ) an d 1(0 ) variables . Thes e issue s
are considere d a t lengt h i n Chapte r 6 , an d th e analysi s i n thi s chapte r
provides useful backgroun d for that discussion .

2.1. Transformation s o f a Simple Model


Before beginnin g a genera l treatment , w e conside r th e first-orde r linea r
autoregressive-distributed la g model, denote d ADL(1,1) , a s an exampl e
and deriv e severa l linea r transformation s o f it . Eac h transformatio n i s
equivalent i n th e sens e tha t eac h implie s th e sam e relationshi p betwee n
exogenous an d endogenous variables . Th e ADL(1,1) is
where e f ~IID(0, a2 ) an d \<Xi\ < 1 (se e Hendry , Pagan , an d Sarga n
1984).
First conside r a stati c equilibriu m defined , a s above , a s a n environ ment i n whic h al l chang e ha s ceased , recallin g tha t w e ar e treatin g
( y t , x t ) a s jointl y stationary . Th e long-ru n value s ar e give n b y th e
unconditional expectation s o f th e for m E(y t) i n (la) . Definin g
v* = E(y t) an d x* = E(x t) V t, w e have, sinc e E(e t) = 0,
and henc e

or

Then ki i s the long-run multiplier o f y wit h respect t o x.


Now subtrac t v r _i fro m bot h side s o f (la) an d the n ad d an d subtrac t
PoXf-i o n th e right-han d side to get 1
1
Equatio n (la ) i s invarian t t o suc h linea r transformation s whic h preserv e th e erro r
process {e,} .

Linear Transformation s an d ECM s 4

Alternatively, w e could hav e adde d an d subtracte d (j8 0 + ft);c,_ i o n th e


right side , t o get
All o f thes e equation s impl y the sam e relationship , becaus e an y on e
can be derive d fro m anothe r withou t violating the equality . In equation s
(Ic) an d (Id), however , term s representin g th e discrepanc y betwee n
yt-i an d x t-i o r betwee n y r _ t an d k\x t-\ appea r explicitly ; th e
coefficient (th e sam e fo r eac h form ) o n thes e term s ca n b e take n a s a
measure o f th e spee d o f adjustmen t o f y t o a discrepanc y betwee n y
and x i n th e previou s period . W e examin e suc h error-correction models
in detai l in the nex t section.
Equation (Ib) i s similar to (Ic ) an d (id) i n that th e sam e information
appears explicitl y a s a coefficient ; tha t is , (<* i - 1 ) represent s th e
short-run adjustmen t to a 'discrepancy' , an d thi s coefficien t ca n b e rea d
directly fro m an y of the three . Equatio n (Ib) wil l be see n t o b e a special
case o f (5 ) below, just as (Ic) is a special cas e o f (3 ) below .
Finally, le t u s retur n t o (la) an d tak e a differen t route . Subtractin g
<x\yt fro m bot h sides , w e have
Defining A ! = ( 1 ai)~l an d addin g and subtractin g ft*, , w e have
This i s agai n a specia l cas e o f on e o f th e genera l form s o f transformed
ADL model s give n belo w (equatio n (4)) . Thi s form , followin g Bewle y
(1979), convenientl y reveal s th e long-ru n equilibriu m multiplie r a s th e
coefficient o f x, i n (le) sinc e A^/J o + ft ) = k. However , becaus e a
contemporaneous valu e o f th e dependen t variabl e appear s o n th e righ t
side o f the equation , ordinar y least-square s estimate s ar e no t consistent ;
consistent estimatio n ca n be carried out usin g instrumental variables.
Next, conside r a data-generatio n process havin g the for m o f a general
autoregressive-distributed la g mode l (Hendr y e t al. 1984) . A n
ADL(m, n) mode l with a constant an d p exogenou s variables, whic h we
will als o writ e as ADL(m, n; p), i s given by 2
2
W e us e th e sam e n fo r eac h o f th e p exogenou s variable s withou t los s o f generality ,
because an y /^, - ma y b e se t equa l t o zero , s o tha t n i s simpl y th e maximum , rathe r tha n
uniform, la g length of the x t.

50 Linea

r Transformation s an d ECM s

where e t ~ IID(0 , cr 2 ). W e migh t als o writ e this, usin g th e la g operator

where a(L) = 1 - ^T= iiL' an d /3;(L) = ^JL0 /?y,-L' . A s before, ther e


are a numbe r o f possibl e transformation s o f thi s equatio n which ,
because the y d o no t ad d o r remov e an y linearl y independen t column s
from th e dat a matrix , ar e equivalen t projection s o f th e dependen t
variable on to the data . Give n joint stationarity, th e long-ru n solutio n of
(2) is

where or(l ) an d /3y(l ) represen t th e substitutio n o f unit y fo r th e la g


operator L i n the la g polynomials.

2.2. Th e Error-correctio n Model


The firs t o f th e genera l form s tha t w e examin e i s th e error-correctio n
model. Error-correctio n term s wer e use d b y Sarga n (1964) , Hendr y an d
Anderson (1977) , an d Davidso n e t al. (1978 ) a s a wa y o f capturin g
adjustments i n a dependen t variabl e which depende d no t o n th e leve l of
some explanator y variable , bu t o n th e exten t t o whic h a n explanator y
variable deviate d fro m a n equilibriu m relationshi p wit h th e dependen t
variable. Whe n th e equilibriu m relationshi p i s o f th e for m y * = dx*,
then a n error-correctio n ter m i s on e suc h a s (y t 9xt), i f thejsaramete r
in th e equilibriu m relationshi p i s presumed known , or (y t - 6x t) i f it is
estimated. However , eve n (y, xt) coul d b e used , sinc e th e possibilit y
of a coefficien t othe r tha n unit y o n x, ca n b e capture d throug h othe r
terms i n the regression , a s we will see below.
We ca n deriv e a generalize d error-correctio n mode l (ECM) , cor responding t o th e ADL(m, n; p) mode l wit h p exogenou s variable s x^,
. . ., x p, b y step s simila r t o thos e use d i n th e specifi c case s above . Th e
result, whic h allow s u s t o specif y directl y a genera l dynami c regressio n

Linear Transformations an d ECMs

51

model in the for m o f an ECM, i s (for r ^ m )

with

and

By convention , i n th e cas e o f an y ter m fo r whic h summation s begin


from r + 1, th e ter m doe s no t ente r a t al l i f th e lowe r limi t o f th e
summation exceed s th e uppe r limit . Fo r eac h o f th e 'error-correction '
terms (y t_t - ] =!*,,_,), on e lagge d ter m i n x jt i s presen t t o brea k
'homogeneity':- that is , t o allo w th e error-correctio n ter m t o tak e th e
form (_y r _ ; - 2f= i 6j xjt-i)> wher e 8, is not equal t o one. Th e 9j are the
equilibrium multiplier s give n above: 9/ = f}j(l)/a(i); an d i f the 9 j wer e
known, the y coul d b e inserte d directl y int o th e EC M term s i n (3 ) an d
the term s i n lagge d x coul d b e eliminated. 3 I n term s o f th e parameter s
of (3) ,

Since th e EC M i s simply a linea r transformatio n of th e AD L model ,


we might ask what its distinguishing feature is. The answe r is that in the
ECM formulation , parameter s describin g th e exten t o f short-ru n adjust ment t o disequilibriu m ar e immediatel y provide d b y th e regression .
Although th e for m i n (3 ) i s analyticall y convenient , i t i s no t a usefu l
empirical specification . I n practice , a singl e error-correctio n ter m a t la g
r i s preferable , a s i t induce s a mor e interpretabl e an d mor e nearl y
orthogonal parameterization .
The error-correctio n mechanis m will be o f particula r valu e wher e th e
extent o f a n adjustmen t t o a deviatio n fro m equilibriu m i s especiall y
interesting. I t i s clear tha t th e EC M provide s thi s informatio n when th e
error-correction term s ar e o f th e for m (y t-i E/=i Qj xjt-i)i wit h Qj a
known parameter . I f 6j i s not know n i t ca n b e estimated ; moreover , a n
unknown 0 ; ca n implicitl y b e allowe d fo r i n th e error-correctio n ter m
3

Not e that this require s

52 Linea

r Transformation s an d ECM s

through th e inclusio n o f extr a lag s i n th e x/, withou t affectin g th e


magnitude o f th e estimate d coefficient s 17, - i n (3) . Henc e thes e para meters d o no t nee d t o b e estimate d a t a n earlie r stag e i n order t o allo w
us t o us e th e ECM . I n fact , a n importan t poin t i n favou r o f th e
generalized EC M (3 ) i s tha t th e estimate d coefficient s o n th e error correction term s ar e unaffecte d by th e incorporatio n o f an y constan t 9
into th e term ; thi s wil l b e prove d afte r w e hav e establishe d som e othe r
results whic h wil l simplif y th e proof . Th e implicatio n i s tha t w e ca n
interpret th e coefficient s ry , i n (3 ) directl y a s adjustment s t o disequili brium eve n thoug h th e tru e disequilibriu m ter m i s give n b y
(yt-t - Zf= i SjXjt-i) an d not by (y r _; - Xf= i *,*-;) Henc e th e use of a
generalized EC M does no t imply homogeneity ( 9 = 1) a s long a s extr a
lags i n th e x, ar e incorporated , eve n thoug h th e error-correctio n term s
that ente r (3 ) d o no t explicitl y allow for 9 = 1 .

2.3. A n Exampl e
An exampl e o f th e us e o f th e error-correctio n mechanis m ca n b e foun d
in Davidso n e t al. (1978) , wh o us e a homogeneou s (6 = 1 ) error-correc tion mechanis m i n th e modellin g o f consumers ' expenditure . Th e 'error '
to whic h adjustment i s made i n th e mode l i s the differenc e between th e
logarithms o f consumptio n an d income , eac h lagge d fou r quarters . Th e
error-correction ter m i s significant i n a wide variety of specifications . I n
particular, usin g quarterl y seasonall y unadjuste d dat a fro m th e Unite d
Kingdom, expresse d a t constan t price s over th e sampl e perio d o f
1958(1) -1970(IV), th e author s favou r th e model 4 (standar d error s i n
parentheses):

where th e statistic s z\ an d z 2 ar e asymptoti c x 2 test s f r paramete r


constancy an d seriall y independen t residuals , respectivel y wit h degree s
of freedo m i n parentheses ; C, i s th e fitte d valu e o f rea l consumers '
expenditure o n non-durabl e good s an d service s C t; Y t i s rea l persona l
disposable income ; P t i s the pric e deflato r fo r consumption ; an d D i s a
dummy variabl e fo r change s i n taxation. Th e error-correctio n ter m ha s a
4
Th e symbo l AjA 4 represent s th e firs t differenc e o f th e fourt h difference ; e.g .
A 4 log Y, - A 4 log y,_j = AjA 4 log Y,.

Linear Transformation s an d ECM s 5

coefficient tha t i s reasonably substantia l a s well as statistically significan t


at conventiona l levels . Th e mode l ca n readil y b e derive d fro m a n AD L
model, notin g tha t log(C/y),_ 4 = lo g C,_ 4 - lo g Yt_4 = c,_4 - y,_ 4 ,
using lower-case letters t o denot e logarithms .
On th e additiona l assumptio n tha t A 4cr, A 4y,, an d A 4pf ar e station ary, wit h (A 4 c r ) = g c, E(& 4yt) = g y, an d E(A 4pt) = pa (th e annua l
rate o f inflation) , then , takin g expectation s o f th e equatio n abov e fo r
fixed value s of the estimate d parameters ,
Hence C * =kY* wher e k = exp(-5.3g_y - 1.3p a), notin g tha t g c = g y
given th e proportiona l long-ru n solution . Thi s for m o f solutio n i s
consistent wit h th e life-cycl e hypothesi s (se e Deato n an d Muellbaue r
1980), i n whic h case th e coefficient s of g y an d p a shoul d correspon d t o
the negative s o f th e annua l wealth-incom e an d liqui d asset-incom e
ratios. Th e resulting values seem sensible .
For positiv e rea l growt h o r inflation , k<\, and k fall s a s g y o r p a
rises. Representativ e value s o f g y an d p a ar e 0.02 5 an d 0.0 5 respect ively. Thes e impl y a valu e o f k o f e~' 2 = 0.82, an d therefor e a
(savings + durable expenditure) to income rati o of (1 - 0.82 ) or 18%.
This mode l has an additiona l interpretatio n whic h can ofte n be give n
to a n error-correctio n term . Th e coefficien t o f -0.1 0 o n log(C/Y) ; _ 4
suggests, firs t o f all , tha t th e greate r i s th e exces s o f incom e ove r
consumption (i n logarithms) for th e correspondin g quarte r on e yea r ago,
the highe r i s consumption now . Tha t is, a s income exceeds consumption
by more , i t become s optima l t o rais e consumptio n i n th e future . Th e
'error' whic h i s partiall y corrected i s thi s discrepancy ; consumer s ma y
consume unusuall y much (o r little ) a t som e poin t i n time , bu t wil l the n
tend t o consum e relativel y les s (o r more ) a t som e poin t i n th e future .
This i s implied simpl y by th e negativ e sig n of th e effect ; i n addition , we
have a n estimat e o f it s magnitude , an d i t i s apparen t tha t th e effec t i s
substantial. Moreover , b y adjustin g expenditure i n thi s way , consumption an d incom e ar e tie d togethe r i n th e lon g run , despit e th e growt h
over tim e in the leve l of each . Consequentl y the mode l formulatio n
actually entail s th e co-integratio n o f consumption an d income, give n that
these serie s ar e individuall y integrated : se e Chapte r 5 . Fo r a recen t
update, se e Hendry, Muellbauer, an d Murphy (1990).

2.4. Bardse n an d Bewle y Transformations


Two othe r transformation s o f th e ADL(m,n;p ) ar e thos e o f Bewle y
(1979) an d Bardse n (1989) . Th e Bewle y transformation has the for m

54 Linea

r Transformation s an d ECM s

where A is define d i n (6 ) below . Not e tha t i n (4) , a s i n (2 ) an d i n (5 )


below, ther e ar e k = l + m + [p(n + 1)] coefficient s t o be estimated .
The transformatio n treated b y Bardse n ca n b e see n a s a varian t o f a n
error-correction mechanism , and ma y be writte n as

where th e coefficient s are relate d t o thos e i n (2) and (4 ) by

Finding th e long-ru n multiplie r implie d b y an y on e o f thes e form s


(ADL, ECM , Bewley , Bardsen ) i s quit e straightforward . Define 9 j a s
the long-ru n effec t o f a chang e i n x/ o n y. Recall tha t i n a n equilibrium
state ther e ar e n o stochasti c shock s an d value s o f al l variable s ar e
therefore constant , s o that, writin g y * = E(y t) an d x*= E(xj t),

Then, correspondin g t o the AD L mode l (2), w e have

where A = (1 2;=
i ^i)"1 = Ml)]" 1- I t i s importan t t o not e tha t thi s
formula i s applicabl e onl y wher e |2)S= i ari i s strictl y les s tha n 1 .
Otherwise, n o long-ru n equilibrium ca n b e sai d t o exis t betwee n y an d
x, as thes e quantitie s ma y diverg e increasingl y a s t -* . I n particular ,
the unconditiona l expectations are no t wel l defined.
Corresponding t o th e Bewle y transform (4), we also have

and thi s ca n b e rea d directl y fro m th e estimate d regressio n (4 ) a s th e


coefficient o n x it. Finally, usin g the Bardse n transformatio n (5), we have

Linear Transformations an d ECM s 5

5
(7)

This expressio n ca n b e compute d fro m Bardsen' s regressio n (5 ) simpl y


by dividin g the coefficien t on Xj,- n b y the negativ e o f that o n y t-m.
Each o f thes e transformation s leads t o numericall y identical estimate s
of th e long-ru n multiplie r (th e sam e estimate d equilibriu m relationship )
if (2 ) an d (5 ) ar e estimate d b y ordinar y least square s (OLS ) an d (4 ) b y
instrumental variable s (IV) , usin g th e regressor s fro m th e AD L mode l
(2) a s instruments . Th e necessit y o f I V fo r consisten t estimatio n o f (4 )
stems fro m th e presenc e o f contemporaneous term s i n th e dependen t
variable y t o n th e right-han d sid e o f (4) , renderin g th e erro r ter m
correlated wit h those explanator y variables.
Finally, i t i s wort h pointin g ou t th e sens e i n whic h th e Bardse n
transform i s a n error-correctio n form . Th e coefficient s of ar e sums o f
the term s jj, - i n th e EC M representin g adjustmen t to disequilibriu m (a s
shown followin g (5) above ) an d th e of ma y therefor e b e though t o f a s
cumulative adjustments : a* represents the sum of the effect s of error correction term s 1 , . . ., i . Fo r som e purpose s thes e cumulativ e adjust ments are of particula r interest , in whic h case the Bardse n for m will be
especially convenient.

2.5. Equivalenc e o f Estimates from Different


Transformations
Wickens an d Breusc h (1988 ) sho w tha t th e Bewle y transformatio n
(estimated b y I V wit h AD L regressors a s instruments ) yield s precisel y
the sam e estimate s o f th e long-ru n multipliers Q- t a s doe s th e untrans formed AD L (2 ) estimate d b y OLS . Th e sam e i s tru e o f th e Bardse n
transform (5 ) estimate d b y OLS , an d o f th e genera l error-correctio n
mechanism (agai n estimate d b y OLS) , a s Banerjee , Galbraith , an d
Dolado (1990> ) show . I n demonstratin g thes e point s w e wil l make us e
of the genera l structur e that Wicken s and Breusc h use to compar e linea r
transformations o f regression models .
Take a s a basic structure th e regressio n mode l
where th e X matri x contain s lagge d (bu t no t contemporaneous ) y a s
well a s contemporaneou s an d lagge d x terms , an d y is a k x 1 vector .
Define thi s as corresponding t o the ADL mode l (2) . The representation s
(4) an d (5 ) involv e transformin g the matrice s y an d X b y a transformation matri x A, suc h that, followin g Wickens and Breusch ,

Linear Transformation s an d ECM s

56

so that
For example , tak e m = n = 2 and p = 1 in (2 ) so that th e matri x of the
transformation t o th e Bardse n for m (5 ) is

-1

0
0
0
0
0

0
0

0
0
0
0

-1
0
0
0

0
0
0
0
0

0
0
0
0

0
0
0

0
0
0

1
-1

_1

0
0
0
0
0
0

(10a)

since x' t = [yt, 1 , y,^, y,_2, xt, jc,_1; *,_2] map s ont o \' t = [&yt, 1, Ayf _i,
A*<5 Ax,-i, yt-2,xt-2] i n (5) . Fo r th e Bewle y transformatio n (4 ) an d
the sam e cas e ( m = n = 2, p = 1) , the transformatio n matrix is

0
0
0
0
0
0

0
0
0
0
0

_1

0
0
0
0

0
0
^

0
0
0

0
0
0
0

0
0

0
0
0
0

1
-1
0

(f

0
0
0

(106)

_1

since

First, le t us summariz e the relatio n betwee n th e erro r processe s u an d


e. Begi n by partitioning th e genera l matri x A (whic h may b e A a ) A/, , o r
another transformation ) to b e conformabl e wit h [ y : X]:

When ther e ar e k regressors , th e element s o f th e partitio n hav e th e


following dimensions : au , 1x1; a 12, 1 X k; a 21, k X 1; A22 , k X k .
When ther e ar e n o contemporaneou s y variable s o n th e right-han d
side o f th e transforme d mode l a 12 = 0 , an d presumin g tha t A i s o f ful l
rank, the n th e tw o set s o f errors ar e identical : u = e. If aj 2 = 0, a s in th e
Bewley transformation , the n a contemporaneou s y t (multiplie d b y a
scalar) ha s bee n adde d t o th e right-han d sid e o f th e equation , an d

Linear Transformations an d ECM s 5

therefore t o th e lef t t o preserv e th e equality . Fo r estimation , th e


equation mus t the n b e renormalize d s o tha t w e hav e onl y y, (unsealed)
on th e left , an d al l elements mus t b e multiplie d b y th e normalizatio n
factor. Thi s normalizatio n wil l have t o b e accounte d fo r late r t o conver t
back to the original parameters .
Transformation b y A require s renormalizin g b y dividin g th e entir e
equation b y th e facto r (a n - a uS) ^ 0 t o delive r (8 ) fro m (9a) an d
(9b), s o the error s of the ne w process ar e give n by
where a is the normalizatio n constant. Fo r example , if we begin with
(where [ ] does not depen d o n y ( _, V i) an d transfor m t o
we d o s o b y subtractin g fty , fro m eac h sid e an d dividin g th e entir e
equation b y (1 - ft) , s o that d r = -ft/(l - ft) , <5 2 = /32/(l - ft), and
M, = ,/(! - ft) . Th e parameters satisf y th e general formula
(12

with a = (1 - ft), 5 = [-ft/(l - ft): ft/(l - ft)]', y =[ft: ft]', and


11
0
A= 0 - 1 0
00
1
Now conside r th e relationshi p betwee n estimate s o f long-ru n multi pliers i n transforme d an d untransforme d models , startin g wit h th e
Bardsen transformatio n whic h ca n b e estimate d consistentl y b y OLS.
Different calculation s mus t b e performe d o n th e tw o set s o f regressio n
estimates t o ge t long-ru n multipliers; if we estimate (2 ) we must perform
the calculatio n (6) , an d i f we us e (5 ) w e mus t perfor m th e divisio n (7) .
We wan t t o sho w tha t th e actua l estimate s tha t w e ge t wil l b e
numerically identica l whicheve r metho d w e use . We kno w fro m th e
definition o f th e OL S estimato r o f th e transforme d mode l ((9b) or ,
explicitly (5) ) that

as is easily verified by substitutin g the formul a for th e OL S estimato r of


d, 6 = (X'X^X'y. B y definitio n o f th e OL S estimato r i n (8 ) (cor responding to (2)) ,

58

Linear Transformation s an d ECM s

again, easil y checked usin g the formul a y = (X'X ) *X'y .


Now in the cas e o f the Bardse n transform , a12 = 0 and A has the form

0 ..

.0

0
1

0
0
0

22

so that (13) and ( y : X) = (A y : XA22) togethe r imply


so

(15)

since A 22 i s of full rank . From (14 ) and (15) , we can deduc e that 5
(16)

where th e equalit y follow s fro m th e fact s tha t th e matri x X'( y : X) = F


has dimensio n k X (k + 1) an d ran k k, wher e agai n k = 1 + m +
[p(n + 1)], an d tha t F ca n b e partitione d a s [F j : F2] wit h F j havin g
dimension k x 1 and F 2 havin g dimension k x k an d rank k.
So the sam e relationshi p hold s between estimate d parameter s y and d
as betwee n th e tru e parameter s y and d (i.e. (16) has th e sam e for m as
(12)). Wha t thi s mean s i s tha t th e estimate s of , say , multipliers wil l b e
the sam e whichever transformation is used. T o make thi s last ste p i n th e
argument, le t ^be a quantit y calculate d fro m th e true parameter s of
model A , an d W, , b e th e sam e quantit y calculate d fro m th e tru e
parameters o f th e transforme d mode l B . Clearly , VP A ^V B sinc e th e
calculation i s adapte d t o produc e th e sam e underlyin g quantity i n eac h
case. Le t th e function s describin g th e calculations b e ff. 1,,] an d g[_ x d ]
respectively. Then , b y (12),

5
Th e normalizin g constan t a doe s no t appea r her e becaus e i t happen s tha t u = e in (8)
(a12 = 0) fo r th e cas e o f th e Bardse n transform , s o tha t a = 1 . Fo r th e Bewle y transform ,
there is a non-zero normalizing constant .

Linear Transformation s an d ECM s 5

But sinc e thi s holds fo r an y y , it must hold fo r y , an d s o

the secon d equalit y followin g fro m (16) . Thi s implie s that 1 P A = V P B : we


get th e sam e estimated quantit y fro m eithe r model , usin g th e appropri ate transformation matrix.
One ca n therefor e obtai n th e coefficient s of either th e transforme d o r
untransformed mode l fro m th e other , usin g th e origina l transformatio n
matrix A . Fo r example , i n calculating 0 y- from th e ADL , w e use (6) ; (6 )
is on e par t o f the transformatio n A applied t o th e origina l parameters y .
If th e parameter s o f th e Bardse n transfor m ar e d , the n th e 6 j ar e ratio s
of element s o f 6 a s i n (7) . Calculatin g 0 ;- fro m th e AD L amount s t o
using ratio s o f sum s o f selecte d element s o f th e vecto r A" 1 [ A ] in th e
calculation; b y (16) , thi s formul a yield s th e correspondin g element s o f
[ Jg], an d s o these result s ar e precisel y th e sam e a s those obtaine d fro m
the Bardse n transformatio n an d (7) . Th e sam e hold s tru e fo r an y linea r
transformation wher e A is of full ran k (A" 1 is non-singular).
In th e cas e o f transformation s fo r whic h a ^ ^ O , suc h a s Bewley's ,
OLS estimatio n i s inconsisten t becaus e o f th e correlatio n betwee n th e
error ter m an d th e contemporaneou s dependen t variable s o n th e right hand side . This bring s us to I V estimation .
Where th e instrumenta l variable s use d i n estimatio n ar e thos e o f th e
untransformed AD L model , I V estimatio n o f the Bewle y transformatio n
also yield s estimate s o f th e long-ru n multiplier s identica l t o thos e fro m
the ADL . I n thi s case , analogousl y wit h (13) , i f we le t d b represen t th e
parameters o f th e mode l (4 ) an d A fc th e matri x o f tha t transformation ,
we have

where 8 bilv is the IV estimato r of d b; again , this formula is immediately


verifiable 'by substituting 6 3 6>1V = (X'X^X'y . From (17) ,

We mus t the n normaliz e b y a b (define d t o b e th e constan t tha t


normalizes th e dependen t variable' s coefficien t t o unity , analogous t o a
in (11) ) befor e w e compar e thi s estimato r wit h anothe r whic h ha s bee n
normalized t o hav e th e dependen t variabl e ente r wit h a coefficien t of
one, an d w e the n obtai n (Wicken s an d Breusc h 1988) , followin g step s
similar to thos e above ,
6

Th e I V estimator take s this form becaus e the origina l Xs are bein g used as instruments
in th e transforme d regression model involving y and X .

60 Linea

r Transformation s an d ECM s

Comparing (19 ) with (16), it i s clear tha t onc e agai n th e estimate s fro m
the transforme d mode l ca n b e relate d bac k t o thos e fro m OL S o n th e
ADL model , o r t o thos e fro m th e othe r transformatio n A a , throug h th e
known transformatio n matrices . Moreover , comparin g (19 ) with (12),
the sam e relatio n hold s i n estimate d parameter s a s i n th e tru e para meters, s o tha t estimate s o f function s o f thes e parameter s (suc h a s th e
long-run multipliers ) wil l b e th e sam e regardles s o f th e mode l fro m
which the y ar e calculated . Here , usin g th e Bewle y transformation , th e
long-run multiplier s 9 j appea r directl y in d b; t o calculat e the m fro m th e
ADL parameters , w e would use

2.6. Homogeneit y an d th e EC M as a Linear


Transformation o f the AD L
The result s jus t establishe d allo w a straightforwar d proof o f th e earlie r
statement that , b y incorporatin g lag s o f th e level s o f explanator y
variables, th e generalize d EC M make s n o implici t homogeneity assump tions. Conside r th e tw o regression s

and

where r = min(m,n). Th e differenc e betwee n (20a ) an d (206 ) lie s i n


the fac t tha t th e d j i n (20b) ar e se t t o unit y in (20o) . W e wil l prove tha t
the coefficient s o n th e error-correctio n term s ar e non e th e les s equal ,
i.e. tha t y t = , for all i and arbitrar y dj. The ADL mode l is

61

Linear Transformations an d ECM s

We wil l call th e ful l paramete r vector s fro m (20a) , (205) , an d (20c ) a ,


b, an d c respectively . Then , fro m ou r examinatio n o f genera l linea r
transformations above , redefinin g th e particula r transformatio n matri x
A6:

In the m = n = 2, p = 1 case, fo r example, A a an d A ft ar e equal to

-1

0
0
0
0
0

0
0
0
0
0

Ah =

-1
0
0
0
0

0
0
0
0
0

0
0

0
0
0

-1

0
0

1
-1
0

0
0
0

0
0
0
0

0
0
0
0
0

o"

0
0
0

0
0

0
0

-e

_1

0
0
-8

0
0
0
0

0
0
0
0
0

1
-1

0
0
0
0
0
0

1_
0
0
0
0
0

Since th e firs t (min(m, ) + 3) row s remai n unaffecte d b y th e ne w


terms 9j, th e firs t (min(m , n) + 3) entries i n

are unaffecte d b y th e arbitrar y constant s 9j. Henc e th e firs t


(min(ra, n) + 2) element s o f th e paramete r vector s a an d b , whic h
correspond t o the error-correctio n terms , must be identical.
Thus th e generalize d ECM , usin g lagge d term s i n th e exogenou s
variables t o brea k homogeneity , produce s precisel y th e sam e estimate s
of th e response s t o 'disequilibrium ' whethe r o r no t th e error-correctio n
terms involve postulated value s of long-run multipliers explicitly.

2.7. Variance s o f Estimates o f Long-Run Multiplier s


We wan t t o b e abl e t o comput e no t onl y th e estimate s o f long-ru n
multipliers, bu t als o th e variance s o r standar d error s o f thes e estimates .
Since th e long-ru n multiplier s ar e calculate d a s ratio s o f coefficient s or
sums o f coefficients , and sinc e ther e i s no genera l formul a for th e exac t

62 Linea

r Transformation s an d ECM s

variance o f a quotien t o f item s wit h know n variances , w e mus t us e a n


approximation t o th e varianc e o f the quotient . I n th e cas e o f the Bewle y
transformation, sinc e th e long-ru n multiplier s appea r a s coefficient s o n
the Xj t, w e ca n rea d th e variance s o n thes e estimate d coefficient s fro m
the usua l estimato r o f th e variance-covarianc e matri x o f I V coefficien t
estimates; thi s estimat e implicitl y embodie s a n approximatio n t o th e
variance o f th e quotient , althoug h i t migh t appea r t o b e a n exac t
estimate.7 I n fact , th e differen t transformation s yield equivalen t results ,
in tha t th e natura l approximat e estimato r o f th e variance s i s th e sam e
for each .
For th e Bewle y transformatio n (4) , sinc e th e 6j ar e coefficient s in th e
regression, w e appl y th e formul a fo r th e covarianc e matri x o f coeffici ents estimate d i n a n instrumenta l variable s regression . Usin g V ^ t o
represent th e estimate d varianc e of y , th e estimate d paramete r vector ,

(23)
Wickens an d Breusc h (1988 : 198 ) sho w tha t thi s i s equa l t o th e
covariance matri x o f th e sam e paramete r vecto r d b<[V, calculate d (in directly) b y applying the transformation \b to the original parameter s y
and usin g th e Jacobia n o f thi s transformatio n t o approximat e th e
estimated covarianc e matri x Vg ftiv . Tha t is ,
(24)
and Vg b ca n b e reduce d fro m thi s t o th e sam e expressio n a s tha t give n
for V^ iv i n (23) .
Both th e origina l AD L mode l an d Bardsen' s transformatio n involv e a
calculation o f th e 6, a s nonlinear function s of coefficient s in th e origina l
regression. Followin g Bardsen , a standar d formul a for a n approximatio n
to th e varianc e o f a nonlinea r functio n o f element s wit h know n
variances ca n b e use d t o comput e var( ; -). Le t / = /(a 1 ,a 2 5 > //) ;
then / = /(!, 2 2, . . ., a H) and

(25a)
In th e cas e o f th e ADL , w e hav e / = , = ^^f= o^ji

wher

7
I t migh t see m impossible tha t th e Bewle y transformatio n coul d involv e a ratio, bein g a
case o f a linea r transformatio n matri x A b applie d t o th e origina l linea r regression . Recall ,
however, tha t ther e i s als o a normalizatio n facto r applied , throug h whic h divisio n b y
another linea r functio n o f the origina l coefficient s is accomplished .

Linear Transformation s an d ECM s 6

A = ( 1 Sl^i^i)"1' a s implie d b y (6 ) above . Th e estimate d variance s


and covariance s o f th e paramete r estimate s $ are base d o n ^(X'X)"" 1
from (8) .
For th e Bardse n transformation , (25a ) take s a particularl y simpl e
form, sinc e 6j i s calculate d fro m th e rati o o f onl y tw o parameters .
Hence i f estimation i s via (5), we have, afte r takin g derivatives, that

recalling that var(/^j,) , var(^), an d cov (/?*, o^) ar e easily calculate d


from a^X'X)- 1.
An equivalen t wa y of writing (25a) is to expres s it in the for m o f (24).
For / = /(!, a2, . . ., aH), a s above , le t V fl b e th e H x H covarianc e
matrix o f whic h th e (g , h)th elemen t i s cov(a g, ah). Then , usin g th e
Jacobian o f the transformatio n /(), defined as

we have
Wickens an d Breusch sho w that , substitutin g dj fo r / an d comparin g
the result s o f (23 ) and (24 ) with thos e o f (25a) an d (256) , th e estimate d
variances of long-run multipliers calculate d fro m th e AD L mode l ar e th e
same a s thos e provide d i n th e I V estimato r o f th e Bewle y transform .
We sho w no w tha t th e sam e i s als o tru e o f th e error-correctio n
transformation, o r an y othe r linea r transformation , usin g th e for m
(256). Tha t is , th e metho d o f proo f doe s no t us e th e feature s o f th e
ECM o r o f an y othe r particula r transformation , bu t instea d applie s t o
the estimate s fro m an y non-singula r linear transformation . Th e import ant point , a s above , i s th e equivalenc e o f result s yielde d b y differen t
linear transformations of the model .
Consider th e long-ru n multiplier vector a s calculated fro m th e ADL,
The AD L approximatio n t o th e varianc e of the multiplie r is
where J f i s th e Jacobia n o f th e transformatio n represente d b y th e
function f ( , ) . Th e long-ru n multiplie r vecto r calculate d fro m th e
ECM is

64 Linea

r Transformations an d ECM s

and th e EC M approximatio n to th e varianc e is


where J g correspond s t o th e functio n g ( , ). W e ca n prov e tha t
for J f an d J g representin g genera l linea r transformations , s o tha t w e
now hav e fo r th e variances , a s wel l a s th e poin t estimates , th e resul t
that th e estimate s obtaine d d o no t depen d o n whic h transformatio n i s
used. Tha t is , var ^ (0) = var (0) i f and onl y i f JfA 22 = Jg, because , b y
(27) an d the relatio n

it follows that
var,t (0) = JfA22 var (f), % )A22 Jf , an

d var

(0) = Jg var (ij,

To prove (30) , define

We kno w tha t f(a ) = 0 = g(h). No w J{ = 3f(a)/3a' an d J g = 3g(h)/3h',


while (34) states tha t a = A 22h. Henc e

Rearranging yield s (30) immediately .


So, estimatin g th e EC M o r anothe r linea r transformatio n an d trans forming t o ge t th e varianc e o f th e long-ru n multiplier lead s t o th e sam e
result a s obtaine d b y transformin g th e AD L model . Wicken s an d
Breusch (1988 ) showe d th e correspondin g result s fo r th e Bewle y estima tor; th e informatio n reveale d b y al l thre e transformation s i s th e same .
As wit h th e Bardse n transformation , a 12 = 0 fo r th e ECM , an d it s
parameters ar e therefor e consistentl y estimate d b y OLS .

2.8. Expectationa l Variable s an d th e Interpretatio n o f


Long-Run Solution s
So far , long-ru n solution s hav e bee n derive d fo r model s wit h vali d
conditioning o n regressor s o r instruments . McCallu m (1984 ) an d Kell y
(1985) hav e suggeste d potentia l problem s i n th e interpretatio n o f long -

Linear Transformation s an d ECM s 6

run solution s i n th e presenc e o f expectational variable s i n th e processe s


generating th e data . Th e problem s are , however, readil y interprete d a s
resulting fro m invali d (weak ) exogeneit y assumptions , an d d o no t
uniquely concer n long-ru n solutions ; short-ru n effect s ma y b e badl y
estimated also . Moreover , i f the variable s concerne d ar e non-stationar y
and hav e particula r integratio n an d co-integratio n propertie s (se e Chapters 3-5) , then th e long-ru n solution , bu t no t th e short-ru n multipliers ,
can b e estimate d consistentl y despit e th e expectationa l variables . W e
follow McCallu m an d Kell y i n describin g th e circumstance s i n question ,
and Hendr y an d Neal e (1988 ) i n relatin g thes e t o wea k exogeneit y an d
non-stationarity.
McCallum (1984 ) offer s the followin g example . Conside r the relation ship betwee n interes t rate s an d inflation , an d i n particula r th e Fishe r
effect. Le t IT , denot e th e inflatio n rat e an d i t th e nomina l interes t rate .
Then, i n the relationshi p
i
the Fishe r hypothesi s i s interprete d a s statin g tha t & = 1 ; i.e . i n
long-run equilibrium , th e nomina l interes t rat e reflect s inflatio n on e fo r
one. No w imagine tha t th e actua l generatio n o f these serie s i s accordin g
to the processe s
and

with \Hi\ < 1 ; v t an d e t white-nois e processes ; E(e tvs) = 0 V t, s, wher e


p i s a constan t rea l interes t rat e an d fi t+\\t represent s agents ' forecast s
of n r+1 made a t time t . Equatio n (33 ) implie s that th e Fisher hypothesi s
is vali d i n eac h period , rathe r tha n a s a long-ru n equilibriu m only .
Imagine furthe r tha t informatio n i s costles s an d tha t agent s understan d
the {I T Jproces s s o that fl t+i\t = j" o + MiH, . The n
Hence estimatio n o f (32 ) by a consisten t metho d suc h a s OL S wil l
produce a coefficien t estimat e /? i whic h converge s t o \t,\ < 1. McCallu m
emphasizes th e hazard s o f frequency-domai n time-serie s technique s fo r
estimating long-ru n (zero-frequency ) effects , bu t th e conclusio n i s
equally wel l applicabl e t o time-domai n methods . I n spit e o f th e validit y
of th e Fishe r hypothesis , th e investigato r examinin g th e long-ru n solu tion throug h a model suc h a s (32) woul d falsel y conclud e tha t i t fail s t o
hold.
Kelly (1985 ) an d Hendr y an d Neal e (1988 ) us e th e following mor e
general structur e in orde r to examin e the issue . The dat a are generate d
by

66 Linea

r Transformation s an d ECM s

where 4>(L), fi(L) , an d y(L ) ar e finite-orde r polynomial s i n th e la g


operator o f the for m

Furthermore, cj) 0 = 1, an d fo r simplicit y in this example w e tak e


The lag operato r in the polynomia l P(L) is interprete d as applyin g to
the firs t subscrip t of x,\,^i only , so that
The underlyin g series {x t} i s generated accordin g to

The erro r term s r an d ?/ , ar e mutuall y and seriall y uncorrelate d whit e


noise. Combinin g (37) and (38) , we have
and

The parameter s /3, - canno t b e determine d fro m (40 ) without know ledge o f th e margina l proces s (38) ; hence, recallin g th e definitio n o f
weak exogeneit y i n Chapte r 1 , x t i s no t weakl y exogenous fo r th e f r i n
(40). If , however , w e someho w observe d x,\,-i directly , the n w e would
be abl e t o estimat e th e /3 , fro m (39) .
The proble m identifie d b y McCallu m an d Kell y i s therefor e simpl y
one aspec t o f a broader , an d well-known , one: we canno t i n genera l
count o n unbiase d estimate s fro m model s i n whic h th e explanator y
variables ar e no t (weakly ) exogenous fo r th e parameter s o f interest. Th e
solution, i n thi s circumstance , i s therefor e join t estimatio n o f (40 ) and
(38). I f (40 ) alone i s estimated , no t onl y i s th e long-ru n solutio n no t
consistently estimated , bu t th e short-ru n adjustmen t coefficient s ar e
incorrectly estimate d a s well ; wher e $,\ t-i i s omitte d fro m th e model ,
coefficients o n x t_i ar e no t /3*(L ) bu t /3 0<5(L) + )8*(L) . I f w e d o no t
have wea k exogeneity , w e canno t conduc t vali d conditiona l inference .

Linear Transformation s an d ECM s 6

This poin t i s independent o f whether ou r primar y interest i s in long-run


equilibrium solution s o r in short-run effects .
It i s als o interestin g t o not e tha t non-stationarit y i n th e serie s y t an d
xt, and co-integration between them , can lead t o consisten t estimatio n of
the long-ru n solutio n i n spit e o f th e lac k o f wea k exogeneity . T o mak e
this clear , a s well as to clarif y th e positio n o f 'strict ' exogeneity , expres s
(39) a s
y
where 2t = (et ~ A)*7t ) an< 3 wher e x t = xtt-i + r) t fro m (38) ; xt is in
effect a n error-lade n measuremen t o f i\|,_i . No w grou p th e latte r
regressors t o get
and defin e

From (38) , xt i s correlate d wit h e 2t i n (41 ) and (42 ) since s 2t depend s


upon t\ t. Howeve r w e ca n redefin e th e parameter s i n (42 ) (i.e . reparameterize) t o eliminat e this correlation, usin g (43). Write
yielding fi t = et- f) 0rit + ftoc uo^xt + A^c^w,, fro m which , agai n
using x t = x,\t-i + f] t, an d assumin g tha t x tt-\ i s uncorrelate d wit h e,
and r\ t, we can calculate tha t

since th e firs t right-hand-sid e ter m i s equa l t o zer o b y assumption , an d


from th e definitio n o f C an d th e fac t tha t CC"" 1 =1, th e square-brack eted ter m i s zero a s well.
So th e re-parameterize d equatio n (44 ) has a n erro r ter m tha t i s
uncorrelated wit h th e regressors ; E(n txt) = 0. Not e that , sinc e thi s
re-parameterization ha s rendere d x t 'strictl y exogenous ' (Engl e e t al.
1983) i n thi s example , th e inferentia l problems d o no t ste m fro m a lack
of exogeneit y in tha t sense : stric t exogeneity , unlik e weak exogeneity, is
neither necessar y no r sufficien t fo r valid inference.
Equation (44 ) allows u s t o se e th e large-sampl e result s o f estimatin g
(42) b y regression , becaus e (44 ) is expresse d suc h tha t th e erro r i s
uncorrelated wit h th e regressor s s o tha t th e coefficient s represen t th e
impacts o f conditional expectations . Ther e ar e tw o points t o note . First ,
the existenc e o f biases i n (44) (e.g. j30(l - c ua^) + /3 0) depend s upo n a

68 Linea

r Transformation s an d ECM s

non-zero valu e fo r ff^ , an d therefor e o n th e discrepanc y betwee n x t an d


the expectatio n x t\t-i, tha t is , th e biase s ar e attributabl e t o th e lac k of
weak exogeneit y implie d b y th e fac t tha t w e us e x, in plac e o f X t\t~i- I f
this proble m wer e no t present , bot h shor t ru n an d lon g ru n woul d b e
estimable wit h n o bias ; sinc e th e proble m i s present, neithe r shor t no r
long run is estimated withou t bias (fo r 1(0) processes) .
Second, i f x t i s no t stationar y bu t integrate d o f orde r 1 , an d i f y t ha s
the sam e orde r o f integratio n an d i s co-integrate d wit h x t (tha t is , i f
there i s a long-ru n equilibriu m relationshi p betwee n them ) whil e r] t
remains a stationar y process , the n c\\a^ >0 a s < (se e Chapte r 3) .
In th e limi t a s t increase s withou t bound , therefore , th e estimate d
coefficient o n x t wil l tend t o f$ 0, an d w e have the possibilit y of consistent
estimation o f th e long-ru n solution . Nevertheless , th e short-ru n coef ficients remai n mis-estimated.
Clearly, a lac k o f wea k exogeneit y create s seriou s inferentia l prob lems, bu t thes e ar e no t restricte d t o th e estimatio n o f long-run solutions .
Further, ther e i s a marke d differenc e i n th e long-ru n outcome s betwee n
the 1(0 ) an d 1(1 ) situations , leadin g u s t o stud y th e propertie s o f
integrated an d co-integrated processes .

Properties o f Integrated
Processes
A knowledg e of the fundamenta l properties o f integrated processe s
is essentia l fo r a n understandin g o f test s fo r bot h non-stationarit y
and th e existenc e o f long-ru n equilibriu m relationships . Her e w e
define an d presen t th e importan t propertie s o f integrate d pro cesses. W e dea l wit h th e issu e o f spuriou s regression s an d sho w
how a consideratio n o f th e theor y o f integrate d processe s help s u s
to understan d th e behaviou r o f standar d estimator s i n model s
involving non-stationar y data . Severa l example s illustrat e th e us e
of Wiene r distributio n theor y i n derivin g asymptoti c result s fo r
such models .
Much conventiona l asymptoti c theor y fo r least-square s estimatio n (e.g .
the standar d proof s o f consistenc y an d asymptoti c normalit y o f OL S
estimators) assume s stationarit y o f th e explanator y variables , possibl y
around a deterministi c trend . No t al l economi c tim e serie s ar e station ary, a s w e sa w i n Chapte r 1 , an d fo r man y importan t ones , includin g
aggregate consumptio n and nationa l income , stationarit y is not eve n a
sensible approximation .
Nonetheless, regressio n method s hav e ofte n appeare d t o b e effectiv e
when analysin g such series, an d i t was not clea r tha t method s develope d
for stationar y serie s woul d no t b e vali d elsewhere . T o som e extent ,
therefore, man y analyse s o f unadjuste d non-stationar y serie s hav e bee n
carried ou t o n th e assumptio n tha t th e non-stationarit y woul d no t
matter. A s som e potentia l problem s i n doing s o became clear , however ,
econometricians naturall y looked fo r method s o f transformin g their dat a
in suc h a wa y tha t th e resultin g serie s would b e stationary , an d
therefore amenabl e t o analysi s usin g 'traditional ' econometri c o r time series methods .
One illustratio n o f th e difficultie s tha t ca n aris e whe n performin g
regression wit h clearl y non-stationar y serie s i s th e proble m o f nonsense
regression, s o name d b y Yul e (1926) , o r spurious regression, i n th e
terminology o f Grange r an d Newbol d (1974) : give n tw o completel y
unrelated bu t integrate d series , regressio n o f on e o n th e othe r wil l ten d

70 Propertie

s of Integrated Processe s

to produc e a n apparentl y significan t relationship. 1 Th e realizatio n tha t


such thing s coul d occu r le d t o th e interes t i n transformation s t o induc e
stationarity. Differencin g data wa s one o f these ; 'removing ' a determin istic trend fro m a series wa s another. 2
Although thes e transformation s enjoyed som e popularity , it eventually
became clea r fo r a wid e clas s o f processe s tha t standar d significanc e
tests fo r th e hypothesi s tha t ther e i s n o tren d ar e biase d i n favou r o f
rejecting th e hypothesi s eve n thoug h i t i s tru e (se e Grange r an d
Newbold 1974 , inter alia). Moreover , spuriou s correlation s betwee n
unrelated integrate d processe s appea r eve n i n regression s containin g
deterministic trends . Th e simpl e metho d o f de-trendin g befor e drawin g
inferences fro m non-stationar y data wa s therefore foun d to b e flawed .

3.1. Spuriou s Regressio n


The standar d proo f o f th e consistenc y o f ordinar y leas t square s regres sion use s a n assumptio n suc h a s tha t plim(l/r)(X'X ) = Q, wher e X i s
the matri x containin g th e dat a o n th e explanator y variable s an d Q i s a
fixed matrix . Tha t is , wit h increasin g sampl e information , th e sampl e
moments o f the dat a settl e dow n t o thei r populatio n values . I n orde r t o
have fixe d populatio n moment s t o whic h thes e sampl e moment s con verge, th e dat a mus t b e stationaryotherwise , a s fo r exampl e i n th e
case o f integrate d series , th e dat a migh t b e tendin g t o increas e ove r
time, i n whic h cas e ther e ar e n o fixe d value s i n th e matri x o f
expectations of sums of squares an d cross-product s o f these data. 3
Some example s o f wha t ca n emerg e whe n standar d regressio n tech niques ar e use d wit h non-stationar y dat a wer e re-emphasize d b y
Granger an d Newbold , wh o considere d th e followin g dat a generatio n
process:

1
Th e chang e i n terminology ma y be misleading , sinc e Yul e als o use d the ter m 'spuriou s
relationships', referrin g t o a correlatio n induce d betwee n tw o variable s tha t ar e causall y
unrelated bu t ar e both dependen t o n othe r commo n variables .
2
Thi s i s accomplishe d eithe r b y includin g a functio n o f tim e a s a regressor , o r b y
subtracting a functio n o f tim e fro m al l serie s used . B y th e Frisch-Waug h theorem ,
regressing al l serie s o n tim e an d usin g thei r residual s i n a furthe r regressio n i s numericall y
equivalent t o includin g tim e a s a regressor whe n usin g th e unadjuste d series.
3
Anderso n (1958 ) extend s th e standar d asymptoti c distributio n theor y t o dea l wit h th e
de-trending o f deterministi c variable s tha t ca n b e suitabl y standardized . However , her e we
are concerned wit h integrated stochasti c processes .

Properties o f Integrated Processe s 7

That is , x t an d y t ar e uncorrelate d rando m walks . Sinc e x t neithe r


affects no r i s affecte d b y y t, on e woul d hop e tha t th e coefficien t fl i i n
the regressio n mode l
would converg e i n probabilit y t o zero , reflectin g th e lac k o f a relatio n
between th e series , an d tha t th e coefficien t o f determinatio n (R 2) fro m
this regressio n woul d als o ten d t o zero . However , thi s i s no t th e case .
Regression method s detec t correlations , an d i n non-stationar y serie s (a s
Yule 192 6 showed ) spuriou s correlation s ma y persis t i n larg e sample s
despite th e absenc e o f an y connectio n betwee n th e underlyin g series. I f
two tim e serie s ar e eac h growing , for example , the y ma y b e correlate d
even thoug h the y ar e increasin g fo r entirel y differen t reason s an d b y
increments tha t ar e uncorrelated . Henc e a correlatio n betwee n inte grated serie s cannot b e interprete d i n the wa y that i t could b e if it arose
among stationary series .
In (3) , bot h th e nul l hypothesi s /3 j = 0 (implyin g y t = /J0 + e ( ), an d
the alternativ e /3 i = 0 lea d t o fals e models , sinc e th e tru e DG P i s no t
nested withi n (3) . Fro m thi s perspectiv e i t i s not surprisin g that th e nul l
hypothesis, implyin g tha t {y t} i s a white-nois e process , i s rejected ; th e
autocorrelation i n the rando m wal k {y t} tend s t o projec t ont o {x t}, als o
a rando m wal k an d therefor e als o strongl y autocorrelated . Test s base d
on badl y specifie d model s ca n ofte n b e misleading . Nonetheless , th e
spurious regressio n proble m tha t appear s amon g integrate d processe s is
distinct fro m th e inferentia l problems tha t ma y appea r amon g stationar y
processes. I f {y t} an d {x t} wer e mad e stationar y b y introducin g a
coefficient betwee n zer o an d on e o n eac h o f the lagge d terms i n (1 ) an d
(2), th e OLS-estimate d regressio n coefficien t / ^ an d th e non-centralit y
of it s t -statistic would bot h converg e t o zero , eve n thoug h (3 ) doe s no t
nest th e tru e proces s (althoug h th e f-tes t woul d over-reject) . Tha t is , in
the stationar y case , regressio n o n a se t o f variable s independen t o f th e
regressand produce s coefficient s tha t converg e t o zero ; i n th e non stationary case, thi s need no t b e so .
To characteriz e precisel y som e o f th e analytica l result s fo r integrate d
processes, w e refe r t o Phillip s (1986) . A simpl e cas e use s (1 ) an d (2 )
above a s th e dat a generatio n process , wit h th e assumption s concerning
the erro r processe s u t an d v t capabl e o f bein g weakene d substantially .
Then, estimatio n o f th e mode l (3 ) b y ordinar y leas t square s ca n b e
shown t o lea d t o result s tha t canno t b e interprete d withi n th e conven tional testin g procedure . T o begin with, conventionally calculated 't-statistics' o n /3 0 an d /3 j d o no t hav e (-distributions , d o no t hav e an y limiting
distributions, an d i n fac t diverg e i n distributio n a s th e sampl e siz e T
increases; hence , fo r an y fixe d critica l value , rejectio n rate s wil l ten d t o
increase wit h sample size . Th e nul l hypothesis tha t i s being rejected her e

72 Propertie

s of Integrated Processe s

is HQ\PI = 0; henc e a rejectio n rat e increasin g wit h sampl e siz e implie s


that a null of n o relationshi p betwee n th e serie s will tend t o b e rejecte d
more an d mor e frequentl y in larger samples . Th e /-statistic s are o f order
T1//2. Thus , th e invali d inferenc e tha t ther e i s i n fac t a relationshi p i s
traceable directl y t o th e non-stationarit y i n th e data-generatio n proces s
(1) an d (2) . Whe n (1 ) an d (2 ) ar e replace d wit h stabl e autoregressiv e
processes, th e non-centralit y o f th e t -statistic t o tes t //o :A - 0 con verges t o zero , reflectin g the lac k o f relationship betwee n th e series . We
will examin e thes e asymptoti c result s i n mor e detai l i n Sectio n 3.5.2 ,
after reviewin g som e o f th e necessar y concept s fro m Wiene r distributio n
theory.
Further analytica l result s concernin g th e distributio n o f the F-statisti c
for th e hypothesi s tha t /? 0 = f a = 0, an d thos e o f standar d autocorrela tion tes t statistics , ar e als o give n i n Phillip s (1986) . Th e F-statisti c als o
diverges, leadin g t o rejection s growin g wit h th e sampl e siz e T , despit e
the lac k o f relatio n betwee n {y t} an d {x,}; residua l autocorrelatio n
tests, however , provid e a n indicatio n t o th e investigato r tha t th e mode l
is mis-specified , by convergin g i n probabilit y t o th e value s implie d b y a
serial correlatio n coefficien t o f unity . Tha t is , althoug h th e t - an d
F-statistics fo r th e nul l hypothesi s o f interes t ar e grossl y misleading ,
some informatio n whic h woul d sugges t tha t th e regressio n (3 ) i s mis specified i s provide d b y a tes t fo r residua l autocorrelation . Thi s under lines agai n th e importanc e o f thoroughl y testin g an y regressio n mode l
for mis-specification , an d basin g inferenc e onl y upo n thos e i n whic h n o
evidence o f serious mis-specificatio n i s found; se e e.g . Spano s (1986) .
Consider no w the followin g bivariate DGP, whic h extends (1 ) an d (2 )
by allowin g th e inclusio n o f intercept s correspondin g t o potentia l drift s
in th e unit-roo t processes:

To simplif y th e analysis , w e assum e tha t th e tw o shock s e , an d v t ar e


independent a t al l points in time, whic h implies that

Assume als o tha t th e initia l values y 0 an d z 0 ar e zero . W e wil l mainl y


consider th e cas e wher e a = y = 0, s o tha t bot h variable s ar e simpl e
random walk s an d y t an d z t ar e th e sum s o f al l o f thei r respectiv e pas t
shocks. Whe n ex an d y ar e no t zero , y t an d z , depen d o n linea r trend s
which reflec t th e accumulatio n o f th e successiv e intercepts . Thi s com pletes th e formulatio n of th e statistica l generatin g mechanism i n (4 ) an d
(5), othe r tha n statin g a specific form fo r the erro r distributions .

Properties of Integrated Processes 7

Turn no w to th e specificatio n o f an economi c hypothesis . A n econom ist ma y wish t o describ e th e relationshi p betwee n {y t} an d {z t} wit h th e
model
where fi i i s interprete d a s th e derivativ e o f y, wit h respec t t o z t.
Conventionally, equation s suc h a s (7 ) ar e estimate d b y ordinar y leas t
squares, treatin g {u,} a s a n II D proces s independen t o f z t- Sinc e y t an d
zt ar e causall y unrelate d her e b y construction , th e derivativ e fi \ i s zer o
in th e sens e tha t n o relatio n exists ; it is not tru e t o sa y that settin g fl i t o
zero i n (7 ) give s th e tru e DGP . W e wan t t o examin e th e propertie s o f
the conventiona l estimatio n an d hypothesi s testin g procedur e applie d t o
(7) when the unknow n DGP i s in fact (4)-(6) .
Standard regressio n theor y fo r model s involvin g stationar y regressor s
would sugges t tha t pli m (fi{) = fl i = 0, an d tha t th e probabilit y o f th e
absolute valu e o f th e t -statistic fo r H 0:fii = 0 exceedin g 1.9 6 i s 5 pe r
cent. Becaus e thes e regressor s ar e integrated , however , thi s i s no t so .
Reconsider (7) . Sinc e {y t} an d {z t} ar e bot h integrate d processes , (7 )
could b e a well-define d regression wit h a non-zer o j8i , i f a relationshi p
between thes e tw o variable s existed . I f howeve r /3 j = 0, a s i s tru e her e
by (4)-(6) , w e hav e y, = /3 0 + ut. Now sinc e { y t } i s 1(1), {u t} mus t b e
1(1), whic h violates the assumptio n mad e abou t {u,} above . Ther e is an
internal inconsistenc y i n conductin g hypothesi s testin g i n th e standar d
way here , becaus e i t i s no t possibl e fo r th e erro r ter m t o b e 1(0 ) whe n
/?i i s zero .
We ca n us e Mont e Carl o method s t o examin e typica l results, i n finit e
samples, o f regression s suc h a s (7 ) where {y,} an d {z t} ar e independen t
non-stationary processes . I n th e exercis e tha t follows , w e generat e {y,}
using th e DG P (4) , wit h T = 100 , a = 0, y 0 = 0, an d o = 1. Similarly,
{zt} i s generate d wit h y = 0, z 0 = 0, an d a v = 1 i n (5) . Th e rando m
errors ar e normall y distribute d an d generate d independently , consisten t
with (6) . A t eac h replication , w e recor d (i ) th e estimate d coefficients ,
(ii) th e estimate d standar d errors , (iii ) whethe r th e nul l hypothesi s
j8j = 0 i s rejecte d whe n conventiona l 5 pe r cen t critica l value s o f th e
t -distribution ar e used , (iv ) the valu e o f th e sampl e correlatio n betwee n
y an d z , an d (v ) th e valu e o f th e Durbin-Watso n statisti c fo r residua l
serial correlation. There are N = 10,00 0 replication s usin g PC-NAIVE .
In thi s experiment , th e Mont e Carl o estimat e o f the mea n valu e of / ^
for th e experimen t i s E[fii] = -0.012 , wit h Mont e Carl o standar d erro r
(that is , th e standar d erro r o f th e Mont e Carl o estimat e o f th e mea n of
fli) o f 0.006 . Becaus e w e ar e estimatin g a mea n usin g independen t
replications, a centra l limi t theore m applie s t o th e Mont e Carl o results ,
so tha t th e sampl e mea n i s asymptoticall y normall y distribute d (i.e . a s
N-* oo) . Henc e we can reject th e hypothesi s tha t [& ] = 0 at T = 100,

74

Properties of Integrated Processe s

despite th e fac t tha t th e estimate d mea n valu e o f ^ i s relatively small .


The frequenc y distributio n o f & i s show n i n Fig . 3.1, standardize d t o
zero mea n an d uni t variance . Th e sampl e standar d deviatio n (SSD ) o f
the value s of {&,. , i = 1, 2, . . ., 10,000 } is 0.63 where SSD is defined as

The Distribution ^ plotte d i n Fig . 3.1 i s tha t o f f t=


(/?!,- - I[^1])/SSD(^1,-), s o that {ft } ha s a standar d deviatio n o f unity.
The Monte Carl o standar d erro r i s a = SSD^-^/N 1/2.
The probabilit y o f rejectin g H 0 a t th e conventiona l significanc e leve l
of 0.0 5 i s 0.753 ; tha t is , eve n whe n th e nul l hypothesi s i s true , w e wil l
reject i t 75. 3 pe r cen t o f th e time , an d therefor e mak e th e wron g
decision mos t o f th e time . Figur e 3. 2 reveal s tha t i t i s no t th e shap e of
the standardize d ^-distributio n tha t is at fault . Rather , th e actua l statisti c

FIG3.1. Frequenc y distributio n o f th e spuriou s regressio n coefficient ,


standardized t o zer o mean, uni t varianc e

FIG3.2. Frequenc y distributio n o f th e 'f-test ' o f f t = 0, standardize d t o


zero mean , uni t variance

Properties of Integrated Processes

75

calculated doe s not hav e a zero mean, uni t variance distribution. I n fact ,
where I i s th e mea n f-statistic , I = -0.12 (0.07 ) an d SSD(? ) = 7.3 .
Values o f |/ | > 1.9 6 ar e ver y likely wit h suc h a large standar d deviation ,
and th e empirica l critica l value s i n th e experimen t tha t ensur e a tes t
with a siz e o f 5 pe r cen t ar e approximatel y 14.5 . However , thes e
critical values are not appropriat e a t other sample sizes .
This i s th e spurious regression problem : regressio n o f a n integrate d
series on anothe r unrelate d integrate d serie s produce s f-ratio s on the
slope paramete r whic h indicat e a relationshi p muc h mor e ofte n tha n
they shoul d a t th e nomina l test level . Th e phenomeno n i s of course no t
specific t o thi s sampl e size , an d i n particula r th e proble m wil l no t
disappear a s th e sampl e siz e i s increased. Th e distributio n o f th e f-rati o
will, however , depen d o n th e sampl e size ; Fig . 3.3 show s th e grap h o f
[/3j|r] fo r T = 20, 21, . . ., 100 , togethe r wit h 2a a t eac h T , wher e
a denote s th e Mont e Carl o standar d erro r i n th e graph . Th e bia s i s
significantly differen t fro m zer o onl y at th e large r sampl e sizes , but doe s
not chang e noticeabl y wit h T. Moreover , th e valu e o f a doe s no t fal l
greatly wit h T, whic h differ s fro m wha t on e woul d expec t i f conven tional asymptotic theory were applicable .
Figure 3. 4 record s th e mea n valu e o f th e regressio n coefficien t
together wit h th e SS D and th e mea n estimate d standar d erro r (ESE ) o f
the coefficient . Ther e i s a great differenc e betwee n th e tw o measures of
uncertainty: ES E i s th e estimate d standar d erro r o f th e coefficien t j j
that th e investigato r woul d obtai n o n average , i n a regressio n o f th e
form o f (7 ) give n th e DG P i n (4)-(6) ; th e SS D i s th e Mont e Carl o
estimate o f th e tru e standar d deviatio n o f thi s paramete r estimate . A s
Fig. 3. 4 shows , th e economis t woul d repor t a sever e underestimat e o f
the uncertaint y in the estimat e o f /J t .
The mea n valu e o f th e ^-statisti c shown i n Fig . 3.5 change s little a s T
increases fro m 2 0 t o 100 , bu t th e standar d deviatio n o f t increase s

FIG 3.3. Mea n value of the spuriou s regressio n coefficien t wit h 2c r (th e
Monte Carl o standar d error)

76

Properties of Integrated Processe s

FIG 3A. Mea n valu e o f th e spuriou s regressio n coefficien t wit h th e


estimated standar d erro r (ESE ) and samplin g standard deviatio n (SSD)
across sample size s

FIG 3.5. Mea n valu e o f th e 'r-test ' o f // 0:/31 = 0, wit h 2SS D (th e
Monte Carl o base d samplin g standard deviation )
rapidly. Thu s the proble m become s worse as T increases ; rejectio n of
the nul l hypothesi s of n o relatio n betwee n th e y, and z t serie s become s
more likely , despit e one' s initia l intuitio n that , i f th e serie s reall y ar e
unrelated, thi s feature shoul d eventuall y dominate a s T > oo . Figur e 3. 6
records th e rejectio n frequencie s for ever y sampl e siz e considere d i n th e
simulation exercise ; Prd^ft = 0)| ^ 2) is 0.30 at T = 20, already greate r
than th e nomina l siz e o f th e test , an d th e proble m worsen s a s T i s
increased becaus e th e rejectio n frequencie s also increase steadil y with T.
The outcome s o f th e simulation s revea l th e danger s o f usin g critica l
values justified i n on e contex t (e.g . IID processes ) t o conduc t inference s
with statistic s compute d fro m dat a generate d b y a ver y differen t
probability mechanism .
With th e DG P i n (4 ) an d (5) , the proble m o f discriminatin g betwee n
genuine interdependenc e an d spuriou s regression s i s difficul t t o solv e

Properties o f Integrated Processe s

77

FIG 3.6. Rejectio n frequenc y o f th e '/-test ' o f H 0:fii = Q whe n th e


hypothesis is true
because, unde r bot h th e nul l an d th e alternativ e hypotheses , y t an d z t
have a hig h sampl e correlatio n (denote d R) . I n bot h case s w e rejec t
H0:/3i = 0 most of the tim e in large samples .
An earl y analysi s o f th e spuriou s regression s proble m i s due , a s w e
have said , t o Yul e (1926) , wh o als o use d Mont e Carl o simulations .
Yule's observation s o n the distributio n o f R remai n noteworthy and may
be considere d i n thre e parts , representin g thre e differen t situations : (i )
where th e {y t} an d {z t} serie s ar e bot h mean-zer o II D processes ; (ii )
where the y ar e II D processe s integrate d once ; an d (iii ) wher e the y ar e
IID processe s integrate d twice. 4 I n eac h case , th e figure s give n belo w
represent th e frequenc y distributio n o f R obtaine d fro m estimatin g
equation (7 ) 10,00 0 time s wit h a sampl e siz e o f 100 ; /3 0 = fi \ = 0 i n al l
the simulation s (excep t fo r a n irrelevan t locatio n chang e i n cas e (i) ,
owing t o a progra m restriction , whe n /3 0 = I). Th e followin g feature s of
the differen t case s ma y be observed .
Case (i). Whe n bot h variable s ar e 1(0 ) an d IID , a s Fig. 3.7 shows , R
is wel l behave d an d ha s a symmetric , nearl y Gaussian , distributio n
centred o n zero althoug h bounded b y 1 .
Case (ii). Whe n bot h variable s ar e 1(1 ) an d th e firs t difference s ar e
IID, th e densit y o f R , /#(/) , i s close r to. a semi-ellips e wit h exces s
frequency a t bot h end s o f th e distributio n (se e Fig . 3.8). Consequently ,
values o f R wel l awa y fro m zer o ar e fa r mor e likel y her e tha n i n cas e
(i).
Case (iii). Whe n bot h variable s ar e 1(2) , th e secon d difference s ar e
IID. I n thi s situatio n (se e Fig . 3.9) //?(r ) become s U-shaped , an d th e
4
Orde r o f integratio n wa s define d informall y i n Chapte r 1 ; i t i s explore d formall y i n
Section 3. 3 below.

78

Properties of Integrated Processe s

FIG 3.7. Frequenc y distributio n fo r th e correlatio n R betwee n tw o II D


independent processe s

FIG 3.8. Frequenc y distributio n fo r R betwee n tw o 1(1 ) processe s wit h


independent II D firs t difference s
most likel y correlation s betwee n tw o suc h 1(2 ) unrelate d serie s ar e 1 ,
which i s precisely what would occur i f the serie s wer e trul y related.
If a tes t statistic , base d o n R , assume s th e distributio n t o b e th e on e
applying t o cas e (i ) whe n i n fac t th e correc t distributio n i s the on e tha t
applies t o cas e (ii) , th e rejectio n frequenc y wil l greatl y excee d th e
nominal siz e o f th e tes t (give n b y th e expecte d numbe r o f rejection s i f
(i) wer e true) . Cas e (iii ) i s eve n worse : th e leas t likel y outcom e her e
would see m t o b e th e discover y o f th e truth . Ther e i s almos t n o

79

Properties of Integrated Processe s

FIG 3.9. Frequenc y distributio n o f R fo r tw o 1(2 ) processe s wit h inde pendent II D second difference s
probability o f findin g R 0 i n thi s las t case , althoug h th e populatio n
value anticipate d unde r th e nul l i s zero. Th e mos t likel y sample valu e is
R~l.
If th e degree s o f integratio n o f th e dat a serie s ar e unknown , mixtures
of case s (i)-(iii ) ar e possible . Fo r T = 100, Tabl e 3. 1 summarize s th e
outcomes.
Denote th e orde r o f integration o f y, and x t b y di an d rf 2 respectively,
and le t d = max{di, d 2}. Th e mea n o f J R i s close t o zer o i n ever y case ,
but it s standar d deviatio n increase s wit h d^ + d2. Th e estimat e o f th e
mean o f fi \ i s relativel y smal l compare d wit h th e SSD , especiall y when
TABLE 3.1. Feature s o f regression s amon g serie s wit h variou s order s o f
integration
Type3

1(0), 1(0)
1(1)
1(2), 1(2)
1(0), 1(1)
1(1), 1(0)
1(2), 1(1)
1(1), 1(2)

1(1),

R SSD(R

0
2
4
1
1
3
3

0,,0004
-0,,006
0,,004
0,,0004
0,,0008
-0,,023
-0.,013

0.101
0.490
0.818
0.099
0.101
0.613
0.610

ESE SS D Pr(\t (ft = 0)|>2)


0.0004
-0.009
0.015
-0.0001
0.003
-1.84
-0.0005

0.101
0.102
0.103
0.031
0.384
3.84
0.0054

0.,102
0,,631
1..974
0,,033
0,.417
33,,52
0..036

0,,0493
0,,7570
0,,9406
0,,0458
0,,0486
0.,8530
0,,8444

Th e notatio n !(/) , I(k) describe s a regression o f an !(/') variabl e o n an


variable, j , k = 0, 1 , 2 . Thus , 1(0) , 1(0) i s a cas e (i ) regression , 1(1) , 1(1) a cas e
(ii) regression , an d 1(2) , 1(2) a cas e (iii ) regression . Th e remainin g case s ar e
mixtures of the primitiv e (i)- , (ii)- , an d (iii)-typ e regressions.

80

Properties of Integrated Processe s

di = d 2. Th e mea n ES E reporte d b y OL S i s virtuall y unaffected b y d


when di = d 2, bu t varie s greatly when di = d 2. Th e SS D als o increase s
as di + d 2 increase s unles s the regresso r i s of higher orde r o f integration
than th e regressand , namel y when di < d2. Th e ESE underestimate s th e
SSD b y a facto r i n th e neighbourhoo d o f 1 0 t o 2 0 fo r d = 2. Th e
probability o f falsel y rejectin g the nul l tha t / ^ = 0 rises t o abou t 9 4 per
cent a s d increases .
Thus, th e difficultie s ar e no t restricte d t o spuriou s regression s gener ated b y regressin g independen t serie s o f th e same order o n eac h other .
Severe problem s ar e reveale d i n regression s of a n 1(2 ) o n a n 1(1 ) serie s
(or vic e versa) . Les s seriou s problem s occu r i n regression s o f 1(1 ) o n
1(0) serie s (o r vic e versa) . Figur e 3.1 0 report s th e distributio n of R fo r
an 1(1 ) o n a n 1(2 ) serie s an d reveal s a U-shape d distribution , a s wit h
two 1(2 ) series . (Thi s als o occur s fo r a n 1(2 ) o n 1(1 ) series. ) Figur e 3.1 1
shows th e distributio n o f th e least-square s coefficien t estimat e fo r a n
1(2) o n 1(1 ) series ; th e distributio n her e i s long-tailed bu t peake d an d i s
distinctly non-normal . Th e t -rejection frequencie s ar e simila r i n thes e
two case s an d li e betwee n th e rejectio n frequencie s give n b y th e cas e
(ii)- an d cas e (iii)-typ e regressions. Th e distributio n o f R , whe n on e of
the serie s i s 1(0), i s similar t o th e distributio n of thi s statisti c when bot h
series ar e 1(0). s Overall , w e se e a patter n o f potentia l nonsens e onc e
both tim e serie s become integrated .

FIG 3.10 . Frequenc y distributio n o f R betwee n a n 1(1 ) an d a n 1(2 )


process wit h independent II D firs t an d secon d difference s respectivel y
5
Ther e i s goo d reason , a s w e shal l se e i n Ch . 6 , fo r thi s similarit y i n behaviour . I n a
regression o f on e 1(0 ) serie s o n anothe r 1(0 ) series , independen t o f th e firs t series , th e
estimate o f th e regressio n coefficien t f t tend s i n probabilit y t o zero . However , whe n a n
1(0) serie s i s regresse d o n a n 1(1 ) series , th e onl y wa y i n whic h OL S ca n mak e th e
regression consisten t an d minimiz e the su m of square s i s to driv e th e coefficien t o n th e 1(1 )
variable t o zero . Thu s equivalen t result s arise . Thes e possibilitie s d o no t occu r whe n bot h
series ar e integrated .

Properties of Integrated Processe s

81

FIG 3.11. Histogra m an d estimate d densit y fo r th e regressio n coefficien t


of a n 1(2 ) serie s regressed o n a n 1(1) serie s
Phillips (1986 ) als o demonstrate s tha t th e Durbin-Watso n statisti c
calculated fro m th e residual s o f (7 ) converge s t o zer o a s the sampl e siz e
tends t o infinity . Whe n th e tw o serie s ar e genuinel y related , th e D W
statistic converge s t o a non-zer o value . Th e behaviou r o f th e D W
statistic therefor e provide s on e wa y o f discriminatin g between spuriou s
and genuin e regressions , bu t a tes t base d o n thi s statisti c may have poor
power propertie s i n smal l samples . Phillips' s analytica l results ar e usefu l
in understandin g th e simulatio n evidenc e tha t Grange r an d Newbol d
(1974) advanced , bearin g o n th e regressio n R 2 a s wel l a s th e D W
statistic. Thes e author s suggeste d treatin g an y regressio n fo r whic h
R2 > DW a s one tha t i s likely to b e spurious . Thi s coul d b e interprete d
as a sign of a lack o f an y equilibrium relationship amon g the variable s i n
the regression , whic h in tur n implie s a non-stationar y erro r ter m an d s o
very stron g autocorrelatio n i n the regressio n residuals .
Overall, simulatio n an d analytica l result s sho w tha t th e proble m o f
drawing inferenc e fro m non-stationar y dat a i s a seriou s one ; OL S
regression interprete d i n th e standar d fashio n ca n b e ver y misleading .
Resolution of thi s proble m wil l lea d us int o a mor e detaile d considera tion o f th e integratio n propertie s o f tim e series , bu t firs t w e wil l
examine the practic e of de-trending time series .

3.2. Trend s and Rando m Walk s


One potentia l solutio n suggeste d fo r dealin g wit h integrate d serie s wa s
to assum e tha t th e sourc e o f non-stationarit y coul d b e capture d by , o r

82 Propertie

s o f Integrated Processes

approximated by , a deterministi c functio n o f time . I f thi s wer e so , i t


would b e possibl e t o brea k u p a n integrate d serie s int o a deterministi c
(and therefor e completel y predictable ) component , an d a stationar y
series o f deviation s fro m thi s 'trend' . Method s fo r analysin g stationar y
series coul d b e applie d t o th e deviations , an d th e whol e serie s thereb y
modelled.
Unfortunately, subsequen t evidenc e fro m Mont e Carl o an d analytica l
studies (e.g . Phillip s 1986 ) showed tha t inferenc e i n model s tha t con tained tim e trend s coul d no t b e carrie d ou t i n th e straightforwar d way
that practitioner s ha d hoped . Firs t o f all , tim e trend s woul d appea r t o
be statisticall y significan t i n model s wher e the y shoul d no t be , muc h
more ofte n tha n conventiona l tes t size s woul d suggest . Tha t is , th e
standard statistic s (especiall y ^-statistics ) fo r th e hypothesi s tha t th e tim e
trend shoul d not appea r d o not hav e standard ^-distributions .
Second, deterministi c trend s di d no t solv e th e spuriou s regressio n
problem, eve n leavin g asid e th e difficult y involve d i n decidin g whethe r
or no t the y shoul d b e presen t i n th e regressio n model . Th e reaso n i s
that spuriou s correlatio n wil l ten d t o emerg e eve n wit h deterministicall y
'de-trended' random walks.
We wil l no w loo k a t som e mor e precis e question s an d thei r answers .
The analytica l result s tha t w e summariz e ar e foun d i n Durlau f an d
Phillips (1988) ; Mont e Carl o studie s o f model s wit h tim e trend s presen t
can b e foun d i n Sai d an d Dicke y (1984 ) an d Schwer t (1989) . Sectio n
3.5.1 describe s th e asymptoti c theory applicable .
The tw o question s tha t w e wil l addres s are : (i ) Wha t problem s o f
inference appea r i n usin g tim e trends ? an d (ii ) Ca n de-trendin g yiel d
stationary serie s an d therefor e a solutio n t o th e proble m o f spuriou s
regression?
Consider a serie s {y,} whic h i s generate d accordin g t o th e rando m
walk
An investigato r face d wit h suc h a serie s (without , o f course , knowin g
this data-generatio n proces s precisely ) migh t decid e t o attemp t t o dea l
with th e apparen t non-stationarit y by de-trending : tha t is , b y including a
time tren d i n a regressio n equatio n o r b y removin g th e fitte d value s
from a regressio n o n tim e fro m th e series . Th e investigato r migh t
therefore us e the regressio n mode l
As Durlau f an d Phillip s (1988 ) show , ther e ar e onc e agai n problem s i n
conducting inferenc e i n this environment. Whe n c = y = 0 , b y (8) , y ha s
a degenerat e limitin g distributio n a t 0 (a s i n a stationar y mode l wit h a
trend), wherea s c ha s a divergen t distribution ; tha t is , th e unseale d

Properties o f Integrated Processes 8

parameter estimat e c ha s a varianc e tha t grow s wit h th e sampl e size .


We will dea l more rigorously with thes e limitin g distributions later i n th e
chapter.
Moreover, inferenc e concernin g y wil l b e unreliabl e eve n thoug h th e
estimate of that parameter i s converging t o its true value of zero. Whil e
the paramete r estimat e converge s t o zero , th e t - an d F-statistic s fo r th e
hypothesis HQ\ y = 0 do no t converg e t o zero , an d ar e i n fac t asymptot ically unbounde d wit h probabilit y 1 . (Tha t is , ther e exist s som e < 5 > 0
such that , fo r representing eithe r o f th e tes t statistics , T~ d wit h
probability 1. ) As i n the spuriou s regressio n cas e above , th e investigator
must loo k t o mis-specificatio n testsi n particular , test s fo r autocorrel ated errorsfo r a suggestio n tha t ther e i s somethin g wron g wit h th e
regression model .
Since th e spuriou s regressio n proble m betwee n integrate d serie s
remains wit h deterministicall y de-trende d series , inclusio n o f a tim e
trend i s no t a solution . Conside r agai n th e DG P (l)-(2) , an d a n
investigator wh o choose s thi s tim e t o attemp t t o 'tak e accoun t o f th e
potential non-stationarit y in these serie s b y including a time tren d i n th e
regression. Th e mode l is therefore

The result s fro m (10 ) ar e muc h a s one woul d expec t give n those implie d
by (3 ) an d (9 ) abov e (see , again , Durlau f an d Phillip s 1988) . A s before ,
the distributio n o f c diverge s an d y tend s i n probabilit y t o zero , bu t / ?
has a non-degenerat e distributio n asymptoticall y (i.e . doe s no t converg e
to zero) . Test s fo r H 0: / ? = 0 diverg e i n distribution, tendin g t o lea d th e
investigator falsel y t o rejec t thi s nul l hypothesis . Estimatio n o f th e
regressions i n (9 ) an d (10 ) wil l produce substantia l residua l autocorrela tion. I t migh t b e though t tha t modellin g th e autoregressiv e erro r using ,
say, th e Cochrane-Orcut t algorith m shoul d remov e th e uni t roo t an d
thereby allo w vali d test s o f ft = 0 in (10) . Grange r an d Newbol d (1977 )
present Mont e Carl o evidenc e suggestin g that suc h a strategy i s ineffect ive in practice whe n based o n conventiona l critica l values.
In summary , th e proble m o f falsel y concludin g tha t a relationshi p
exists betwee n tw o unrelate d non-stationar y series , a proble m tha t
persists eve n a s th e sampl e siz e grow s without bound, i s no t alleviate d
by a n attempt t o remove a trend fro m th e underlying series .
In workin g with non-stationar y data , th e investigato r mus t b e particu larly careful . Whil e on e solutio n i s t o transfor m th e serie s t o achiev e
stationarity (a t th e cos t o f losin g som e informatio n abou t long-ru n
behaviour, a s we shal l se e below) , i t i s essential tha t th e investigato r b e
aware o f th e non-stationarit y i n th e dat a i f procedure s fo r modellin g
data o f thi s typ e ar e t o b e applie d appropriately . A s i t happens , testin g

84 Propertie

s o f Integrated Processes

for non-stationarit y i s als o potentiall y misleading , i n tha t non-standar d


distributions appea r wher e th e dat a ar e non-stationary , s o tha t inferen tial procedure s mus t diffe r fro m thos e applicabl e whe n th e serie s ar e
stationary.
Our discussio n ha s therefor e le d u s to tw o major area s whic h must b e
understood whe n working with potentially non-stationar y data . Th e firs t
is compose d o f technique s fo r determinin g whethe r o r no t serie s ar e
stationary (mor e generally , the orde r o f integration of a series). Chapte r
4 wil l concentrat e o n thes e techniques , whic h we us e t o decid e whethe r
methods o f inferenc e for non-stationar y data ar e necessar y t o overcom e
the problem s tha t hav e bee n illustrate d t o thi s point . Method s tha t ca n
be use d wit h non-stationar y dat a compris e th e secon d are a tha t w e
should examine , an d for m th e subjec t matter o f Chapte r 6 . Moreover , i t
must b e note d that , i n spit e o f the inadequac y of deterministi c trend s a s
models fo r serie s tha t ar e i n fac t rando m walks , i t remain s conceivabl e
that economi c tim e serie s d o actuall y contai n suc h deterministi c com ponents; som e o f th e test s tha t w e conside r late r wil l allo w fo r thi s
possibility.

3.3. Som e Statistica l Feature s o f Integrated Processe s


Before w e conside r testin g fo r integratio n i n tim e series , w e mus t firs t
define order s o f integratio n an d conside r som e o f th e propertie s tha t
integrated serie s usuall y display.
DEFINITION 1.6 A serie s wit h n o deterministi c componen t an d
which ha s a stationar y an d invertibl e autoregressiv e movin g aver age (ARMA ) representatio n afte r differencin g d times , bu t whic h
is no t stationar y afte r differencin g onl y d 1 times , i s sai d t o b e
integrated o f order d , denote d x t ~ \(d).
The definitio n can b e extende d t o allo w for polynomial s in time of th e
form 2f= ojM' - Whe n & dxt contain s a polynomial of order p i n time, x t
depends on a polynomial o f order p + d.
The propertie s o f serie s integrate d o f strictl y positiv e order s diffe r
substantially fro m thos e o f 1(0 ) series . Conside r a serie s containin g a
single unit root :

6
Thi s definitio n i s simila r t o tha t o f Engl e an d Grange r (1987) , bu t rule s ou t som e
anomalies. Conside r th e stationary , I(1) , serie s z , = et ,_1; wher e e, is 1(0) . Integrat ing {z,} give s a serie s tha t i s 1(0) ; bu t i f we cal l {z, } itsel f a n 1(0 ) series , the n w e woul d
expect its integral {ej t o be 1(1).

Properties o f Integrated Processe s 8

or, afte r integrating ,


where S, = E/=oPX-;- I f p > 1, y < is non-stationary, an d i f p = 1, it is
integrated o f orde r 1 (i.e. 1(1) ) sinc e y, is then th e su m o f al l previou s
errors {u/}, j = 1, . . ., t . Th e sequenc e {u t} need no t b e a n innovatio n
sequence; u, ma y itsel f follo w a stationar y ARMA(p , g) process , fo r
example. Belo w w e wil l assum e a fairl y genera l se t o f propertie s fo r th e
{ut} process . First , however , w e consider tw o special case s o f (llfl) :

and
In (12) , t o ensur e stationarity , le t u s assum e tha t y 0 i s draw n fro m th e
unconditional distribution o f y; that is, y0 ~ IID[0, a\/(l - p 2)].
It i s interestin g t o compar e severa l propertie s o f thes e series , viewe d
as possibl e DGPs . Tabl e 3. 2 summarize s som e o f th e difference s
between autoregressiv e serie s tha t ar e stationary , an d thos e containin g
one (o r more ) uni t root s (whic h requir e differencin g t o b e mad e
stationary). Th e propertie s i n th e right-han d colum n o f th e tabl e hol d
for integrate d serie s generally . Nonetheless , th e specificatio n (13 ) i s a
special one , an d i n a genera l treatmen t w e wan t a les s restrictiv e
TABLE 3. 2. Som e propertie s o f stationary an d integrate d processe s

Variance
Conditional variance
Autocorrelation
function a t lag i
Expected time between
crossings of y = 0
Memory3
a

DGP (12 )
(1(0))

DGP (13 )
(1(1))

Finite
(a\(l - p2)-i )

Unbounded
(grows as ta^)

Pi = P

Finite
Temporary

Pi = Vl - (i/f) - 1 V i as t - o o

Infinite
Permanent13

We sa y tha t a serie s has a permanen t memory if th e effec t o f a shoc k does


not disappea r as t * .
b
ln a multivariat e context , a n integrate d process may hav e som e components
that d o no t remai n in th e serie s indefinitely. I f a series is integrated, there must
be a t leas t on e componen t tha t wil l hav e permanent effects, bu t ther e ma y b e
others wit h temporar y memory. Fo r example , a rando m wal k proces s plu s a n
unrelated stationar y process woul d yiel d a n integrate d process , bu t memor y
would be permanen t only for the rando m walk component.

86 Propertie

s o f Integrated Processe s

specification whic h wil l cove r a greate r variet y o f series . W e ca n fin d


one b y adopting (11) , fo r example, bu t th e propertie s o f the erro r ter m
remain t o b e specifie d sinc e (lla ) require s only tha t i t b e 1(0) . W e d o
not, however , wis h t o adop t th e ver y restrictive specificatio n in (12 ) an d
(13), whereb y th e erro r i s require d t o b e orthogona l t o it s ow n past .
However som e restriction s mus t b e place d o n th e error s t o guarante e
non-degenerate limitin g distribution s fo r th e statistic s describe d below .
A wea k se t o f restriction s whic h suffice s fo r man y purpose s i s give n
below an d i s discusse d i n detai l b y Phillip s (1987a) ; th e mode l (11) ,
supplemented wit h erro r term s {u t} require d t o mee t onl y thes e
conditions, i s capabl e o f representin g a wid e variet y o f univariat e
data-generation processes , includin g thos e wit h exogenou s variables , a s
long a s th e exogenou s variable s ar e 1(0 ) an d s o ar e capabl e o f bein g
subsumed i n {u,} i n (11) . Thes e condition s ar e give n i n (I6a)-(l6d)
below.
Series tha t ar e 1(0 ) hav e the importan t property tha t certai n function s
of th e sampl e value s converg e t o constant s a s th e numbe r o f sampl e
values increase s withou t bound . Fo r example , law s o f larg e number s
(see e.g . Whit e 1984 ) guarante e th e convergenc e i n probabilit y o f th e
sample mea n t o th e tru e mea n o f th e proces s fo r a clas s o f processe s
that include s stationar y tim e series . Othe r function s o f th e sampl e ca n
have constan t probabilit y limit s a s well; for example , a varianc e estimator ma y converge i n probabilit y to th e tru e varianc e o f th e series . On e
of th e primar y fact s abou t integrate d processes , however , i s tha t
convergence theorem s o f thi s type , wher e convergenc e i s t o constants ,
generally fai l t o hold , an d suc h convergence theorem s a s can b e derive d
will involv e convergenc e o f sampl e moment s t o random variables.
Analytical result s concernin g limitin g distribution s mus t therefor e b e
based o n a n extended asymptoti c theory.
For a vector tim e serie s x, wit h n components , w e define x t ~ I(d) i f
d i s th e highes t orde r o f integratio n o f th e individua l series: x it ~ I ( d t )
and d = max(di, d^, . . ., d n).

3.4. Asymptoti c Theory fo r Integrated Processe s


We wil l no w revie w an d develo p som e o f th e asymptoti c theor y
appropriate t o integrate d rando m variables . W e us e th e Wiene r pro cesses introduce d i n Chapte r 1 , so tha t th e propertie s o f estimator s an d
test statistic s fo r 1(1 ) serie s wil l b e mor e readil y interpretable . Mos t o f
our attentio n wil l b e devote d t o th e statistica l propertie s o f serie s
containing a singl e unit roo t (i.e . 1(1 ) processes) , extendin g to th e mor e
general I(d) clas s only where necessary .
Begin by considering the followin g dat a generatio n process :

Properties o f Integrated Processes 8

where {u t}i i s a weakly stationary, mean-zero innovation sequence .


After integratin g the proces s i n (14),

In general , 1(1 ) serie s suc h a s y t ar e linea r function s o f time , wit h a


slope o f zer o wher e fj, = 0. Th e deviation s from thi s function of tim e ar e
1(1), bein g the accumulatio n o f pas t rando m shocks : th e effect s o f thes e
shocks d o no t di e out . For example , le t f ~IN(0,1). Then , fo r
0 ss T * T, w e have that E(S T - S r) = 0, and

because 2S= r+iM ? i s distribute d a s x 2 wit h T - T degrees o f freedom .


Hence S T~ N(0, T), a rando m wal k wit h independen t normall y dis tributed increments .
In general , th e formulatio n in (14 ) nee d no t assum e tha t th e {, } ar e
white-noise disturbances , bu t onl y tha t the y satisf y condition s give n i n
(16) below . T o complet e th e specificatio n of th e DGP , w e impose thes e
restrictions o n {wjf . Th e condition s ar e stron g enoug h t o sustain th e
derivation o f non-degenerate limitin g distributions for th e statistic s t o b e
discussed belo w an d wea k enoug h t o b e relevan t fo r man y economi c
time series . Thi s se t o f condition s is defined i n detail i n Phillips (1987a) ,
and ca n b e summarize d as follows .
Let {u t}i b e a stochastic process suc h that, fo r S T 2i=iM <>
E(u t) = 0 for all t; (16a
)
sup , E(\ut\^) < o o for some /3>2; (16b)
o 2 = li m E(T~ 1S2T) exists , and a2 > 0;
r^oo
u t i s strongl y mixing , wit h mixin g coefficient s {a m} suc h tha t
S-it-^*>. (16c
)
fo r stationary {,} , o 2 ca n be written as

Each o f thes e condition s relate s t o a n importan t aspec t o f th e


behaviour o f th e {u t} process . Th e first , i n (16a) , i s th e conventiona l
one o f havin g a zer o unconditiona l mea n suc h tha t al l drawing s of {u t}
have th e sam e mean . Next , (166) i s sufficient t o ensur e th e existenc e of
the varianc e an d a higher non-intege r momen t o f {u t} V t. However , i t
is a wea k conditio n i n tha t E(\u,P) i s no t assume d t o b e constant , s o
that heterogeneit y i s allowe d i n th e erro r process . Often , thir d o r eve n

88 Propertie

s of Integrated Processe s

fourth moment s wil l b e assume d t o exist , thereb y ensurin g tha t (I6b)


holds: normality , fo r example , entail s tha t al l moment s o f finit e orde r
exist. Th e thir d conditio n i s neede d t o ensur e non-degenerat e limitin g
distributions, an d eithe r (16c ) o r a closel y relate d conditio n i s require d
in mos t centra l limi t theorem s t o guarante e tha t informatio n continue s
to accrue . Finally , w e discusse d mixin g condition s i n Chapte r 1 , an d
these serv e a s a usefu l intermediat e assumptio n which ensures ergodicit y
yet allow s a considerabl e degre e o f tempora l dependenc e i n th e {u,}
process. Th e /3 i n (16> ) i s th e sam e a s tha t i n (16c) : th e mor e
heterogeneity tha t i s allowed , th e les s th e possibl e tempora l depend ence, an d vic e versa.
These condition s impl y tha t th e proces s generatin g th e erro r ter m i n
(14) ma y tak e an y on e o f a larg e numbe r o f forms . Possibl e example s
include most stationar y ARM A models , an d ARMAX model s where th e
exogenous variables ar e 1(0 ) . Note tha t a 2 = o 2u only if the erro r term in
(14) i s IID(0, o 2u). Thi s restrictiv e cas e i s of interes t i n tha t i t i s the cas e
for whic h most limitin g distributions hav e bee n tabulated ; nevertheless ,
it wil l no t hol d i n man y empirica l applications. 7 Fo r example , i f u t i s
the MA(1 ) process u , = et det-i, the n o 2u = o 2e(l + 0 2 ), wherea s
a2 = a](l -26+ 9 2) = o 2E(l -8) 2.
As note d above , ordinar y probability limits an d centra l limi t theorems
do no t appl y i n th e cas e o f integrate d processe s l(d), d 5 = 1 . I n orde r t o
derive limitin g distributions, i t i s necessar y a s i n th e stationar y cas e t o
use sequence s o f rando m variables , th e convergenc e o f whic h is ensure d
by appropriat e transformations . Th e evolutio n o f a time-serie s proces s
dominated b y a growin g secular component ca n b e suitabl y smoothed b y
a choic e o f horizonta l an d vertica l axe s whic h control fo r explosivit y an d
curvature, respectively . Mor e precisely , i n th e 1(1 ) framework , we nee d
to focu s o n th e sequenc e {S t} whic h ca n b e transforme d suc h tha t eac h
element o f th e sequenc e lie s in th e spac e o f real-value d function s o n th e
interval [0 , 1] whic h are right-continuous , an d hav e finit e lef t limits ; this
space i s denoted D(0 , 1). Th e transformatio n is achieved b y substituting
a concentrate d serie s fo r th e stochasti c componen t S t o f th e origina l
series. I n particular , we will map a transformation of S, onto th e Wiene r
process. Th e firs t step , a s w e sa w i n Chapte r 1 , i s t o ma p th e interva l
[0, T ] ont o th e fixe d interva l [0 , 1] by dividin g th e latte r into T + 1 parts
at 0 , 1/T, 2/T, . . ., 1 ; next , w e construc t a ne w rando m functio n o n
[0, 1] (se e Phillip s 1987a) .
A suitabl e concentrate d serie s i s then

The paramete r a 2 ha s a clea r interpretatio n in th e frequenc y domain : i t i s equa l t o


, wher e /u(0) i s the spectra l density at frequenc y zero .

Properties o f Integrated Processe s 8

with ( t - l)/T ^ r < t/T an d t = 1, 2, . . ., T , s o tha t r e [0, 1]. Her e


[z] represents th e intege r par t o f any rational numbe r z . I n thi s way we
are abl e t o concentrat e th e origina l horizonta l axi s o f 1 t o T t o th e
closed interva l [0,1] , indexin g th e observation s b y r . If , fo r example ,
T = 100, th e origina l observatio n _y 50 wil l be indexe d b y r e [0.50 , 0.51),
and s o on . Th e choic e o f th e powe r o f T i n th e denominato r o f (17 ) i s
such tha t th e serie s R T i s neither explosiv e no r converge s to zero. Since ,
for example , whe n u t i s IID(0 , o 2u), the n var(S r) = O 2UT , th e standar d
deviation o f S T wil l b e O(T 1/2), an d thi s i s precisel y th e powe r chose n
to modif y th e ordinat e axis .
We then have that, a s T grow s without bound,
The symbo l = > i s use d her e t o signif y wea k convergenc e o f th e
associated probabilit y measure, 8 whil e W(r ) i s a scala r Wiene r proces s
with varianc e r, also know n as a Brownian motion process , whic h lies in
the spac e C[0 , 1] o f al l real - valued continuou s function s o n th e interva l
[0, 1]. Resul t (18 ) i s known as Donsker's theorem; interested reader s ar e
referred t o Billingsle y (1968) fo r detail s and proof .
An extensio n o f th e Slutsk y theore m i n conventiona l asymptoti c
theory (se e e.g . Whit e 1984 ) als o applie s i n this framework, i n the sens e
that, i f g ( ) i s any continuou s functional on C[0 , 1], the n Rr(r) 4 > W(r)
implies that
This resul t i s calle d th e continuous mapping theorem (se e Billingsle y
1968).
The mos t strikin g differenc e betwee n conventiona l asymptoti c theory
and thi s theor y appropriat e t o integrate d processe s i s that , wherea s i n
the forme r th e sampl e moment s converg e t o constants , i n th e latte r
suitably normalize d sampl e function s converg e t o rando m variables .
Similarly, a s a resul t o f th e absenc e o f stationarit y and ergodicit y i n th e
series {y t}, traditiona l centra l limi t theorem s ar e replace d b y functional
central limi t theorems (FCLT) .
A usefu l contras t betwee n thi s asymptoti c theor y an d tha t applicabl e
to stationar y processe s is provide d by the distributio n of the sampl e
mean considere d i n Chapter 1 . Rewrit e (14 ) a s

and conside r th e behaviou r o f th e las t ter m fo r p < 1 an d p = 1


8
Thi s concept , use d i n functio n spaces , i s analogou s to convergenc e i n distributio n fo r
ordinary random variables . Se e Hall an d Heyd e (1980) .

90 Propertie

s of Integrated Processe s

respectively. I n th e forme r case , thi s ter m i s 1(0 ) an d a straightforward


application o f a La w o f Larg e Number s (agai n see , e.g., White 1984)
will show tha t

since (M,_,- ) = 0 . I n th e 1(1 ) case , whe n p = 1 this las t ter m i s given by


St = 2i'= i M i> an d ca n b e writte n i n term s o f th e correspondin g Wiene r
process usin g the standardize d su m (see Phillips 1986 and Sect . 1.5.6): 9

Similarly:

Since

Thus:

Note th e differenc e between th e order s o f magnitud e o f thes e limitin g


distributions an d th e conventiona l stationar y distributions : i.e . 0 p(T3/2)
in (21 ) instead o f O P(T), O p(T2) i n (22 ) instead o f O P(T), O P(T) i n
(23) instea d o f O p(Tl/2}, an d O p(T5/2) i n (24 ) instead o f O P(T3/2).
These difference s ar e behin d a number o f unconventional feature s of th e
distributions o f test statistic s fo r hypothese s involvin g integrated series .
E(u,), given the restrictions embodied i n (16).

Properties of Integrated Processe s 9

Many o f the functional s t o whic h thes e sample moments converge can


be expresse d i n term s o f norma l densities . Tabl e 3. 3 provide s a se t o f
distributional result s fo r a numbe r o f thes e functional s fo r II D error s
with uni t variance . Sectio n 1.5. 6 an d the appendi x t o Chapte r 1 provide
examples o f th e metho d o f proo f o f thes e result s b y showin g tha t th e
sample momen t i n exampl e 1 o f Tabl e 3. 3 converge s t o bot h th e
functional JoW(r)d r an d th e densit y N(0,1/3) , implyin g tha t th e func tional mus t hav e thi s densit y (als o se e Phillip s 1987 a, b , an d Cha n an d
Wei 1988) .

3.5. Usin g Wiene r Distribution Theory


We no w presen t tw o example s o f th e applicatio n o f th e asymptoti c
distribution theor y fo r integrate d processe s t o hel p understan d regres sion wit h non-stationar y data . Recal l tha t result s o n sum s o f power s of
trend term s ar e summarize d i n Sectio n 1.5. 5 above , an d tha t th e
relationships amon g sampl e moments , functional s o f Wiene r processes ,
and densitie s fro m th e norma l famil y ar e summarize d in Table 3.3.
TABLE 3.3. Convergenc e result s fo r normalize d sampl e moments 3
Functional Densit
,1
W(r)dr N(0

, 1/3 ) T~

,1 T

2: J rdW(r) N(0

, 1/3 )

3: W(l) N(0

, 1)

1T

4: W(r)dW(r)

r f i 1-1/

5: J o W(r) 2 dH J
,1 T

e moment 5

y Sampl

(l/2)(x

2fi

^ W(r)dV(r) N(0

6: J o ( r - a)W(r)dr N(0

(l) - 1 ) T~

l 2

ly

^ y,-iu,
t =l

, 1) , T), T~

52

/ ^ ty, (a = 0)

where Y = (1/60 )
(8 - 25 + 202)
a

I n exampl e 5 , V(r) i s anothe r Wiene r proces s independen t o f W(r). Not e


that a specia l cas e o f exampl e 6, whic h we wil l us e later , i s a 0, whic h yields a
density of N(0, 2/15).
b
Thes e ar e example s o f sampl e moment s whic h converge t o th e correspond ing functionals in th e firs t colum n for _y n = A * = 0 and ff 2 = 1 .

92 Propertie

s o f Integrated Processe s

3.5 .1. Example: Spurious De-trending (Durlauf an d Phillips 1988)


Let {y t} be generate d as in (14 ) above ; the n
Consider th e mode l
)
t, (26
This i s a mode l whic h fail s t o tak e accoun t o f th e presenc e o f th e
stochastic tren d i n th e dat a serie s an d thereb y attempt s t o de-tren d
spuriously.
The OL S estimato r o f c in (26 ) is

Substituting (25 ) int o (27 ) an d rearranging , we obtain

However, b y (21) ;

by (24) ;
also;

The densit y o f this functiona l ca n b e foun d fro m exampl e 6 in Table 3.3 ,


by substitutin g a =2/3; i t reduce s t o N(0 , 2cr2/15). Not e i n particula r
that c ha s a divergent limiting distribution.
Similarly, th e OL S estimat e o f y in (26 ) is

Properties o f Integrated Processe s 9

Using (25 ) an d rearrangin g yields

Further,

It the n follows , from th e limitin g results give n above, tha t

where th e las t equalit y follow s fro m settin g a = 1/2 i n exampl e 6 o f


Table 3.3 . Usin g simila r techniques , Durlau f an d Phillip s (1988 ) sho w
that T" lf2t9, T" l/2tt, T~ lcii, R 2, an d T - D W hav e functional s of
Wiener processe s a s their asymptoti c distributions. 10 Sinc e th e estimate d
coefficient o n th e tren d converge s t o \JL, a s suggeste d by (29) , an d a s th e
distribution o f it s ^-statisti c i s divergent , interpretin g th e result s a t fac e
value wil l lead th e investigato r t o suppos e tha t th e tren d i s an importan t
determinant o f th e serie s { y t } . I n fact , th e serie s woul d b e bette r
modelled wit h a stochasti c tren d a s i n (25) , whic h woul d lea d t o a
stationary residual series.
3.5.2. Example: Spurious Regression (see Phillips 1986)
Let {y t}i an d [x t}i b e generate d a s pure rando m walks:

The spuriou s regressio n mode l is


In orde r t o deriv e th e asymptoti c distribution s of the estimator s an d tes t
statistics fo r (30) , i t i s convenien t t o defin e W u(r) an d W E (r) a s th e
independent Wiene r processe s o n C[0 , 1] obtaine d fro m cumulatin g th e
{wjf an d {ejf series , respectively . Le t x an d y b e th e sampl e mean s
of th e {x,} an d {y t} series . The n
10
R i s th e multipl e correlatio n coefficien t o f th e estimate d model , an d D W i s th e
Durbin-Watson statisti c computed fro m th e ut.

94

Properties o f Integrated Processe s

From (21),

From (22) ,

It ma y also be shown , usin g the sam e method o f proof, tha t

Substituting (32)-(35) into (31) , it follows tha t

Also,

From (21 ) and (36),

The spuriou s regressio n problem becomes clear upo n inspection o f (36) .


The tru e valu e of th e derivativ e of y t wit h respect t o x t i s zero becaus e
the error s generatin g th e {x t} an d {y t} serie s i n th e regressio n (30 ) are
independent. Ye t / ? fail s t o converge i n probabilit y t o zer o an d instea d
has a non-degenerate distribution.

Properties o f Integrated Processe s 9

Using simila r techniques , Phillip s (1986 ) show s tha t T~ l/2tp ha s a


non-degenerate distribution , o r i n othe r word s tha t th e t -statistic fo r / 3
has a divergen t distribution . Henc e a s T , th e probabilit y o f a
significant f-valu e arisin g i n a regressio n suc h a s (30 ) approaches 1 ,
leading t o spuriou s inference s abou t th e existenc e o f a relationshi p
between y t an d x t (se e Banerjee an d Hendry 1992 , fo r a n exposition) .

3.6. Near-integrate d Processe s


In late r chapter s w e wil l dea l wit h variable s tha t ar e 'borderline- ' o r
'near-'integrated. B y thi s w e mea n tha t th e proces s generatin g th e
variables ha s a roo t clos e t o bu t no t o n th e uni t circle . Phillip s (19876 )
presents asymptoti c result s fo r 'unit-root ' an d 'near-unit-root ' processe s
within a unifie d framewor k t o explai n th e specia l propertie s o f regres sions estimate d usin g borderline-stationar y variable s an d w e follo w hi s
approach.
Consider th e AR(1) model
where u t ~ IN(0, a2). When |p | < 1 and y0 ~ N[0, o2(l - p 2)"1], {y t} i s
a stationar y process . Whe n p = 1 and y 0 = 0, i t i s 1(1) an d non-station ary. Apparently , therefore , ther e i s a discontinuit y a t p = 1 wher e
stationarity disappears , an d th e constan t unconditiona l varianc e
(a2(l - p 2)"1) becomes a trend (to 2).
In fact , i f y 0 = 0 in (39) and jp | < 1 but is close t o unity, sa y p = 1 + s
with e < 0 for small e , the n

and

Thus, th e varianc e act s lik e a tren d fo r finit e t whe n term s o f 0(e 2) o r


smaller ar e negligible , an d ther e i s reall y n o discontinuit y i n practica l
terms: fo r sufficientl y smal l e an d finit e t, th e proces s behave s lik e a n
1(1) proces s eve n thoug h i t i s asymptotically stationary . Paraphrasing , in
finite samples , fo r e close t o zero , a better approximatio n i s to trea t th e
process a s 1(1 ) tha n a s 1(0) , eve n thoug h asymptotically , th e expansio n
for th e variance abov e approaches a finite limi t not dependen t upo n t .
A mor e convenien t parameterizatio n o f nearl y integrate d processe s i s
given b y writin g p = exp(e/T), fo r E < 0. Thi s parameterizatio n define s
a sequenc e o f loca l alternative s t o p = 1 for th e process . Whe n e = 0,

96 Propertie

s of Integrated Processe s

p = 1 , whil e p i s les s tha n bu t clos e t o unit y fo r smal l e < 0 an d a s


r-o, p1 . A proces s wit h suc h a valu e o f p i s calle d 'near integrated' becaus e fo r smal l negativ e E it behave s rathe r lik e a n 1(1 )
process.11
There ar e three advantage s t o considerin g near-integrated tim e series.
The firs t i s th e lin k the y provid e betwee n conventiona l asymptoti c
distribution theor y an d th e Wiene r theor y describe d above , stressin g the
continuity o f th e breakdow n i n stationarit y a s a roo t approache s unity .
The sketc h o f th e relevan t theor y provide d belo w reinforce s thi s
consideration. Th e secon d advantag e is that th e resultin g theory ma y b e
empirically mor e relevan t tha n tha t derivin g fro m th e assumptio n o f a n
exact uni t root . I t i s too earl y t o reac h a fina l judgemen t o n tha t issue ,
but th e algebr a belo w suggest s tha t ver y similar finite-sample behaviour
would be observe d i n unit-root and near-integrate d processes .
The fina l advantage , an d th e rea l reaso n fo r ou r interest , i s tha t
near-integration i s neede d whe n examinin g th e powe r function s o f
unit-root test s agains t stationar y loca l alternatives . Phillip s (1988 ) em phasizes thi s role , an d Johanse n (1991 ) an d Haldru p an d Hylleber g
(1991) presen t application s t o derivin g powe r functions . W e describ e
and dra w upon som e o f their result s in th e nex t chapte r whe n discussing
testing for a unit root .
Reconsider (39 ) wit h p = exp(e/r), y 0 = 0, an d wit h th e {u t}i
sequence satisfyin g th e se t o f condition s give n by (I6a)-(16d). I n orde r
to deriv e th e limitin g distributio n o f p , th e OL S estimato r o f p , unde r
H0, it is convenient t o defin e th e functiona l K E(r):

KB(r) i s also know n as an Ornstein-Uhlenbeck proces s and , fo r fixe d r,


is distribute d normall y wit h mea n zer o an d varianc e
(l/2)e~ 1 [exp(2r) - I]. 12 K e(r) i s a first-orde r diffusio n proces s an d is
closely relate d t o W(r). (Se e e.g . Grimme t an d Stirzake r (1982 ) fo r
details.) I t i s like a n error-correctio n process , havin g been generate d b y
the stochasti c differentia l equatio n
Using argument s analogou s t o thos e employe d earlie r i n thi s chapte r
to deriv e distribution s fo r uni t roo t processes , Phillip s (1987fc ) prove s
the followin g asymptotic results for (39) whe n p - ex p (e/T):13
11

Se e Chan an d We i (1988 ) an d Phillip s (19876).


Not e tha t lira e _, 0 (e~ 1 /2)[exp(2r) 1] = r (usin g L'Hopital' s rule) . Thi s i s a s ex pected because , a s e->0 , K s(r)>'W(r), an d fo r fixe d r , W(r)~N(0 , r) . Alternatively ,
use a Taylor serie s expansio n t o give exp(2r) = 1 + 2rc + O(e 2 ) an d the resul t follows.
13
Th e definition s o f 5 [Tr], S,, A, and a 2 ar e give n in equations (14)-(23) .
12

Properties o f Integrated Processe s 9

For example , to demonstrat e (40) , construc t step-processes give n by

and the n sho w that


Using the power-serie s expansion for exp(e/T),
Now, fro m (39) ,

Thus, fro m (43) ,

Finally, usin g (41) and (42 ) i n (44) ,

When th e non-centralit y paramete r e i s set t o zero , K ( r ) = W(r) an d


the Dickey-Fulle r distributio n i s recovere d a s a specia l cas e o f (45) .
Using th e Dickey-Fulle r distributio n a s a benchmark , i t ca n als o b e
seen fro m (45 ) tha t th e effect s o f near-integratio n ar e reveale d i n a shif t
in locatio n (give n b y e ) an d a chang e i n shap e o f th e limitin g
distribution o f p : p converge s t o 1 (whic h i s th e nul l valu e o f p a s
T-*oo) a t rat e T" 1. Thi s i s th e usua l Dickey-Fulle r rat e o f converg ence: se e Chapter 4 .
Results i n Banerje e an d Dolad o (1987 ) an d Banerjee , Dolado , an d

98 Propertie

s of Integrated Processes

Galbraith (1990a ) sho w tha t som e o f th e importan t distributiona l


features fo r th e near-integrate d cas e (fo r example , th e lower-tai l critical
values) ca n b e recovere d fro m th e Dickey-Fulle r table s simpl y b y
shifting th e Dickey-Fulle r distributio n b y fixe d numbers . Thes e result s
suggest that , eve n i n fairl y larg e samples , th e non-centralit y paramete r
in (45 ) i s th e mos t importan t determinan t o f th e shap e o f th e distribu tion o f p. Th e mor e subtl e distributiona l features, which involve change s
in shap e an d ar e give n by the secon d par t o f (45) , becom e relevan t only
asymptotically.

Testing for a Unit Roo t


This chapte r describe s method s o f testin g fo r a uni t roo t i n a n
observed series . Bot h parametri c regressio n test s and non-parametric adjustment s to thes e tes t statistic s ar e considered , an d w e give
the table s o f critica l value s necessar y fo r th e applicatio n o f
commonly use d tests . W e als o us e functional s o f Wiener processe s
to describ e th e asymptoti c distribution s of important tes t statistics .
Since a n 1(1 ) serie s become s stationar y upo n bein g difference d once , i t
must contai n on e uni t root . Fo r example , i f we tak e a rando m wal k as
the DGP , the n w e ca n immediatel y deriv e tha t it s firs t differenc e i s
stationary. I f by contrast the underlyin g data-generating process is
where |pj | > 1 , then we have
From (1 ) i t i s clea r tha t Ay , i s n o longe r stationary : i t depend s no t
only upo n th e stationar y process MI, , bu t als o upo n th e non-stationar y
process y t-i (sinc e p i - 1 > 0). Hence a n AR(1) proces s wit h a coeffici ent o f 1 is 1(1) , bu t th e sam e proces s wit h a coefficien t o f 1.0 1 i s not ,
since differencin g wil l not reduc e this process t o stationarity .
Many economi c tim e serie s ma y contai n a n exac t uni t roo t i f w e
consider logarithmi c transformation s o f th e for m routinel y applie d t o
economic tim e series. Otherwise , root s ver y close to, bu t slightl y greater
than, unit y impl y non-stationar y serie s tha t ar e no t l(d) fo r an y d .
Roots slightl y les s tha n unit y generat e near-integrate d series . Suc h
processes wil l ten d t o b e difficul t t o distinguis h from thos e wit h root s of
exactly unit y on moderatel y size d samples ; suc h processe s ar e discusse d
in Chapte r 3 . Root s substantially greater tha n unity , by contrast, wil l b e
easily detecte d a s the explosiv e characte r o f the serie s wil l be clea r wit h
even fairl y smal l samples.
Consider th e simples t data-generatio n proces s withi n whic h w e ca n
discuss tests for unit roots:

100 Testin

g for a Unit Root

If on e wer e testin g th e tru e hypothesi s H 0:p = p 0 fo r p 0 < 1 , th e


test woul d b e easil y performed . Runnin g th e regressio n (2) , th e t-statistic ( p p0)/SE(p) has , asymptotically , a standar d norma l distributio n
and ca n b e compare d wit h table s o f significanc e point s fo r N(0, 1). I n
small sample s th e statisti c i s approximatel y t -distributed, althoug h th e
coefficient estimat e p i s biased downwar d slightly.
For p o = 1 , however , thi s resul t n o longe r holds . Th e distributio n o f
the tes t statisti c jus t give n i s no t asymptoticall y normal , o r eve n
symmetric. Tables o f critica l value s hav e bee n tabulate d b y D . A .
Dickey an d ar e reporte d in , e.g . Fulle r (1976) . I t i s instructiv e t o
examine thes e i n detail, an d they are recorde d a s Tables 4.1 and 4.2 .
The critica l value s i n Fuller' s table s pertai n t o eac h o f thre e differen t
models: i t i s importan t t o not e a t th e outse t that , a s i n man y othe r
instances, th e distribution s of tes t statistic s obtaine d depen d no t onl y o n
the data-generatio n process , bu t als o o n th e mode l wit h whic h w e
investigate it . Fo r th e tim e being , w e wil l conside r thre e possibl e
models:

The nul l hypothesi s i s that p , = 1 for i = a, b, c. Th e applicabilit y of


each mode l depend s on what is known about th e DGP , sinc e we want t o
construct simila r tests (tha t is , test s fo r whic h the distributio n o f the tes t
statistic under th e nul l hypothesis is independent o f nuisance parameter s
in th e DGP) . I f a tes t i s not similar , then th e appropriat e critica l value s
may depen d upo n unknow n nuisanc e parameter s (e.g . a constant) ,
which will invalidate standar d inferences . W e will return t o th e similarit y
of test s below . Fo r th e moment , w e will follow much o f the literatur e o n
the topi c i n assumin g that (2 ) i s the DGP , i n whic h case th e issu e doe s
not aris e sinc e (2 ) contains no nuisanc e parameters .
Another formulatio n o f th e DG P deal s wit h a potentia l difficult y tha t
arises fro m (2 ) concernin g th e statu s o f th e nuisanc e parameter s unde r
the alternativ e H I . p < 1. Reconsider (2 ) when there is an intercep t <p:
&yt = cj) + (p - l)y t-i + e wher e e, ~ IID(0 , ol) fo r t = 1, . . ., T .
When H 0: p= 1 is true , y, is a rando m wal k wit h drif t cj), an d henc e i t
has a tren d fo r ( p ^ O . Whe n HQ i s false , y , i s stationar y (a t leas t
asymptotically) aroun d a constan t mea n o f <p/(l p), bu t ha s n o trend .
This i s a rathe r asymmetri c treatment and , i f the dat a d o hav e a trend ,
does no t d o justic e t o th e alternative . Addin g a tren d t o th e mode l
compensates, bu t induce s a quadratic trend unde r th e null .

Testing for a Unit Root 10

A simpl e solutio n wa s proposed b y Bhargava (1986 ) a s follows. Write


the DGP as
which i s a commo n facto r mode l (se e Sargan 1980 , and Hendr y an d
Mizon 1978) . The n
Now, fo r H 0: p= 1, y t i s a rando m wal k wit h n o drift , wherea s when
p < 1 it i s stationary aroun d a non-zer o mean . Similarly , if y t i s adde d
to the process ,
so that
When H 0: p= 1 holds, A_y , = y + et. Thus, a tren d a t rat e y( l - p ) is
present unde r th e alternative , an d drif t a t rat e y unde r th e null .
Bhargava develop s severa l test s base d o n thi s formulation . Mor e re cently, Schmid t an d Phillip s (1992 ) hav e als o investigate d the propertie s
and power s of test s of H 0:p= l usin g thi s approac h and fin d the m
preferable, althoug h th e powe r function s cros s thos e o f correspondin g
Dickey-Fuller tests . I n practice , unfortunately , the power s o f availabl e
unit-root test s ar e lo w fo r alternative s differen t from , bu t clos e to , th e
null of unity .
In interpretin g Table 4.1, note that, i f the sig n o f an entr y in the tabl e
is negativ e fo r a give n size , sa y a (wher e a i s th e probabilit y o f a
smaller value) , the n a t leas t a fractio n a o f estimate s o f p ar e les s tha n
1; for model s (3b) an d (3c) , negative entrie s persis t u p t o a = 0.95 and
a = 0.99 respectively i n larg e samples . Althoug h i t i s not explici t i n thi s
table, entrie s eve n fo r mode l (3a ) ar e negativ e a t a = 0.50. Fo r al l of
these models , then , mos t estimate s o f p ar e les s tha n 1 ; fo r th e latte r
two, th e overwhelmin g majorit y ar e les s tha n 1 . Thi s hold s i n spit e o f
the fac t tha t th e tru e valu e i s 1 : error s ar e fa r fro m symmetri c aroun d
zero.
Generally, p i s a downwardl y biased estimato r o f p ; thi s i s tru e fo r
any o f th e thre e model s chosen . A tes t conducte d b y th e metho d tha t
would typicall y b e use d fo r stationar y processes that is , a tes t base d
upon th e usua l t - o r asymptoti c norma l distributio n applie d t o th e
^-statistic ( p l)/SE(p), a t conventiona l critica l values therefore seem s
likely t o giv e misleadin g results . Thi s ca n b e confirme d b y examinin g
Table 4.2 , again take n fro m Fulle r (1976 ) an d originall y constructed b y
Monte Carl o simulation .

102 Testin

g for a Unit Roo t

TABLE 4.1. Empirica l cumulativ e distribution o f T(p 1)


DGP: (2 ) with p = 1
Sample Probabilit y of a smaller value 3
size (T )
0.10 0.9
0.01
0.025 0.0 5
(a) Mode l
25
50
100
250
500
00

(b) Mode l
25
50
100
250
500
00

(c) Mode l
25
50
100
250
500
00

(3fl)/(8;1

0 0.9

5 0.97

5 0.9

-9.3
-9.9
-10.2
-10.3
-10.4
-10.5

-7.3
-7.7
-7.9
-8.0
-8.0
-8.1

ca
J, J

-5.5
-5.6
-5.7
-5.7
-5.7

1.01
0.97
0.95
0.93
0.93
0.93

1.40
1.35
1.31
1.28
1.28
1.28

1.79
1.70
1.65
1.62
1.61
1.60

2.28
2.16
2.09
2.04
2.04
2.03

(3b)/(8b )
-17.2
-14.6
-18.9
-15.7
-19.8
-16.3
-20.3 -16.6
-20.5
-16.8
-20.7
-16.9

-12.5
-13.3
-13.7
-14.0
-14.0
-14.1

-10.2
-10.7
-11.0
-11.2
-11.2
-11.3

-0.76
-0.81
-0.83
-0.84
-0.84
-0.85

0.01
-0.07
-0.10
-0.12
-0.13
-0.13

0.65
0.53
0.47
0.43
0.42
0.41

1.40
1.22
1.14
1.09
1.06
1.04

(3c)/(8c)
-22.5
-25.7
-27.4
-28.4
-28.9
-29.5

-17.9
-19.8
-20.7
-21.3
-21.5
-21.8

-15.6
-16.8
-17.5
-18.0
-18.1
-18.3

-3.66
-3.71
-3.74
-3.75
-3.76
-3.77

-2.51
-2.60
-2.62
-2.64
-2.65
-2.66

-1.53
-1.66
-1.73
-1.78
-1.78
-1.79

-0.43
-0.65
-0.75
-0.82
-0.84
-0.87

-11.9
-12.9
-13.3
-13.6
-13.7
-13.8

-19.9
-22.4
-23.6
-24.4
-24.8
-25.1

e.g. , fo r model (3 ) wit h T = 100, P r [T(p - 1 ) < 1.65 ] = 0.975. Al l entries


in th e lef t hal f o f the tabl e hav e standard error s les s than 0.15; thos e i n the right
half, les s than 0.03.
Source: Fulle r (1976 : 371) .

This tabl e give s th e cumulativ e distributio n o f th e f-statisti c fo r


HQ: p = 1 in eac h o f th e model s (3a)-(3c) . I t i s especially interestin g t o
compare th e result s fo r eac h o f thes e model s wit h thos e w e woul d
obtain wit h a stationar y process; becaus e th e ^-statisti c would asymptot ically b e distribute d ./V(0,1 ) i n tha t case , th e statistic s woul d b e
distributed a s indicated i n the las t line of the table .
For mode l (3a) , w e se e tha t th e result s approximat e thi s outcom e
reasonably closel y i f we ad d (ver y roughly) 0. 3 t o eac h entr y i n par t (a )
of th e table ; tha t is , th e entir e distributio n of th e f-statisti c i s shifte d t o
more negative values, by approximately this amount.

Testing for a Unit Root 10

TABLE 4.2. Empirica l cumulativ e distribution o f ( p l)/SE(p)


DGP: (2 ) with p = 1
Sample Probabilit y o f a smaller value
size (T )
0.01 0.02
5 0.0 5 0.1
0 0.9

(3 )/(8:1

(a) Model fl
25
-2.66
50
-2.62
100
-2.60
250
-2.58
500
-2.58
00
-2.58

0 0.9

5 0.97

5 0.9

-2.26
-2.25
-2.24
-2.23
-2.23
-2.23

-1.95
-1.95
-1.95
-1.95
-1.95
-1.95

-1.60
-1.61
-1.61
-1.62
-1.62
-1.62

0.92
0.91
0.90
0.89
0.89
0.89

1.33
1.31
1.29
1.29
1.28
1.28

1.70
1.66
1.64
1.63
1.62
1.62

2.16
2.08
2.03
2.01
2.00
2.00

(b) Mode l (3&)/(8ft )


25
-3.75
-3.33
50
-3.22
-3.58
100
-3.51
-3.17
250
-3.14
-3.46
-3.44
500
-3.13
00
-3.12
-3.43

-3.00
-2.93
-2.89
-2.88
-2.87
-2.86

-2.63
-2.60
-2.58
-2.57
-2.57
-2.57

-0.37
-0.40
-0.42
-0.42
-0.43
-0.44

0.00
-0.03
-0.05
-0.06
-0.07
-0.07

0.34
0.29
0.26
0.24
0.24
0.23

0.72
0.66
0.63
0.62
0.61
0.60

(3c)/(8c)
-3.95
-4.38
-4.15
-3.80
-4.04
-3.73
-3.69
-3.99
-3.98
-3.68
-3.66
-3.96

-3.60
-3.50
-3.45
-3.43
-3.42
-3.41

-3.24
-3.18
-3.15
-3.13
-3.13
-3.12

-1.14
-1.19
-1.22
-1.23
-1.24
-1.25

-0.80
-0.87
-0.90
-0.92
-0.93
-0.94

-0.50
-0.58
-0.62
-0.64
-0.65
-0.66

-0.15
-0.24
-0.28
-0.31
-0.32
-0.33

-2.33

-1.65

-1.28

1.28

1.65

1.96

(c) Model
25
50
100
250
500
00

N(0, 1 )
00

-1.96

2.33

Source: Fulle r (1976 : 373).

In model s (3b) an d (3c) , w e se e greate r deviation s fro m th e N(0,1 )


pattern abov e tha t woul d hol d asymptoticall y for \p \ < 1. A s a constan t
and the n th e tren d ar e adde d t o a model , w e see mor e entrie s tha t ar e
negative i n th e table s (part s (b ) an d (c)) ; a s i n Tabl e 4.1 , a greate r
proportion o f estimate d j3 s become negative .
With th e informatio n i n Tabl e 4.2 , however , w e ca n no w conside r
applying a tes t fo r p = 1 usin g th e f-statisti c fro m an y o f th e thre e
models. A s lon g a s we ar e awar e tha t th e distributio n o f th e statisti c is
non-standard, an d s o avoid making the mistak e of applying t - or norma l
tables, thes e significanc e points tabulate d b y Dicke y an d Fulle r ca n b e
used i n their plac e t o provid e a valid test. Fo r example , conside r model

104 Testin

g for a Unit Roo t

(3b). A ^-statisti c o f +1.0 0 woul d no t lea d t o rejectio n o f th e nul l


against a n explosiv e alternativ e i f w e wer e applyin g N(0,1 ) tables ; b y
Table 4.2b, however , th e tes t reject s a t th e 5 per cen t leve l (o r eve n th e
1 pe r cen t level ) becaus e th e probabilit y o f th e statisti c exceedin g eve n
0.60 i s onl y 0.01 . B y contrast , a valu e o f -2.50 , whic h woul d lea d t o
rejection o f H 0 usin g standard norma l tables , ca n n o longe r b e use d t o
infer tha t H 0 ( p = 1) i s fals e agains t a stationar y alternativ e a t th e 5 %
level.
This wa s the firs t for m o f 'unit-roo t test ' t o hav e bee n developed . It s
main potentia l disadvantag e lie s i n th e fac t tha t i t i s base d upo n th e
assumption tha t th e data-generatio n proces s (2 ) hold s precisel y unde r
the null . Man y series wil l b e integrate d o f order 1 but wil l no t hav e thi s
form; i n particular, th e DG P ma y contain nuisanc e parameter s suc h a s a
constant o r othe r exogenou s variables , o r ma y contai n riche r dynamic s
in th e variabl e o f interest . A s a n exampl e o f th e latter , conside r a
general AR(/> ) process i n y t:
a(L)y, = ut,
with a(L) = (I L)a*(L), an d wher e al l laten t root s o f a*(L) li e
within th e uni t circle . Suc h a proces s i s 1(1) , and , dependin g upo n th e
form o f th e polynomial , a*(L) ma y b e wel l approximate d b y (2 ) wit h
p = 1. T o th e exten t tha t i t i s not, however , the critica l values in Table s
4.1 an d 4. 2 ma y b e inaccurate . W e wil l conside r severa l method s o f
dealing wit h thi s i n Section s 4. 2 an d 4.3 . First , however , w e wil l
consider th e possibilit y o f additiona l exogenou s regressor s i n th e DGP ,
and th e proble m o f constructing simila r tests under thes e conditions .

4.1. Simila r Test s an d Exogenou s Regressors i n th e DGP


Kiviet an d Phillip s (1992 ) conside r exac t an d simila r test s fo r th e
coefficient o n a lagge d dependen t variable , i n a first-orde r autoregress ive mode l tha t ma y includ e multipl e exogenou s variables . I n orde r t o
compute th e exac t critica l value s fo r suc h tests , thes e author s us e
numerical integratio n base d o n th e Imho f routine . (Se e Imho f (1961 ) o r
Koerts an d Abrahams e (1969 ) fo r a n introduction. ) Whil e thi s pro cedure ca n b e use d t o construc t exac t an d simila r test s fo r a DG P wit h
first-order dynamic s an d als o containin g arbitrar y strictl y exogenou s
processes, th e Dickey-Fulle r test s alread y discusse d wil l b e simila r test s
for som e DGPs .
Evans an d Savi n (1981 , 1984) , Nankervi s an d Savi n (1985 , 1987) , an d
Bhargava (1986) , a s wel l a s Kivie t an d Phillips , al l consider th e
properties o f Dickey-Fuller test s fo r variou s DGPs . Som e o f th e result s
may b e summarize d a s follow s (fi = 0; y = 0).

Testing for a Unit Root 10

DGP Model
s yieldin g similar tests 1
(i) y t = Pyt-i + ut, y0 = 0 (3c)
, (36) , (3c )
(ii) y t = py,-! + ut, arbitrary y 0 (36)
, (3c )
(iii) y t = [i+ py t-i + ut, arbitrary y 0 (3c
)
(iv) y t = [a + yt + pyt-i + u f> arbitrar y y 0 Extensio n o f (3c ) necessar y
Thus, fo r example , i n cas e (i) , i f th e mode l i s give n b y (3c) , th e
appropriate critica l value s ar e give n b y Table s 4.1(c ) an d 4.2(c) . Th e
same table s ca n b e use d t o conduc t inferenc e i n (iii) , despit e a non-zer o
value o f n i n th e DGP , becaus e (3c ) yield s a simila r test . Similarit y
implies tha t th e distribution s o f p an d it s associate d ^-statisti c ar e no t
affected b y th e value , unde r th e null , o f th e nuisanc e parameter , an d
the critical value s ar e th e sam e a s the one s tha t woul d appl y fo r n = 0,
namely, those i n Tables 4.1(c ) an d 4.2(c).
There ar e a numbe r o f noteworth y additiona l points . I n cas e (i ) ther e
are n o nuisanc e parameters , s o tha t similarit y i s a trivia l property . I n
general, a s this summar y suggests , a simila r tes t havin g a Dickey-Fuller
distribution require s tha t th e mode l use d contai n more parameter s tha n
the DGP . I n order to hav e a similar test fo r (iv) , one woul d the n nee d a
model wit h a ter m suc h a s t 2, necessitatin g anothe r bloc k o f critica l
values i n eac h o f Table s 4. 1 and 4.2 . I n cas e (ii) , fo r example , w e nee d
at leas t mode l (36 ) (wit h a constant ) t o allo w fo r th e unknow n startin g
value. I n cas e (iii ) w e hav e a n unknow n constan t an d nee d th e tren d
term i n model (3c ) t o allo w for it s effect .
Each o f thes e simila r test s i s als o exac t i n finit e samples , provide d
appropriate critica l value s ar e available . I n general , however , i t wil l b e
necessary t o abando n exac t test s i n orde r t o us e variant s o f th e
Dickey-Fuller tes t wher e ther e ar e mor e unknow n parameters . Thes e
parameters ca n typicall y be estimated , s o that asymptoticall y they can b e
accounted fo r an d a tes t provided . Again , Kivie t an d Phillip s offe r
general exac t an d simila r test s fo r DGP s wher e th e dynamic s ar e
restricted t o first-order , a s wel l a s demonstratin g th e similarit y o f th e
tests just mentioned .
In th e cas e o f exac t parameterizations , suc h a s cas e (iii ) wit h mode l
(3>), w e d o no t hav e simila r test s wit h th e Dickey-Fulle r distributions .
However, a s West (1988 ) showed , the f-statistic s i n th e exactl y paramet erized cas e (wit h exogenou s item s suc h a s a constan t i n th e DGP ) ar e
asymptotically normal , jus t a s ar e f-statistic s use d fo r standar d prob lems. I n finit e samples , however , th e Dickey-Fulle r distribution s ma y
be a better approximatio n tha n th e norma l distribution . We will explor e
this asymptoti c normalit y further i n Chapte r 6 below.

Critica l value s ar e those corresponding t o the mode l use d i n Table 4.1 or 4.2 .

106 Testin

g for a Unit Roo t

4.2. Genera l Dynami c Model s fo r the Proces s o f Interest


The firs t o f th e method s fo r allowin g richer dynamic s in th e DG P o f th e
process o f interest , { y t } , wa s develope d concurrentl y wit h th e tes t tha t
we hav e alread y describe d fo r a uni t roo t i n th e AR(1 ) model , an d i s
reported i n Fulle r (1976) . Thes e mor e genera l method s yiel d tes t
statistics tha t hav e th e sam e limiting distribution s a s thos e alread y
discussed, becaus e the y ar e base d o n consisten t estimate s o f 'nuisance '
parameters. Henc e w e ma y us e th e las t row s o f Table s 4.1(a)-(c ) o r
4.2(a)-(c) fo r inferenc e wit h thes e statistic s i n larg e samples , bu t i n
small sample s percentag e point s o f thei r distribution s will no t i n genera l
be th e sam e a s fo r thos e applicabl e unde r th e stron g assumption s o f th e
simple Dickey-Fuller model .
When y t follow s a n AR(p) process ,

a tes t ca n be constructe d wit h the regressio n model :

The coefficien t p i s use d t o tes t fo r a uni t root , an d T(p 1) an d


(p - l)/SE(p ) hav e th e limiting distribution s tabulate d i n Tables 4.1(a )
and 4.2(a ) fo r T-*. Moreover , jus t a s i n th e cas e o f a n AR(1)
process, w e ca n exten d thi s regressio n mode l t o allo w for th e possibilit y
that th e data-generatio n proces s contain s a constan t (drift ) ter m o r a
deterministic time trend. Again , fo r suitably modified regression models ,
the asymptoti c distribution s of th e statistic s base d o n p ar e thos e give n
in Table s 4.1(fe)/(c ) an d 4.2(fe)/(c ) fo r T-^. Thes e procedure s ar e
called 'augmented ' Dickey-Fulle r (ADF ) tests .
The ai m i n modification s suc h a s thes e t o th e simple r for m o f th e
Dickey-Fuller tes t i s to us e lagge d change s in th e dependen t variabl e t o
capture autocorrelate d omitte d variable s whic h woul d otherwise , b y
default, appea r i n th e (necessaril y autocorrelated ) erro r term . Wit h th e
additional lagge d term s i t wil l b e possible , i f th e DG P ha s th e for m o f
(4), t o produc e a mode l (5 ) i n whic h asymptoticall y the erro r term s ar e
white noise , becaus e th e nuisanc e parameters ar e know n asymptoticall y
and th e term s involvin g the m ma y b e remove d fro m th e erro r term .
With white-nois e errors , th e asymptoti c Mont e Carl o critica l value s
given i n th e firs t tw o table s ma y b e applied . Moreover , th e asymptoti c
distribution o f th e coefficien t o n th e y r -i ter m i n (5 ) i s no t affecte d b y
the inclusio n o f th e additiona l Aj f _, terms . I f y, is 1(1), th e difference d

Testing for a Unit Root 10

terms ar e al l 1(0 ) an d appropriat e scalin g ensure s tha t th e variance covariance matri x i s asymptoticall y block-diagonal . (Tha t is , al l cross product term s o f 1(0 ) an d 1(1 ) variable s i n th e matri x ar e asymptoticall y
negligible.) I t i s thi s asymptoti c orthogonality tha t drive s th e result ,
much as , i n a standar d regressio n model , on e use s th e orthogonalit y of
the informatio n matri x t o prov e th e statistica l independenc e o f th e
estimated coefficien t vecto r fro m th e estimat e o f the standar d error . Th e
asymptotic theor y an d th e issu e o f 'appropriate ' scalin g ar e discusse d
later i n this chapter an d i n Chapter 6 .
By allowin g the DG P t o tak e th e for m (4 ) rather tha n th e muc h mor e
restrictive AR(1 ) for m (3) , w e hav e expande d th e clas s o f model s t o
which we can validl y appl y unit-roo t test s of thi s type . Not e that , as it
will generall y b e th e cas e tha t p i s unknown even wher e y t i s strictly an
AR(p) process , i t i s generall y safe r t o tak e p t o b e a fairl y generou s
number; i f too man y lags ar e presen t i n (5) , th e regressio n i s free t o se t
them t o zer o a t th e cos t o f som e los s i n efficiency , wherea s to o fe w lags
implies som e remainin g autocorrelatio n i n (5 ) an d henc e th e inapplicab ility o f even th e asymptoti c distributions i n Tables 4. 1 an d 4.2 . On e can ,
of course , perfor m test s fo r autocorrelatio n o n th e estimate d residual s
from (5 ) i n orde r t o chec k th e acceptabilit y o f th e premis e tha t thes e
residuals ar e whit e noise . Alternatively , mode l selectio n procedure s ca n
be used t o choose p, and test fo r a unit root, jointly (see Hal l 1990) .
We have , therefore , a class o f tests fo r th e uni t root whic h can validly
be applie d t o serie s tha t follo w AR(p ) processe s containin g n o mor e
than on e uni t root . Th e nex t natura l ste p i s to attemp t t o exten d furthe r
the clas s of series t o which we can appl y such tests , ideall y in such a way
as t o allo w exogenou s variable s t o ente r th e proces s a s well . Sai d an d
Dickey (1984 ) provid e a tes t procedur e vali d fo r a genera l ARM A
process i n th e errors ; Phillip s (1987a ) an d Perro n an d Phillip s (1988 )
offer a still more genera l procedure .
While th e Said-Dicke y approac h doe s represen t a generalizatio n o f
the Dickey-Fulle r procedure , i t agai n yield s test statistic s wit h th e sam e
asymptotic critica l value s a s thos e tabulate d b y Dicke y an d Fuller . Th e
particular advantag e o f thi s tes t i s tha t w e ca n appl y i t no t onl y t o
models wit h M A part s i n th e errors , bu t als o t o model s fo r whic h (as is
typically th e case ) th e order s o f th e A R an d M A polynomial s i n th e
error proces s ar e unknown . Th e method involve s approximating the tru e
process b y a n autoregressio n i n whic h the numbe r o f lag s increases wit h
sample size .
Begin b y assuming that th e data-generatio n proces s follows :

108 Testin

g for a Unit Root

so tha t th e erro r ter m i n th e autoregressio n follow s a n ARMA(p,q),


presumed t o be stationar y an d invertible . Th e DG P ca n be rewritten a s

where k i s larg e enoug h t o allo w a goo d approximatio n t o th e


ARMA(/>, q) proces s {u,}, s o tha t {v (} i s approximatel y whit e noise .
The nul l hypothesi s i s agai n tha t p = 1. Sai d an d Dicke y sho w tha t th e
test i s valid i n spit e o f th e fact s tha t p an d q ar e unknow n and tha t th e
ARMA(p, q) i s approximated b y a n A R process , a s lon g a s k increase s
with th e sampl e siz e T s o tha t ther e exis t number s c an d r, c > 0 an d
r > 0 , suc h tha t c k > T 1/r an d T~ l/3k^Q. Henc e 7 1/3 i s a n uppe r
bound o n th e rat e a t whic h th e numbe r o f lags , k , shoul d b e mad e t o
grow wit h th e sampl e size . Ordinar y least-square s estimatio n o f th e
model (6 ) i s prove n t o yiel d a consisten t estimato r o f ( p 1); th e tes t
can the n b e base d o n th e ?-typ e statistic , ( p - l)/SE(p) , usin g Tabl e
4.2(a). Clearly , th e for m o f th e regressio n implie d b y th e Said-Dicke y
test i s precisely the sam e a s that o f the augmente d Dickey-Fulle r test .
In thi s case Tabl e 4.2(a) , correspondin g t o a model containin g no drif t
or trend , i s used , bu t th e tes t ca n als o b e adapte d t o allo w fo r a
non-zero drif t ter m fj, i n th e model . Th e tes t i s modified onl y i n s o fa r a s
it i s the n base d no t o n y, bu t o n y t y,wher e y = T~l^^=iyt. Th e
regression mode l (6 ) remain s th e sam e excep t fo r th e firs t regressor ,
which become s (y t-\ y), an d tes t statistic s are calculate d i n th e sam e
way. B y analogy to th e earlie r result s fo r Dickey-Fuller an d augmente d
Dickey-Fuller tests , i t i s no t surprisin g tha t w e no w refe r t o Tabl e
4.2(b), correspondin g t o a mode l containin g a drif t term , fo r th e
significance point s o f the (asymptotic ) distributions of th e statistics .
Monte Carl o studie s of test powe r i n models wit h autocorrelate d erro r
processes, describe d b y Dicke y e t al. (1986) , sugges t tha t th e empirica l
levels o f th e T(p 1) statistics ten d t o b e farthe r fro m th e nomina l tes t
levels tha n thos e o f th e f-typ e statistics . Dicke y e t al. therefor e sugges t
the us e o f th e f-typ e statistic s in thes e cases . Deviatio n o f nomina l fro m
actual tes t level s i s particularly grea t i n DGP s wit h M A part s suc h tha t
the M A la g polynomia l contain s a factor o f ( 1 6L), wit h 6 nea r unity .
The near-cancellation o f such a factor wit h th e factor ( 1 - L ) i n the AR
lag polynomia l (unde r th e null ) affect s th e actua l levels o f bot h T(p 1)
and f-typ e statistics , bu t i s especially seriou s fo r th e former .

4.3. Non-parametri c Test s for a Unit Roo t


In extendin g th e origina l tests abov e t o allo w for higher-order autocorre lation, w e adde d extr a term s t o th e regressio n mode l t o accoun t fo r th e

Testing for a Unit Root 10

autocorrelation i n th e residual s tha t woul d otherwis e b e present . B y


extending the model , i t was possible t o continu e to dra w valid inferences
from th e asymptoti c critica l value s give n i n Table s 4. 1 an d 4.2 ; other wise i t woul d have bee n necessar y t o recomput e thes e critica l value s for
each differen t DGP , whic h i n tur n woul d requir e knowledg e o f th e
unobservable orders (p) o f the processe s i n these underlyin g DGPs.
In expandin g th e se t o f models to whic h we ca n appl y these tests , ou r
aim i s to avoi d increasing the numbe r o f table s o f critical values that we
must fin d an d us e whil e nonetheles s allowin g fo r quit e genera l DGPs .
Phillips (1987a ) provide s a n alternativ e procedur e tha t largel y allow s us
to d o so ; ou r expositio n relie s o n furthe r result s reporte d i n Perro n
(1988) an d Phillip s an d Perro n (1988) . Rathe r tha n takin g accoun t o f
extra elements i n th e DG P b y addin g the m t o th e regressio n model ,
Phillips suggest s accounting for th e autocorrelatio n tha t wil l b e presen t
(when thes e term s ar e omitted ) throug h a non-parametri c correctio n t o
the standar d statistics . Tha t is , whil e th e Dickey-Fulle r procedur e aim s
to retai n th e validit y o f test s base d o n white-nois e error s i n th e
regression mode l b y ensurin g tha t thos e error s ar e indee d whit e noise ,
the Phillip s procedur e act s instea d t o modif y th e statistic s afte r estima tion i n orde r t o tak e int o accoun t th e effec t tha t autocorrelate d error s
will hav e o n th e results . Asymptotically , th e statisti c is corrected b y th e
appropriate amount , an d s o th e sam e limitin g distribution s apply. Fro m
one perspective , th e effec t i s the sam e a s that o f ADF-type tests: we can
validly conduc t asymptoti c inferenc e usin g Table s 4. 1 an d 4.2 . Thi s
procedure doe s not , however , requir e th e estimatio n o f additiona l
parameters i n the regressio n model .
The data-generatio n process that is assumed to hol d is

or equivalently

It i s importan t t o note , however , tha t th e erro r ter m i s no t bein g


assumed t o follo w a white-nois e process . Th e condition s tha t u t mus t
satisfy i n (70 ) an d (Ib) ar e thos e liste d above i n Chapte r 3 as conditions
(3.160)-(3.16d) give n in Phillips (19870).
As wit h th e Dickey-Fulle r tests , test s o f th e Phillip s typ e ar e base d
upon on e o f three differen t regressio n models , differin g onl y i n on e cas e
from thos e use d earlier , b y centring the tren d term :

110 Testin

g for a Unit Roo t

and
It i s eas y t o calculat e fro m thes e regression s th e coefficien t estimate s
and th e '^-statistics ' fo r each . Fo r test s o f th e significanc e o f p,- , th e
statistics ar e the n adjuste d t o reflec t autocorrelatio n i n th e corresponding Uit series . (W e wil l omi t subscript s a , b , o r c o n u t t o simplif y
notation.) I f we defin e

and

then th e limitin g distribution s of th e tes t statistic s do no t depen d upo n


the parameter s o f the proces s determinin g th e sequenc e {u t} i f o 2 = ou.
In th e cas e o f test s statistic s o f th e Dickey-Fulle r (DF ) typ e tha t w e
examined earlier , th e mode l i s presumed t o captur e th e relevan t features
of th e proces s i n suc h a wa y tha t th e error s ar e independentl y an d
identically distributed ; th e latte r i s sufficien t t o guarante e tha t a 2 = o 2u.
Note tha t th e statistic s use d i n th e DF-typ e parametri c test s d o emerg e
as specia l case s o f th e non-parametri c statistic s wher e th e estimate s o f
the parameter s o 2 an d o 2u ar e equa l (i.e . where th e estimate s S 2U an d
S2Tt, give n in (11) and (12 ) below, are equal) .
We wil l se e thi s mor e clearl y whe n w e examin e th e non-parametri c
statistics. I n orde r t o d o so , w e firs t nee d consisten t estimator s o f o 2
and o 2u. Ther e ar e a numbe r o f possibl e choices . I f \i = 0 i n th e DG P
(7), the n th e standar d estimato r fro m an y o f (8a) , (8>) , (8c ) wil l b e
consistent fo r a u\ that is,

where u, represents th e residual s fro m on e o f (8a), (8b), (8c) , above. If


j U ^ O , th e estimato r i s no t consisten t usin g th e residual s {u at}, bu t
residuals fro m eithe r o f th e othe r tw o model s d o yiel d a consisten t
estimate.
For th e estimato r o f a 2 , a consisten t estimato r ca n b e foun d a t th e
cost o f strengthenin g th e assumptions . First , conditio n (3.16& ) i s re placed wit h the conditio n tha t sup r E(\u t\2^} < fo r som e fi>2 . Next ,
a conditio n mus t b e place d o n th e la g truncatio n paramete r which wil l
be use d i n definin g th e estimato r o f a 2. The conditio n i s that a s
T> oo , suc h tha t ( i s o(T 1/4). Tha t is , th e numbe r o f lag s use d i n

Testing for a Unit Root 11

estimating autocorrelation s o f th e residual s increase s wit h th e sampl e


size, but les s quickly than its fourth root.
Given these conditions , a consistent estimato r o f a 2 is

The estimato r i s indexe d b y th e la g truncatio n paramete r t o indicat e


that differen t choice s o f wil l lead t o differen t values . I t remain s only
to specif y th e residual s t o b e use d i n (12) , and, as i n (11 ) above , w e
may choos e the m fro m an y o f (8a) , (86) , (8c ) if fj. = 0. Als o a s i n (11),
,u + 0 require s tha t w e us e th e residual s fro m on e o f th e model s tha t
does contai n a constant ter m in order t o preserv e th e consistenc y of this
variance estimate . Evidentl y th e saf e strateg y i s t o tak e residua l esti mates fro m (8b) o r (8c ) i n an y cas e wher e ther e seem s eve n a smal l
probability tha t th e data-generatio n proces s contain s a constan t (drift )
term.
It i s important t o not e tha t bot h o f th e varianc e estimates S 2U an d S 2T(
could b e define d usin g th e firs t difference s y t yt_i rathe r tha n th e
residuals u t. Under th e nul l hypothesis that p 1 and that th e drif t an d
trend term s are zero , the two wil l of cours e be equivalen t asymptotic ally. I n finit e samples , whic h o f th e tw o method s i s use d ca n mak e a
substantial difference , however ; we will return to thi s point below.
While S\e jus t define d i s consisten t fo r o 2 give n residual s fro m th e
appropriate model , i t unfortunatel y doe s no t guarante e a non-negativ e
estimate fo r finit e sampl e sizes . However , on e ca n guarante e a nonnegative estimat e wit h a simpl e modificatio n o f (12 ) pioneered b y
Newey an d Wes t (1987) , whic h i s moreove r consisten t unde r precisel y
the sam e conditions as is (12). Define

where (o f(j) = 1 - j((, + I)"1. A fe w example s o f test s usin g thes e


quantities t o transfor m th e tes t statistic s ca n b e presente d withou t
further discussion . Thereafter we will present statistic s for hypothese s o n
\nb, \n c, an d y e i n (8b) an d (8c) , and fo r hypothese s involvin g p a s well
as these parameters.
Consider th e hypothesi s tha t p b = I (i n (8b)). 2 A n asymptoticall y
valid tes t consist s of the statistic 3

W e trea t th e initia l observatio n a s fixe d a t zero ; not al l statistics here are invarian t t o
the initia l value. Se e Phillips (1987a) an d Perron (1988).
3
Thes e statistic s ar e vali d fo r eithe r choic e o f S 2Tt give n abov e (i.e . the Phillip s o r
Newey-West forms) .

112 Testin

g for a Unit Roo t

or, alternatively ,

where t(p b) i s th e ^-statisti c associate d wit h testin g th e nul l hypothesi s


pb - 1 . Th e first o f these statistics , Z(p b), ha s under th e null hypothesis
(H0: p b = 1) the limitin g distribution give n in Table 4.1(6) (T * ) ; th e
second ha s th e limitin g distribution give n in Tabl e 4.2(6 ) (7 1 ) unde r
the sam e null . I t i s especially usefu l t o not e agai n her e th e fac t tha t th e
original Dickey-Fuller statistic s are specia l case s o f these. Unde r Dicke y
and Fuller' s assumptions , th e {/,, } f=i ar e independentl y an d identicall y
distributed, implying , a s w e note d above , tha t o\ = a2 an d therefor e
that E(S 2Tf) = E(S 2U). Henc e o n averag e S 2T{ = S 2U, an d Z(p b) reduce s
to T(p b 1). Thi s i s precisely th e firs t o f th e statistic s tha t Dicke y an d
Fuller examine . Moreover , Z(t(p b)) reduce s t o t(p b), th e ordinar y
regression ^-statistic , an d ha s the distributio n given in Table 4.2.
The correspondin g statistic s for model s (8a) an d (8c ) are als o give n in
Perron (1988) , an d shar e thi s property . Fo r (8a), th e tes t statistic s ar e
similar t o (14 ) and (15) . They ar e (wit h _y 0 = 0)

and

Analogous t o th e test s o n (8a) , (16 ) has th e significanc e points give n in


Table 4.1(a ) an d (17 ) those i n Table 4.2(a) . Finally , fo r mode l (8c) , we
have

and
having th e limitin g distribution s tabulate d i n Table s 4.1(c ) an d 4.2(c )
respectively. Th e quantit y D x i s defined a s the determinan t o f th e inne r
product o f the dat a matri x with itself: for (8c),

where, again , summation s are ove r al l available elements o f the vectors .

Testing fo r a Unit Root 11

In additio n t o th e extensio n o f th e Phillip s (1987fl ) result s t o th e cas e


of regressio n model s containin g constan t an d trend , Phillip s an d Perro n
(1988) presen t simulatio n evidenc e regardin g th e powe r o f th e Phillips type procedure s vis-a-vis that o f the Said-Dicke y procedure , eac h bein g
applicable t o processe s tha t hav e genera l ARMA(j> , q) processe s i n th e
errors fro m a regressio n mode l tha t consist s o f a constan t an d lagge d
dependent variable . Th e data-generatio n process i s taken t o be

To characteriz e th e result s roughly, the Phillip s or Phillips-Perron tes t


generally ha s highe r power , bu t suffer s substantia l siz e distortion s fo r
6 < 0, i n sample s o f size s typicall y foun d i n economics . Th e Said Dickey tes t als o involve s siz e distortion s fo r 9 < 0, bu t muc h smalle r
ones: tha t is , eac h tes t reject s a tru e nul l o f p = 1 mor e tha n th e
nominal siz e ( 5 per cen t i n these experiments ) states , bu t th e proble m is
much wors e fo r th e Z(p ) an d Z(t(p)) statistic s o f Phillip s an d Perron ,
where rejection s o f th e tru e nul l rang e a s hig h a s 99. 7 pe r cen t fo r
6 = -0.8. (Siz e an d powe r als o depen d upo n th e numbe r o f lags chose n
in th e Said-Dicke y tes t an d o n th e la g truncatio n paramete r i n th e
Phillips-Perron tests. ) Fo r th e Said-Dicke y test , th e larges t siz e distor tions (wit h tw o lags , a tru e nul l i s rejecte d approximatel y 67. 7 pe r cen t
of th e tim e a t a nomina l siz e o f 5 per cent ) disappea r a s th e numbe r of
lags used increases, fallin g t o onl y 1 2 per cen t where 1 2 lags are used .
This simulatio n stud y i s o f cours e a limite d one , dealin g a s i t doe s
with onl y on e AR M A proces s fo r th e equatio n errors . I t doe s howeve r
suggest tha t th e Phillips-typ e test s ar e mor e likel y to rejec t th e nul l of a
unit root , whether or no t i t i s false; fo r error s wit h stron g negativ e M A
components, th e differenc e i s quite large . On e migh t suspect a s well that
the powe r o f th e Said-Dicke y procedur e woul d be highe r fo r processe s
involving A R errors , becaus e th e tes t regressio n capture s A R term s
precisely.
Phillips an d Perro n conclud e b y recommendin g thei r ow n Z(p ) tes t
for model s wit h positiv e M A o r II D errors , an d th e Said-Dicke y
statistic for models with negative MA errors .

4.4. Test s o n More than One Paramete r


The test s abov e hav e al l been directe d a t testin g th e leve l autoregressiv e
parameter alone . I n model s (8b) an d (8c) , however , ther e ar e othe r
parameters present , an d on e ma y b e intereste d i n a forma l tes t o f th e
hypothesis tha t on e o f thes e i s zero , o r i n a joint test . Test s simila r t o

114 Testin

g for a Unit Roo t

those abov e ca n b e provided , bu t a furthe r se t of table s mus t b e use d t o


find th e significanc e point s o f th e distribution s o f th e resultin g tes t
statistics. Table s 4. 4 an d 4. 5 belo w ar e base d o n thos e give n b y Dicke y
and Fulle r (1981) , wh o provid e likelihoo d ratio , ^-type , an d F-type
statistics for test s on th e parameter s fi b, (JL C, an d y c i n (8b) an d (8c) . Th e
tables ar e agai n derive d fro m a Mont e Carl o simulation .
The statistic s tha t Dicke y an d Fulle r offe r ar e derive d unde r th e
assumption tha t u bt an d u ct ar e white-nois e processes , bu t the y sho w
that, a s wa s th e cas e wit h test s above , th e sam e distribution s ca n b e
applied wher e th e error s follo w a n autoregressiv e proces s an d a cor rectly specifie d mode l i s used t o estimat e th e parameter s o f thi s process .
As we noted earlier , however , it is desirable t o generaliz e th e test s t o b e
applicable t o a s broad a s possible a class o f error processes , o f unknown
form. Thi s ca n be done , onc e again , using a non-parametric correction .
Table 4. 3 summarize s th e Mype , F-type , an d non-parametri c tes t
statistics used fo r severa l nul l hypotheses involvin g the parameter s fi an d
y. I n additio n t o th e quantitie s define d above , w e requir e

The Phillips-Perro n correction s t o th e standar d Dickey-Fulle r statist ics mus t howeve r b e use d cautiously . Again , th e accumulate d evidenc e
of severa l Mont e Carl o simulatio n studie s suggest s tha t th e non-para metrically correcte d tes t statistic s d o no t alway s hav e th e correc t size s
even in fairl y larg e samples .
Schwert (1989 ) make s thi s poin t forcefully . Hi s results , amplifyin g
those i n th e Phillips-Perro n simulation s reporte d earlier , sho w tha t th e
critical value s o f th e augmente d Dickey-Fulle r tes t statistics , give n b y
the standar d Dickey-Fulle r tables , ar e muc h mor e robus t t o th e
presence o f movin g averag e term s i n th e error s o f th e random-wal k
process tha n ar e th e correspondin g non-parametricall y adjuste d Dickey Fuller statistics . A n example , take n fro m Schwert , i s sufficien t t o
illustrate th e point .
The data-generatio n proces s i s give n by 4 y, = yt-i + ut + du t~i,
4

Fo r conformit y wit h th e notatio n o f Phillips-Perro n use d earlier , th e sig n o f th e


coefficient o n 6 is changed here .

TABLE4.3(a). Tes t statistics for simple hypotheses in models with drif t an d trend 3
Statistic typ e Tes

t Statistic

Critica l values for Z(TI) , Z(t2) , an d Z(T^) ar e th e sam e as those fo r TI , TI, an d 7 3 respectively and ar e tabulate d i n Table 4.4.
Note als o tha t S 2U an d S\ e ar e define d wit h respect t o th e residual s o f a particula r model , an d s o diffe r acros s models (8a), (8b),
and (8c) . c ti(j) i s the it h diagonal element of the invers e second-moment matrix of the regressors i n model j .
Sources: Dickey and Fuller (1981 ) and Perro n (1988) .

TABLE 4.3(6). Test statistics for joint hypothesesa

Critical values for Z(<>i), Z(<J> 2 )> and Z(<t>3) are the same as those for <!>!, <I>2, and <53 respectively and are tabulated in Table
4.5. Note also that S2U and S2T( are defined with respect to the residuals of a particular model, and so differ across models (8a),
(8b), and (8c).
Sources: Dickey and Fuller (1981) and Perron (1988).

Testing for a Unit Root 11

(t - -19,. . ., T), wher e th e {, } proces s i s a normall y distribute d


white-noise process . Th e firs t 2 0 observation s ar e discarde d t o contro l
for th e effec t o f th e initia l conditions . Sample s o f siz e T = 25, 50 , 100 ,
250, 500 , an d 100 0 ar e use d i n th e experiment s an d eac h experimen t i s
replicated 10,00 0 times . Th e M A paramete r 9 is set equa l t o 0.8 , 0.5 , 0 ,
-0.5, an d 0.8 . Th e mode l estimate d i s

Six differen t tes t statistic s ar e considered , includin g th e ordinar y an d


augmented Dickey-Fulle r statistic s an d th e Phillips-Perro n statistics .
Both th e augmente d Dickey-Fulle r an d the Phillips-Perro n statistic s ar e
TABLE 4.4. Empirica l cumulative distribution s
DGP: (8a ) with p = 1
Sample size (T) Probabilit

y o f a smaller value 3

0.90 0.9

5 0.97

5 0.9

(a) Tes t statisti c r i; model (8b)


25
2.20
50
2.18
2.17
100
250
2.16
500
2.16
OO
2.16

2.61
2.56
2.54
2.53
2.52
2.52

2.97
2.89
2.86
2.84
2.83
2.83

3.41
3.28
3.22
3.19
3.18
3.18

(b) Tes t statisti c T 2; model (8c)


25
2.77
50
2.75
100
2.73
250
2.73
500
2.72
00
2.72

3.20
3.14
3.11
3.09
3.08
3.08

3.59
3.47
3.42
3.39
3.38
3.38

4.05
3.87
3.78
3.74
3.72
3.71

(c) Tes t statistic r3; model (8c)


25
2.39
50
2.38
100
2.38
250
2.38
500
2.38
00
2.38 .

2.85
2.81
2.79
2.79
2.78
2.78

3.25
3.18
3.14
3.12
3.11
3.11

3.74
3.60
3.53
3.49
3.48
3.46

Al l entrie s i n th e tabl e hav e standard error s o f les s tha n 0.01 . Distribution s


are symmetric.
Source: Dicke y and Fuller (1981 : 1062) .

Testing for a Unit Roo t

118

TABLE 4.5. Empirica l cumulativ e distribution s


Sample Probabilit
size (T)
0.01

y of a smaller value a
0.025 0.0

0.10

(a) Tes t statistic <E>!; DGP : (8b) wit h Pb = 1 ,


25
0.29
0.65
0.38
0.49
50
0.29
0.50
0.66
0.39
100
0.29
0.39
0.50
0.67
250
0.30
0.51
0.67
0.39
0.30
500
0.39
0.51
0.67
00
0.30
0.67
0.40
0.51
(6) Tes t statistic O2; DGP : (8c) wit h
25
0.61
0.75
0.89
0.62
50
0.77
0.91
100
0.63
0.77
0.92
0.63
0.92
250
0.77
0.63
500
0.77
0.92
00
0.63
0.92
0.77
(c) Tes t statistic
0.74
25
0.76
50
0.76
100
250
0.76
0.76
500
00
0.77

0.90

0.95

0.975 0.9

r\.

4.12
3.94
3.86
3.81
3.79
3.78

; mode l (8b)
6.30
5.18
4.86
5.80
4.71
5.57
4.63
5.45
5.41
4.61
4.59
5.38

Me = 0 , yc = 0; model (8c )
6.75
4.67
5.68
5.13
5.94
4.31
5.59
4.16
4.88
4.07
4.75
5.40
4.05
4.71
5.35
4.03
4.68
5.31
Xc
=
0
;
model ( 8c)
0>3; DGP : (8c) wit h PC = 1 , '
1.33
7.24
8.65
0.90
1.08
5.91
1.37
5.61
6.73
7.81
0.93
1.11
7.44
0.94
1.12
1.38
5.47
6.49
5.39
6.34
7.25
0.94
1.13
1.39
0.94
1.39
5.36
6.30
7.20
1.13
5.34
6.25
0.94
1.39
7.16
1.13
Pc = l ,

1.10
1.12
1.12
1.13
1.13
1.13

7.88
7.06
6.70
6.52
6.47
6.43
8.21
7.02
6.50
6.22
6.15
6.09
10.61
9.31
8.73
8.43
8.34
8.27

Al l entrie s i n th e lef t hal f o f th e tabl e hav e standar d error s o f les s tha n


0.005; those in the righ t half , les s tha n 0.06 .
Source: Dicke y an d Fulle r (1981 : 1063) .

computed fo r tw o differen t length s o f lags . Th e firs t la g lengt h i s given


by 4 = [4(T/100) 1/4] an d th e secon d b y 12 = [12(7/100) 1/4]; [x ] denote s
the largest intege r les s tha n or equa l t o x.
The result s o f thi s experimen t ar e presente d i n Table s 1 an d 2 o f
Schwert (1989 : 148-9) . The y indicat e tha t th e distribution s o f th e
Phillips-Perron test s ar e no t clos e t o th e Dickey-Fulle r distribution .
The distributions ar e closest whe n 6 - 0. 5 or 0.8 but diffe r markedl y for
values o f 9 - 0. 5 an d 0.8 . Th e discrepancie s persis t eve n wit h
sample size s a s larg e a s T = 1000. Th e AD F statistics , o n th e othe r
hand, hav e distribution s tha t ar e muc h close r o n averag e t o th e
Dickey-Fuller distribution .
The poo r behaviou r o f th e Phillips-Perro n test s wher e negativ e M A
terms ar e presen t persist s i n regression s tha t incorporat e a tim e trend .

Testing for a Unit Root 11

Schwert als o report s the distribution s of the normalize d unit-roo t


estimators (i.e . T(p 1)) i n thei r AD F an d non-parametricall y cor rected D F versions . Th e conclusion s remai n unaltered . Finally ,
Schwert's simulation s d o sugges t tha t th e finite-sampl e performanc e
under th e nul l o f th e Phillips-Perro n procedures , i n th e case s wher e
MA term s caus e siz e distortions , is bette r whe n S 2U and S 2Tf are
calculated usin g th e firs t difference s o f y t tha n wher e th e regressio n
residuals ar e used . However , th e test s ma y the n fai l t o b e consisten t
against som e stationar y alternativ e hypothese s (Stoc k an d Watso n
I988b). I t seem s safest , therefore , t o avoi d thes e test s i f ther e i s an y
evidence o f th e kin d o f M A componen t t o th e error s tha t cause s siz e
distortions.
An alternativ e procedur e i s propose d b y Hal l (1989) , wh o suggest s
that I V b e use d i n place o f OL S i n augmente d Dickey-Fuller tests . Th e
level instrumenta l variabl e use d i n plac e o f y,^. 1 i s y t-(k+i), wher e th e
residual autocorrelatio n functio n ha s non-zer o element s onl y u p to la g k
(see Sectio n 4.6. 4 below) . Hall' s Mont e Carl o result s sugges t tha t th e
method perform s well , particularly for negative MA erro r processes .

4.5. Furthe r Extension s


Two mor e extension s o f th e testin g procedur e ma y b e considered . Th e
first concern s testin g fo r multipl e uni t root s i n a process . Th e secon d i s
testing fo r uni t root s a t seasona l frequencies . Inventorie s ma y b e
regarded a s a goo d exampl e o f a variable tha t i s likel y t o b e 1(2 )
(contains tw o uni t roots) , a s i t i s constructe d b y aggregatin g a functio n
of flo w variable s (productio n an d sales ) whic h ar e individuall y 1(1) ; a
test fo r multipl e uni t root s woul d therefor e b e importan t whe n dealin g
with stoc k variable s o f thi s kind . Test s fo r seasona l uni t root s ar e
applicable whe n seasona l dat a ar e used . Standar d unit-roo t test s ma y
provide misleadin g result s i n th e presenc e o f integratio n a t seasona l
frequencies.
4.5.1. Multiple Unit Roots
Consider th e proble m o f testin g fo r d > 1 uni t root s i n a series . Th e
sequence o f testingwhic h start s wit h a test fo r a singl e unit root i n th e
undifferenced series , the n proceed s t o a test fo r a second uni t root (tha t
is, test s th e first-difference d series ) i f th e firs t nul l (o f a uni t roo t i n
levels) i s not rejected , an d s o ondoes not constitut e a statistically vali d
testing sequence , sinc e al l o f th e unit-roo t test s considere d i n thi s
chapter tak e th e complet e absenc e o f uni t root s a s th e alternativ e

120 Testin

g for a Unit Roo t

hypothesis. Dicke y an d Pantul a (1987 ) sugges t a more natura l sequentia l


testing procedur e fo r uni t root s whic h take s th e largest 5 numbe r o f uni t
roots unde r consideratio n a s th e firs t maintaine d hypothesi s an d the n
decreases th e orde r o f differencin g eac h tim e th e curren t nul l hypothesis
is rejected . Thi s continue s unti l th e firs t tim e th e nul l hypothesi s i s no t
rejected.
The sequentia l procedur e ma y be illustrate d fo r th e cas e d = 2. Le t u s
consider th e AR(2 ) model ,
This mode l ca n be re-parameterize d a s
where ft = (pjp 2 - 1 ) and ft = -(1 - pj)( l - p 2).
The testin g procedure consist s o f the followin g steps:
1. Tes t th e nul l hypothesi s o f tw o uni t root s agains t th e alternativ e o f
a singl e uni t root . Unde r thi s nul l hypothesi s f t = f t = 0 an d a n F-tes t
may b e use d t o tes t it . Suc h a test , however , doe s no t tak e accoun t o f
the one-side d natur e o f th e alternativ e hypothesis . A mor e powerfu l
procedure follow s fro m notin g that , unde r bot h th e nul l an d th e
alternative hypotheses , f t = 0. However , f t = 0 unde r th e nul l hypo thesis bu t i s les s tha n zer o unde r th e alternativ e hypothesis . Thus , a
more powerfu l tes t i s give n b y estimatin g th e regressio n o f A 2 y, o n
Ay f _!, computin g th e f-rati o o f ft , an d performin g a one-side d lower tail test usin g the Dickey-Fulle r critica l values .
2. I f th e nul l hypothesi s abov e i s rejected , procee d t o tes t th e nul l of
one uni t roo t versu s th e stationar y alternative . Her e HQ an d HI ar e
given b y f t < 0, f t = 0, an d f t < 0, f t < 0 respectively . Thus , a
one-sided f-tes t her e involve s estimating the regressio n o f A 2 y, on A y f _ j
and y t-\, computin g th e f-rati o o f ft , an d comparin g i t wit h th e
Dickey-Fuller values .
This testin g procedure ma y be generalize d t o testin g fo r three o r mor e
unit roots . Dicke y an d Pantul a (1987 ) contain s th e result s o f a simula tion study . Thei r genera l conclusio n i s tha t th e sequentia l procedure ,
consisting o f testin g a nul l hypothesi s o f k uni t root s agains t a n
alternative o f k 1 uni t roots , base d o n f-tests , i s considerabl y mor e
powerful tha n a n F-test-base d procedure .
4.5.2. Seasonal Integration
We hav e s o fa r focuse d attentio n o n testin g fo r a uni t roo t a t th e zer o
frequency. However , whe n seasona l dat a ar e used , i t ma y b e necessar y
5

Not e tha t th e firs t sequenc e too k th e smallest numbe r (i.e . 1 ) of uni t root s a s it s firs t
maintained hypothesis .

Testing for a Unit Root 12

to allo w fo r seasona l averagin g o r seasona l differencin g t o achiev e


stationarity. Fo r example , th e appropriat e differenc e to use to transform
to stationarit y ma y not be x, - x t-i, bu t xt - x t~4 i n quarterly dat a or
xt - x,~i2 i n monthly data. Seasona l integratio n (an d co-integration ) and
testing fo r uni t root s a t seasona l frequencie s ar e discusse d b y Engle ,
Granger, an d Hallma n (1988) , Ghysel s (1990) , Hylleberg , Engle ,
Granger, an d Yo o (1990) , Engle , Granger , Hylleberg , an d Le e (1993) ,
and Ilmakunnas (1990) amon g others .
Just a s a tim e serie s wit h n o seasona l componen t ma y b e wel l
described b y a deterministi c process, a stationar y stochasti c process , o r
an integrate d process , th e seasona l componen t o f a tim e serie s ma y b e
well describe d b y a proces s fro m an y o f thes e classes , o r ma y combin e
elements o f each . Whil e i t i s commo n practic e t o mode l a seasona l
component a s havin g a deterministi c o r stationar y form , ther e ma y b e
cases wher e i t i s appropriat e t o allo w th e mode l o f th e seasona l
component t o drif t substantiall y ove r time . Thi s possibilit y is implicit in
the practic e o f seasona l differencin g (se e e.g . Bo x an d Jenkin s 1970) ,
whereby a proces s observe d s time s pe r yea r woul d b e transforme d t o
its , s -period difference , x t x,-s, o n th e assumptio n tha t th e proces s
contains an integrated seasona l component .
In orde r t o allo w for a unit root a t a seasonal frequency, it is useful t o
factor th e la g polynomial of the process . I f the la g polynomial contains a
factor ( 1 - L s ) = A 5 , correspondin g t o a seasona l uni t root , the n i t can
be factorize d as

That is , th e seasona l differenc e operato r ca n b e broke n dow n int o th e


product o f th e firs t differenc e operato r an d th e moving-averag e seasonal
filter 5(L ) containin g further root s o f modulus unity.
Engle e t al. (1988 ) defin e a variabl e x t t o b e seasonall y integrated o f
orders d an d D (denote d SI(d, D)) , i f & dS(L)Dxt i s stationary . Thus ,
for quarterl y data , i n th e terminolog y establishe d above , i f A 4 jr r i s
stationary, the n x, is SI(1, 1) with S(L) = 1 + L + L 2 + L 3 . Further ,

Hence th e quarterl y seasona l uni t roo t proces s ha s fou r root s o f


modulus unity : on e a t th e zer o frequency , on e a t th e two-quarte r
(half-yearly) frequency , an d a pai r o f comple x conjugat e root s a t th e
four-quarter (annual ) frequency . T o relat e thes e root s t o frequencie s in
an intuitiv e way , conside r th e deterministi c proces s a(L)x t = 0. Fo r

122 Testin

g for a Unit Root

a(L) (1 + L) , the n x,+i = -x, an d so ^(+2 = x t; th e proces s return s to


its origina l valu e o n a cycl e wit h a perio d o f 2 . Fo r a(L) = ( 1 /L),
then x t+i = i.xt, x t+2 f2x, = ~*< > *r+ 3 = '*r> an d ^, +4 = i 2xt = * s o
that th e proces s repeat s wit h a period o f 4.
As wit h a proces s wit h a singl e uni t roo t a t th e zer o frequenc y (e.g .
the rando m wal k (1 L)x, = et), a seasonally integrated proces s suc h as
(1 - L 4)xt = r retain s th e effec t o f shock s indefinitely , an d ha s a
variance whic h increase s linearl y wit h time . However , becaus e th e
seasonally integrate d proces s contain s multiple roots o f modulus unity, it
does no t behav e lik e a n 1(1) proces s i n all respects. Fo r example , shock s
to th e syste m wil l als o alte r th e seasona l patter n o f th e series , s o tha t
the sequence s o f observation s corresponding to eac h quarte r ma y evolve
in differen t ways . Th e firs t differenc e o f suc h a seasonall y integrate d
process wil l not b e stationary.
Testing fo r a uni t roo t a t a seasona l frequenc y ha s muc h i n commo n
with testin g fo r uni t root s a t th e zer o frequency . Test s hav e bee n
proposed b y Hasza an d Fulle r (1982) , Dickey , Hasza , an d Fuller (1984) ,
Osborn, Chui , Smith , an d Birchenhal l (1988) , Hylleber g e t al (1990) ,
and Engl e e t al. (1993) , amon g others. W e wil l follow Hylleberg e t al. i n
describing a testing strategy.
Consider a process observe d quarterl y and generate d b y
where e t i s IID(0 , cr 2) an d y(L ) i s a fourth-orde r la g polynomial . W e
wish t o tes t th e nul l hypothesi s tha t th e root s o f y(L ) li e o n th e uni t
circle, agains t th e hypothesi s tha t the y li e outside . Definin g thre e
positive parameters <5j , <5 2, an d <5 3, y(L ) ca n b e represente d as 6

For 5 j clos e t o one , thi s ca n b e furthe r rewritte n b y usin g a Taylo r


series approximation , a s

where th e las t ter m i s a remainde r (se e Engl e e t al . (1993 ) fo r th e


approximation theorem) . Makin g th e substitution s ^ = A1; 7r 2 = A2 ,
2A3 = 773 + iTT4, an d 2A 4 = ir^ in^, rewritin g th e expressio n fo r
6
Th e las t ter m appear s a s ( 1 + 6 3 L 2 ) rathe r than , a s migh t b e expected , a s
(1 + <5 3L)(1 + 6 4L) becaus e y(L ) i s a real la g polynomial , an d henc e a t leas t tw o o f it s
roots mus t be comple x conjugates of each other .

Testing for a Unit Root 12

and groupin g terms in ?r 3 an d 77 4, w e hav e

Substituting this expression int o (20) and rearranging, we have

(21)

Equation (21 ) can b e estimate d b y OLS , possibly wit h adde d lag s of


the dependen t variabl e t o captur e autocorrelatio n i n th e errors . T o tes t
the nul l that ther e i s a unit roo t a t zer o frequency , we test A j = 0, which
corresponds t o ji ^ = 0 ; t o tes t fo r a roo t o f 1 (half-yearly frequency) ,
we tes t A 2 = 0,, corresponding t o ?r 2 = 0 ; t o tes t fo r root s o f L (annual
frequency), w e tes t tha t A 3 o r A 4 = 0 , eac h o f which requires a joint tes t
that 77 3 an d ?r 4 ar e equa l t o zero . Rejectio n o f al l o f thes e nul l
hypotheses implies stationarity of the process .
The critica l value s fo r thes e test s ar e relate d t o th e Dickey-Fulle r
(jri an d 7r 2) an d Dickey-Hasza-Fulle r values , an d ar e tabulate d b y
Hylleberg e t al.
Various extension s o f th e basi c mode l ar e considere d b y Hasz a an d
Fuller (1982) , Dicke y e t al. (1984) , an d Osbor n e t al. (1988) , notabl y t o
allow fo r th e presenc e o f a deterministi c constan t an d tren d term s i n
(20) an d highe r orders o f integration.

4.6. Asymptoti c Distribution s o f Test Statistic s


We wil l no w conside r som e example s o f th e us e o f a functiona l centra l
limit theore m t o deriv e th e asymptoti c distributions of test statistic s such
as thos e above , fo r hypothese s involvin g integrate d variables . Again ,
recall tha t result s o n th e sum s of powers o f trend term s ar e summarized
in Sectio n 1.5.5 , an d tha t th e relationship s amon g particula r sampl e
moments, functional s o f Wiene r processes , an d densitie s fro m th e
normal family ar e give n in Table 3.3.
4.6.1. Example: Dickey-Fuller Tests
The simples t versio n o f this test i s based o n th e nul l hypothesis tha t th e
DGP i s y t = y t-i + u t, u, ~ IID(0 , cr 2) an d y$ = 0. Th e mode l use d i s

124 Testin

g for a Unit Roo t

y, = pyt-\ + ut. Therefore, estimatin g the model b y OLS,

By equations (3.22 ) and (3.23) ,

and

Hence

The percentile s o f this distribution are thos e give n in Table 4.1(a) .


Further,

where

The percentile s o f the distributio n are give n in Table 4.2(a) .


Now suppos e tha t y t i s generate d b y th e slightl y mor e elaborat e
process,
with u t ~ IID(0 , a2 ) and y 0 = 0. The model is given by
where p c = 1 , y c = 0 unde r th e null . Th e nul l hypothesi s therefor e
entails tha t th e serie s i s a rando m wal k wit h possibl e drift , an d th e
alternative is stationarity around a possibly non-zero deterministi c trend.
Consider usin g th e mode l i n (23) , wit h y c an d fi c unconstrained , t o
test nul l hypothese s o f th e for m H\:p c = \, H^:y c = Q, an d
HO- (P C ~ 1 ) = Yc =0- # o i s tne standar d Dickey-Fulle r null , give n as
case (iii ) in the discussio n earlier o n simila r tests. Thes e test s ma y al l b e

Testing for a Unit Root 12

put withi n a commo n framewor k b y usin g a se t o f transformation s


suggested b y Sims e t al. (1990).
Under th e null HQ, then y t - \i ct + St, where S t - 2i= i M i- I n general,
(23) ca n be rewritte n as
or

where z j = (zi, t, Z 2 , t , z^, 0 ' = (#1, #2 , #3) . wit h 6 l = (jU c + y c ),


#2 = PC, #3 = (y c + PA), and zu = 1, z2 , r = y, - fi ct = St, zi,, = t .
The transforme d regressor s ar e linea r combination s o f th e origina l
regressors, wit h th e linea r combination s chose n t o isolat e th e regressor s
with differen t stochasti c properties that is , a constant , a n integrate d
process wit h n o deterministi c tren d component , an d a linea r trend ,
respectively. Give n th e rate s o f convergenc e implie d i n (3 . 21) -(3. 24),
OLS estimator s o f th e coefficient s i n 6 converg e a t differen t rates .
Define th e scalin g matri x Tr = diag(T 1 / 2 , T , T 3/2) partitione d conformably with z, an d 0 .
With these definitions , th e OL S estimator o f 0 is

so that
where

From (3.21)-(3.24 ) w e ca n deriv e th e limitin g distributions of th e si x


elements i n th e 3 x 3 symmetri c matri x V T an d th e thre e differen t
elements i n 0 r . Thi s i s don e unde r th e additiona l assumptio n tha t
fic 0, withou t an y los s o f generality , since , havin g included th e tren d
in (23) , th e estimate s 0 ar e invarian t t o th e tru e valu e o f fi c give n tha t
there is in fact n o trend i n the DGP. 7 Thes e element s are :

Refer t o the discussio n on similar tests earlie r in the chapter .

126 Testin

g for a Unit Root

The analytica l densitie s o f Vr,i,2 > ^7,2, 3 > 07,i > 07, 2 > an d 0 r> 3 ca n b e
found fro m Tabl e 3.3 . I n th e cas e o f <j) T^ w e use th e fac t tha t th e squar e
of W(l) i s distribute d a s ^ 2(1), recallin g that W(l) i s standar d normal.
The closed-for m densit y fo r th e functiona l t o whic h ^7,2, 2 converge s is
more difficul t t o derive , but a n asymptoti c expansion i s given by Abadi r
(1992).
If, a s i n thi s Dickey-Fulle r test , w e ar e particularl y interested i n th e
estimator o f p c an d it s ?-ratio , t(p c), choosin g th e appropriat e element s
from abov e gives
and

where V 2? denote s th e secon d elemen t o n th e diagona l o f V^ 1 , an d


fi(W), i = 1 , 2, ar e combination s of th e functional s o f Wiene r processe s
derived above . Fo r example , fro m (24)ff . , p c = 9 2, IT, 22 ~ T, an d
9 ^ = 1 under th e null . S o from (26) ,

the secon d elemen t o f the 3 x1 matrix Vj1 ^. Fro m (27 ) w e note tha t
(pc 1) converge s a t rat e O p(T~l) instea d o f th e conventiona l
Op(T~^2). Similarly , fro m (28) , th e correspondin g ?-rati o ha s a non degenerate distributio n differin g fro m th e standardize d normal distribu-

Testing for a Unit Root 12

tion whic h appear s i n the conventiona l asymptoti c theory appropriat e t o


stationary processes .
There ar e analogou s expression s fo r genera l Wal d statistic s fo r th e
tests o f join t hypotheses . Suppos e tha t th e Wal d statisti c test s th e q
hypotheses R0 = r i n (24). The test statistic is

The asymptoti c behaviou r o f thi s tes t statisti c afte r suitabl e scalin g b y


T r i s then a function o f the limiting distributions of \T and <j> T.
4.6.2. Example: Augmented Dickey-Fuller Tests
In thi s cas e w e assum e tha t th e DG P i s simila r t o (22) , but tha t th e
error ter m i s a n AR(j > + 1 ) process wit h a unit root. Th e correspondin g
model is
with la g polynomial /?(L ) = 2f=i/3,-L' wher e th e root s o f [ 1 - /3(L)L ]
lie outsid e th e uni t circle . Unde r th e nul l hypothesi s H 0: {p c = 1 ,
yc = 0}, th e DG P i s a n AR(jC> ) generalizatio n o f (22 ) so tha t w e ca n
again use the transforme d model
where no w z' f = (i{ <t, z 2 ,> ZS. M Z 4,t) an d 0' = (0[, 6 2, 03, 04). T o defin e
the element s o f zj, le t jU c = E(Ay t) = ( 1 - j8(l))~V c = b{i c, the unconditional mea n o f the drif t unde r th e null, usin g b = (1 - ^(l))" 1. Next,
let

The 0 { ar e give n b y 0{ = (ft, ft , . . ., ft,) , 0 2 = A* c + j8(l)A c + y c ,


63 = pc, an d 0 4 = y c + p cuc. Th e scalin g matri x T r become s
diag(r 1/2 ip, T 1/2, T, r3/2) wher e i p i s the uni t vecto r o f dimensio n p .
Finally l p = E(zittz[tt), th e covarianc e matri x o f z^,. Th e element s of
the matrice s Vj - an d <J>T ar e simila r t o thos e fo r th e simpl e Dickey Fuller test . Then, usin g 4> to denot e convergenc e in probability

128 Testin

g for a Unit Roo t

Again, Tabl e 3. 3 ma y b e applie d t o fin d th e densitie s o f th e Wiene r


processes appearin g above , wit h th e exceptio n o f tha t appearin g i n th e
expression fo r VT.S.S ; again , a n expansio n fo r thi s densit y i s give n b y
Abadir (1992) .
V i s therefor e bloc k diagonal , an d th e estimator s o f th e nuisanc e
parameters j 8 are asymptoticall y normal an d d o no t affec t th e asymptoti c
distributions o f th e Dickey-Fulle r statistics , s o tha t th e sam e critica l
values ca n b e used . Th e b s tha t appea r i n som e o f th e expression s
cancel appropriatel y t o mak e thi s possible . Thi s ma y b e see n i n th e
simplest cas e wher e th e mode l doe s no t includ e eithe r th e constan t o r
the tren d ter m bu t doe s include the Ay ; _ ; - terms . Noting that i n this case
the term s Vj-^2 , \T,i,4'
^r,2,3 11
^r,2,4 > Vr,3A>
$r,2 > an d 0r, 4 ar e n t
1
pp
relevant, an d tha t V" = diag(o) . . . a) , V^3,3), wher e o> " i s th e z'th
diagonal elemen t o f S2 p th e distributio n o f th e f-statisti c i s give n b y
t = (o" 2Fri3j3)^1//207-;3. Thi s ha s th e standar d Dickey-Fulle r distributio n
with th e critica l values give n by Tables 4.2(a) . Th e result s exten d t o th e
cases wher e th e constan t an d (or ) tren d ar e (is ) include d i n th e mode l
with th e critica l value s give n b y Table s 4.2(6 ) an d 4.2(c ) respectively .

Testing for a Unit Root 12

The inclusio n o f th e 1(0 ) term s Ay ( _ ; leave s unchange d th e asymptoti c


distributions o f the parameter s o f interest .

4.6.3. Example: Non-parametric Test Statistics (Phillips 1987a)


Consider th e simpl e random-wal k proces s y t = yt^ + ut. Th e mai n
features o f non-parametri c correction s ma y b e illustrate d b y assumin g
that th e onl y restriction s impose d o n th e stochasti c proces s {wj^ i ar e
those give n by condition s (3.16a)-(3.16d) ; {wjjl i ma y therefore b e a n
ARMA(p,q) proces s i n whic h cas e th e f-statisti c fo r p , i n th e mode l
yt = pyt-i + ut, does no t have the standard Dickey-Fulle r distribution .
As discusse d earlie r i n this chapter, a non-parametric correction i s one
way o f accountin g fo r th e autocorrelatio n i n th e {wj = 1 series . Thi s
correction enable s u s t o retai n th e us e o f th e Dickey-Fulle r critica l
values t o conduc t inferenc e an d therefor e expand s th e rang e o f model s
to which the Dickey-Fulle r test s ca n be applied .
Using th e result s i n (3.21)-(3.24) , th e estimato r p an d it s f-rati o t(p)
have the following limiting distributions:

where A =(cr 2 cr2)/2 wher e CT 2 and cr 2 ar e a s define d i n (10a ) an d


(106). I f th e u, ar e IID(0 , CT2), then CT2 = CT, and A =0. I f so , th e
distributions o f p an d it s r-rati o i n (31 ) an d (32 ) above ar e th e usua l
Dickey-Fuller distributions .
It ma y the n b e verifie d tha t th e limitin g distributio n o f th e statisti c
Z(p), where

is th e sam e a s th e distributio n obtaine d b y settin g A =0 i n (31) . This

130 Testin

g for a Unit Roo t

follows fro m a n inspectio n o f (31 ) an d b y noting that

Similarly, th e limitin g distribution o f the Z(t(p)), wher e

is the sam e a s the distributio n obtained by setting A = 0 in (32) .


The limitin g distribution s o f (33 ) an d (34 ) ar e unchange d whe n A is
replaced b y A in thes e expressions , wher e A is a consisten t estimato r o f
A. Consisten t estimator s o f a 2 an d o 2u ar e require d i n orde r t o obtai n a
consistent estimato r o f A and t o implemen t th e non-parametri c correc tions. A consistent estimato r o f a 2u i s given by either T~ 1^ \(yt - yt~i) 2
or 3 n"1Xf(yr Pyt-i)2 The asymptoti c equivalenc e o f th e tw o estima tors follow s fro m th e propert y tha t p- * 1 in probability. 8 A consisten t
estimator o f o 2 ca n be obtaine d fro m (12 ) o r (13 ) a s before.
Using argument s simila r t o thos e outline d above , th e no n -parametric
corrections fo r th e mor e elaborat e model s whic h includ e constan t o r
constant an d trend , ma y b e derived . I n particular , Z(p,- ) an d Z(f(p,) )
(/ = b, c) ma y be obtained .

4.6.4. Example: Instrumental Variables Test for Unit Roots (Hall 1989)
The non-parametri c statistic s describe d i n exampl e 4.6. 3 ar e know n no t
to perfor m wel l i n finit e sample s i n th e presenc e o f negativ e moving average error s (se e Schwer t 1989) . Hal l (1989 ) propose d estimatio n b y
instrumental variable s a s a n alternativ e t o th e us e o f non-parametri c
corrections. H e showe d tha t i n th e regressio n mode l y, = pyt~\ + ut,
where u t i s a moving-averag e proces s o f som e specifie d orde r an d p i s
equal t o 1 under H 0, the n p iv ha s the standar d Dickey-Fulle r distribu tion.
The intuitio n for thi s result ma y b e easil y described: p OLS i n th e abov e
model doe s no t hav e th e standar d Dickey-Fulle r distributio n because o f
the bia s induce d b y th e correlatio n betwee n y r _i an d u, (whe n u t i s an
ARMA(p,q) process) . I t i s therefor e necessar y t o us e a correctio n
factor t o remov e thi s bias . Thi s bia s doe s no t appea r when , say , y,_ 2 is
used a s a n instrumen t fo r y,_ i an d u t i s a n MA(1 ) process . Th e
8
A s note d above , th e finite-sampl e behaviou r o f thes e tw o estimator s ma y b e quit e
different (se e Schwer t 1989) .

Testing for a Unit Root 13

Dickey-Fuller table s ca n thu s b e use d directly . W e formaliz e thi s


intuition nex t b y presentin g a simpl e exampl e an d b y usin g some o f th e
distributional result s derive d earlie r i n th e chapter . Throughout , t o
simplify th e algebra , adequat e initia l observation s ar e assume d t o b e
available, s o all sums are taken ove r 1 . . . T.
Let th e DG P b e give n by

Then p, v, th e instrumenta l variables estimator o f p which uses _y,_ 2 a s an


instrument for yt-\, is given by

Next, w e want to prove tha t

where W(r) is th e Wiene r proces s associate d wit h th e sequenc e {,} .


The RH S o f thi s expressio n i s th e limitin g distributio n o f th e simpl e
Dickey-Fuller tes t fo r a mode l lik e (35 ) when th e u, ar e II D (see
Section 4.6.1) . Thus , w e nee d t o sho w that , fo r th e instrumen t y t~k

Note tha t

Proof o f (i). From(35a) ,

132 Testin

g fo r a Unit Root

This follows from th e fac t tha t

Recall no w from (3.23 ) tha t

for th e DG P give n b y (35a)-(35c) . Further , fo r th e erro r proces s u t,


o2u = (l + 0 2 )cr 2 and o 2 = (I + 0) 2o2e.
It als o follow s from (3 5 b) tha t

Using (39) , it is now possible t o se e fro m (38 ) tha t

But a 2 = (1 + 0 2 )a 2 . Henc e
The las t equalit y follows from th e expressio n fo r a 2 give n previously, (i )
now follows routinely from (40) .
Proof of (ii).

All term s o f th e for m r~ 2 Xf= i}Vi M i-p / 1.2, . . ., ( k 1), converge


in probabilit y t o zero . Thi s i s because th e scalin g T^ 1 i s appropriate fo r
these sum s t o hav e non-degenerat e distributions. 9 Th e scalin g T~ 2
induces degeneracy . Th e distributio n o f T~ 2 2T= i.y?-i i s give n b y
cr 2 (/oW(r) 2 dr) for the DG P (35a)-(35c) ; (ii) no w follows routinely .
Finally, (37 ) follows fro m (36) , usin g k = 2 in (i ) an d (ii) , sinc e
9

Thi s follows fro m argument s similar to thos e used t o prove (3.21)-(3.24).

Testing for a Unit Root 13

It als o follow s fro m (37 ) that th e f -ratio form o f the test ,

has the Dickey-Fuller f-distributio n wher e a i s a consistent estimato r of


a (possibl y equa l to ( 1 + )& E, where 6 and d e ar e OL S estimators o f 6
and 0^.
Thus, estimatio n b y instrumenta l variable s ha s th e sam e effec t a s th e
non-parametric correction s t o p(OLS ) proposed b y Phillips an d Perron .
In a smal l Mont e Carl o study , Hal l (1989 ) show s tha t th e siz e
problems associate d wit h the Phillips-Perro n tes t ar e partiall y alleviate d
by the us e o f this instrumental variable procedure . However , substantia l
size distortion s remai n in the case s wher e 6 < 0 in the nul l model . No
power calculation s ar e reported i n Hall's paper .
4.6.5. Example: Bounds Test for Unit Roots (Phillips and Ouliaris 1988)
A limitatio n o f th e testin g procedure s discusse d i n thi s chapte r i s tha t
the distribution s o f th e tes t statistic s ar e non-standard . Consequently , a
number o f differen t set s o f critica l value s hav e t o b e use d t o implemen t
the tests .
This proble m i s at the hear t of a literature whic h exploit s the ide a tha t
differencing a n 1(0 ) serie s induce s a uni t roo t i n th e moving-averag e
representation o f th e process . Us e i s mad e o f thi s fac t t o devis e a
unit-root tes t base d o n th e long-ru n variance, define d i n (3.16c) , o f th e
first-differenced tim e series . Th e critica l value s ar e take n fro m th e
standard norma l table .
In orde r t o illustrat e thi s approach , assum e tha t y t follow s th e
IMA(1,1) process ,
&yt = ( 1 - 9L)e t = ut, (41
)
2
2
2 2
with E, ~ IID(0, o e). Th e long-ru n varianc e o f Ay , is a = (1 - 9) o E,
so a 2 + 0 if and onl y if 9 = 1. I n othe r words , if y, is 1(0), A.y, will have
<72 = 0 , whil e i f i t i s 1(1) , wit h |0|<1 , o 2 i= 0. Phillip s an d Ouliari s
(1988) therefor e tak e a s thei r nul l hypothesi s H 0: o 2 + 0 o r (equival ently, bu t standardizin g t o eliminat e unit s o f measuremen t effects )
HQ-. T 2 = o 2/o2e = 0 agains t th e alternativ e hypothesi s // 1 :r 2 = 0. Ob taining an estimate o f a 2 a s in (13), they prove that 10
^ 2 (f 2 - T 2)/r2 ~ N(0, 1). (42
)
10

is the lag-truncatio n paramete r a s defined in (12).

134 Testin

g for a Unit Roo t

They propos e a bound s procedur e base d upo n th e confidenc e interva l


corresponding t o (42 ) and give n by
where z^ , i s th e ( 1 - a)th percentag e poin t o f th e standar d norma l
distribution. Accordin g t o th e bound s test , H 0 i s rejecte d i f th e uppe r
limit o f r 2 i n (43 ) is sufficiently smal l and clos e t o zero . Conversely , H 0
is no t rejecte d i f the lowe r boun d i s sufficientl y larg e an d non-zero ; 0.10
is recommende d a s a thumb-rul e value o f 'nearness' . Simulatio n result s
show tha t thi s suggested critica l value ca n lead t o ver y conservative test s
in som e cases . Fo r example , i f th e DG P i s ARIMA(0 , 1, 1) wit h values
of th e paramete r 6 i n the interva l (-0.6, 0.6) , the averag e uppe r boun d
is 0.45 whil e the averag e valu e of the lowe r boun d i s close t o 0.10.
An implicatio n o f thi s typ e o f tes t i s that , becaus e o f asymptoti c
normality, i t ca n be applie d t o dea l with very general trend-cycl e model s
(for example , linea r function s o f tim e o r an y typ e o f dumm y variable).
All tha t i s require d i s t o perfor m th e previou s tes t o n th e difference d
residuals o f the regressio n o f y t o n th e deterministi c terms.
Phillips an d Ouliari s (1988 ) exten d thi s approac h t o testin g fo r
co-integration amon g a set of n variable s in the vector \t. I f x, doe s not
form a co-integrate d se t o f variables , a'x, i s 1(1 ) fo r al l a . Hence ,
generalizing th e analysi s give n above , a'Ax , ha s a positiv e definit e
long-run varianc e matri x a'JJa, wher e ft i s the long-ru n varianc e matrix
of Ax t . Sinc e a'Sla = 0 implies tha t tta ^ 0 , Phillip s and Ouliari s (1990 )
suggest testin g fo r a zero eigenvalu e i n JJ , usin g a multivariate estimator
of S2 , unde r th e nul l hypothesi s of 'n o co-integration' . Th e tes t i s base d
on th e bound s procedur e discusse d previousl y bu t i s applie d t o th e
minimum o f th e estimate d eigenvalue s o f th e consisten t estimato r
of Q .
Taken together , th e method s o f testin g jus t presente d offe r a mean s
of discriminatin g betwee n stationar y an d non-stationar y processe s i n
reasonably genera l circumstances , withou t to o grea t a proliferatio n o f
tables o f critica l values . Ther e remain s wor k t o b e done , however , i n
improving th e powe r o f th e test s an d i n achievin g a greate r conformit y
with nomina l size s i n finit e samples , fo r particula r kind s o f erro r
process. Moreover , researc h i s neede d int o th e effect s o f paramete r
non-constancy, o r eve n o f th e possibilit y tha t th e degre e o f integratio n
may no t b e constant , o n suc h tests.
Tests fo r uni t root s ar e applie d fo r a wid e variet y o f reasons . Th e
tests may , first o f all , be directl y relevan t t o economi c theory , whic h
offers a numbe r o f example s o f hypothese s tha t impl y uni t root s i n
observable dat a series . Moreover , becaus e o f th e potentia l proble m o f
spurious regression , investigator s workin g wit h highl y autocorrelate d
series wil l ofte n wan t t o tes t fo r non-stationarit y i n thes e series . I f

Testing for a Unit Root 13

non-stationarity ca n b e rejected , standar d regressio n method s ca n b e


applied safely ; otherwise , a n investigato r ma y choos e t o transfor m th e
series t o stationarity , o r ma y investigat e co-integratin g relationship s
between th e dat a serie s which , i f present, coul d agai n justif y regressio n
involving the level s of the variables .
The nex t chapte r take s u p th e topi c o f co-integratio n amon g differen t
processes an d thereb y continue s th e stud y o f regressio n model s o f
non-stationary dat a series . Test s fo r co-integration , whic h wil l b e con sidered i n Chapter 7 , bear a close relationship t o test s for unit roots.

Co-integration
We defin e th e concep t o f co-integratio n o f integrate d time-serie s
and giv e severa l examples . A n importan t theore m du e t o Grange r
on alternativ e representations o f a system of co-integrated variables
is state d an d it s proo f i s sketched . W e the n discus s th e Engle Granger two-ste p procedur e fo r estimatin g th e parameter s
characterizing the co-integratin g relationship.
In Chapte r 1 we discusse d ou r us e o f th e wor d 'equilibrium' . Th e ide a
that variable s hypothesize d t o b e linke d b y som e theoretica l economi c
relationship shoul d no t diverg e fro m eac h othe r i n th e lon g ru n i s a
fundamental one. 1 Suc h variable s ma y drif t apar t i n th e shor t ru n o r
because o f seasona l effects , bu t i f the y wer e t o diverg e without bound ,
an equilibriu m relationshi p amon g suc h variable s coul d no t b e sai d t o
exist. Th e divergenc e fro m a stabl e equilibriu m state must be stochastic ally bounde d and , a t som e point, diminishing over time . 'Co-integration '
may b e viewe d a s th e statistica l expressio n o f th e natur e o f suc h
equilibrium relationships.
The concep t o f co-integration is a powerful on e becaus e i t allow s us t o
describe th e existenc e o f a n equilibrium , o r stationary , relationshi p
among tw o o r mor e time-series , eac h o f whic h i s individuall y non stationary.2 Tha t is, while the componen t time-serie s ma y have moment s
such a s means , variances , an d covariance s varyin g wit h time , som e
linear combinatio n o f thes e series , whic h define s th e equilibriu m rela tionship, ha s time-invariant linear properties .
The wor d 'co-integration ' clearl y demand s a forma l definitio n o f
'integration', an d thi s wa s provided i n Chapte r 3 . Informally , a serie s is
said t o b e integrate d i f it accumulate s some pas t effects ; suc h a serie s is
non-stationary becaus e it s futur e pat h depend s upo n al l suc h pas t
influences, an d i s no t tie d t o som e mea n t o whic h i t mus t eventuall y
1
Familia r example s o f hypothesize d long-ru n relationship s includ e th e quantit y theor y
of money , th e Fishe r effect , th e permanent-incom e hypothesi s o f consumption , an d
purchasing-power parity .
2
Typically , i n economi c application s on e look s fo r th e existenc e o f co-integratin g
relationships amon g variable s individuall y integrate d o f orde r one . Th e deviatio n fro m th e
equilibrium relationshi p i s thu s integrate d o f orde r zer o (i.e . i s stationary ) whe n th e
variables ar e co-integrated .

Co-integration 13

return. T o transfor m a n integrate d serie s t o achiev e stationarity , w e


must differenc e it a t leas t once . However , a linear combinatio n o f series
may hav e a lowe r orde r o f integratio n tha n an y on e o f the m ha s
individually. I n thi s case , th e variable s ar e sai d t o b e co-integrated. 3
Thus, fo r example , i f {x t} an d {y t} ar e integrate d o f orde r 1 an d ar e
also co-integrated , the n {A*,} , {Ay (}, an d {x t + ayt}, fo r som e a , ar e
all stationary series .
This chapte r provide s forma l definition s o f co-integratio n an d o f
related concepts . Severa l theorem s ar e stated , applying i n particula r t o
alternative representation s o f co-integrated processes .

5.1. A n example
In orde r t o illustrat e th e precedin g discussion , conside r a simpl e
example. Tw o serie s {x t} an d {y t} ar e eac h integrate d o f orde r 1 and
evolve accordin g t o th e followin g data-generation process: 4

( e in 2t)' i s distribute d identicall y an d independentl y a s a bivariat e


normal wit h

Solving fo r x t an d y t fro m th e abov e syste m with a = / ? gives

Since {, } is a rando m wal k and {x t} an d {y j depen d linearl y o n {u t},


these ma y therefor e b e classifie d a s 1(1 ) variables . Nonetheless ,
{xt + ay,} i s 1(0 ) becaus e e t i s stationar y i n (2) . I n thi s exampl e th e
vector [ 1 a]' i s th e co-integratin g vecto r an d x + a y i s th e equilibriu m
relationship. I n th e lon g run , th e variable s mov e toward s th e equili brium x + ay = 0, recognizin g tha t thi s relationshi p need no t b e real ized exactl y even a s t .

3
Whe n regardin g a co-integratin g combination a s a n 'equilibrium ' relationship , i t i s
natural t o expec t this combination to b e integrate d o f order zero. However, definitionally,
any reductio n i n th e orde r o f integrationsay , fro m d t o d b (wher e b > 0)is
sufficient fo r th e variables to b e calle d 'co-integrated' .
4
Th e exampl e is taken fro m Engl e an d Grange r (1987).

138 Co-integratio

Although thi s i s a simpl e example, muc h o f th e metho d an d reasonin g


can be generalize d t o more complex cases. Wha t i s crucial is that, whil e
{xt} an d {y t} ar e integrate d processes , no t tie d t o an y fixe d means , a
linear combinatio n o f th e tw o variable s make s th e resultin g serie s a
stationary proces s an d th e variable s x an d y ma y be sai d to b e linke d by
the correspondin g equilibriu m relationship .
It i s interestin g t o not e tha t i n th e bivariat e cas e w e hav e th e adde d
bonus tha t thi s equilibriu m relationship, i f suc h a relationshi p exists , i s
unique. Th e proo f i s straightforwar d an d follow s b y contradiction .
Suppose not : tha t is , suppos e tha t ther e exis t tw o distinc t co-integratin g
parameters a an d y suc h tha t {x, + ay,} an d {x t + yv( } are bot h 1(0) .
This implie s tha t (ex y)y r i s als o 1(0 ) becaus e subtracting on e I(d)
series fro m anothe r canno t lea d t o a serie s integrate d o f orde r ( d + 1)
(or higher) . Bu t sinc e {y t} i s 1(1), a non-zero constan t time s {y t} i s als o
1(1). Hence we have a contradiction unles s a = y.
The analysi s is not quit e s o straightforwar d i n th e multivariat e cas e a s
we mus t allo w fo r th e possibilit y o f severa l co-integratin g vectors .
Nevertheless, muc h o f th e intuitio n gaine d fro m th e analysi s o f th e
bivariate cas e carrie s through to riche r examples .
There ar e a t leas t thre e reason s fo r regardin g th e concep t o f co integration a s centra l t o econometri c modellin g wit h integrate d vari ables, a s wel l a s t o th e examinatio n o f long-ru n relationship s amon g
those variables .
The firs t i s th e lin k tha t th e concep t formalize s amon g variable s o f
higher order s o f integration , fo r whic h som e linea r combinatio n i s o f a
lower orde r o f integration . I n th e mos t widel y use d examples , a
reduction i s mad e fro m variable s tha t requir e first-differencin g fo r
stationarity t o a composite time-serie s tha t i s stationar y i n levels . I n
addition, thi s composit e stationar y variable , constructe d b y takin g a
linear combinatio n o f th e origina l series , ma y be sai d t o characteriz e th e
equilibrium relationshi p linkin g th e series . I f a n equilibriu m exist s
among severa l variable s s o tha t suc h a stationar y linea r combinatio n
exists, w e ma y coun t o n eventua l retur n o f this linea r combinatio n t o it s
mean (typicall y zero) .
Second, an d followin g directl y fro m thi s identificatio n o f co-integra tion wit h equilibrium , i s th e complementar y ide a o f meaningfu l versu s
spurious regression . Regression s involvin g level s o f tim e serie s o f
non-stationary variable s mak e sens e i f an d onl y i f thes e variable s ar e
co-integrated. A tes t fo r co-integratio n the n yield s a usefu l metho d o f
distinguishing meaningfu l regression s fro m thos e tha t Yule (1926 ) calle d
'nonsense' an d Grange r an d Newbol d (1974 , 1977 ) calle d 'spurious' .
Finally, anothe r importan t propert y characterize s variable s tha t ar e
co-integrated. A se t o f co-integrate d variable s is known t o have , amon g
other representations , a n error-correctio n representation ; tha t is , th e

Co-integration 13

relationship ma y b e expresse d s o that a ter m representin g th e deviatio n


of observe d values from th e long-ru n equilibriu m enters the model . This
is a n interestin g resul t b y itself , bu t i s eve n mor e noteworth y a s a
contribution t o resolving , o r synthesizing , th e debat e betwee n time series analyst s an d thos e favourin g econometri c methods . I t allow s a
reconciliation, a t leas t i n part , o f time-serie s method s o f analysin g dat a
that traditionall y considere d onl y th e propertie s o f difference d time series (whic h coul d mor e legitimatel y b e assume d stationary ) an d thos e
econometric method s tha t lai d emphasi s o n the equilibriu m relationship s
between variable s an d therefor e focuse d o n th e level s of variables. Bot h
methods a s traditionall y use d coul d b e sai d t o hav e bee n flawed , th e
former b y th e implie d necessit y o f ignorin g information contained i n th e
levels o f variables , th e latte r b y it s tendenc y t o ignor e th e spuriou s
regression problem .
Reliance o n th e us e o f difference d data , a s a potentia l cur e fo r th e
spurious regressio n problem , raise s a set o f new issues. A n exampl e o f a
potentially controversia l recommendatio n fo r modellin g economi c time series appear s i n Grange r an d Newbol d (197 7 p . 206 ; emphasi s i n
original): 'I n th e presenc e o f some autocorrelatio n o f the error s . . . firs t
differencing migh t b e expecte d t o g o a long wa y towards alleviatin g th e
problem an d i s certainly preferabl e to doin g nothing at all.'
As a n illustration , Grange r an d Newbol d cit e th e result s o f Sheppar d
(1971), who regressed U K consumptio n o n autonomou s expenditur e an d
mid-year mone y stoc k fo r bot h level s an d changes , usin g annua l dat a
over th e perio d 1947-62 . Th e result s wer e take n t o indicat e th e
existence o f a significan t relationshi p i n level s whic h disappeare d en tirely whe n firs t difference s wer e employed . Th e level s regression ,
characterized b y a high value of R 2 an d a low value of the Durbin-Wat son statistic , i s spurious . However , th e first-difference d regression ap pears t o b e testin g a differen t hypothesis. 5 Th e differencin g operation ,
in particular , omit s an y information abou t long-ru n adjustment s tha t th e
data ma y contain.
Thus, whil e th e spuriou s regressio n proble m i s a seriou s one , th e
practice o f differencin g integrate d serie s t o achiev e stationarity , an d o f
treating th e resultin g serie s a s th e prope r object s o f econometri c
analysis, i s not withou t costs . Error-correctio n mechanism s (ECMs ) ar e
intended t o provid e a wa y o f combinin g th e advantage s o f modellin g
both level s an d differences . I n a n error-correctio n mode l th e dynamic s
of bot h short-ru n (changes ) an d long-ru n (levels ) adjustmen t processe s
are modelle d simultaneously . Thi s ide a o f incorporatin g th e dynami c
5
I n the nex t chapte r we discuss the consequences of differencing (and over-differencing )
in case s wher e differencin g (an y numbe r o f times ) doe s no t alleviat e th e problem s o f
non-stationarity an d wher e transformin g th e serie s monotonically , prio r t o differencing ,
appears to be the appropriat e procedure.

140 Co-integratio

adjustment t o steady-stat e target s i n th e for m o f error-correctio n terms ,


suggested b y Sarga n (1964 ) an d develope d b y Hendr y an d Anderso n
(1977) an d Davidso n e t al. (1978) , amon g others , therefor e offer s th e
possibility o f revealin g informatio n abou t bot h short-ru n an d long-ru n
relationships.
The theor y o f co-integratio n provide s a unifie d framewor k fo r th e
analysis o f ECM s an d o f tim e serie s i n whic h th e variable s shar e on e o r
more stochasti c trends . W e elaborat e upo n th e alternativ e representa tions o f co-integrate d system s i n Sectio n 5.3 , where w e als o provid e a
more forma l descriptio n o f th e theory ; w e firs t revie w th e theor y o f
polynomial matrice s whic h i s necessar y fo r a thoroug h understandin g o f
several proof s i n th e nex t section s an d i n following chapters .

5.2. Polynomia l Matrice s


A polynomia l matri x A(L ) i s a matri x fo r whic h th e element s {a ry(L)}
are scala r polynomial s i n an argumen t L :

where k^ < . Usefu l reference s t o th e algebr a o f polynomia l matrice s


include Gel'fan d (1967 ) an d Gantmache r (1959) . Th e degree , k , o f
A(L) i s the highes t o f th e order s &,-, o f th e elemen t polynomials :

Thus, A(L ) can be expressed a s

(10)
The determinan t |A(L) | o f a polynomia l matri x A(L ) i s a scala r
polynomial.
A familia r exampl e o f a polynomial matri x i s A (A) = (A 0 - AI) , which
occurs i n the characteristi c equatio n
which ma y b e solve d fo r eigenvalue s o f th e matri x AQ . Ever y matri x
satisfies it s ow n characteristi c equatio n (th e Cayley-Hamilto n theorem )
in that , i f we le t /(A ) = |A(A)| , the n /(A ) = 0 (wher e thi s i s interprete d
as a matri x expression) . I n general , i f A(L ) = 2f= oA;L' , the n w e wil l
also us e the notatio n A(B ) = 2f=oA,-B', fo r a matrix argument B.

Co-integration 14

The inverse of a finit e polynomia l matri x A(L) o f degre e k whic h has


all root s o f th e determinanta l equatio n |A(z) | = 0 strictl y outsid e th e
unit circle 6 i s given , i n general , b y a n infinite-orde r matri x C(L ) =
^T=oCiL'. Thi s matri x i s wel l define d i f an d onl y i f ]Cf= (AL' ' i s a
convergent sequenc e a s A:o . Fo r [ z > 1 (equivalently ,
|L| = z' 1 < 1) , a sufficien t conditio n fo r thi s t o hol d i s |C;|sSp' I
where \p \ < I. 7 Th e C , ar e define d by an infinite set o f matrix identities
which ma y b e describe d i n a simpl e scala r case , wher e A(L) =
1 - p L = a 0 + a\L, as follows:

such tha t

The constructio n give n b y (11 ) i s derive d b y usin g th e propert y


C(L)A(L) = 1 and equatin g power s of L. The algebr a generalize s to
high-order scala r polynomial s A(L) an d to matri x polynomials A(L). I n
the nex t sectio n o f thi s chapte r an d i n Chapte r 8 we shal l nee d t o dea l
with matri x polynomial s tha t hav e uni t root s ( z = 1). I n thes e cases ,
while th e matri x A(L ) ma y no t hav e a wel l define d invers e becaus e o f
failure o f ran k conditions , transformin g A(L ) an d pre - an d post multiplying i t b y suitabl e matrice s wil l lea d t o a n invertibl e matri x
provided certai n condition s ar e satisfied .
Two polynomia l matrice s R(L ) an d T(L ) ar e sai d t o b e equivalent if
and only if there exis t tw o invertible matrices U(L) an d V(L) suc h that
Every polynomia l matri x A(L ) ca n be divide d o n th e lef t b y a matri x
of th e for m ( B - LI ) fo r an y matri x B s o that , wher e A(L ) i s of
degree k ,
where H(L ) i s o f degre e k -I an d D i s a constan t matrix , th e
remainder term . T o obtai n th e precis e for m o f D , w e wil l deriv e thi s
6
Tha t is , denotin g a n arbitrar y roo t o f the determinan t equatio n b y z , \z \ > 1 + e, for
some > 0 , fo r al l z satisfyin g this equation .
7
Not e tha t thi s exponentia l deca y conditio n i s onl y sufficien t an d no t necessar y t o
guarantee convergence .

142 Co-integratio

result, whic h is simply a linear transformatio n o f th e origina l polynomia l


matrix. W e hav e

and s o on . B y induction , w e ca n continu e thi s substitutio n fo r an y k t o


get

A simila r resul t hold s fo r divisio n o n th e right . I n dealin g wit h


integrated series , th e cas e B = I i s of particular interest; the n
where A(l ) is equa l t o A(L ) evaluate d a t L = 1 . Not e tha t fro m (13)
and (15) , for the cas e B = I ,

and

Further, A(l ) is called th e total effect. Whe n D = A(l) =0 , the n A(L ) is


divisible o n th e lef t b y ( 1 L)I withou t a remainder , an d henc e ca n b e
rewritten i n terms of the operator ( 1 - L ) alone.
The nex t mai n resul t t o b e prove d i s th e isomorphi c relationshi p
between polynomia l matrice s an d companio n matrices . Thi s wil l clarif y
the derivatio n of latent roots of polynomia l matrices, whic h are of grea t
interest i n analysin g dynamic s an d co-integration . Conside r th e syste m
of n deterministi c linear equations :

We se t A Q = I a s a normalization . Th e sam e informatio n ca n b e

Co-integration

143

represented i n stacke d for m (calle d th e companion form) b y definin g


the followin g matrice s an d vectors :

Direct multiplicatio n o f 4 > int o 7, t-i an d comparison o f tha t outcom e


with X r reveal s tha t th e secon d expressio n i n (18 ) merely augment s th e
original syste m with a se t o f identitie s o f the for m x ( _i = x ( _ j , etc . The
corresponding advantag e of companion form s i s that, whateve r th e valu e
of k i n (16) , the companio n for m i s always of firs t order , an d henc e ca n
be analyse d usin g alread y establishe d tools . Thi s advantag e i s pronounced whe n w e wis h t o fin d th e eigenvalue s o f A(L) , an d d o s o b y
solving
It wil l b e convenien t t o re-expres s (19 ) in term s o f th e negative s o f th e
inverses of th e eigenvalues , /j, = I/A , an d t o solv e
Using the definitio n o f <1 > fro m (17 ) in (20) , we hav e

rom the partitioned invers e formula, wher e D ^0,

The firs t equalit y follow s fro m th e fac t tha t th e determinan t o f th e firs t

144 Co-integratio

matrix followin g th e equalit y i s one . Repeatin g thes e operation s i n th e


alternative direction , i f E ^ 0, establishes tha t
Both result s wil l b e use d below. Here , w e apply (22) t o th e determinan t
in (21) , choosin g E a s th e larg e n( k - 1 ) x n(k 1) matri x i n th e
upper-left corner , an d D = I. The n FD -1G i s zer o excep t fo r it s
top-right block, which is -^A^, an d D = 1. Thus,

(23)
Comparing (21 ) wit h (23) , th e analysi s can b e see n t o repeat , leadin g t o
| A O/) | after k - 1 steps. Thus ,
the laten t root s ca n b e foun d b y equatin g either expressio n t o zer o an d
solving. Sinc e A ( ) i s n x n , O i s n k x n k an d s o ha s n k eigenvalues ,
as required.
From (13) , whe n B = I, i f A(l ) ha s ran k r < n, the n |A(1) | = 0 an d
hence A(L ) ha s n r uni t roots . Conversely , i f A(l) ha s ran k n , A(L )
has none o f its eigenvalues equal to unity.
Next, derivative s o f polynomia l matrice s wit h respec t t o thei r argu ments will b e needed , an d w e have

This i s reminiscen t o f th e mean-la g formul a i n a scala r distribute d lag .


From th e resul t tha t H(l ) = - ]^= i/A, , w e now see that H(l ) = -T.
Thus, whe n A(l ) = 0, s o tha t A(L ) = (1 - L)H(L) , the n |H(L) | = 0
delivers th e remainin g eigenvalues . I f H(l ) di d no t hav e ran k n whe n
A(l) = 0, the n |H(1)| = 0, s o H(L ) als o ha s uni t roots . Usin g (13 ) an d
(15) t o write H(L) = H(l) + (1 - L)K(L) , w e note that , i n the extreme
case tha t T
= 0, H(L ) = (1 - L)K(L) , whic h implie s tha t
A(L) = (1 - L) 2 K(L). Consequently , equatio n (16 ) woul d becom e
(1 L)2 K(L)x r = 0 , yieldin g a syste m in secon d differences . There i s a
close affinit y betwee n th e rank s o f A(l) , H(l) , etc. , an d th e numbe r of
differences tha t ca n be extracte d fro m A(L) .
Finally, polynomia l matrice s ar e invarian t unde r non-singula r linea r

Co-integration 14

transformations i n tha t the y hav e man y equivalen t representation s wit h


the sam e properties. This is clear fro m (13 ) above. Mor e generally,

In term s o f (16) ,

For example , whe n k = 1 ,

Such linear transformations are use d regularly in Chapter 8 .

5.3. Integratio n an d Co-integration : Forma l Definition s


and Theorem s
DEFINITION 1. (adapte d fro m Engl e an d Grange r 1987) . Th e com ponents o f the vecto r x r ar e sai d to be co-integrate d o f order d , b,
denoted x t~Cl(d, b) , i f (i ) x , i s l(d) an d (ii ) there exist s a
non-zero vector such that a'\, ~ l(d b), d ^ b > 0. The vector
a, is called the co-integratin g vector.
If x , ha s n > 2 components , the n ther e ma y b e mor e tha n on e
co-integrating vecto r ; i t i s possibl e fo r severa l equilibriu m relation ships to gover n th e join t evolution o f the variables . I f there exis t exactly
r linearl y independent co-integratin g vectors wit h r ^ n - 1 , then thes e
can b e gathere d int o a n n x r matri x a . Th e ran k o f a wil l b e r an d is
called th e co-integrating rank.
DEFINITION 2. A vecto r time-serie s x , ha s a n error-correctio n representation i f it can b e expresse d a s
where (a, i s a stationar y multivariat e disturbance , wit h A(0 ) = !,
A(l) havin g onl y finit e elements , z ( = 'x r , an d y a non-zer o

146 Co-integratio

vector. Fo r th e cas e wher e d = b = 1, and wit h co-integrating ran k


r, the Grange r Representatio n Theore m holds (se e Sectio n 5.3.1) .
Granger's theore m wil l prove tha t a co-integrate d syste m o f variable s
can b e represente d i n thre e mai n forms : th e vecto r autoregressiv e
(VAR), error-correction , an d moving-averag e forms . Thes e representa tions ar e al l isomorphic t o eac h other , an d th e theore m establishe s th e
restrictions tha t hol d betwee n th e lag-polynomia l matrice s i n eac h
representation o f the process .
We ma y prov e th e theore m i n a t leas t thre e (equivalent ) ways ,
depending o n th e representatio n fro m whic h w e choos e t o start . Th e
theorem i s stated i n Sectio n 5.3.1 . Followin g thi s statement, w e take th e
autoregressive representatio n a s ou r starting-poin t an d deriv e th e mai n
results. Thi s proo f i s due t o Johanse n (1991fl) . Th e sub-sectio n afte r th e
proof contain s a detaile d interpretatio n o f th e results . I n Chapte r 8 we
return t o th e theore m an d provide anothe r proof , thi s time startin g fro m
the moving-averag e representation . Provin g th e theore m i n tw o way s
highlights som e interestin g symmetries which exis t amon g the equivalen t
representations o f the process .

5.3.1. Granger Representation Theorem (adapted from Engle and


Granger 1987 and Johansen 1991 a)
Let x t b e a n 1(1 ) vecto r o f n components , eac h wit h (possibly )
deterministic trend i n mean. Suppos e tha t th e syste m ca n be written a s a
finite-order vecto r autoregression :

(25)
where th e e t satisf y assumption s (3.16a)-(3.16d ) an d th e firs t k dat a
points Xj_fc , Xj-fc+i , . . ., x 0 ar e fixed . Th e mode l ca n the n b e rewritte n
in error-correction for m as

Both (25 ) and (26 ) ca n be writte n as


where

Co-integration 14

Equation (26 ) may also b e written as


where V(L ) = (1 - L)~\x(L) - *(!)* ) = I - Sti1^'. Fro m (13)
above, 1 P(L) can alway s be constructed . Further , th e derivativ e of a(z)
at z = 1 is equal to -W = -V(l).
Define th e orthogona l complemen t Pj _ o f an y matri x P o f ran k q an d
dimension n x g a s follows (0 < q < ri):
(i) P_ L i s of dimensio n n x ( n q);
(ii) PI P = 0(B _, )X ,, P'P1 = 0,x(n _ ?) ;
(iii) Pj _ ha s ran k n q, an d lie s i n the nul l space o f P .
Certain key assumptions may now be stated .
ASSUMPTION Al . Th e characteristi c polynomial ,

has root s eithe r equa l t o o r strictl y greate r tha n one ; that is ,


|flr(z)| = 0 implies that eithe r z > 1 or z = 1.
ASSUMPTION A2 . Th e n x n matri x n ha s reduced ran k r < n and
is therefor e expressibl e a s the produc t o f tw o n x r matrice s y and
a, where y and a have ran k r. Thus n = y'.
ASSUMPTION A3 . Th e ( n r) x ( n r) matri x y'iWa ha s ful l
rank n r.
Assumption A l guarantee s tha t th e non-stationarit y o f x , ca n b e
removed b y differencing . A 2 rule s ou t a stationar y x , process . I f n ha d
full ran k (tha t is , i f |JT(Z) | ha d n o root s a t one) , then fro m (27),
x, = Ji~ l(L)(/u + et), whic h would impl y that x t wa s stationary. I t is also
the statement , i n th e autoregressiv e form , tha t th e syste m has r linearl y
independent co-integratin g vectors . I n ligh t o f Assumptio n A2 , y '
provides a transformatio n o f the n matri x (an d hence a linear combina tion o f th e Xjt whic h i s stationary) . Th e significanc e o f A 3 wil l becom e
evident i n du e course , bu t essentially , i t ensure s tha t x r i s integrated of
order n o greate r tha n 1 . Unde r th e assumption s state d above , th e
following result s ma y be proved :
(Rl) Ax r i s stationary.

148 Co-integratio

(R2) a'x , is stationary.


(R3) (Ax, ) =

(R4) E(a'x t) = -(

(R5) Ax , ha s a moving-average representation give n by


(R6) C(l ) = aj_(y' i < Pj.)~1y'i ha s rank n - r .
(R7) 'C(1 ) = O r X B
C(l)y=0BXr.

where C(L ) = C(l) + (1 - L)Ci(L) , r= C(l)f , x 0 i s a constan t


(vector) o f integration, an d S, = Ci(L)e t.
Proof. Multipl y (27) by y ' an d y' L respectivel y to obtai n th e equation s

using the decomposition n = ya' an d the result tha t y^ y = 0( n-r)Xr. Th e


matrix n i s no t invertible , an d th e syste m give n b y (28a)-(28b)
therefore canno t b e inverte d directl y t o expres s th e x it i n term s o f th e
;,. T o obtai n a n invertibl e system , w e defin e tw o ne w variables ,
(ot = (a'a)~la'xt an d v, = (a^ L a_ L )~ 1 a^ L Ax r . Next , defin e th e matrice s
=(' a)"1 an d j _ = a L(a'LaL)~l. Le t R = (a, a ) b e a n n x n
matrix o f ran k n . The n R(R'R)~ 1 R' = ! an d henc e ( ' + j.'i) =
!. Thus ,
Substituting i n (28a)-(286 ) gives

where i n (28a ) th e firs t ter m o n th e left-han d sid e need s t o b e writte n


first a s -(y'y)('a)('a)~ 1 a'x,. Th e equation s for (a, an d v t ca n now be
written i n autoregressive for m a s
with

For z = 1 , this matrix has determinant

Co-integration 14

which i s non-zer o b y Assumption s A 2 an d A3 . Henc e z = 1 i s no t a


root. Fo r z + 1, straightforwar d bu t tediou s algebr a enables u s t o
express th e matri x A(z) as
To sho w this , substitut e for *P(z ) in A(z ) in term s of n(z) and
jr(l) = nfro m (27) , and us e th e decompositio n n = y' an d th e
orthogonality conditio n yly = a' La = 0( n _ r ) X r . Fo r z = 1, therefore ,
from (31),
where w e have used th e resul t tha t th e determinan t o f a matrix obtained
by multiplyin g n r column s (o r rows ) o f a n n x n matri x b y a
constant i s th e determinan t o f th e origina l matri x multiplie d b y th e
constant raise d t o th e powe r n r. Thus , fo r z = 1, |A(z) | = 0 i f an d
only i f |;r(z) | = 0 . B y Assumptio n Al , i f w e exclud e z = 1, th e onl y
remaining roots o f this determinant li e outside th e uni t circle.
This show s tha t al l th e root s o f |A(z) | = 0 ar e outsid e th e uni t disk .
Hence th e syste m define d b y (29a)-(29b) i s invertibl e an d <o t an d v,
can be give n initial distribution s suc h tha t the y becom e stationary . Sinc e
Ax, = a vt + aAtat, stationarity o f v, an d ta t implie s stationarit y o f
Ax r . Further , sinc e a'x t = (a'a)(ot, the n 'x ( i s als o stationary . Thi s
completes th e proo f o f (Rl) an d (R2).
To prove (R3 ) and (R4) , not e tha t

Using the formul a for inversio n o f partitioned matrices ,

Thus,

and

From above , (Ax< ) = a E(vt) + aE(Aw,). Notin g tha t E(A(o t) =


we have tha t E(Ax t) = aj.(yiV(l)a )-Vj.A- Thi s prove s (R3).

0,

150 Co-integratio

Next,

This complete s th e proo f o f (R4) . Fro m (32) ,


But

where C(L ) = [a(l - L) , L ](A(L))~ 1 [(y, yj']. Thi s complete s th e


proof o f (R5) .
To prove (R6) , not e tha t
Substituting fo r (A(l))" 1 fro m abov e give s C(l ) =
as required . Th e matrice s a an d y'j _ an d (7i ll'(l)a)~1 hav e ran k
(n - r ) usin g Assumptions A2 , A3, and the definition o f the orthogona l
complements a L an d y. Thus , C(l ) has rank ( n - r). Thi s complete s
the proof o f (R6). Not e tha t E(Ax, ) = C(!)A * = r .
(R7) follow s immediately fro m (R6) .
Finally, t o prove (R8) , firs t writ e C(L) = C(l) + (1 - L)Ci(L) . Thus ,
from (R5),

Integrating thi s expression give s

as required. Thi s complete s th e proo f o f the theorem .


5.3.2. Interpreting the Results of the Granger Representation Theorem
Several feature s ar e noteworth y i n th e theore m prove d above . First , i t
may b e see n fro m (R2 ) and (R8 ) respectively that , whil e x , i s nonstationary (becaus e i t contain s th e integrate d error s 2<'= i i')> K/x ( i s

Co-integration 15

stationary. I n fact , a'\ t provide s th e se t o f co-integrate d combination s


of th e x it.
Second, despit e th e presenc e o f a drif t ter m i n th e proces s generatin g
x t , there i s no linear tren d i n the co-integrate d combinations . Fro m (R6)
and (R8), the tren d i n the x , process disappears if y' Lfi = 0(n _ r ) X l .
Third, (R6 ) is the conditio n neede d fo r the proces s to be integrate d of
order 1 . I f thi s matri x is not o f ful l rank , |A(L) | = 0 will hav e a root of
1 an d a further uni t root ca n be extracte d fro m th e system , leading to a
system of 1(2 ) variables.
Fourth, C(l ) i s a n n-dimensiona l squar e matri x bu t ha s ran k n r.
Hence, startin g fro m th e assumptio n o f a reduced-ran k matri x n i n th e
autoregressive representation , w e deriv e a reduced-ran k matri x C(l) in
the moving-averag e representation . A s w e sho w i n Chapte r 8 , i t i s
possible t o g o in th e othe r directio n an d deriv e th e resul t tha t a matrix
C(l) wit h rank (n r) implies , for a co-integrate d system , a matri x of
the for m n wit h ran k r . Indeed , ther e i s an interesting dualit y between
the singularit y of the 'impact ' matri x n fo r th e autoregressiv e represen tation an d th e singularit y o f th e impac t matri x C(l ) fo r th e moving average representation. Th e null space for C(l)' i s the rang e space for n
and th e rang e spac e fo r n' i s the nul l space fo r C(l). This follow s fro m
using (R7 ) and notin g tha t jr(l)C(l ) = C(l)*r(l) = O n . Thi s dualit y wil l
be furthe r i n evidenc e i n Chapte r 8 , whe n w e deriv e th e autoregressiv e
representation o f the syste m from it s moving-average representation.
Fifth, i f y' jj = 0( n - r ) X i, the n fi lie s in th e orthogona l spac e o f y an d
hence i n th e space o f y . Thus , ft ma y b e writte n a s y/J 0 wher e /? 0 i s a n
arbitrary r X 1 vector. Fro m th e expressio n fo r E(a'x t) i n (R4) , note
that (o'x () = /So , an d th e constan t enter s th e syste m onl y vi a th e
error-correction term . Thi s ma y b e see n mor e clearl y b y rewritin g (26)
as

If thi s restriction i s not satisfied , th e intercep t enter s th e syste m both i n


the error-correctio n ter m an d a s a n autonomou s growt h component . I n
Chapter 8 , wher e w e presen t th e Johanse n maximum-likelihoo d procedure fo r estimatin g th e co-integratin g relationships , th e treatmen t of
the constan t i s importan t i n determinin g th e estimatio n procedur e an d
the se t of critical values to b e used fo r inference.
Finally, th e analysi s ca n be extende d t o includ e seasonal components .

152 Co-integratio

The theore m ma y als o b e extende d (se e Hylleber g an d Mizo n 1989a ) t o


incorporate severa l additiona l representation s o f co-integrate d systems .
Among thes e ar e th e Bewle y an d common-trend s representation s (th e
latter du e t o Stoc k an d Watso n 1988ft) .

5.3.3. Granger Representation Theorem (supplement)


(R9) There exist s a Bewley (1979 ) representatio n

where S2i(L ) an d f l 2 ( L ) ar e ( n r) x n an d r X n matrice s consistin g


of stabl e la g polynomial s of orde r k 1, an d wher e fii(O ) an d S2 2(0) ar e
matrices differen t fro m th e zer o matrix , whil e y * i s a n r x r matri x of
rank r.
(RIO) Ther e exists a common-trends representatio n
where <I > i s a n n x ( n r) matri x o f ran k n - r , H, a n
(n r) x 1 vector whic h is a linear transformatio n o f 2J=i n an <3
The polynomia l matrix C*(L ) i s defined as

The proo f o f th e supplemen t t o th e Grange r Representatio n Theore m i s


given b y Hylleberg an d Mizo n (1989c).
Next, w e will consider th e DG P give n by equations (l)-(6 ) abov e an d
derive a fe w o f th e abov e alternativ e representations . Th e exercis e wil l
then b e repeate d fo r anothe r exampl e take n fro m Engl e an d Yo o
(1987). Bu t first , w e nee d t o discus s th e importanc e o f eac h o f thes e
alternative representations. 8
8
Th e discussio n in Ch . 2 , althoug h dealin g mainl y wit h 1(0 ) variables , i s relevant here .
The propertie s o f linea r transformation s o f linea r model s carr y throug h unchanged , an d
consequently s o do th e reason s fo r estimatin g particular transformations.

Co-integration 15

5.4. Significanc e o f Alternative Representation s


The moving-averag e representation i s a natura l starting-poin t fo r analys ing variable s tha t ar e covariance-stationar y afte r first-differencing .
However, th e error-correction , interim-multiplier , an d Bewle y represen tations eithe r offe r greate r insigh t int o th e equilibriu m relationship s
among th e co-integrate d variable s o r hav e operationa l valu e i n deriving
the long-ru n multipliers or the numbe r o f co-integrating vectors.
1. Th e error-correctio n representatio n ha s th e specia l advantag e o f
separating th e long-ru n an d th e short-ru n responses . I t i s als o a n
important par t o f wha t ha s com e t o b e know n a s th e Engle-Grange r
two-step procedure , whic h i s discusse d late r i n th e chapter . A smal l
modification o f the error-correctio n representatio n provide s th e interim multiplier representatio n whic h ha s bee n use d b y Johanse n (1988 ) t o
develop a maximum-likelihoo d estimato r o f th e dimensio n o f th e co integration space . Likelihood-rati o test s ca n b e use d t o determin e
empirically th e valu e of r, the numbe r o f co-integrating vectors.
2. Th e feature s of the Bewle y representation ar e describe d i n Chapte r
2. I n particular , i f n = 2, on e ca n rea d directl y th e estimat e o f th e
co-integrating representation . However , a s th e co-integratin g vecto r i s
not necessarily uniqu e fo r n > 2,9 th e Bewle y transfor m properl y
estimated (b y IV ) wil l giv e consisten t estimate s o f th e co-integratin g
space althoug h non-unique estimates o f the long-ru n parameters.
3. Th e common-trend s representatio n decompose s th e non-stationar y
series int o a stationary component an d a stochastic trend component .
The choic e amon g thes e equivalen t alternativ e representation s i s
determined primaril y by th e particula r questio n th e investigato r wishe s
to answer.

5.5. Alternativ e Representation s o f Co-integrated


Variables: Two Example s
5.5.7. Example 1
Consider th e DG P give n b y equation s (l)-(6) . Tak e \p \ < 1. The n x,
and y t ar e co-integrated , an d b y th e Grange r Representatio n Theore m
must hav e vecto r autoregressive , error-correction , an d moving-averag e
representations. W e deriv e eac h o f these i n turn.
9
B y thi s we mean that, i f x, ha s n>2 components, then, in general , th e dimensio n of
the co-integrating space can b e l =s r =s n - l.

154

Co-integration

VAR representation

Equations (33 ) an d (34 ) ar e derive d fro m (l)-(2 ) b y first-differencin g


and usin g e t = (xt + ayt). Thus ,

Taking th e invers e o f th e matri x multiplyin g the vecto r o f th e first-dif ferenced x t an d y t, w e have

which i s th e VA R representation , wher e w e hav e relabelle d th e tw o


linear combinations o f e lt an d 2t a s lf an d 2t.
ECM representation. Thi s follow s directl y fro m th e VA R representa tion:

where we let d = (a f$)~l(\ p) an d z , = (x , + ay,) .


From th e EC M representation , < 5 i s non-zer o i f an d onl y i f p i s no t
equal t o 1 . Bu t p = 1 is precisely th e conditio n tha t make s bot h u, and
vt rando m walk s an d lead s t o a non-cointegrate d system . I n othe r
words, i f p = l , ther e doe s no t exis t a n a tha t make s th e linea r
combination o f x an d y stationary . Fro m (36a) , a t p = l , th e level s
variables vanis h i n th e VAR , whic h is the n i n difference s only . Testin g
for co-integratio n i s considere d formall y i n Chapte r 7 . Intuitively ,
however, test s fo r co-integratio n i n this model ma y b e conducte d i n tw o
equivalent forms .

Co-integration 15

1. Static regression of x, on y t. Th e tes t fo r co-integratio n i s a tes t o f


the nul l hypothesi s tha t p = 1 in th e residuals . Thi s nul l ma y be teste d
by usin g th e Sargan-Bhargav a (1983 ) o r Dickey-Fulle r statistic s an d
tables.
2. Regression using th e error-correction form o f th e system, followed
by a test of the null hypotheses HQ\ 6 l 0, H$: 9 2 = 0 or of the joint
null HQ: 9i = 92 = 0. There is a problem here : i f a: is unknown, it must
be estimated fro m th e data . Bu t if the nul l hypothesis that p = 1 is valid,
a i s no t identifie d an d th e error-correctio n regression , a t leas t i n th e
form specifie d b y th e theorem , i s invalid . Onl y i f th e serie s ar e
co-integrated ca n a b e simpl y estimate d b y a co-integratin g regression ,
but a tes t mus t b e base d upo n th e distributio n o f th e statisti c assumin g
that the nul l is true.
There i s however a solution . I t consist s i n specifyin g th e error-correc tion quit e generally , o n th e line s suggeste d i n Chapte r 2 , an d deducin g
the value s of a , 0 ls an d 9 2. That is , in the absenc e o f prior knowledge ,
one ma y simpl y us e x t yt i n th e error-correctio n ter m wit h suitabl e
lags of x and y adde d to the regression . Recal l tha t in Chapte r 2 we
showed tha t th e estimate s o f the short-ru n adjustmen t coefficients, given
here b y 9 l an d d 2, ar e invarian t t o assumption s mad e abou t th e
long-run coefficient . Thus , 8 l an d 6 2 ar e estimate d consistentl y regard less o f whether o r no t a homogeneou s EC M i s estimated. Therefore , a n
equivalent tes t fo r th e nul l o f n o co-integratio n coul d b e constructe d
based o n th e regressio n coefficient s withou t requirin g knowledge of th e
value of a .
The join t tes t o f H O abov e i s mor e efficien t give n tha t th e cross equation restrictio n i n (37 ) an d (38 ) implie s tha t th e error-correctio n
term z t-i enter s bot h equations . Furthermore , estimatin g (37 ) an d (38 )
as a syste m i s likel y t o lea d t o estimate s mor e efficien t tha n thos e
derived fro m estimatin g (37 ) an d (38 ) separately . Thi s i s because , i n
general, neithe r x , no r y, is weakly exogenous fo r the parameter s o f the
other equation , owin g to the cross-equatio n restriction . The issu e of
single-equation versu s system s estimatio n whic h thi s exampl e illustrate s
is discusse d in Chapte r 8 .
MA representation. From (7)-(8), we have

The M A representation follows b y expressing u t a s (1 - L)~ 1lt an d v,


as ( 1 - pL)~~ le2t. Thus , multiplyin g bot h side s o f (7' ) an d (8' ) b y
(1 - L ) gives

156 Co-integratio

n
5.5.2. Example 2

Assume that , i n the M A representation , th e DG P i s given by

where e , is the vector (e lf , e2,)'.


representation. B y direct inversio n of th e polynomia l matrix,

which implies , upo n multiplyin g bot h side s o f (40 ) b y C(L)" 1 an d


cancelling,

This i s the autoregressiv e representation .


ECM representation. For th e EC M form , w e nee d t o expres s th e DG P
as
From th e Grange r Representatio n Theorem (R7) , (a^, K^)' solve s
From (40) ,

so that # 1 = 1 and a2 = -2 solv e (44).


Moreover, (y ls y 2)' solve s
Equation (45 ) gives yi ~ 0-4 an d Yz
tation i s given by

0.1, an d so the EC M represen -

Co-integration 15

It i s easy t o se e from th e EC M representatio n tha t th e long-ru n solutio n


is given by

5.6. Engle-Grange r Two-ste p Procedure


Engle an d Grange r (1987 ) propose d a two-ste p estimato r fo r model s
involving co-integrate d variables . I n th e firs t step , th e parameter s o f th e
co-integrating vecto r are estimate d by runnin g the stati c regressio n in
the level s o f th e variables . I n th e secon d step , thes e ar e use d i n th e
error-correction form . Bot h step s requir e onl y OLS , an d the result s may
be show n t o b e consisten t fo r al l th e parameters . I n particular , th e
estimates of the parameter s in the firs t ste p converge to thei r probabilit y
limits a t rat e T whil e th e element s o f th e vecto r multiplyin g th e
error-correction term , i n th e secon d step , converg e a t th e usua l asymptotic rate of T 1/2.
This procedur e is convenient because the dynamic s d o not nee d to b e
specified unti l th e error-correctio n structur e ha s bee n estimate d (al though i t ma y nevertheless b e sensibl e t o d o so , a s we shall se e below) .
We ca n illustrat e thi s usin g a simpl e argumen t whe n ther e i s n o
intercept.
An importan t implicatio n of the theor y o f series x t integrate d o f orde r
one i s tha t th e varianc e o f A* , i s asymptoticall y negligibl e relativ e t o
that o f x t. Assum e the n tha t som e dynami c relationshi p link s th e 1(1 )
series {x t} an d {y,}, an d tha t thes e tw o serie s ar e co-integrated .
Consider th e stati c regression o f y t o n x t,
Now v, contain s al l o f th e omitte d dynamics , bu t thes e ca n b e
re-parameteri/ed i n terms o f Ajc r_7-, Av,_ m , an d (v f _ r - ax t-r), fo r j ,
m, r > 0, whic h ar e al l 1(0 ) i f co-integrabilit y holds . Thus , a i s
consistently estimate d b y th e regressio n despit e th e complet e omissio n
of al l dynamics. I n fact ,

(48)

Since {vj i s 1(0) under co-integrabilit y but {x t} i s 1(1),

158 Co-integratio

whereas

Thus,

which implie s that


Hence a converge s t o a a t a rate o f O p(T) an d no t a t th e usua l rate of
Op(T1/2). Convergenc e i s rapi d asymptoticall y an d i t i s thi s rapi d
convergence o f th e estimate s o f th e coefficient s tha t i s use d b y Engl e
and Grange r as the basis of their two-step estimator.
Since & differs fro m a b y term s o f O p(T~l), th e asymptoti c result s
for estimatio n o f dynami c model s wit h 1(1 ) variable s wil l b e th e sam e
whether a i s estimate d o r known . Moreover , differencin g mus t reduc e
the orde r o f integratio n o f a n integrate d variabl e b y unity , s o i f Ay f i s
related t o AJC , an d perhap s lag s o f bot h o f these , an d i f {x t} an d {y j
are co-integrated , the n y t_i - ax t-i i s 1(0) an d can be include d i n the
ECM mode l a s if a wer e know n (that is , the samplin g variance of a ca n
be ignored) . I f _{y t] an d {x,} ar e no t co-integrated , the n w e hav e th e
familiar spuriou s regression problem ; i f the y ar e co-integrated , th e
benefits accruin g from a static regression ar e potentially large .
The so-calle d 'super-consistenc y theorem ' du e t o Stoc k (1987 ) ma y be
stated formall y as follows.
THEOREM (Stoc k 1987) . Suppos e tha t x , satisfie s ( 1 L)x, =
C(L)e, wit h C(L) = C(l) + (1 - L)C*(L) , wher e C*(L ) ha s all o f
its laten t root s insid e th e uni t circle . I f C*(L ) i s absolutel y
summable,10 th e disturbance s hav e finit e fourth-orde r absolut e
moments, an d x , i s CI(1,1) wit h r co-integrating vectors (incorpor ated i n a matrix ) satisfying , uniquely,
then11
Thus, instea d o f convergin g a t rat e T 1/2, a s i n stationar y processes ,
10
Th e infinit e sequence {c ;}f i s sai d t o b e absolutel y summabl e i f 2*= i c j < . Fo r th e
matrix C*(L ) t o b e absolutel y summable , th e conditio n i s that 27= ollCj1 l < .
11
Th e element s o f q an d Q wil l typicall y be al l zeroes and ones , definin g one coefficien t
in eac h colum n o f to be unit y and defining rotation s i f r > 1 . M = pli m E(T~2 2,^i x r x D-

Co-integration

159

least-squares estimator s converg e a t a rat e o f T. Thi s theore m an d th e


error-correction representatio n o f co-integrated system s may be allie d t o
give the followin g theorem .
THEOREM (Engl e an d Grange r 1987) . Th e two-ste p estimato r o f a
single equatio n o f a n error-correctio n syste m with one co-integrat ing vector , obtaine d b y takin g th e estimat e & of a fro m th e stati c
regression i n place of the tru e value for estimatio n o f the error-cor rection for m a t a secon d stage , wil l hav e th e sam e limitin g
distribution a s th e maximum-likelihoo d estimato r usin g th e tru e
value o f a . Least-square s standar d error s i n th e secon d stag e wil l
provide consistent estimate s of the tru e standard errors .

5.6.1. Sketch-proof of Engle-Granger Theorem (Bivariate Case)


The followin g i s a proof o f thi s theorem fo r th e bivariat e case . Conside r
the estimatio n o f ft and y in the tw o equations give n by

y, an d x t ar e co-integrate d 1(1 ) variable s wit h th e co-integratin g para meter give n b y a . I n th e contex t o f th e discussio n i n thi s chapter , th e
error-correction mechanis m i s estimate d i n (53 ) usin g th e tru e valu e
of th e co-integratin g parameter , whil e i n (54 ) a i s substitute d fo r a ,
where a i s derive d fro m th e stati c regressio n o f y t o n x t. Also ,
e
* = e + y( - oc)x t-]_. Le t zt = yt- x tWe nee d t o sho w that th e asymptoti c distributions of the estimator s f t
and y , o f / 3 an d y respectively , ar e th e sam e regardles s o f whethe r on e
uses a o r a (tha t is , whether one estimates (53 ) o r (54)). .
In standar d fashion , w e hav e fro m (53 ) (assumin g adequat e initia l
values)

The estimator s derive d fro m (54 ) ar e als o give n by (55 ) bu t wit h z t-\
and e f replacin g z t an d s t. From this , i t is easy to deduc e tha t th e resul t
will be demonstrate d if the followin g condition s are show n to be true :

160 Co-integratio

(iii) th e asymptotic distribution s of


are th e same ;

(iv) th e asymptoti c distribution s o f


are the same .
In (53) , we assum e tha t {e,} i s a n innovatio n proces s suc h tha t
E(Axtt) = 0.
Note firs t that , b y th e propertie s o f 1(0 ) an d 1(1 ) series , a s use d an d
discussed i n Chapter s 3 and 4 , th e followin g expression s ar e O p(l) (tha t
is, non-explosiv e an d non-degenerat e a s T > <*>) :

Secondly,
Using (59) ,

Result (i ) now follows fro m (57 ) an d (58) . Also ,

Co-integration 16

Result (ii ) now follows from (56), (57) , an d (58) .


Finally,

By (57 ) an d (58) , th e las t tw o expression s o n th e right-han d sid e o f th e


above equalit y ar e O p(T~1/2). Resul t (iii ) follows , an d (iv ) i s prove d
analogously from :

Regression wit h Integrate d


Variables
We hav e see n ho w th e presenc e o f integrated variables pose s som e
special problem s whic h do no t appea r whe n workin g wit h station ary series . Thes e migh t lea d u s t o believ e tha t a ne w rang e o f
techniques need s t o b e considere d i n orde r t o handl e suc h data .
However, a s w e sho w i n thi s chapter , w e ca n continu e t o appl y
standard regression s i f w e pa y attentio n t o order s o f integratio n
and us e dynami c specification s whic h tak e accoun t o f an y co integrating relationships amon g the variables .
The Engle-Grange r theore m i n Chapte r 5 , layin g emphasi s o n simpl e
static regressions , implie s a goo d dea l abou t th e wa y i n whic h a n
investigator ough t t o procee d wit h a n econometri c stud y o f integrate d
variables. Som e o f thi s i s relate d t o th e evolutio n o f modellin g practic e
among econometricians .
Econometricians o f th e 1970 s bega n t o b e suspiciou s o f regression s
using dat a i n levels . Thei r suspicion s wer e reinforce d b y worrie s
expressed b y time-serie s analyst s relatin g t o spuriou s regressions . Th e
focus o f attentio n bega n t o shif t toward s th e nee d t o hav e properl y
specified model s wit h ric h dynami c structures . Th e move , followin g
Mizon (1977) , Sim s (1977) , Hendr y an d Mizo n (1978) , an d Hendr y an d
Richard (1982) , wa s toward s a metho d o f econometri c researc h tha t
preferred model s whic h began wit h as general a specification as possible,
and continue d wit h simplificatio n to a parsimoniou s econometri c mode l
following fro m imposin g constraints consisten t wit h observe d data . (Se e
Spanos (1986 ) fo r a detaile d treatment. ) Th e literatur e o n co-integratio n
reinstated som e confidenc e i n stati c regression s i n levels , an d goo d
econometric metho d appeare d t o hav e take n a ful l circle ; a s long a s th e
1(1) variables were co-integrated, suc h regressions mad e sense .
There ar e nonetheles s severa l reason s fo r continuin g t o trea t stati c
regressions a s being i n general sub-optimal . Firs t o f all, the estimat e a is
biased fo r th e co-integratin g paramete r <x and , althoug h tha t bia s i s
Op(T~l), i t ca n b e substantia l in finit e samples . Th e bia s i s likely t o b e
a functio n o f som e paramete r suc h a s th e mea n la g o f th e dynami c
adjustment proces s relatin g {y,} t o {x t}. I n som e circumstances , there -

Regression wit h Integrated Variables 16

fore, a retur n t o dynami c modellin g woul d see m t o b e th e appropriat e


response t o th e problem s o f static-regressio n biases . Alread y a bod y of
work exist s demonstratin g th e poo r performanc e o f static regression s fo r
many type s o f proble m (Banerjee , Dolado , Hendry , an d Smit h 1986 ,
and Stoc k 1987) . Second , th e distribution s o f coefficien t estimate s wil l
typically tak e non-standar d form s eve n wher e th e serie s ar e co integrated. Th e 'non-standardness' , b y which we generall y mean asymp totic non-normality , come s fro m th e propert y tha t th e serie s ar e
integrated o f orde r greate r tha n o r equa l t o 1 . Th e fundamenta l point is
that th e distribution theor y tha t applie s t o non-stationar y serie s i s
different fro m th e familia r Gaussia n asymptoti c theory . Th e estimator s
have distributions , i n general , whic h ar e functional s o f th e Wiene r
processes discusse d i n Chapters 1 and 3. However , som e o f the standar d
asymptotic theor y ma y be restore d i n dynamic models.
We wil l elaborat e o n th e secon d o f thes e points , leavin g a discussion
of th e firs t unti l Chapter 7 . I t i s important t o poin t ou t a t th e outset , i n
order no t t o mislea d readers , tha t i t i s no t tru e tha t single-equatio n
dynamic models ar e necessaril y superio r t o thei r static counterparts. Th e
next tw o section s presen t example s wher e single-equatio n dynami c
models d o perfor m satisfactorily . Yet , a s th e discussio n i n Chapte r 8
shows, i t i s possibl e t o construc t man y case s wher e single-equatio n
dynamic model s b y themselve s ar e no t sufficien t fo r obtainin g efficien t
and unbiase d estimate s (se e Engl e e t al. 198 3 an d Phillip s an d Loreta n
1991).
There ar e severa l interrelate d difficultie s whic h ar e importan t an d
which collectivel y impl y that the issu e is broader tha n simpl y a comparison o f dynami c wit h stati c models . A n informa l descriptio n o f th e
problems encountere d i n modellin g non-stationar y variable s i n a singleequation framewor k woul d identif y a t leas t fiv e effects . First , th e
presence o f uni t root s induce s non-standar d distribution s o f th e coeffi cient estimates . Second , th e erro r proces s ma y no t b e a martingal e
difference sequence . Third , th e explanator y variable s ma y eac h b e
generated b y processes that displa y autocorrelation ; take n i n conjunction
with th e secon d effect , thi s give s ris e t o 'second-order ' biases . Fourth ,
there ma y be mor e tha n on e co-integratin g vector . Finally , th e explanat ory variable s i n th e singl e equatio n ma y no t b e weakl y exogenou s fo r
the parameter s bein g estimated . Wea k exogeneit y ca n fai l if , say , a
co-integrating vecto r enter s mor e tha n on e equatio n i n th e syste m
generating th e variables .
Static regression s ca n b e affecte d b y al l fiv e o f th e problem s liste d
above, whil e dynami c model s ma y b e abl e t o accommodat e th e firs t
three effects , a s i n th e example s give n i n th e section s tha t follow .
However, estimate s derive d fro m single-equatio n dynamic model s ar e
not optima l i f wea k exogeneit y fail s t o hold . Thi s fina l observatio n

164 Regressio

n wit h Integrated Variable s

extends th e discussio n fro m th e real m o f modellin g unit-roo t processe s


to th e all-encompassin g real m o f genera l econometri c modelling . Thi s
discussion i s formalize d i n Chapte r 8 an d illustrate d wit h severa l
examples.

6.1. Unbalance d Regression s an d Orthogonalit y Tests


Mankiw an d Shapir o (1985 , 1986 ) dre w attentio n t o a problem tha t ma y
arise i n applyin g standar d distribution s t o inferenc e wher e ther e ar e
non-stationary (o r borderlin e non-stationary ) serie s present , an d i n
particular t o th e proble m o f inference concernin g orthogonalit y betwee n
series. Whil e th e proble m is , a s wit h spuriou s regression , essentiall y a
problem o f integrate d data , i t wil l appea r wit h near-integrate d dat a i n
finite samples. 1 Wit h thi s qualification, the proble m ma y be sai d t o aris e
in unbalanced regressions : tha t is , regression s i n which the regressan d i s
not o f th e sam e orde r o f integratio n a s th e regressors , o r an y linea r
combination o f the regressors. 2
The Mankiw-Shapir o discussio n centre s o n a condition suc h as
Et-i(yt) =

c , implying y, = c + vt, E

t^(vt)

0, (1

where ( _i i s interpreted a s the expectation , conditiona l o n informatio n


realized a t tim e t 1, o f th e valu e of som e variabl e whic h may b e date d
in th e future . Tha t suc h a conditio n hold s i s ofte n teste d wit h a
regression suc h as
where c^ = 0 under th e nul l hypothesi s tha t (1 ) holds . Example s o f such
hypotheses an d test s aris e frequentl y i n model s tha t postulat e th e ful l
use o f al l realized information . On e suc h exampl e fro m macroeconomic s
is Hall' s (1978 ) formulatio n o f th e life-cycle/permanent-incom e model ,
which, give n a stringen t se t o f assumptions , implie s tha t consumptio n
should follo w a rando m walk . Test s o f thi s hypothesi s hav e typicall y
taken th e for m o f regression s o f difference d consumptio n o n a constan t
and on e o r mor e lagge d incom e o r consumptio n terms ; unde r th e nul l
hypothesis th e coefficient s o n th e lagge d term s shoul d no t b e signifi cantly differen t fro m zero .
Mankiw an d Shapir o sugges t examinin g th e cas e i n whic h th e regres sor x t follow s the AR(1 ) process :
1
Whil e th e experiment s reporte d her e us e borderlin e stationar y data , th e result s wil l
also appl y t o integrate d series .
- Thes e ar e sometime s calle d inconsisten t regressions . Inconsistenc y i n thi s sens e i s
unrelated t o th e concep t o f an inconsisten t estimato r o f a parameter: se e n . 3.

Regression wit h Integrated Variables 16

with
corr(e ( , v t) = p an

d corr(e

t+; -,

v t) = 0 V; + 0.

Note tha t thi s is not a problem o f simultaneity bias: th e regresso r x t-\ is


uncorrelated wit h v t. A structur e suc h a s thi s i s appropriat e i n man y
models i n whic h thes e test s hav e bee n used . I n th e Hal l (1978 ) model ,
for example , p = 1 where x t an d y t represen t curren t incom e an d th e
change i n curren t consumptio n respectively . Manki w an d Shapir o us e
Monte Carl o simulation s t o tabulat e estimate s o f th e actua l rejectio n
frequencies an d critica l value s i n /-typ e test s o f H 0: c 2 = 0, whe n
standard ^-value s ar e used . Tabl e 6. 1 reproduce s a selectio n o f thei r
results for model (2 ) and als o fo r the mode l with a linear time trend ,

TABLE 6.1 . Percentag e rejectio n frequencie s o f standar d f-test s a t


nominal 5 per cen t level 3
DGP: (1 ) + (3) ; Sampl
e siz e = T; No
. o f replications = 100 0
Model (2)

e\P

Model (4)

1.0

0.9

0.8

0.5

0.0

1.0

0.9

0.8

0.5

0.0

30
0.99
26
22
0.98
0.95
17
12
0.90
0.00
5
(b) T = 200
0.999
29
0.99
18
13
0.98
0.95
9
0.90
7
0.00
5

24
20
17
12
9
6

20
15
15
10
8
6

11
10
8
7
6
5

7
7
7
6
6
5

60
54
50
38
28
6

45
40
37
30
22
7

36
33
30
25
19
7

16
15
14
12
10
5

6
6
5
6
6
6

23
15
10
7
6
4

20
13
9
7
6
4

10
8
7
6
6
5

5
4
5
5
6
5

61
41
29
17
10
5

48
32
24
14
9
5

38
27
20
12
8
4

18
13
11
7
6
5

5
5
6
6
7
5

(a) T = 50

0.999

Thi s tabl e compare s tw o sampl e sizes . Whil e th e tes t siz e distortion s ar e


generally smalle r fo r th e large r sampl e an d wil l vanis h as T - , thi s feature i s
specific t o th e borderline-stationar y processe s use d (0<1) . Fo r 6 = 1, distor tions will persist as T * o. Each of the entrie s a,-, - (expressed a s a fraction) ha s a
standard erro r whic h can be approximated by [(a,y)(l - a^/N] 1/2, wher e N i s the
number of replications (equa l t o 100 0 here).
Source: Manki w and Shapiro (1986) .

166

Regression wit h Integrated Variable s

As wit h th e Dickey-Fulle r statistic s see n earlier , th e siz e distortion s in


Table 6. 1 spring from bia s an d skewnes s in the f-typ e statistic .
The critica l value s reporte d b y Manki w an d Shapir o ar e no t repro duced here ; Galbraith , Dolado , an d Banerje e (1987 ) sho w that thes e ar e
sensitive t o unobservabl e parameter s o f th e underlyin g DGP , and , i n
considering a mor e genera l DGP , i t i s possible t o relat e th e proble m o f
size distortion s t o co-integratio n amon g regressors , an d s o t o wha t ha s
been calle d th e balanc e o r imbalanc e of the regressio n model. 3 General ize (1 ) an d (3 ) t o

with v t ~ I N ( 0 , l ) , e f t ~IN(0,l), E(v tls) = 6tsp, E(E lt2s) = G, an d


<5te = 1 i f t = s, an d 0 otherwise . Th e fitte d mode l i s generalize d fro m
(2) t o
incorporating th e ne w regressor . A classificatio n o f possibl e case s i s
given i n Table 6.2 . Th e notatio n NI(/c ) (nearl y integrated) indicates tha t
the series , althoug h I(k 1), ar e clos e t o integrate d processe s o f th e
given order . Onl y i n cas e C ar e th e tw o regressor s 'nearly ' co-integrated
series, and in this case the co-integrating slope is (1 - 0\\)~ 1Q\2.
The siz e distortion s stresse d b y Manki w and Shapir o fo r hig h values
of p als o appea r i n th e case s B , D , an d E (se e Galbrait h e t al. 1987) ,
where [x\ t] i s (nearly ) 1(1 ) an d no t co-integrate d wit h {jc 2t}. Siz e
distortions begin to appea r i n case A a s f9 u rise s and cas e A approache s
case E . Wher e th e regressor s ar e co-integrated , howeve r (cas e C) , th e
TABLE 6.2. Classificatio n of cases of interes t
Case
A
B
C
D
E
F

022

1.0
0.999
1.0
0.999
0.999
0.999

0.999
1.0
0.999
1.0
0.999
0.999

0.0
0.0
=0.0
=0.0
0.0
=0.0

1(0)
NI(1)
NI(1)
NI(1)
NI(1)
NI(2)

NI(1)
1(0)
NI(1)
1(0)
NI(1)
NI(1)

3
A regressio n i s defined t o b e balance d i f and onl y i f the regressan d an d th e regressor s
(either individuall y o r collectively , a s a co-integrate d set ) ar e o f th e sam e orde r o f
integration. Th e mer e fac t tha t a regressio n i s unbalance d ma y no t b e a matte r fo r
concern; fo r example , AD F statistic s ar e compute d fro m model s that , i n thi s terminology ,
are unbalanced . The y ar e nevertheles s vali d tool s fo r inferenc e a s lon g a s th e correc t
critical value s ar e used.

Regression with Integrated Variables 16

regression i s balance d (ther e exist s a linea r combinatio n o f th e regres sors tha t ha s th e sam e orde r o f integratio n a s th e regressand ) an d siz e
distortions d o not appear . Cas e F resemble s C except tha t 9 U i s close t o
1, indicatin g that co-integratio n i s broken betwee n th e regressors ; non e
the less , siz e distortion s ar e no t detectabl e a s lon g a s # 12 remain s
non-zero. Thi s las t findin g demonstrate s th e difficult y o f distinguishing,
at modes t sampl e sizes , th e result s o f regression s wit h co-integrate d
regressors fro m thos e wit h regressor s o f differing , bu t bot h strictl y
positive, orders .
We see , i n summary , tha t fo r integrate d serie s (or , i n finit e samples ,
for th e borderline-stationar y serie s examine d i n thes e papers) , wit h
p=l, siz e distortion s ma y emerge whe n ther e i s no linea r combinatio n
of regressor s tha t has the sam e order of integration as the regressand. 4
For a n intuitive view of these results , le t u s return t o th e consumptio n
example an d conside r th e order s o f integratio n o f th e variable s o n th e
two side s o f th e regression . Consumptio n an d incom e ar e bot h typically
variables integrate d o f orde r one . Thus , th e regressio n (2 ) ha s a n 1(0 )
variable (difference d consumption ) regressed o n a n 1(1 ) variabl e (lagge d
income i n level ) an d th e regressio n i s unbalanced ; th e investigato r i s
attempting t o explai n a n 1(0 ) variable b y a variabl e integrate d o f highe r
order. Thi s strateg y wil l eventuall y fail , a s th e tw o variable s mus t
diverge b y ever-large r amounts . Therefore , a requiremen t o f estimatio n
with integrate d variable s mus t be balanc e i n th e order s o f integratio n of
the variable s o n th e left-han d an d right-han d side s o f th e regressio n
equation. However , ther e ma y be circumstance s i n whic h a test wil l b e
designed t o involv e regressan d an d regressor s havin g differen t order s of
integrationfor example , efficienc y test s suc h a s thos e mentione d
above. W e mus t bea r i n mind , o f course , tha t tes t statistic s fro m suc h
regressions will have non-standard distributions.
The importanc e o f th e latte r poin t follow s fro m th e observatio n that ,
even whe n th e regressan d (e.g . y t) an d th e regresso r (x t) ar e bot h
integrated o f orde r 1 an d ar e co-integrate^ , th e ^-statisti c o n th e
coefficient o f x t stil l ha s a non-standar d distributio n whic h make s
ordinary t an d norma l tables unusabl e fo r purposes o f inference. O n th e
other hand , i f th e orde r o f integratio n o f bot h side s i s zero (whic h may
be ensure d b y lookin g fo r a co-integrate d se t o f regressor s an d usin g a
sufficiently difference d ter m a s th e regressand) , th e t -statistics ca n b e
shown t o hav e asymptoticall y norma l distributions . Thi s implie s som e
advantage t o th e us e o f dynami c rathe r tha n static regressions , sinc e
lagging variable s an d includin g the m a s regressor s ofte n ha s th e sam e
effect a s providin g a co-integrate d se t o f regresso r variables . Th e
4
Campbel l an d Dufou r (1991 ) offer , a s a wa y o f overcomin g th e Mankiw-Shapir o
problem, a n alternativ e non-parametric test o f orthogonality which i s independent of some
nuisance parameters i n the DGP .

168 Regressio

n wit h Integrated Variable s

essential poin t i s t o fin d som e wa y o f re-parameterizin g th e regressio n


such tha t i n th e re-parameterize d form , th e regressors , eithe r jointl y or
individually, ar e integrate d o f orde r zero . Correspondingly , th e re gressand mus t als o b e 1(0) . However , provide d n o restriction s ar e
imposed, it is irrelevan t whethe r or not the re-parameterizatio n is
actually carrie d out . A s w e hav e see n abov e i n Chapte r 2 , non-singular
linear transformation s yiel d numericall y equivalen t result s afte r trans forming back , an d s o regressions tha t ar e linea r transformation s of eac h
other hav e identica l statistica l properties . Wha t i s important, therefore ,
is th e possibility o f transformin g in suc h a wa y tha t th e regressor s ar e
integrated o f th e sam e orde r a s th e regressand , a possibilit y tha t i s
enhanced i n a dynami c model a s th e probabilit y o f a co-integrate d se t
being presen t i s increased , although , a s th e discussio n i n th e previou s
section shows , care mus t be take n if weak exogeneity fails t o hold .

6.2. Dynami c Regression s


The remark s abov e woul d sugges t tha t a sensibl e procedur e fo r econo metric investigations , eve n i n th e presenc e o f integrate d variables , i s t o
use a dynami c specificatio n tha t i s a s genera l a s th e constraint s o f dat a
and sampl e allow . Th e 'genera l to specific ' modellin g method i s effectiv e
here fo r a straightforwar d reason: th e inclusio n of severa l variable s and
their lag s as regressors increases the chance s of obtaining a co-integrated
set o f regressors . A dependen t variabl e made stationar y by differencin g
can b e regresse d o n thi s co-integrate d set , an d standar d t- , F- , an d
normal table s ca n be use d for inference. The regressio n would then tak e
the for m o f a difference d variabl e a s the regressand , an d difference s an d
levels of variables a s regressors.
A comprehensiv e accoun t o f th e asymptoti c theor y associate d wit h
dynamic regression s o f thi s kin d appear s i n Sim s e t al. (1990) . I n thei r
general formulation , the variable s may have drift s an d ar e allowe d t o b e
integrated an d co-integrate d o f arbitrar y order . Th e intuitio n fo r thei r
results, moreover , i s straightforward. The y show that estimator s o f thos e
parameters whic h ca n be rewritte n as coefficient s o n mean-zero , non-integrated regressors , wil l hav e asymptoticall y normal join t distributions ,
converging a t a rat e T 1/2 t o thei r probabilit y limits . Thi s rewritin g may
be accomplishe d eithe r b y differencin g th e regressor s t o achiev e statio narity o r b y linearl y combinin g subset s o f thes e regressor s a s show n i n
Chapter 4 . Stationarity , o r mor e precisel y non-integratedness , i s
achieved i n th e latte r cas e i f an d onl y i f subset s o f th e regressor s ar e
co-integrated.
There ar e thre e importan t propertie s o f thes e transformations . First ,
starting fro m th e origina l dynami c regression an d transformin g linearl y
in suc h a wa y that th e regressio n is rewritten i n term s o f non-integrated

Regression wit h Integrated Variable s 16

regressors, th e origina l parameter s o f interest ca n b e identifie d fro m th e


parameters o f th e transforme d regression . Second , becaus e th e trans formed paramete r estimate s ar e asymptoticall y normall y distributed , s o
are th e untransforme d estimates . Again , thi s is because linea r transformations d o no t alte r an y o f th e statistica l propertie s o f th e estimator s o f
the regressio n coefficients . Finally , a s show n by th e analysi s in Chapte r
2, becaus e an y informatio n obtaine d fro m a transforme d regressio n ca n
be obtaine d fro m a n untransforme d regressio n a s well , th e essentia l
point i s not tha t th e transformation s actuall y be undertaken , bu t rathe r
that th e scop e exist s fo r th e appropriat e transformation s t o b e made ,
because appropriat e regressor s ar e present .
There is at leas t on e othe r importan t cas e wher e asymptoti c normality
of coefficien t estimate s ha s bee n show n t o hold . Th e resul t i s du e t o
West (1988 ) an d occur s whe n a stochasti c tren d i s presen t i n th e
regression bu t i s dominated , i n th e sens e o f order s o f probability , b y a
non-stochastic tren d component . Wes t consider s OL S an d linea r I V
models o f the for m
where y t i s a scala r 1(1 ) tim e serie s th e firs t differenc e o f whic h ha s a
non-zero unconditiona l mean , x 1( i s a vecto r o f stationar y observabl e
variables, an d e t i s a stationar y disturbanc e term . Th e dependen t
variable w, i s stationar y i f y = 0 bu t non-stationar y otherwise . Wes t
shows tha t th e paramete r estimate s a an d y ar e asymptoticall y normal ,
given tha t E(A.y t) i s non-zero. Not e that , wher e w e take y t = w t-\ an d
let X j b e a constant term , w e have
this i s th e proces s an d mode l examine d b y Dicke y an d Fulle r (1979) ,
with th e exceptio n tha t the y too k E(Aw t) = 0. I n tha t case , th e
asymptotic distribution o f th e /-statisti c fo r H 0: p = 1 is give n i n Tabl e
4.2. Additio n o f a non-zer o constan t t o th e data-generatio n process ,
however, make s the asymptoti c distribution o f this statistic normal .
Asymptotic normalit y hold s onl y whe n th e non-zer o constan t (an d
trend) i n th e DG P i s (are ) matche d b y a constan t (an d trend ) i n th e
model. Includin g a tren d i n th e mode l whe n th e DG P doe s no t contai n
a tren d destroy s thi s result. I n th e latte r case , th e Dickey-Fulle r critica l
values give n i n th e thir d bloc k o f Tabl e 4. 2 ar e agai n th e appropriat e
ones t o use. 5
5
Thi s resul t follow s fro m th e similarity propertie s o f th e Dickey-Fulle r statistics . Th e
third bloc k o f Tabl e 4. 2 i s compute d b y usin g a pur e rando m wal k (withou t constan t o r
trend) a s the DGP and y, = a + yt + py,^ + u, as the model. Whe n th e DGP is altered to
include a non-zero constant, with th e mode l remainin g unchanged , th e critica l value s of th e
distributions o f p an d o f th e associate d f-statisti c ar e no t affected : th e distribution s ar e
invariant t o th e valu e o f th e constan t i n th e DG P whic h implie s similarity . A detaile d
discussion o f this issu e appear s in Kivie t an d Phillip s (1992) .

170 Regressio

n wit h Integrated Variable s

Since som e non-zer o constan t seem s likel y t o b e presen t i n th e firs t


differences o f th e processe s generatin g man y economi c tim e series , w e
might suspec t tha t th e Dickey-Fulle r distribution s examine d i n detai l i n
Chapter 4 ma y b e o f limite d relevance . However , th e relevanc e o f th e
Dickey-Fuller (DF ) value s i s determine d b y th e relativ e magnitude s of
the drif t ter m an d the standar d deviation o f the process .
Hylleberg an d Mizo n (19896 ) presen t som e simulatio n evidenc e fo r
the AR(1 ) model . Th e critica l value s derive d fro m ou r simulations , fo r
sample size s and value s of the constan t chose n b y Hylleber g an d Mizon ,
are give n i n Tabl e 6.3 . Sinc e i t i s th e siz e o f ^ relativ e t o (var(e,)) 1/2
that i s relevant , an d va r (et} = 1 i n th e experiment s reporte d i n Tabl e
6.3, w e treat \i a s this ratio rather tha n th e valu e o f the constan t term in
the DG P alone . I n general, for l u/(var(e())1'2 ^ 1/2 , th e critical values of
the norma l densit y ar e close r t o (althoug h les s than ) th e actua l critica l
values tha n are the DF critical values.
TABLE 6.3. Empirica l cumulativ e distribution o f t( p = 1)
DGP: x t = (JL + x,-i + et, e
t ~ IN(0,1) , x 0 = 0;
model: x t = /j. + px t.~i + et
Sample siz e Probabilit y o f a smaller value"
(T)
0.01 0.02 5 0.0 5 0.1 0 0.5

DFb

0 0.9

0 0.9

5 0.97 5 0.9 9

-3,.43 -3.12 -2.86 -2.57 -1.56 -0.44 -0.07 0.23

(a)n = 0
50
-3,.57
100
-3,,50
^ .47
200
400
-3,.45
(b) ni = 0.001
50
-3,.57
100
-3,,49
200
-3,,47
400
-3,,46
(c) (i = 0.010
!
,57
50
100
-3,,50
200
-3,.47
400
-3,,46
(d) /j: = 0.10
50
-3,,53
100
-3.,41
200
-3,,34
400
-3,,17

0.60

-3.22
-3.17
-3.15
-3.13

-2.92
-2.89
-2.88
-2.87

-2.60
-2.59
-2.58
-2.57

-1.55
-1.56
-1.56
-1.56

-0.40
-0.42
-0.42
-0.43

-0.03
-0.06
-0.06
-0.07

0.30
0.26
0.25
0.25

0.66
0.62
0.62
0.62

-3.22
-3.17
-3.14
-3.14

-2.93
-2.89
-2.88
-2.87

-2.60
-2.58
-2.57
-2.57

-1.55
-1.56
-1.56
-1.56

-0.40
-0.42
-0.42
-0.43

-0.03
-0.06
-0.06
-0.07

0.29
0.26
0.26
0.24

0.65
0.63
0.63
0.62

-3.22
-3.16
-3.14
-3.13

-2.93
-2.90
-2.88
-2.87

-2.60
-2.58
-2.57
-2.57

-1.55
-1.56
-1.56
-1.56

-0.40
-0.41
-0.41
-0.42

-0.03
-0.05
-0.06
-0.06

0.29
0.27
0.26
0.26

0.67
0.64
0.64
0.64

-3.17
-3.08
-3.00
-2.83

-2.87
-2.80
-2.71
-2.54

-2.54
-2.48
-2.37
-2.19

-1.46 -0.22
-1.37 -0.08
-1.20 0.11
-0.94 0.34

0.16
0.30
0.48
0.70

0.49
0.61
0.81
1.02

0.89
0.98
1.18
1.39

Regression with Integrate d Variable s 17

TABLE 6.3 (cont.)


Sample size Probabilit y of a smaller value"
(T)
0.01 0.02 5 0.0 5 0.1 0 0.5 0 0.9 0 0.9 5 0.97 5 0.9 9

(e) n = 0.25
50
100
200
400
(/) J* = 0.5
50
100
200
400
(g) J* = 1
50
100
200
400
(h) n = 10
50
100
200
400

N(0, 1)

.35
-3 ,10
-2.86
-2.68

-2.95
-2.72
-2.48
-2.31

-2.64
-2.41
-2.16
-2.00

-2.28
-2.05
-1.80
-1.63

-1.,02
-0,.74
-0,.52
-0,.36

0.29
0.53
0.74
0.90

0.67
0.90
1.11
1.26

1.00
1.21
1.42
1.58

1.35
1.58
1.77
1.95

-2.94
2.70
2.60
-2.50

-2.53
-2.33
-2.21
-2.14

-2.20
-2.01
-1.89
-1.82

-1.81
-1.64
-1.53
-1.46

-0.51
-0,.35
-0..25
-0,.18

0.78
0.93
1.02
1.09

1.15
1.29
1.40
1.45

1.47
1.61
1.71
1.79

1.85
1.98
2.06
2.16

-2.65
_2.52
-2.48
-2.42

-2.26
-2.15
-2.10
-2.06

-1.93
-1.84
-1.77
-1.74

-1.55
-1.47
-1.41
-1.37

-0.24
-0.17
-0.12
-0.09

1.06
1.12
1.15
1.18

1.44
1.49
1.53
1.54

1.76
1.81
1.84
1.87

2.16
2.19
2.20
2.25

-2.44
_2.38
-2.35
-2.35
-2.32

-2.03
-2.00
-1.99
-1.98
-1.96

-1.71
-1.68
-1.67
-1.66
-1.65

-1.32
-1.31
-1.30
-1.30
-1.28

-0.02
-0.02
-0.01
-0.01
0.00

1.28
1.27
1.27
1.27
1.28

1.65
1.65
1.65
1.64
1.65

1.99
1.97
1.97
1.96
1.96

2.39
2.35
2.33
2.33
2.32

Th e entrie s i n thi s tabl e ar e base d o n a t leas t 100,00 0 replication s usin g


GAUSS. Fo r an y /j,, th e sample s a t th e smalle r sampl e size s ar e sub-sample s of
the large r samples , tendin g t o reduc e variabilit y acros s T fo r an y jj,. I n
consequence the result s are monotoni c in T fo r give n [i.
b
Source: Secon d block of Table 4.2, in the limitin g case T -* .
For a dat a serie s suc h a s annua l GNP , E(Alo g (GNP,)) = 0.025,
which i s roughl y the sam e a s th e standar d deviatio n o f th e series . Sinc e
the rati o o f ^ t o th e standar d deviatio n o f th e serie s i s clos e t o 1 , we
refer t o th e ', u = 1 ' block , whic h suggest s tha t fo r th e GN P serie s th e
appropriate critica l value s ar e quit e clos e t o thos e o f th e norma l
distribution.
West's resul t i s i n th e spiri t o f th e discussio n earlie r i n thi s section .
Asymptotic normalit y prevail s onl y i n th e absenc e o f dominatin g sto chastic trends , becaus e i n that cas e conventional 6 central-limi t theorem s
may b e use d t o deriv e convergenc e result s fo r th e paramete r estimates .
6

B y 'conventional' we mean those applying to stationar y ergodic processes.

172

Regression wit h Integrated Variable s

This ca n b e achieve d i n a regressio n whic h ca n b e rewritte n i n term s o f


non-stochastically trendin g components , o r wher e a deterministi c tren d
dominates th e stochasti c one .
The nex t sectio n applie s th e asymptoti c theor y derive d earlie r t o
regressions wit h integrated variables . Th e firs t tw o example s ar e deriva tions, fo r specifi c cases , o f asymptoti c distribution s o f estimator s o r tes t
statistics. W e the n consider th e issu e o f dynami c modellin g mor e
generally, b y firs t presentin g a n exampl e fro m Stoc k an d Wes t (1988) .
Five mor e example s the n appl y thi s theory . Th e fina l sectio n look s a t
the issu e o f co-integratio n testin g whe n th e origina l data-se t ha s bee n
transformed (fo r example, b y differencing or b y taking logarithms).
6.2 .1. Asymptotic Normality of Unit-root Tests (West 1988)
Consider a DG P tha t contain s a constan t o r a constan t an d a trend . I f
the sam e variable s appea r i n th e model , th e estimato r o f th e coefficien t
of th e lagge d dependen t variabl e i s asymptotically normall y distribute d
and doe s not hav e a Dickey-Fuller type distribution .
Assume tha t y t i s generated by (fi h = 0)

The modl is given by


Define a scaling matrix

(7)
Now, notin g tha t y, = \n bt + ^'s=\us = fj, bt + St, i t i s possibl e t o sho w
that

Regression wit h Integrate d Variable s

173

An importan t featur e o f this derivatio n i s that, becaus e o f the particula r


scaling matri x chosen , onl y th e deterministi c par t o f y, plays a rol e i n
generating th e join t distributio n o f T~^ 2^]= \ut an d T~ 3//2 XT= i3 ; r-i M <Thus, fo r example , i n derivin g th e distributio n o f T~ 3/2^T= iyt~iut> w e
need conside r onl y the distributio n of

(see Tabl e 3.3) becaus e

since

The scalin g factors are suc h that an y term wit h th e stochasti c componen t
of y t, namel y S t, ha s a degenerat e asymptoti c distributio n an d ma y b e
ignored asymptotically .
From (7) , w e have tha t

where

Now, lim r^ T 2 2f= i*-i= M6/ 2 an d


(where, again , onl y th e deterministi c componen t o f
is importan t i n
determining thes e limits) . Thu s B r -5>B. A simpl e applicatio n o f Slut sky's theore m an d Cramer' s theore m i s then neede d t o prove , usin g (8) ,
that

174

Regression wit h Integrated Variable s

From (9 ) i t ma y the n b e deduce d tha t T


Looking no w at th e t -ratio,

3/2

(pb - 1 ) =>N(0, 12a 2/j4).

s 2 i s a consisten t estimato r o f <7 2, whic h ma y b e constructe d fro m th e


residuals o f th e estimate d model , s o plim i = o u. Usin g th e result s
derived earlier ,

The resul t tha t t(pb) i s asymptoticall y distribute d a s a standar d norma l


variable no w follow s fro m a n inspectio n o f (10 ) an d b y notin g tha t
This las t resul t i s remarkable. Th e asymptoti c distributions o f estima tors i n model s wher e stochasticall y trendin g variable s ar e presen t ar e
typically o f th e Dickey-Fulle r type . I n direc t contrast , th e asymptoti c
distribution o f th e estimator s i n (6 ) i s bivariate norma l whe n \i b = 0.
Similar result s appl y i f bot h th e mode l an d th e data-generatio n
process contai n a drift term , /j, c, an d a trend. In this case T 5/2(pc - 1 ) ^
N(0, ISOcr2/^), an d agai n t(p c) ^> N(0, 1).
6.2.2. Co-integrating Regression
Consider th e followin g bivariat e syste m o f co-integrate d variable s {y t}^
and {x t}:

where 8 ts i s th e Kronecke r delta . Th e least-square s estimato r o f / ? i s


given by

(13)
Thus,

(14)

Regression wit h Integrated Variable s 17

From (3.22) ,

In orde r t o deriv e th e limitin g distributio n o f T 2j t= ixtut, i t i s


convenient t o condition u t on e t i n the followin g fashion:
By construction, E(e tvs) = 0 V t = s .
Define W E(r) an d W a(r) a s th e independen t Wiene r processe s o n
C[0,1] obtaine d fro m th e { t}i an d {v t}i series , respectively . Now ,
using (12 ) an d (15) ,

(16)
Using (3.23 ) (als o se e Phillips 1987a : 282),

By the property assume d for the e, series,

Finally,

The proo f fo r (19 ) i s simila r t o thos e presente d i n Chapte r 3 an d i s


given i n Phillip s (1986 : 327) . Equatio n (20 ) follow s fro m (i ) e t an d v t
being identicall y an d independentl y distribute d processe s wit h zer o
means an d variance s o f o 2e an d al respectively , an d (ii ) th e independ ence o f the e t an d v t processe s (obtaine d b y construction). The limiting
distribution o f (16 ) ca n now be deduce d b y using (17)-(20) an d is

176 Regressio

n wit h Integrated Variable s

It ca n be show n that (se e Par k an d Phillip s 1988, an d Table 3.3 )

The ter m S 2 i s a consisten t estimato r o f a 2u an d ma y b e calculate d fro m


the residual s of the estimate d regressio n o f y, on x t. Thus, usin g (22) ,

In general , therefore , th e f-rati o o f f l wil l no t hav e a standar d norma l


distribution unles s y = 0 (tha t is , unles s x t i s strongly exogenou s fo r th e
estimation o f /?) . Whe n y ^ O , th e firs t ter m i n (24 ) give s ris e t o
'second-order' o r 'endogeneity ' bia s (se e Phillip s an d Hanse n 1990) ,
which, althoug h asymptoticall y negligibl e i n estimatin g f i du e t o supe r
consistency, ca n be importan t i n finit e samples .
The Durbin-Watso n statistic , compute d fro m th e residual s o f th e
estimated regressio n (11 ) ma y be show n t o converg e i n probability limit ,
to 2 . Thi s resul t follow s fro m ou r assumptio n tha t th e u t ar e independ ently an d identicall y distributed . I f th e u t ar e first-orde r autoregressiv e
with autocorrelatio n paramete r p\, the Durbin-Watso n statisti c tend s t o
the usua l 2( 1 Pi), familia r fro m th e asymptoti c theor y fo r stationar y
processes. Not e that , if pi = 1, {y t} and {x,} are not co-integrated , and
the estimate d valu e o f th e Durbin-Watso n statisti c shoul d b e clos e t o
zero. Thi
s propert y i s th e basi s fo r th e Sargan Bhargava test fo r co-integratio n (se e Chapte r 7) .
The existenc e o f nuisanc e parameter s ha s importan t effect s upo n th e
distribution o f p . I n th e ligh t o f Sectio n 6.2. 1 thi s i s t o b e expected .
Suppose [x t}i i s generated b y (fo r \i b = 0)

Regression wit h Integrated Variable s 17

Then

By result s i n Stoc k (1987 ) an d Wes t (1988) , an d intuitivel y fro m th e


orders o f magnitude involved,

Following Wes t (1988 ) (se e also Sectio n 6.2. 1 above), i t ma y the n b e


shown tha t n bT~^2^^= 1tut, an d henc e r~ 3//2 2f= i^Wf> i s normall y
distributed wit h mea n zer o an d varianc e ^ 2ba2u/3. Fro m Sectio n 6.2.1 ,
plim r~3 2f=i^ 2 = M&/3 - Hence , b y Slutsky' s theore m an d Cramer' s
theorem,

6.2.3. Example (Stock an d West 1988)


This exampl e describe s ho w a dynami c regressio n equatio n ca n b e
transformed t o validat e th e us e o f asymptoti c normal-distributio n
theory. W e nex t formaliz e th e argument s b y presentin g a genera l
theoretical framework . All o f th e example s discusse d in this section ma y
be viewe d a s specia l case s o f thi s genera l formulation . Thi s generaliza tion i s necessar y t o illustrat e th e subtletie s inheren t i n derivin g th e
distribution theory . Fou r mor e example s follo w th e descriptio n o f th e
general theory . Thes e elaborat e upo n an d illustrat e some specia l aspect s
of th e theor y an d yiel d recommendation s fo r empirica l modellin g wit h
integrated series .
Stock an d Wes t (1988 ) i s on e o f severa l paper s dealin g wit h test s o f
the Hal l (1978 ) permanen t incom e hypothesis. 7 Hall' s regression s tak e
the followin g form :
where c t i s consumptio n i n perio d t an d y t i s disposabl e income . Th e
processes generatin g c t an d y t ar e assume d t o hav e tw o importan t
properties. First , c t an d y dt hav e uni t roots ; tha t is , the y ar e bot h
integrated o f orde r 1 . Second , give n tha t th e permanen t incom e hypo thesis i s correct , y t ma y b e show n t o b e co-integrate d wit h c t. Thus ,
7
Othe r paper s includ e Mankiw and Shapir o (1985 , 1986) , Banerje e an d Dolad o (1987) ,
and Galbrait h e t al. (1987) .

1 78 Regressio

n with Integrated Variable s

while c t an d y dt ar e individuall y non-stationary , y dt ct i s stationary ,


possibly wit h a non-zero mean .
The permanent-incom e hypothesi s ha s tw o implications : first , / 3 = 1;
and second , n l = 7T 2 = . . . = n p = 0. I n mos t o f th e discussio n i n Stoc k
and Wes t (1988) , J3 i s restricted t o it s hypothesize d valu e o f one . Thus ,
a tes t o f th e permanen t incom e hypothesi s take s th e for m o f testin g th e
joint exclusio n restriction s o n th e TT,- . A join t tes t o f th e restriction s o n / 3
and th e TT , raise s severa l interestin g issues, an d w e wil l dea l wit h these i n
the contex t o f a late r example . I t wil l become clea r tha t suc h a joint tes t
will no t hav e th e usua l F distribution . Th e F-tes t o n th e JT,- , wit h th e
restriction o n 13 imposed, doe s howeve r hav e th e standar d F distributio n
asymptotically.
The ke y featur e o f th e regressio n give n b y (25 ) i s tha t al l th e
coefficients o n incom e ca n b e writte n a s coefficient s o n mean-zer o
stationary variables . On e possibl e rearrangemen t o f the variable s yields

or

where k i s th e intercep t o f th e long-ru n consumptio n function, 8


m = fi + k^P=1TTh an d 0 = (0 + 2f= iJr,-) .
Theorem 1 i n Sim s e t al. (1990 ) implie s tha t th e OL S estimator s o f
{TT,-} ar e jointl y asymptoticall y normall y distributed , convergin g t o th e
true value s a t th e rat e T 1//2. Theorem 2 of Sim s e t al. implie s tha t th e t or F-test s o n an y o r al l subset s o f thes e estimate d n { coefficient s hav e
the usua l asymptoti c distributions . I t i s wort h re-emphasizin g tha t i t i s
only th e existence o f a transformation , t o stationar y an d mean-zer o
regressors, tha t i s important. Ther e i s no uniqu e way t o accomplis h thi s
transformation, but , becaus e nothin g depend s o n th e precis e parameter ization chosen , uniquenes s i s not necessar y fo r th e result s t o hold . Test s
and coefficien t estimate s base d o n an y on e o f th e linearl y transforme d
regression model s wil l b e equivalent . I n particular , then , thi s wil l b e
true o f the untransforme d regression .
Having establishe d th e intuitio n fo r th e result s derive d b y Sim s e t al. ,
inter alia, i t i s necessar y t o procee d t o a formalizatio n o f th e model .
This sub-sectio n o f th e chapter , whil e relyin g heavil y o n Sims , Stock ,
and Watso n (1990 ) (hencefort h SSW) , doe s no t presen t th e argument s
Possibly equal t o zero .

Regression wit h Integrated Variable s 17

in al l thei r possibl e generality . Referenc e shoul d b e mad e t o SS W for a


complete description . Thei r notatio n is retained fo r convenience.

6.2.4. General Formulation


Most o f th e examples usuall y discusse d i n thi s literatur e ma y b e
expressed a s special case s of the followin g linea r time-serie s model :
where Y , i s a ^-dimensiona l vecto r an d A i s a k x k matri x o f
coefficients. Th e N x 1 vecto r o f disturbance s {if, } i s a martingal e
difference sequenc e wit h E[tj t\tii, . . ., q r _i] = 0 an d E[ij tri't\rii, . . .,
tlt-i] = lNtoTt = \,...,T? The N X N matri x S2 1/2 is the square root
of th e covarianc e matri x fl o f th e error s (iJ 1//2tj,). Th e matri x G i s a
selection matri x fo r th e errors . I t i s o f siz e k x N , i s assume d t o b e
known a priori, and determine s whic h errors ente r whic h equations. I t is
also assume d tha t A ha s k j eigenvalue s wit h absolut e valu e les s tha n 1 ,
and tha t th e remainin g k k eigenvalue s are exactl y equal t o unity.
In general , th e component s o f Y f ar e rando m term s o f various order s
of integration , constants , an d polynomial s i n time . Linea r combination s
of element s o f Y f , wit h order s o f integratio n lowe r tha n thos e o f it s
component elements , ma y als o b e included . A s lon g a s th e syste m
possesses suc h generalized co-integrating vectors, 10 SS W sho w tha t
r~p](3l1Y/Y! converge s t o a singula r limit , fo r a suitabl y chose n p .
Thus, th e analysi s mus t b e undertake n wit h a transforme d se t o f
variables Z M
The variabl e r Lt ha s severa l importan t properties . First , th e non-singular
matrix D i s chose n i n suc h a wa y tha t Z r i s decomposabl e int o it s
non-stochastic an d stochasti c components . Second , th e momen t matri x
2f=iZ(ZJ mus t be invertibl e almos t surely .
If ther e ar e n o stochasti c tren d component s i n th e decompositio n o f
Z, int o it s stochasti c an d non-stochastic components , o r a t leas t n o
dominating stochasti c tren d components , the n asymptoti c normalit y of
9

E[ij t\ti1, . . ., ij,_i ] = 0 i s th e propert y tha t define s a martingal e differenc e sequence ;


see Ch . 1 (or Hall an d Heyde 1980) . Thi s martingale differenc e sequenc e assumptio n i s not
important fo r th e derivatio n o f th e results . Al l convergenc e theorem s i n SS W ca n b e
proved whe n th e ij , ar e mixingales (Hal l an d Heyd e 1980 ) an d follo w a process suc h a s th e
one give n in , fo r example , Phillip s (1987a) .
10
Thi s i s SSW' s terminology . The y refe r t o suc h vector s a s generalize d co-integratin g
vectors t o allo w th e possibilit y tha t no t al l o f th e componen t element s o f th e linea r
combination hav e th e sam e orde r of integration .

180

Regression wit h Integrated Variables

the regressio n coefficient s holds , bot h i n th e transforme d an d i n th e


untransformed regressions . I n thi s case , w e ar e abl e t o transfor m th e
original regressor s an d expres s the m i n term s o f variable s tha t d o no t
contain stochasti c trends . Normalit y i s the n a natura l consequenc e o f
this transformatio n fo r th e sam e reason s a s i n standar d econometrics ,
where th e matri x o f sampl e secon d moment s tend s i n probabilit y t o a
non-random positiv e definit e matri x and th e usua l central-limit theorem s
apply.
The detail s o f th e derivatio n o f th e matri x D an d it s existenc e ar e
contained i n SSW . W e wil l procee d b y recordin g th e fina l for m o f th e
transformation. Lettin g | 1>r = 2s= i1s > ar >d definin g |,- j( (th e /-fol d
summation o f th e if^ ) recursivel y a s |fy )( = Ss= il;-i,.s > 1 ^J ^ S> tn e
transformation D is chosen suc h that

or, equivalently
where

and L i s th e la g operator . Th e variate s v, ar e referre d t o a s th e

Regression wit h Integrated Variable s

181

canonical regressor s associate d wit h Y ( . Th e la g polynomial F U (L) ha s


dimension k\ x N , an d ^JLoFiijF'iij i s non-singular . F yy i s assume d t o
have ful l ro w ran k k; (ma y be equa l t o zero ) fo r j = 2, . . ., 2 g + 1, so
Since w e ma y b e intereste d i n estimatin g onl y som e o f th e k
equations i n (28) , we nex t need t o defin e a selectio n matri x C . I f w e
needed t o conside r onl y n ^ k , w e could loo k a t th e regressio n o f CY ,
on Y f _ i , wher e C i s a n n X k matri x o f constants . Th e n regressio n
equations t o be estimated ar e the n

The asymptoti c analysi s i n SS W is derive d i n stacke d single-equatio n


form. I n orde r t o us e thi s form, we need th e symbo l whic h denotes a
Kronecker produc t define d a s follows : conside r th e m x n matri x
A = {fly } an d th e p X q matri x B ; th e Kronecke r produc t o f A an d B
(in that order) i s the m p x n q matrix ,

V e c ( - ) denote s th e column-wis e vectoring operator . Thus , writin g the


matrix A a s A = (a 1; a 2 , . . ., a n ), wher e eac h o f th e a , i s a n m x 1
vector, vec (A) is given by

X = [Yi , Y 2 , . . ., Yj--!]', s = vec(S) , v = ve c (if), an d ft = vec((A)'),


then (32 ) ca n be writte n in stacked for m a s
In orde r t o expres s (33 ) in term s o f th e transforme d regressor s Z =
[Z{, Z 2 , . . ., Z'T-_I] ' = XD', not e tha t th e coefficien t vecto r correspond ing to thes e i s given by 6 = (! D'"1)/?.11 Thus , finally ,
11
T o sho w this , substitut e fo r Z = XD' an d 5 = (! OD'^ 1)^ i n (34 ) giving
s = (! XD')(In D'- 1 )/? + ( J/2 ir _ 1 ) v . NOW (Aj A 2 )(A 3 A 4) = (A!A 3) (A 2 A 4 ),
for arbitrar y matrice s A,- , i = 1 , 2, 3 , 4 , provide d th e matrice s ar e conformable . Usin g thi s
rule (33 ) is recovered a s required.

182 Regressio

n wit h Integrated Variables

The OL S estimator 5 of 6 in th e stacke d transforme d regressio n mode l


(34) is given by
It i s possible t o se e fro m (30 ) tha t th e moment s involving the differen t
components o f Zt converg e a t differen t rates . Fo r example , Z l j f an d Z 2 ,
are O p(l) whil e Z 3>f i s O p(t^2), Z 4j , i s O p(t), an d s o on . Henc e th e
sample secon d moments , whic h is what we would be intereste d i n when
looking a t th e matri x Z'Z, converg e a t a rate o f T fo r th e Z l i t an d Z 2tt
components, a t a rat e T 2 fo r th e Z 3;( component , an d a t a rat e T 3 fo r
the Z 4 r component . I n orde r t o handl e thes e differen t orders , SS W use
the scaling matrix Tr , given by

(36)
1

All the convergenc e result s use the scale d Z' Z matri x T^Z'ZTy ; le t us
call this scaled matri x Q .
The firs t ste p in th e proo f i s to deriv e th e limitin g matrix for Q . SSW
show that , unde r certai n regularit y conditions , Q = $ > V wher e th e elements of V may b e describe d a s follows :
(a) V u an d V 12 ar e non-rando m matrice s give n b y S7= o Fn/Fii/ an d
2F=oFii/F2iy respectively . Additionally, V ]2 = V 21.
(b) V l p = V ^ = 0, p = 3, ...,2g + l.
(c) V 22 is also non-random, give n by F22F22 + S 7=0^21/^21;
(d) V mp , wher e m, p = 3, 5 , 7 , . . ., 2 g + 1, ar e rando m matrice s
involving functionals of multivariate Wiener processes .
(e) V mp, where m = 2, 4 , 6 , . . ., 2g , p = 3, 5 , 7 , . . ., 2 g + 1, are als o
random matrice s involvin g functional s o f multivariat e Wiene r pro cesses.
(f) V mp = [2/(p + m-2)] mm'pp, p = 4, 6, . . ., 2g, m = 2, 4, 6, . . .,
2g.
This i s the firs t tim e w e have used multivariat e Wiener processes . Th e
mathematical detail s involve d i n goin g fro m univariat e t o multivariat e
Wiener processe s ar e comple x an d wil l no t b e deal t wit h her e (fo r a
good account , se e Phillip s an d Durlau f 1986) . Howeve r th e generaliza tions fro m ou r analysi s in Chapte r 3 can b e understoo d intuitivel y fairl y
easily an d the appendix sketche s th e bivariate case .
Thus, eac h elemen t o f a standardize d n x 1 multivariat e Wiene r
process W(r ) i s a univariat e Wiene r proces s an d th e element s o f W(r )
are independent . I n particular , W(l ) i s the multivariat e standar d norma l

Regression with Integrated Variables

183

density, tha t is , N(0, !). Further, W(r ) e C[0,1]", wher e C[0,1 ] is the
space of continuous function s defined on [0,1] .
Convergence result s analogou s t o (3.17) , fo r a sequence o f mean zero
random vector s {u (}, ca n b e prove d b y definin g standardize d sum s such
as

with (t - l)/ r ^r<t/T an d t = 1,2, . . ., T , s o that r e [0,1]. S, is the


matrix o f th e cumulativ e su m o f th e n X
~ A1 irando
annum
m vector
vcnui as UuM, i.e .
S, = 2)I=i u i> an d tn e matri x f t i s th e long-ru n variance-covarianc e
matrix o f u, - define d b y f t = limr^00.E(T~1S:rS'r) analogousl y wit h
(3.16c). Th e {uj innovatio n sequenc e satisfie s conditions equivalen t t o
those give n by (3.16a)-(3.16d) fo r the univariat e case . Provide d suitabl e
regularity condition s ar e satisfied , the following multivariate analogue of
(3.18) may be proved :
RT(I-) = > W(r).
Finally, multivariat e analogue s o f al l th e convergenc e result s give n
earlier fo r univariat e processe s ma y b e derived . Thus , fo r example ,
referring t o Table 3.3, wher e y, = y r _ j + u r :

To derive th e result s abov e w e have assumed , a s in Table 3.3 , tha t {u j


is a white-noise innovatio n sequence wit h ! a s the varianc e matrix.
The nex t ste p o f the argumen t involve s rewritin g the estimato r 6 i n a
form suc h tha t it s distributio n ca n b e derived . Thi s i s don e b y firs t
defining a non-singula r matri x H which , i n essence , transpose s th e
stacked version of the matri x Z. Thus ,
(37)

From (35) ,

184 Regressio
by substitutin g fo r s

n wit h Integrated Variable s


fro m (34) . Next , usin g th e resul t tha t

Thus,

(38)
As note d abov e th e matri x V is the limitin g matrix of Q .
The asymptoti c distribution of

is neede d t o giv e us th e fina l result . Thi s limitin g vector, denote d b y


takes th e followin g form:

where
(a) (j) m fo r al l m ^ 3 are functional s of multivariate Wiener processes ;
(b) 0 2 = 02 i + 022 , wher e ft, 2 = vec[F 22W(l)'S1/2], W(l ) is th e multi variate standar d norma l densit y function, and

Finally,

where (ft , 0 21) ar e independen t o f (0 22, ft , . . ., ft


these steps , w e have the followin g theorem.

g+i).

Consolidatin g

This provide s u s wit h severa l interestin g results . First , d, an d henc e /} ,


is a consisten t estimato r o f 6, respectivel y /J , i n th e presenc e o f
arbitrarily man y uni t root s an d deterministi c tim e trends . Thi s observa tion relie s o n th e assumptio n tha t th e mode l i s correctly specified , i n th e

Regression wit h Integrated Variable s 18

sense tha t th e error s ar e martingal e differenc e sequences , an d th e T T


may rescale by powers of T greate r tha n \.
We have alread y noted tha t th e estimate d coefficient s o n th e element s
of Z r converg e t o thei r probabilit y limit s a t differen t rates . Hence , if
some o f th e transforme d regressor s ar e dominated , i n a n orde r o f
probability sense , b y stochasti c components , thei r limitin g distributions
will b e non-normal . O n th e othe r hand , i f ther e ar e n o Z , regressor s
dominated b y stochastic trend s (tha t is , if & 3 = k 5 = . . . = k 2g+i - 0) ,
then d, an d henc e ft , ha s a n asymptoti c normal joint distribution . This
happens becaus e th e term s involvin g the rando m integrals ar e n o longe r
present, a s ma y be see n fro m (30) , where k 3, k$, . . ., k 2g+i ar e th e
ranks of matrices multiplying the stochasti c canonical regressors. I f these
matrices ar e absent , th e transforme d regressio n i s considerabl y simpli fied a s i t i s expressibl e solel y i n term s o f stationar y variable s
and deterministi c tren d terms . I n suc h a case , therefore ,
H(I B T r )(3-*)4. N(0 , H(S V^)H') wher e V i s no w a nonrandom matrix . Additionall y th e F-statisti c associate d wit h testin g a n
arbitrary se t o f q linea r restriction s R/ J = r, i s asymptotically distributed
as $ in this case .
If a singl e stochasti c tren d i s dominate d b y a non-stochasti c trend ,
then, again , asymptoti c normalit y holds . Thi s i s th e resul t o f Wes t
(1988) an d ma y b e see n usin g (30 ) and keepin g trac k o f th e rate s o f
convergence o f th e sampl e moment s o f th e separat e component s o f Z f .
Consider, fo r example , th e se t o f canonica l regressor s give n b y
(tit, 1 , %itt, t)' an d suppos e th e transforme d regressio n i s expressibl e i n
terms of these canonica l regressors. Thus , whil e the sampl e variability of
the stochasti c tren d ter m i s O p(T), tha t o f th e deterministi c tren d i s
O(T3/2). A s show n b y Wes t (1988) , an d discusse d i n Sectio n 6.2.1 , i n
deriving th e asymptoti c distributio n for thi s case , th e deterministi c trend
component dominate s th e stochasti c componen t an d asymptoti c normality follows .
The Stock-Wes t (1988 ) example , discusse d earlier , work s because w e
are abl e t o rewrit e th e regressio n i n term s of canonica l regressors which
do no t hav e an y dominating stochasti c component. Th e issu e o f domina tion, i n this context, i s best addresse d b y looking at the scalin g matrix.
Four mor e example s wil l no w b e give n t o illustrat e thes e arguments ,
using th e framewor k develope d above . Th e fina l exampl e i n thi s se t o f
four contain s recommendations fo r modelling with integrated series .
6.2.5. Example (Sims e t al . 1990:119)
Let th e proces s {x,} b e generate d accordin g t o th e followin g AR(2)
process without drift :

186

Regression wit h Integrated Variable s

Under H 0, f a = 0, f a + fa = 1 and |/3 2| < 1 so tha t th e autoregressiv e


polynomial i n (39 ) ha s onl y on e uni t root . I f a constan t i s include d i n
the regressio n o f x, o n it s tw o lags , Y , (i n th e notatio n develope d
earlier) i s given by

Transforming t o th e canonica l regressor form, 12 w e have

(40)
where 61 = fa, 6 2 = fa , an d 6 3 = f a + fa , Z l>t
Z 3; f = x t. It ma y also be shown that

Z 2 ( = 1 , an d

(41)
where 0(L ) = (1 + faL)' 1 an d 0*(L) = (1 - L)" 1 [0(L) Note fro m (41 ) tha t F 2 i(L) = 0. Thi s implies , b y referrin g t o th e
description o f th e V matri x above , tha t V i s block-diagonal . Th e
estimate d j o f the coefficien t on th e (differenced ) stationary ter m ha s an
asymptotically norma l distributio n wit h mea n 0 an d varianc e give n b y
Vf]1. Th e margina l distribution o f o 2) however , i s no t normal ; becaus e
F23 i s no t equa l t o zero , Z 2 ,t an d Z 3 j r ar e asymptoticall y correlated ,
and sinc e Z^ t ha s a Wiener distribution , so does the coefficien t o n Z 2:t .
If a n intercep t i s no t include d i n th e regression , w e hav e a 2 x 2
block-diagonal V matrix . Th e estimate d coefficien t o j stil l ha s a n
asymptotically norma l distribution , wit h d^ convergin g to it s probability
limit a t rat e T 1/2, whil e S 3 has a Wiene r distributio n wit h convergence
at rat e T . An y join t tes t involvin g di an d 6 3 wil l als o hav e a
non-standard distribution.
The analog y with the Stock-Wes t exampl e is direct. I n (27 ) we ha d a
series o f term s integrate d o f orde r zero . Th e coefficien t estimate s o n al l
these stationar y term s were jointly and individuall y asymptotically normally distributed . Th e join t distributio n o f 0 i n (27) , wit h an y o f th e 77, ,
was o f cours e non-standard . Thi s observatio n applie s equall y well here .
There is , however , a n importan t differenc e betwee n th e Stock-Wes t
12
Thi s transformatio n i s no t unique , an d on e coul d imagin e choosin g others ; however ,
(39) ca n be rewritte n a s x, = (f) l + /3 2)*,_i - /3 2(*,-.i ~ x t-2> + 1t> because j8 0 = 0 under th e
null, an d thi s suggest s th e decompositio n give n b y (40) . I t ha s th e advantag e o f makin g 6 l
(= /32) th e coefficien t o f a non-integrate d rando m variable , sinc e x , i s a n integrate d
series.

Regression wit h Integrated Variable s

187

example an d th e curren t example . I n th e forme r case , becaus e /3 ha d


already bee n se t equa l t o 1 , ou r parameter s o f interes t coul d al l b e
written a s coefficient s o n mean-zer o an d non-integrate d variables .
Inference coul d the n b e conducte d usin g standar d tables . I n th e latte r
case, althoug h w e can us e standar d table s t o tes t fo r th e significanc e o f
j32, a test o f fli + /3 2 = 1 still requires u s to us e non-standard distributio n
theory (an d s o table s constructe d b y simulation) . I n a sense , ou r
rewriting i n term s o f stationar y variables i s not sufficientl y successfu l t o
enable u s t o conduc t inferenc e solel y usin g standar d tables . Exampl e
6.2.6 examines this issue in more detail .

6.2.6. Example (Sims e t al . 1990: 128)


Suppose no w tha t x, is generate d a s in Sectio n 6.2. 5 bu t /? 0 i s non-zero
under the null . The canonica l representation 13 yields

(42)

(43)

where 6(L) an d 0*(L ) ar e define d a s in Section 6.2. 5 above.


Here, unlik e th e exampl e i n Sectio n 6.2.5 , ther e ar e n o element s o f
Z ( dominated b y a stochasti c integrate d process . Th e stochastic-tren d
term i s dominated, i n sample variability, by the deterministic-tren d ter m
t. A detaile d discussio n of this case appears i n West (1988) .

6.2.7. Example (Banerjee an d Dolado 1988)


This exampl e i s a consolidatio n o f most o f th e principa l points discussed
in th e page s above . I t i s a variation of the Stock-Wes t example , an d al l
statements concernin g th e distribution s o f variou s paramete r estimate s
may be derive d fro m earlie r genera l principles.
13
Thi s decompositio n agai n ha s th e advantag e o f makin g 6 1 th e coefficien t o f a
non-integrated variable . Th e motivatio n fo r choosin g thi s transformatio n i s therefor e
similar t o tha t give n fo r the exampl e i n Sect. 6.2.5.

188 Regressio

n wit h Integrated Variable s

Consider th e followin g regression :

where y f denote s th e logarith m o f disposabl e incom e an d c t th e


logarithm o f consumption , an d bot h variable s ar e 1(1 ) i n levels . Here ,
although w e hav e non-stationar y variable s a s regressors , i f the y ar e
co-integrated wit h each other , a s the y mus t b e i f any o f th e permanent income/life-cycle model s o f consumptio n ar e t o mak e sense , the n thi s
co-integration propert y make s bot h side s o f th e regressio n equatio n 1(0 )
and th e /-test s o f th e coefficient s o f al l the regressor s ar e asymptotically
normal. Th e long-run - multiplier betwee n consumptio n an d incom e ca n
be deduce d muc h as in an y dynamic model.
A varian t of (44 ) is the mode l

Although th e individua l t-ratio s ar e asymptoticall y normally distributed ,


the distributio n o f th e Wal d statistic , use d fo r testin g th e join t nul l
hypothesis j 3 =< 5 = 0 , i s a functiona l o f a Wiene r proces s an d it s
distribution i s non-standard. Mor e interestingly , if (45) were re-paramet erized a s

where s t-i = y,_i - c t _j, yi = ft + 6, y 2 = j8 , an d st-i ma y be show n to


be 1(0 ) under th e assumption s of the permanent-incom e hypothesis , the n
I(YI = 0) woul d b e a functiona l o f a Wiene r proces s wherea s f(y 2 = 0)
would hav e an asymptoticall y normal distribution .
In th e genera l mode l give n b y (44) , th e followin g result s ma y b e
proved, using theorems 1 and 2 in SSW (1990):
(a) Th e /-statisti c o f eac h coefficien t individuall y i s asymptoticall y
normally distributed.
(&) Th e F-statistic s o f join t significanc e of an y prope r subse t o f th e se t
of stationar y regressor s hav e standar d asymptoti c distributions .
Thus, an y tes t o f th e join t significanc e of Ay f _y ( / = 1 , . . ., n 1 )
and Ac ( _y ( / = 1, . . ., m - 1 ) will hav e th e correc t siz e i f standar d
tables ar e used . Further , give n tha t th e non-stationar y variable s ar e
co-integrated, i f th e regressor s i n th e non-stationar y se t wer e com bined, say , t o giv e p stationar y regressor s an d q non-stationar y
regressors,14 a n F-statisti c tha t use s an y o f th e derive d p stationar y
14
I n (46) , fo r example , p = q = 1 and th e origina l numbe r o f non-stationar y regressor s
(excluding the trend ) is 2.

Regression with Integrated Variables 18

regressors i n combinatio n wit h an y o f th e origina l stationar y regres sors wil l also have a standard distributio n asymptotically .
(c) Th e F-statistic s o f join t significanc e o f an y subse t o f th e se t o f
non-stationary regressor s hav e non-standar d distributions .
Moreover, a n F-statisti c tha t use s an y stationar y regressors i n
combination wit h an y non-stationar y regressor s wil l hav e a non standard distribution .
Point (a ) i s obtaine d fro m th e propert y o f th e non-stationar y regres sors formin g a co-integrate d set ; a s in Sectio n 6.2. 3 above, bot h 6 and /3
can b e writte n a s coefficient s o n mean-zer o stationar y variable s (wit h
(46) givin g on e suc h re-parameterizatio n fo r /?) . Th e nex t exampl e
reconsiders thi s poin t i n th e contex t o f modellin g practice . Poin t (b) i s
not surprisin g becaus e th e F-statistic s considere d us e onl y stationar y
regressors. Th e fac t tha t som e o f thes e stationar y regressor s ma y b e
re-parameterizations o f som e o r al l of the origina l non-stationary regres sors i s an interesting feature .
Point (c ) i s surprising in two respects. Conside r (44 ) and (46) ; the firs t
surprising featur e i s th e non-standar d behaviou r o f th e F-statisti c an d
the secon d i s that , whil e th e f-rati o o f th e coefficien t o f c t-\ ha s a
standard distributio n unde r parameterizatio n (45) , unde r th e linea r
re-parameterization give n b y (46 ) th e t -ratio ha s a Wiene r distribution .
Both result s follo w fro m th e asymptoti c singularit y o f a particula r
variance-covariance matrix. 15
Consider y i i n (46) , whic h tend s t o a non-degenerat e distributio n a t
rate T ; T l/2<fi therefor e ha s a degenerat e distribution , an d T 1/2}>2 i s
asymptotically normally distributed. Thus ,

and s o

This account s fo r th e asymptoti c singularit y o f th e variance-covarianc e


matrix o f [ 6 , /?]' an d th e correspondin g non-standar d behaviou r o f th e
F-statistic i n (45) . However , th e distributio n o f Tji ma y b e show n t o
be non-degenerate . y \ ca n b e writte n a s a functiona l o f Wiene r
processes, an d th e scalin g facto r (o f T ) suggest s th e resultin g non standard distribution .
15
Th e asymptoti c singularit y o f th e variance-covarianc e matri x i s th e proble m o f
multi-collinearity in another guise. O n this , also see SS W (1990).

190 Regressio

n wit h Integrated Variable s

It i s instructive t o not e tha t th e regressio n give n by (44 ) would no t b e


sensible unles s th e right-han d variables or regressor s wer e co-integrated .
A specia l exampl e o f (44 ) wa s discusse d i n sectio n 6.1 , wher e w e spok e
of a n unbalance d regression . Thi s i s a muc h mor e genera l poin t tha n
that mad e i n th e contex t o f spurious regression. A regressio n involvin g a
right-hand se t o f variable s integrate d o f a n orde r differen t fro m th e
order o f integratio n o f th e left-han d sid e i s jus t a s problemati c a s a
regression betwee n tw o unrelate d non-stationar y series . I n eac h case ,
the distribution s of the statistic s are non-standard .
6.2.8. Example (Stock and Watson 1988a)
Stock an d Watso n (1988a ) provid e a n exampl e o f th e danger s involved
in no t properl y takin g accoun t o f th e order s o f integratio n o f th e
regressors an d th e regressand . The y se t u p a simpl e data-generatio n
process base d o n th e permanent-incom e hypothesis:

where
y* = the permanen t componen t o f disposabl e incom e whic h i s as sumed t o follo w a random wal k
ct = consumption
yst = transitory componen t o f disposabl e incom e whic h is a stationary
innovation proces s
p, = price leve l in period t.
The innovation processes u, and v t ar e uncorrelated .
Stock an d Watso n relat e th e tal e o f two econometricians tryin g to tes t
versions o f Friedman' s permanen t incom e hypothesis . Th e misguide d
econometrician, unawar e o f o r choosin g t o ignor e th e order s o f integration o f the series , estimate s the followin g regressions :
c, = <x\ + Pipt (t

o chec k money illusion)

ct = a 2 + $2* (t

o check whethe r consumptio n ha s a trend )

Ac, = a 3 + !3 3Ay, (t

o calculat e the margina l propensity t o consume)

Ac, = 1X4 + 04y t-i (t

o tes t th e permanen t incom e hypothesis).

Each o f the inference s from thes e regressions i s invalid.

Regression wit h Integrated Variable s 19

The firs t regressio n i s a spuriou s regressio n o f th e classica l Granger Newbold kind ; c, an d p, ar e unrelate d rando m walks , an d th e eco nometrician's findin g o f a larg e ^-statisti c fo r j8 l5 thereb y leadin g hi m t o
conclude i n favour of money illusion, 16 i s a spurious one .
The secon d regressio n i s als o spuriou s sinc e i t attempt s t o explai n a
random wal k (or, i n other words , a stochastically trending variable) b y a
deterministic trend . Nelso n an d Kan g (1981 ) pointe d ou t th e danger s of
running regression s whic h attemp t t o de-tren d stochasticall y trendin g
data i n th e vai n hop e o f achievin g stationarity aroun d a trend . I n bot h
cases th e problem s wit h th e inference s aris e becaus e th e regression s
involve variables tha t ar e no t co-integrate d (se e Chapte r 3) .
The thir d equatio n appear s t o b e correctl y specifie d bu t nevertheles s
leads t o downwardl y biased estimate s o f th e coefficien t for th e margina l
propensity t o consum e becaus e disposabl e incom e measure s th e chang e
in permanen t incom e wit h error , sinc e i t include s th e chang e i n
transitory incom e a s well . Th e fina l regressio n i s wha t w e calle d a n
'unbalanced regression ' a s i t trie s t o explai n a variabl e integrate d o f
order zer o b y a variabl e integrate d o f orde r 1 . Th e serie s o f paper s
noted abov e (Manki w an d Shapir o 1985 , 1986 ; Banerje e an d Dolad o
1988; Galbrait h e t al. 1987 ) conside r th e exten t t o whic h th e f -statistics
in suc h case s ar e biase d awa y fro m zero , leadin g t o misleadin g infer ences abou t th e significanc e of coefficients.
Stock an d Watso n compar e th e predicamen t o f thi s econometricia n
with econometricia n B , say , wh o look s a t th e result s o f th e followin g
alternative regressions :

The inference s fro m eac h o f thes e regression s wil l be , b y an d large ,


correct. Th e firs t regressio n her e i s th e standar d co-integratin g regres sion an d thi s tim e i s valid. Th e estimat e o f th e coefficien t 61 wil l have a
Wiener distributio n bu t wil l be super-consistent . Th e reporte d standar d
error wil l be incorrec t owin g to untreated autocorrelation .
The secon d regressio n ca n be re-parameterized 17 a s
Thus, (5 3 ca n b e writte n a s a coefficien t o n a stationar y variable (a s ca n
62 treate d i n isolation). Th e theory , a s described above , implie s that th e
16
Inferenc e o f thi s kin d woul d appea r t o b e faulty , i n an y case . T o conside r a rejectio n
of H 0: fl l = 0 a s a reaso n fo r acceptin g an y specifi c alternativ e i s statistically an d logicall y
unjustifiable.
17
O r i n a form analogou s to tha t give n b y (44) .

192 Regressio

n wit h Integrate d Variable s

usual t an d F distributions 18 wil l apply . A simila r argumen t applie s t o


the thir d regression , wit h th e exceptio n tha t i n thi s cas e y t~i ct_i
forms th e co-integratin g relation . Stoc k an d Wes t (1988 ) an d Banerje e
and Dolad o (1988 ) discus s regressions o f this form i n further detail .
The mora l o f th e econometricians ' stor y i s the nee d t o kee p trac k o f
the order s o f integration o n bot h side s o f the regressio n equation , whic h
usually mean s incorporatin g dynamics ; model s tha t hav e restrictiv e
dynamic structure s ar e relativel y likel y t o giv e misleadin g inference s
simply fo r reason s o f inconsistenc y o f order s o f integration . Specificit y
was clearly th e proble m wit h several o f the model s propose d b y th e firs t
econometrician. A genera l t o specifi c metho d o f econometri c modellin g
would hav e overcom e man y o f th e problem s o f spuriou s inference s an d
non-standard distributions . A n initia l model, mor e genera l tha n th e on e
postulated b y the secon d econometrician , o f the form , say,
would b e mor e appropriat e fo r inferenc e whe n wea k exogeneit y condi tions ar e satisfied. 19 Accoun t mus t b e take n o f fact s (a)-(c ) o f Sectio n
6.2.7 whe n conductin g suc h inference ; mor e generally , th e exampl e
illustrates way s i n whic h th e theor y o f modellin g wit h integrate d
variables ha s contribute d t o improvin g ou r understandin g o f wha t
constitutes goo d practice i n dynamic modelling.

6.3. Functiona l Form s an d Transformation s


We dre w attentio n i n Chapte r 1 t o th e fac t tha t man y economi c tim e
series wil l com e clos e t o conformit y with the integrate d model s onl y if a
logarithmic transformatio n i s applied . Th e logarithm s o f man y suc h
series ma y b e integrated , bu t i t seem s unlikel y that th e untransforme d
levels o f macroeconomi c tim e serie s suc h a s consumption , nationa l
income, an d th e pric e leve l coul d b e mad e stationar y b y differencin g
alone. I t i s worth examinin g this transformation mor e closely , alon g with
the effec t tha t i t ma y b e expecte d t o hav e o n a n equilibriu m relation ship. I f th e level s o f tw o serie s ar e co-integrated , d o w e expec t th e
logarithms to be co-integrate d also , an d vice versa?
Begin by examining a series wit h a tendency t o gro w over tim e subject
to stochasti c shock s whic h ten d t o gro w wit h th e underlyin g series. Fo r
example,
18
Th e F-distributio n wil l appl y whe n lookin g a t test s o f join t significanc e o f subset s o f
regressors, eac h o f which is 1(0). I n thi s example , becaus e on e o f th e regressor s i s 1(1) an d
the othe r i s 1(0), th e F-statisti c will hav e a non-standard distribution .
19
Se e Ch . 8 and earlie r discussio n i n this chapter .

Regression wit h Integrated Variables 19

where e t ha s a mean o f 1 and i s log-normally distributed. A serie s suc h


as Y t might describe a number of economic tim e series , a t leas t i n broad
outline. Takin g th e logarithmi c transformatio n o f (51 ) an d usin g lowercase letters t o denot e th e transforme d variables with Y, > 0,

where log(1 + y ) y and e t = log (e t ).


Equation (53 ) i s indee d commonl y use d a s a simpl e characterization
of th e logarithm s o f economi c tim e series . A s a descriptio n o f suc h a
transformed dat a series , (52 ) o r (53 ) seem s a t leas t admissible ; Ay , i s
the growt h rate o f the leve l serie s Y t, and this growth rate varies aroun d
a (typicall y positive ) mean . Tha t thi s equatio n coul d describ e th e leve l
of th e serie s (s o y t denote s th e origina l dat a withou t th e logarithmi c
transformation) seem s implausible , however: (53 ) woul d then impl y that
the absolut e amoun t o f growt h varie s aroun d a fixe d mean , an d
therefore that , a s th e serie s grows , th e averag e amoun t o f growt h fall s
to zer o a s a proportion o f th e serie s itself . Moreover, cr 2 /var(Y < ) would
tend t o zero , forcin g th e serie s t o becom e essentiall y deterministi c i n
relative terms . Thi s criticis m doe s no t appl y t o (53 ) sinc e a i s a
proportion o f Y t.
Ermini an d Hendr y (1991 ) conside r th e issu e o f testin g 'logarithm s
versus levels' b y formulating a test base d o n the encompassin g principle.
The nul l mode l MI may be sai d to encompas s the riva l or alternativ e
model MI i f M\ i s able t o explai n th e finding s o f M 2 . Alternatively , if
the riva l mode l doe s no t adequatel y characteriz e th e propertie s o f th e
process generatin g the series , th e nul l model ough t t o b e abl e t o predic t
the form o f mis-specification one woul d expect to fin d i f the riva l mode l
were estimated.
To pursu e th e las t point , suppos e a dat a serie s {Y t} i s well characterized b y a rando m wal k i n logarithm s wit h a stabl e drif t an d homo skedastic errors. Suppos e furthe r tha t thi s implies that regressin g AY , on
a constan t woul d yiel d unstabl e estimate s an d heteroskedasti c errors . A
simple initia l tes t woul d the n b e t o estimat e th e rando m wal k i n bot h
logarithms an d level s an d se e whethe r th e model s displaye d th e pre dicted behaviour. 20 I f th e nul l model als o ha d prediction s t o offe r abou t
20
Th e processe s correspondin g t o 'rando m wal k i n logarithms ' an d 'rando m wal k i n
levels' ar e Ay , = f t + , an d A Y, = fi 2 + v,, respectively.

194 Regressio

n wit h Integrated Variable s

the for m o f th e instabilit y o f th e parameters , th e tes t coul d b e


sharpened b y testin g for th e presenc e o f particular kind s of misspecificationsay, drif t o r variance s of errors increasing exponentially over time .
In general , th e entir e argumen t shoul d als o b e ru n i n revers e b y taking
the riva l mode l a s th e null ; however , linea r model s d o no t ensur e
positive observations, so awkwar d issue s arise.
We illustrat e thi s discussio n wit h th e tim e serie s analyse d i n Chapte r
1, namely real ne t nationa l produc t (Y, i n 192 9 million) for th e Unite d
Kingdom ove r 1872-197 5 (fro m Friedma n an d Schwart z 1982) . Th e
approach follow s that in Ermini an d Hendry (1991) .
First, w e mode l th e leve l o f ne t nationa l produc t ove r th e sampl e
1875-1975 b y OLS . Onl y on e lagge d differenc e wa s neede d t o remov e
any residual serial correlation, yielding

where th e standar d error s o f coefficien t estimate s ar e show n i n paren theses, o i s th e equatio n standar d error , an d S C i s th e Schwar z
criterion. (Smalle r value s on balanc e produc e preferabl e models. ) Sinc e
the mea n o f Y i s 4701.0 , th e a a s a percentag e o f Y i s 3. 1 pe r cent .
However, th e coefficient s ar e no t constan t ove r th e sampl e period , a s
shown i n Fig . 6.1 fo r th e intercept , an d Fig . 6.2 fo r th e one-ste p
residuals an d o . (Se e Hendr y (1989 ) fo r details.) 21 Th e intercep t trend s
upwards, an d o increase s ove r time , eve n ignorin g the larg e shoc k i n
1919-20. O n an y constancy test, th e mode l i s rejected a t fa r beyon d th e
1 per cen t leve l (e.g. tha t of Hansen 1992) .
Next w e mode l growt h i n logs . A s before , on e lagge d differenc e
removed residua l seria l correlation, giving

21
Recursiv e estimatio n involve s estimatin g a n equatio n ove r successivel y large r sub samples, startin g fro m a minimu m sub-sampl e an d extendin g t o th e ful l sample . Paramete r
instability ma y b e tracke d b y lookin g a t th e behaviou r o f th e estimate d coefficients , a s
sample siz e i s increased , t o se e whethe r the y fluctuat e significantl y o r remai n stable .
Recursive Cho w (1960 ) test s ma y b e compute d i n a t leas t tw o ways . Th e firs t involve s
estimating th e equatio n from , say , t = 1 to ( = 7\ , wher e T l i s greater tha n th e minimu m
sample size , an d the n fro m t = I t o t = T t + 1. The one-step-ahea d Cho w tes t is based on
a compariso n o f th e residua l varianc e o f th e tw o estimate d equation s an d i s a n F-tes t
under th e nul l o f paramete r constancy . A secon d tes t i s give n b y estimatin g th e equatio n
from, say , t = 1 to ( = T } an d comparin g th e residua l varianc e o f this regressio n wit h tha t
of th e equatio n estimate d ove r th e ful l sample . A sequenc e o f thes e Cho w test s i s built u p
by augmentin g th e sub-sampl e siz e b y on e a t eac h step , e.g . T 1 + 1 t o 7 \ + 2, an d

Regression with Integrated Variables

195

FIG 6.1. Recursiv e estimate s o f intercept i n levels mode l

FIG 6.2. One-ste p residuals i n levels mode l

comparing th e residua l varianc e o f eac h o f thes e equation s wit h th e ful l sampl e residua l
variance. Alternatively , th e sequenc e o f one-ste p residual s (o r forecas t errors ) ca n b e
examined relative to the residua l variance a t eac h sampl e size.

196

Regression with Integrated Variables

The percentag e a i s 3. 3 pe r cen t bu t no w th e intercep t i s constan t a s


shown i n Fig. 6.3 , an d littl e residual heteroskedasticity remain s (se e Fig .
6.4). Th e mode l fail s constanc y test s onl y prio r t o th e larg e shoc k i n
1919-20.
Ermini an d Hendr y us e result s fro m Ermin i an d Grange r (1991 ) t o
describe th e particula r for m o f instabilit y an d heteroskedasticit y on e
would expec t i n th e mode l i n level s i f th e dat a wer e generate d b y th e
logarithmic model . Ermin i an d Grange r sho w that , i f th e dat a ar e
generated by
with time-invarian t distribution Ay , ~ IN(jU , cr 2), an d i f th e riva l mode l
is
then E(AY t) =

<5exp(Af); var(A7, ) = 0exp[(2 A + o 2 ) t ] , wher e

0 = exp (2v0 ){l - 2ex p [-(A + a2)] + exp [-(2A + a2)]}; and Y0 = exp (y0 )
is the startin g observatio n postulate d fo r the mode l i n levels.
Thus, i f th e logarithmi c mode l wer e true , th e mode l i n level s woul d
have bot h a drift , (5exp(Af) , an d variance , 0exp[(2A + cr 2 )/], exponen tially increasin g with time. Further , i n the regressio n
with A =j u + & 2/2, wher e f t an d a 2 ar e obtaine d fro m estimatin g (55) ,

1980

FIG 6.3. Recursiv e estimate s o f intercept i n log mode l

Regression wit h Integrated Variable s 19

MI (th e logarithmi c model ) encompasse s M 2 (mode l i n levels ) onl y if


<5 *0 and y = 0.
We no w appl y thei r tes t t o th e linea r mode l o f U K nationa l incom e
over th e las t century . Becaus e o f the lagge d dependen t variabl e in (54) ,
the long-ru n solutio n provide s th e estimat e f t fo r \i i n th e Ermini Hendry test, namely

Thus, A t = 0.0191?; calculat e exp(Af ) an d ente r thi s a s a n additiona l


regressor i n the linea r model . Th e empirica l outcome is

The coefficien t o n exp(Af ) i s significan t an d make s th e intercep t


insignificant. Thi s resul t confirm s th e earlie r graphica l evidenc e o n th e
inappropriateness o f th e linea r mode l agains t a log-linea r form . Finally ,
dropping the intercep t i n (57) ,

FIG 6.4. One-ste p residual s i n log model

198 Regressio

n wit h Integrated Variable s

Figure 6. 5 show s th e recursiv e estimate s o f < 5 fro m (58 ) ove r th e


sample, an d reveal s greatl y reduce d evidenc e o f paramete r non-con stancy in using the exponentia l tren d relativ e t o a n intercept .
These principle s ma y b e extende d t o decidin g whethe r i t i s the level s
or th e logarithm s o f variables that ar e co-integrated . Thus , consider tw o
1(1) processe s X t > 0 and Z t > 0 between whic h there i s a co-integratin g
relationship in levels:
Defining th e transforme d serie s x, log (Xt) an d it = log (Z,), we have
Using a Taylor serie s expansion of the logarithmi c function, w e obtai n

from whic h w e ca n se e tha t th e term s i n th e summatio n wil l declin e i n


importance a s Z, grows , sinc e b y (59 ) u t i s of fixe d variance , whil e th e
variance o f Z t i s o f O(t). Henc e w e expec t t o fin d a n equilibriu m
relation o f som e sor t amon g th e logarithm s o f variable s tha t ar e
co-integrated i n levels . Asymptotically , thi s equilibriu m relatio n i s o f a
degenerate kin d wit h th e distributio n o f x t zt collapsin g aroun d
logQ3). Thi s i s als o a testabl e predictio n o f th e hypothesi s tha t th e
random wal k mode l i n level s encompasse s th e logarithmi c model, 22
although th e tes t i s likely to hav e lo w power becaus e th e varianc e in th e
errors i s likely to persist eve n in fairl y larg e samples .
Conversely, i f we begin with a co-integrating relationship betwee n two
series whic h hav e alread y been transforme d t o logarithms,
then th e relationshi p amon g the level s of the serie s is
which implies

22
T o se e this , simpl y substitut e A r,_1 fo r Z, . Th e instabilit y o f th e rando m wal k mode l
in level s mad e a forma l tes t i n th e level s > logarithms directio n unnecessar y i n th e
Ermini-Hendry discussion , althoug h i n principle suc h a test coul d be carrie d out .

Regression wit h Integrated Variable s

199

FIG 6.5. Recursiv e estimate s o f d

or

This n o longe r ha s th e for m o f a standar d co-integratin g relationship ,


since W t kV, = V t(V~lvt k) = ry r ; whil e v , ma y remai n a stationary
process, th e erro r ter m r\ t i n th e ne w relationshi p depend s o n th e
integrated serie s V t an d i s therefor e no t stationar y i n general . N o
co-integrating relationshi p ma y therefore appear , an d a regression o f th e
form W, = kV t + r] t i s likely to displa y considerable instability .
At th e sam e time , i t shoul d b e note d that , i n eithe r o f th e abov e
examples, onl y on e o f th e logarith m an d th e leve l o f a variabl e wil l b e
an integrate d proces s (capabl e o f bein g mad e stationar y b y differen cing), althoug h stationarit y o r non-stationarit y wil l b e commo n t o bot h
representations. Th e standar d definitio n o f co-integration , whic h de scribes equilibriu m relation s amon g integrate d processes , can be legiti mately applie d t o onl y one o f the tw o cases at a time.
The fac t remains , however , tha t a co-integratin g relationshi p amon g
the level s o f variable s suggest s th e existenc e o f som e linear equilibriu m
relationship amon g the logarithm s of those sam e variables. The convers e
need no t i n general b e true .

200 Regressio

n wit h Integrate d Variable s

Appendix: Vecto r Browman Motio n


Consider th e bivariat e 1(1) dat a generatio n proces s give n by:

The DG P i n (Al ) i s a re-parameterizatio n o f a genera l bivariat e norma l


distribution fo r (Ay, , Az f ) wit h covarianc e JJCT ^ an d define s th e inte grated vector process :
when x, = (v, : z,)' an d v , = (e lt + r) 2t, 21)'- The n v , ha s non-unit error
variance matri x :

As i n Chapte r 1 , a suitably scaled functio n o f x f converge s t o a vecto r


Brownian motio n process , denote d BM(E) . W e firs t deriv e th e standardized Brownian motion b y the transform:

and s = Oi/o 2. The n m ( ha s a unit error varianc e matrix since:

Alternatively, fro m (A2 ) an d (A4) :

(A6)
Next, usin g a componen t b y componen t analysi s simila r t o tha t i n
Chapter 3 , fro m (A5) :

where B(r ) = (#i(r), B 2(r))' (denote d BM(I)) , an d th e fl,-(r ) ar e th e


standardized Wiene r processe s associate d wit h accumulatin g th e {e it}.
Further:

Regression wit h Integrated Variable s

201

These vecto r formula e ar e natura l generalization s o f th e scala r Wiene r


processes i n Chapter 3 .
Scalar function s o f vecto r 1(1 ) variable s ca n b e handle d a s follows .
Consider th e distributio n o f th e differenc e betwee n y t an d z t, namel y
ut = d'xt fo r d' = (1, -1). The n fro m (A4):

202

Regression wit h Integrated Variable s

(A10)
By direct calculatio n fro m (Al ) however ,

and W(r ) i s the Wiene r proces s associate d wit h {n^/a,,,} . B y definition,


w
t ~ it + (> ? ~ 1) 2> s o tha t cr lv W(r) = OiB^r) + (r] - \}o 2B2(r), an d
hence th e expression s i n (A10 ) an d (All ) are equal , bu t provid e
different insight s into th e behaviou r o f the scala r second moment .
Similarly, le t f = (1,0 ) s o tha t f'e t = EK/CTI , the n w e ca n deriv e a
covariance suc h as:

Returning t o th e standardize d vecto r Brownia n motion , le t


V(r) = (V^i(r) , V 2(r))' (whic h is BM(i:)) be associate d wit h the accumu lation o f {v,} . No w Vi(r) an d V 2(r) ar e no t independen t sinc e
E(vltv2t) = 0. The standardize d vecto r Brownia n motio n is
B(r) = K'V(r) where K' i s defined i n (A4). Multiplyin g out, w e have :
2(r).

(A13 )

Indeed, i f w e conditio n v 1( o n v 2t (whic h generate s 1;) an d le t Vi. 2(r)


be th e associate d "conditional " unstandardize d Wiene r process , the n

Regression wit h Integrated Variable s 20

and V 2(r) ar e independent . Becaus e lr = v 1( - r (v lr |v 2r ) =


M we see that Vj. 2 (r) = Vi(r) - riV 2(r) = aiB^r) fro m (A13) .
Finally, conside r a n expression o f the form :

Then the erro r covarianc e matri x is added on if the cross-produc t unde r


analysis i s a contemporaneou s rathe r tha n a lagge d on e (se e th e
appendix t o Chapte r 7 fo r a n extension) . Phillip s an d Durlau f (1986 )
and Phillip s (19886) provide proofs and generalizations.

Co-integration in Individua l
Equations
We firs t examin e method s o f testin g fo r co-integratio n vi a stati c
regressions, an d provid e simulatio n estimate s o f th e uppe r percen tage point s o f th e distribution s o f statistic s use d i n th e tests . Next ,
we look a t th e propertie s o f the estimator s derive d fro m suc h stati c
regressions. I n particular , w e focu s o n th e finite-sampl e biase s i n
the estimate s o f co-integratin g vector s an d th e power s o f test s t o
detect co-integration . Finally , w e conside r modifie d estimator s an d
dynamic models . I n Chapte r 8 , system s method s o f estimatin g
co-integrating relation s wil l be considered .
The previou s chapte r focuse d o n th e propertie s o f co-integrate d pro cesses an d th e implication s o f modellin g wit h co-integrate d variables .
We hav e discusse d th e 'super-consistency ' o f th e coefficien t estimate s i n
the static o r co-integratin g regression , balance d an d unbalance d regres sions, an d th e distribution s o f th e statistic s commonl y use d t o tes t fo r
the significanc e of regression coefficients .
The tw o issues o f being abl e t o tes t fo r th e existenc e o f an equilibriu m
relationship amon g variable s an d t o accuratel y estimat e suc h a relation ship ar e complementary . Indeed , a s demonstrate d i n discussin g spuriou s
regressions i n Chapte r 3 , stati c regression s amon g integrate d serie s ar e
meaningful i f an d onl y if they involve co-integrate d variables . Thus , i t i s
of interes t t o discover , first , ho w wel l th e mos t frequentl y use d test s of
co-integration perform , an d second , ho w accuratel y th e correspondin g
equilibrium relationship i s estimated.
The objectiv e o f thi s chapte r i s t o develo p test s applicabl e t o singl e
equations whic h ma y b e use d t o detec t a long-ter m relationshi p o f th e
form discusse d an d exploite d i n earlie r chapters . W e als o attemp t t o
formulate som e recommendation s fo r efficien t estimatio n o f co-integrat ing parameter s an d testin g fo r co-integratio n i n finit e samples . I t wil l
become clea r fro m th e discussio n that th e asymptotic propertie s o f static
regression estimator s ar e ofte n rathe r differen t fro m thei r behaviou r i n
empirically relevan t sampl e sizes . Further , lac k o f wea k exogeneit y du e
to co-integratin g vector s enterin g severa l equation s als o alter s finit e
sample behaviour . I t therefor e become s important , i n th e fac e o f dat a

Co-integration i n Individua l Equation s 20

limitations, t o conside r alternativ e method s which do not rel y exclusively


on single-equatio n stati c regressions . Thes e ar e th e topi c o f Section s
7-9.

7.1. Estimatin g a Single Co-integratin g Vector


Consider th e proble m o f estimatin g th e singl e co-integratin g vector a
using the stati c mode l
We conduc t th e discussio n i n thi s an d th e followin g section s i n thre e
stages. First , w e elaborat e upo n th e theorem s presente d i n Chapte r 5
and develo p a n intuitiv e discussio n o f stati c regressions . Next , w e
proceed t o th e issu e of testing for co-integratio n using static regressions .
The testin g an d th e parameterizatio n o f the equilibriu m relationship ar e
seen t o b e complementar y exercises . Finally , w e discus s simulatio n
studies whic h cas t ligh t o n th e behaviour , i n finit e samples , o f th e
static-regression estimator s an d th e power s o f th e test s fo r co-integra tion.
In orde r t o kee p th e analysi s a s tractabl e a s possible , w e wil l restric t
ourselves to considering CI(1,1 ) systems . Thus , suppos e tha t all the
elements i n x, are 1(1). I n general , then , an y linear combination 6'x t o f
the element s o f x ( wil l produc e a n 1(1 ) serie s u t. The onl y exception , if
one exists , i s a co-integrating vector a suc h tha t 'x r i s 1(0).1 Ordinar y
least square s minimize s th e residua l varianc e o f x t , an d therefor e a
simple OL S regressio n o f th e for m (1 ) shoul d provid e a n excellen t
approximation t o th e tru e co-integratin g vecto r whe n on e exists , a s
discussed i n Chapte r 5 .
The simplicit y o f thi s metho d an d th e eleganc e o f th e theoretica l
argument hel p explai n th e popularit y o f suc h regressions . Al l tha t i s
needed t o parameterize a long-run equilibriu m relationshi p amon g a set
of variable s i s a stati c OL S regression . Thi s regressio n i s performe d a s
the firs t ste p o f th e Engle-Grange r two-ste p estimator 2 an d serve s a s a
preliminary chec k o n th e equilibriu m relationship s postulate d b y eco nomic theory to exist amon g the variables.
1
Initiall y w e focu s o n th e cas e wher e (apar t fro m normalization ) th e co-integratin g
vector a i s uniqu e an d i s therefor e o f dimensio n n x 1 . A s th e analysi s i n Ch . 5 showe d
(especially th e discussio n o f th e Grange r Representatio n Theorem) , thi s i s clearl y a
restrictive assumptio n t o make . I n general , ther e wil l exis t r co-integratin g vectors ,
O^s r s n 1, an d whe n gathere d i n a n array , th e matri x a wil l b e o f orde r n x r . Th e
problem of estimatin g co-integratin g vector s i n system s is considered i n Ch . 8 .
2
Th e two-ste p estimato r an d it s asymptoti c propertie s ar e discusse d i n Ch . 5 . Th e
general cas e i s derived b y Engle an d Grange r (1987 : 262, Theorem 2) .

206 Co-integratio

n i n Individua l Equation s

However, ther e ar e reason s fo r preferrin g alternative s t o th e simpl e


static regressio n in sample s o f the siz e typica l i n economics. This chapte r
will conside r dynami c regressio n method s an d modifie d estimators .
These technique s hel p to reduc e or eliminat e source s of finite-sampl e
biases whic h aris e fro m stati c estimation , an d whic h ca n b e ver y
substantial i n practice.

7.2. Test s fo r Co-integration i n a Single Equatio n


The simples t test s fo r co-integratio n propose d b y Engl e an d Granger ,
test fo r th e existenc e o f a uni t roo t i n th e residual s o f th e stati c
regression. Th e method s o f Chapte r 4 ca n therefor e b e followe d wit h
minor modifications . W e firs t conside r th e bivariat e case , wher e

*t = (yt,z ty.

The modification s are necessar y because, whil e the test s for uni t root s
discussed i n Chapte r 4 us e th e origina l series , sa y {w t}, th e co-integra tion test s ar e base d o n th e estimated, o r derived, residual series ,

Hence, a s th e co-integratin g regressio n estimate s y 3 before th e tes t i s


performed, th e co-integratio n tes t i s not simpl y a standar d test fo r a unit
root i n the series u t.
If / J wer e know n i n th e exampl e presente d i n Chapte r 5 (give n b y
equations (5.1)-(5.6)) , th e nul l hypothesi s o f n o co-integration , cor responding t o p equa l t o 1 , coul d b e teste d b y constructin g th e serie s
ut = y t [3zt, treating thi s series a s the on e tha t ha s th e uni t roo t unde r
the null , an d usin g the Dickey-Fulle r tables . However , i f / ? is unknown,
it mus t b e estimate d (e.g. ) fro m th e stati c regressio n o f y t o n z t- Th e
test is based on the nul l hypothesis of no co-integration , with the critica l
values fo r th e tes t statistic s calculate d t o ensur e th e appropriat e prob ability of rejection of th e nul l hypothesis.
Some o f th e mos t widel y use d test s o f co-integratio n hav e bee n th e
co-integrating regression Durbin-Watson tes t (CRDW) , th e Dickey Fuller tes t (DF) , an d the augmente d Dickey-Fuller test (ADF) .
The CRDW , suggeste d b y Sarga n an d Bhargav a (1983) , i s compute d
in exactl y the sam e fashion as the usua l DW statisti c and i s given by

where u t denotes the OLS residual fro m the co-integrating regression .


The nul l hypothesi s bein g tested , usin g th e CRD W statistic , i s o f a
single uni t root : tha t is , u t i s a rando m walk . Thi s i s t o b e contraste d

Co-integration i n Individual Equations 20

with th e conventiona l us e mad e o f thi s statisti c i n standar d regressio n


analysis where the nul l of no first-order autocorrelation i s tested.
The us e of this statistic is problematic i n the presen t setting . First , th e
test statisti c fo r co-integration depend s upo n th e numbe r of regressors in
the co-integratin g equation and , mor e generally , o n th e data-generatio n
process an d henc e o n th e precis e dat a matrix . Onl y bound s o n th e
critical value s ar e available. 3 Second , th e bound s diverg e a s the numbe r
of regressors i s increased , an d eventuall y ceas e t o hav e an y practica l
value fo r th e purpose s o f inference . Finally , th e statisti c assume s th e
null wher e u t i s a rando m walk , an d th e alternativ e wher e u t i s a
stationary first-orde r autoregressiv e process . I n suc h circumstances ,
Bhargava (1986 ) demonstrate s tha t i t ha s excellen t powe r propertie s
relative t o alternativ e tests . However , th e tabulate d bound s ar e no t
correct i f ther e i s higher-orde r residua l autocorrelation , a s wil l com monly occur . Exac t inference i s therefor e possibl e i f an d onl y i f eac h
regression exercis e i s augmented b y the us e o f algorithms such as that of
Imhof (1961 ) t o cpmput e th e relevan t critica l values . I n principle , i t i s
possible fo r simulatio n method s t o b e use d t o comput e th e critica l
values. However , i n practic e thi s implie s a proliferatio n o f table s o f
different critica l value s fo r differen t data-generatio n processe s an d
simulation exercises .
As w e hav e argue d previously , th e onl y hop e fo r uncomplicate d
inference lie s in generatin g a robus t se t o f critica l values. Robustnes s i s
defined b y lac k o f sensitivit y o f th e critica l value s t o a wid e rang e o f
changes t o th e data-generatio n process . Test s that ar e simila r for a wide
range o f nuisanc e parameters woul d ensur e thi s non-sensitivity . In othe r
words, i t i s importan t t o hav e a se t o f tables tha t coul d b e use d
regardless o f th e precis e propertie s o f th e DGP , a s lon g a s th e
regression mode l i s parameterized t o satisf y certai n basi c properties suc h
as balance . Test s o f co-integratio n base d no t directl y o n th e residual s
but o n th e regressio n coefficient s themselves , migh t have highe r power .
As a n alternativ e method , on e coul d conside r usin g non-parametri c
corrections o f the sor t describe d i n Chapte r 4 to conduc t inferenc e usin g
only a smal l se t o f tables , fo r a rang e o f possibl e data-generatio n
processes. Example s o f bot h thes e procedure s wil l b e presente d i n du e
course.
Similar qualification s appl y to th e us e o f the D F statisti c and less so to
the ADF , i f the numbe r o f Aw r _, term s appearin g i n the data-generation
process coincide s wit h thos e use d i n th e implementatio n o f th e test .
Since th e numbe r o f suc h term s appearin g i n th e DG P i s unknown , it
seems safes t t o over-specif y th e AD F regression , an d us e a s man y
3
Whil e th e CRD W statisti c doe s no t hav e a limitin g distributio n wit h a non-zer o
variance, T(CRDW ) = J~ l ^ = 2(u, - u,^) 2/T-2 f= ir 2 does .

208 Co-integratio

n i n Individua l Equations

lagged term s a s degrees-of-freedo m restrictions wil l allow . O f course , i n


practice, th e choic e o f the la g structure i n ADF test s ma y be a d hoc an d
different result s ca n b e obtaine d b y changin g th e lengt h o f th e auto regression. I n particular , th e powe r o f th e tes t ma y b e affecte d ad versely.
Table 7. 1 provides , fo r illustratio n ( a mor e detaile d descriptio n o f
applicable critica l value s wil l b e give n below) , th e 5 pe r cen t critica l
values o f th e DW , ADF(l) , an d ADF(4 ) tests , fo r thre e sampl e size s
(T = 50, 100 , 200) . Th e data-generatio n process i s a n -variat e rando m
walk wit h n less tha n o r equa l to 5 , as in Engle an d Yo o (1987) .
It i s importan t t o emphasiz e that , i n commo n wit h th e test s fo r uni t
roots, test s fo r co-integratio n ma y lac k powe r t o discriminat e betwee n
unit root s an d borderline-stationar y processes. I n a small-scal e stud y of
the powe r propertie s o f thi s test , Engl e an d Grange r (1987 ) sho w that ,
when th e data-generatio n proces s o f th e disturbance s o f the co-integrat ing equatio n i s a n AR(1 ) proces s wit h th e autoregressiv e paramete r
equal t o 0.9 , th e power s o f the CRDW , DF , an d AD F test s a t th e 5 per
cent critica l value s ar e 20 , 15 , an d 1 1 per cen t respectively . Whe n th e
DGP i s altered t o b e a more genera l AR(1 ) proces s wit h a unit root , th e
power o f th e AD F tes t become s 6 0 per cent , dominatin g strongl y bot h
the power s of the CRD W an d D F test s a t the 5 per cen t level.
Engle an d Grange r (1987 ) emphasiz e th e robustnes s t o change s in th e
data-generation proces s o f th e AD F critica l values . Th e discussio n i n
Chapter 4 help s t o explai n thi s result . Phillip s an d Ouliari s (1990 ) sho w
that th e limitin g distribution of the AD F tes t statisti c is the sam e a s tha t
of th e non-parametricall y adjuste d D F statistic . Becaus e th e limitin g
distribution o f th e latte r statisti c i s invarian t t o nuisanc e parameter s i n
the processe s generatin g th e dat a series , th e resul t follows . Eac h tes t
manages t o correc t fo r variou s features that ma y be presen t i n the DGP ,
in on e cas e b y capturin g th e effect s i n a regressio n model , i n th e othe r
by implicitl y adjusting th e critica l values.
Phillips an d Ouliari s (1990 ) deriv e th e distribution s of severa l test s o f
co-integration. W e clos e thi s sectio n b y presentin g a summar y o f th e
theoretical result s presente d there . The y conside r th e linea r co-integrating regressions :

and

where y, an d z t satisf y (multivariate ) unit-roo t processes . Th e asymp totic distribution s o f a numbe r o f residual-base d test s ar e discussed ,
from whic h we wil l conside r fiv e (thi s analysi s is of cours e relate d t o th e

Co-integration i n Individual Equations 20

TABLE 7.1. Fiv e pe r cen t critica l value s fo r th e co-integratio n test s


n

CRDW

ADF(l)

ADF(4)

50
100
200

0.72
0.38
0.20

-3.43
-3.38
-3.37

-3.29
-3.17
-3.25

50
100
200

0.89
0.48
0.25

-3.82
-3.76
-3.74

-3.75
-3.62
-3.78

50
100
200

1.05
0.58
0.30

-4.18
-4.12
-4.11

-3.98
-4.02
-4.13

50
100
200

1.19
0.68
0.35

-4.51
-4.48
-4.42

-4.15
-4.36
-4.43

Source: Th e CRD W critica l value s (se e Sarga n an d Bhargav a 1983 ) an d th e


ADF(l) critica l value s were generate d b y PC-NAIV E usin g 10,00 0 replications .
The ADF(4 ) critica l value s hav e bee n take n fro m Engl e an d Yo o (1987) . Th e
ADF critica l value s ar e compute d b y replicatin g th e regressio n AM , = pu,-i +
2f =1 <;A,_i + v, fo r k = 1,4, followin g estimation o f /3 i n (2 ) augmente d b y a
constant.

analysis of unit-root test s foun d i n Chapter 4) :


(i) Dickey-Fulle r p
DF(p)=
Tp , wher e p i s obtaine d fro m th e regressio n
AM, = pu,-i + fj t',
(ii) Dickey-Fulle r t (DF )
DF(f) = t p= 0 i n the regressio n AM , = pw,_ i + r) t;
(iii) augmente d Dickey-Fulle r (ADF )

(iv) Phillip s (1987a ) Z

where

210 Co-integratio

n i n Individual Equation s

where a)((j) = 1 /( + I)" 1 fo r som e choic e o f la g window , an d


p an d th e fj t ar e derive d from th e D F regressio n give n in (i) ;

with 57 ^ an d S 2 a s in (iv ) an d p an d th e j) , ar e agai n derive d fro m


the D F regressio n give n in (i).
Some propertie s o f thes e test s ma y no w b e enumerated . First , unde r
the maintaine d hypothesi s o f n o co-integration , th e distribution s o f Z p
and Z t (p=Q), fo r an y genera l specificatio n o f th e erro r proces s {r\ t}, ar e
the sam e a s thos e o f DF(p ) an d DF(f ) respectively , whe n th e distribu tions ar e compute d unde r th e restrictiv e assumptio n o f II D errors . Th e
distributions o f Z p an d Z f ( p= 0 ) ar e independen t o f nuisanc e parameter s
(leading t o asymptoticall y similar tests) , althoug h the y do depen d o n th e
number o f regressors i n the system ; thus, th e non-parametri c correction s
serve th e sam e rol e i n th e contex t o f co-integratin g regression s a s the y
do i n unit-roo t tests : the y eliminat e nuisanc e parameter s an d enabl e th e
use o f a standar d se t o f Dickey-Fulle r tables . Correction s mus t stil l b e
made fo r si/e i n the origina l Dickey-Fulle r table s t o preven t over-rejec tion o f th e nul l hypothesis . Som e o f th e table s appea r i n Phillip s an d
Ouliaris (1990) .
Second, th e AD F tes t an d Z ( ( p= 0 ) hav e th e sam e asymptoti c distribu tion. Thi s i s a n interestin g resul t becaus e i t re-emphasize s th e tw o
alternative bu t equivalen t way s o f takin g accoun t o f nuisanc e para meters. I n orde r t o us e a standard set o f tables, on e eithe r augment s the
Dickey-Fuller regressio n o r adjusts , non-parametrically , th e unaug mented Dickey-Fulle r statistic .
Third, i f the statistic s ar e base d o n a regressio n wit h a fitte d intercep t
or tim e trend , th e interpretatio n o f th e test s i s not altere d althoug h th e
asymptotic critica l values change . Thi s issu e i s considered i n mor e detai l
in th e nex t section .
Fourth, i f th e non-parametricall y adjuste d statistic s wer e constructe d
by imposin g p = 0 an d therefor e usin g th e v t, where v l = ut ut-i, in
the tes t statisti c instea d o f th e fj t, th e statistic s woul d hav e th e sam e
asymptotic distributio n unde r th e null ; however , a s show n b y Phillip s
and Ouliari s (1990) , thes e woul d have inferio r powe r properties .
Finally, a n alternativ e clas s o f test s o f co-integratio n no t base d o n
regression residual s ha s bee n propose d i n th e literature . Prominen t
among the test s ar e thos e du e t o Johanse n (1988 ) an d Stoc k an d Watso n
(1988ft). Thes e test s als o appl y t o multivariat e system s o f equation s an d
have thei r mos t natura l use s whe n investigatin g multiple co-integratin g

Co-integration i n Individual Equations 21

vectors. A discussio n o f th e Johanse n maximu m likelihoo d procedur e


appears i n the nex t chapter .

7.3. Respons e Surface s fo r Critical Values


When compare d wit h the correspondin g critica l values for unit-root test s
given i n Chapte r 4 , th e critica l value s i n Tabl e 7. 1 are illustrativ e of th e
changes in test level s implied b y the presenc e o f estimated parameter s in
the relationshi p yieldin g th e serie s t o b e teste d fo r stationarity . I n
themselves, however , they cover onl y a limited set of cases. Othe r table s
are provide d i n Engl e an d Yo o (1987 ) an d Phillip s an d Ouliari s (1990) .
MacKinnon (1991 ) provide s result s o f a mor e extensiv e se t o f simula tions, summarize d i n response surfaces: tha t is , critica l value s fo r
particular test s ar e give n a s a se t o f parameter s o f a n equatio n relatin g
the exac t critica l valu e t o a constan t ter m an d term s involvin g sampl e
size, fro m whic h a critica l valu e fo r an y give n sampl e siz e ca n b e
approximated. W e will describe th e latte r results .
Dickey-Fuller (o r augmente d Dickey-Fuller ) test s fo r uni t root s o r
co-integration ca n b e considere d withi n a commo n framework . Conside r
n tim e serie s give n b y y lt, y 2t, . ., y nt, n ^ 1 , t = 1, 2 , . . ., T . I f
n 1, w e ar e testin g for a uni t root i n a singl e series , an d t o establis h a
uniform notation , w e defin e th e tim e serie s unde r tes t a s
{JJ=i = {yit}?=i- I f n > 1, we ar e firs t intereste d i n obtainin g a se t of
residuals fro m th e estimate d relationshi p amon g th e n variables , an d s o
begin with the (static ) co-integrating regression,4

Let y t = (y\t, y 2t, -, y nt) b e th e vecto r o f measurement s a t tim e t o n


the n variables . Th e serie s t o b e teste d fo r stationarit y the n become s
ut = [I :-/J']y r , wher e J3 ' i s th e vecto r o f estimate d parameters . Subjec t
to th e relevanc e o f th e normalize d variable , th e orderin g o f variables in
the co-integratin g regression wil l not affec t th e asymptoti c distribution of
the tes t statistic , althoug h i n finit e sample s th e valu e wil l depen d upo n
which variabl e i s th e regressand . Th e nul l hypothesi s o f n o co-integra tion implie s tha t u t i s 1(1).
We test thi s null using the test s considered i n Chapter 4 . I n particular,
the augmente d Dickey-Fuller test take s the for m o f one o f the followin g
4
Below , th e parameter s <5
respectively.

an d t> [ wil l denot e coefficient s o n a constan t an d trend ,

212 Co-integratio

n i n Individua l Equations

models, with chosen t o eliminat e any autocorrelatio n i n th e residuals :

For n s = 2, s o tha t ' a co-integratin g regressio n precede s th e us e o f on e


of thes e models, 5 mode l (6 ) coul d als o b e use d wit h constant an d trend ,
adding <5 0 o r 6 0 + d^t t o th e regression . Co-integratio n test s eithe r
include a constan t i n (6) , o r includ e a constan t i n th e regressio n mode l
(76). I f a constant i s added t o (6 ) an d mode l (la) i s used, the strateg y is
equivalent t o omittin g th e constan t ter m an d usin g mode l (lb); i f
constant an d tren d ar e adde d t o (6 ) an d mode l (la) i s used, the n thi s is
equivalent t o usin g mode l (7c) , an d s o on . Th e mode l typ e referre d t o
in Tabl e 7. 2 describe s thi s presenc e o r absenc e o f constan t an d tren d i n
the models . A tes t wit h constan t bu t n o trend , fo r example , implie s
model (lb) wit h n o constan t i n th e co-integratin g regressio n (6) , o r a
constant i n (6 ) used wit h mode l (la).
The critica l values , o r uppe r quantile s o f th e distributions , ca n b e
calculated fro m th e parameter s o f Table 7.2 using the relatio n
where C(p) i s th e p pe r cen t upper-quantil e estimate . Th e parameter s
were estimate d fro m regressio n ove r a se t o f individua l simulatio n
results covering , fo r mos t value s o f n , 4 0 sets o f parameter s fo r eac h o f
15 sampl e sizes . Mode l (8 ) (wit h a n adde d erro r term ) wa s foun d t o
represent wel l th e variou s critica l value s tha t emerge d fro m th e man y
individual experiments ; bu t othe r model s coul d i n principl e hav e bee n
used t o fi t a response surfac e t o th e results ; see MacKinno n (1991 ) fo r a
description o f the experimenta l technique , includin g the feasibl e general ized leas t square s techniqu e b y whic h estimatio n o f th e fina l respons e
surface mode l wa s undertaken, t o allo w for heteroskedasticit y in (8) .
As a n example , th e estimate d 1 pe r cen t critica l valu e fo r 15 0
observations, n = 6 an d constan t + trend include d in th e mode l i s given
5

I f n 3 = 2 bu t th e value s o f th e parameter s i n / J ar e known , the n th e residual s


u, = [1: P']y, ca n b e constructe d withou t a co-integratin g regression, an d th e tes t statisti c
is interprete d a s i f n wer e equa l t o unity . I n thi s cas e w e hav e on e know n serie s o f
observations to b e teste d fo r stationarity , not a series constructed o n th e basi s of estimated
parameters. Unde r th e nul l of no co-integration , however , (6 ) is a spurious regression s o fi
has a non-degenerat e limitin g distribution , whic h induce s differen t critical value s fro m D F
tests.

Co-integration in Individual Equation s 21

TABLE 7.2. Respons e surface s fo r critica l values of co-integration tests


n

Model

Point (% )

000

SE

0i

02

No constant ,
no tren d
Constant,
no trend

Constant
+ tren d

Constant,
no tren d

Constant
+ tren d

Constant,
no trend

Constant
+ tren d

Constant,
no tren d

Constant
+ tren d

Constant,
no trend

Constant
+ tren d

Constant,
no tren d

Constant
+ tren d

-2.5658
-1.9393
-1.6156
-3.4336
-2.8621
-2.5671
-3.9638
-3.4126
-3.1279
-3.9001
-3.3377
-3.0462
-4.3266
-3.7809
-3.4959
-4.2981
-3.7429
-3.4518
-4.6676
-4.1193
-3.8344
-4.6493
-4.1000
-3.8110
-4.9695
-4.4294
-4.1474
-4.9587
-4.4185
-4.1327
-5.2497
-4.7154
-4.4345
-5.2400
-4.7048
-4.4242
-5.5127
-4.9767
-4.6999

(0.0023)
(0.0008)
(0.0007)
(0.0024)
(0.0011)
(0.0009)
(0.0019)
(0.0012)
(0.0009)
(0.0022)
(0.0012)
(0.0009)
(0.0022)
(0.0013)
(0.0009)
(0.0023)
(0.0012)
(0.0010)
(0.0022)
(0.0011)
(0.0009)
(0.0023)
(0.0012)
(0.0009)
(0.0021)
(0.0012)
(0.0010)
(0.0026)
(0.0013)
(0.0009)
(0.0024)
(0.0013)
(0.0010)
(0.0029)
(0.0018)
(0.0010)
(0.0033)
(0.0017)
(0.0011)

-1.960
-0.398
-0.181
-5.999
-2.738
-1.438
-8.353
-4.039
-2.418
-10.534
-5.967
-4.069
-15.531
-9.421
-7.203
-13.790
-8.352
-6.241
-18.492
-12.024
-9.188
-17.188
-10.745
-8.317
-22.504
-14.501
-11.165
-22.140
-13.641
-10.638
-26.606
-17.432
-13.654
-26.278
-17.120
-13.347
-30.735
-20.883
-16.445

-10.04

1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10
1
5
10

0.0
0.0
-29.25
-8.36
-4.48
-47.44
-17.83
-7.58
-30.03
-8.98
-5.73
-34.03
-15.06
-4.01
-46.37
-13.41
-2.79
-49.35
-13.13
-4.85
-59.20
-21.57
-5.19
-50.22
-19.54
-9.88
-37.29
-21.16
-5.48
-49.56
-16.50
-5.77
-41.65
-11.17
0.0
-52.50
-9.05
0.0

Source: MacKinno n (1991) . W e ar e gratefu l t o Jame s MacKinno n fo r permis sion t o reproduce thi s table.

214 Co-integratio

n i n Individua l Equations

by -5.512 7 - 30.735/15 0 - 52.50/150 2 = -5.7199. Estimate d standar d


errors fo r finite-sampl e critical values such a s this are generall y less than
those reporte d fo r $, , althoug h MacKinno n argue s tha t thes e ma y
understate th e tru e standar d errors b y roughly a factor o f 2.

7.4. Finite-sampl e Biase s in OL S Estimates


In th e nex t chapte r w e will consider system s estimation o f co-integrating
vectors. Here , w e wil l examin e on e o f th e mai n reason s fo r usin g such
an estimatio n strategy : th e larg e finite-sampl e biase s tha t ca n aris e i n
static OL S estimate s o f co-integratin g vectors o r parameters . Whil e such
estimates ar e super-consisten t (T-consistent) , Mont e Carl o experiment s
nonetheless sugges t tha t a larg e numbe r o f observation s ma y b e neces sary before the biase s becom e smal l (see Banerjee e t al, 1986) .
Some investigator s have suggeste d that w e ma y explai n th e finding s of
such Mont e Carl o studie s b y th e fac t tha t th e particula r data-generation
processes considere d wer e to o specific , o r possesse d som e specia l
properties, whic h meant tha t th e probabilit y o f findin g larg e biase s wa s
unusually high . Thi s poin t i s partl y valid , i n tha t eac h DG P ca n b e
regarded a s specifi c i n som e way . Moreover, i t i s certainl y tru e that ,
with sufficien t patience , th e exac t expression s fo r th e stati c biases coul d
be worke d ou t fo r an y data-generatio n process , a s function s o f th e
parameters o f th e DGP . However, th e poin t i s no t tha t som e o f thes e
data-generation processe s ar e mor e likel y t o lea d t o hig h biase s whil e
others wil l giv e lowe r values , bu t rathe r that , i n th e absenc e o f
information o n th e data-generatio n process , som e metho d othe r tha n
static regression ma y giv e superio r estimate s o f th e co-integratin g vector
or test s wit h highe r powers . I n particular , dynami c regressions ma y b e
more robus t t o a rang e o f data-generatio n processes . Eve n wher e stati c
regressions behav e pooil y i n finit e samples , dynami c regression s ma y
provide u s with quit e goo d estimates . Sinc e the investigato r is in genera l
unaware o f the particula r properties o f the data-generatio n process (suc h
as whethe r i t wil l ten d t o lea d t o lo w biase s o r hig h biases) , i t make s
sense to allo w the regressio n to be as flexible as possible. Robustnes s in
the sens e o f adequate performanc e fo r a wide range o f underlying DGPs
is an importan t property .
Most o f th e evidenc e tha t ha s bee n presente d i n favou r o f th e
existence o f finite-sampl e biases ha s com e i n th e for m o f Mont e Carl o
experiments; we present tw o investigation s of the bia s properties of OL S
estimators. B y specifyin g th e data-generatio n process , Mont e Carl o
experiments provid e complet e knowledg e an d contro l o f th e feature s of
interest; i n particular , i n th e presen t case , w e kno w th e co-integratin g

Co-integration i n Individua l Equations 21

parameter. Performin g regressions o n th e artificia l data, whil e notionally


ignoring th e data-generatio n process , put s u s i n th e positio n o f th e
empirical investigator ; however , w e are the n able t o compar e ou r result s
with th e tru e parameters , fo r a set o f chosen exampl e cases .
The firs t experimen t considere d use s th e data-generatio n proces s

The vecto r (e lf , e 2t)' i s distribute d identicall y an d independentl y a s a


bivariate norma l with

The structur e o f th e DG P i s th e sam e a s tha t o f Engl e an d Grange r


(1987). Three case s o f interest ma y be distinguished . In cas e A, p\\ < 1,
|p2| < 1 so tha t bot h z an d y ar e 1(0 ) variables . I n cas e B , PI = p2 = 1
so tha t bot h variable s ar e 1(1 ) an d ar e no t co-integrated . W e wil l
concentrate o n cas e C , wher e P I = 1 , p 2| < 1 , so tha t th e variable s ar e
still 1(1 ) bu t ar e no w co-integrated . I n thi s las t case , th e co-integratin g
coefficient i s -2 .
For cas e C , th e nul l hypothesi s o f a uni t root i n the erro r dynamic s in
(10) i s false. Interes t therefor e lie s i n investigatin g the usefulnes s of th e
estimate o f th e co-integratin g parameter i n th e stati c regression o f y t o n
zt an d als o i n checkin g th e abilit y o f unit-roo t test s t o rejec t th e fals e
null o f non-stationarity . W e us e 500 0 replication s o n th e paramete r
space s x T X PJ; s = a^ja-i = (16, 8 , 4 , 2 , 1 , \), T = (25 , 50 , 100 , 200) ,
and p 2 = (0.6 , 0.8 , 0.9 ) givin g ris e t o 7 2 experiments. Th e rang e o f th e
ratio o f standar d deviations , th e significanc e o f whic h w e wil l describ e
below, i s ver y large . Obviously , i t woul d b e difficul t t o distinguis h
between o 2u an d o 2v whe n Oi an d T ar e smal l an d o 2 an d p 2 ar e large ;
for larg e value s o f s , OL S essentiall y pick s u p equatio n (9 ) instea d o f
equation (10) .
The proble m of finite-sampl e biase s is illustrate d in the figures .
Figures 7.1(a)-7.4(a ) refe r t o th e simples t for m o f stati c mode l whic h
contains n o constant , whil e Figs . 7.1(>)-7.4(> ) pertai n t o stati c model s
which d o contai n constan t terms . Th e figure s sho w th e relationshi p
between bia s an d sampl e siz e fo r fou r differen t value s o f th e rati o o f
standard deviations . Th e horizonta l scal e i s implicitly Iog 2 (T/25) s o tha t
the fou r point s show n ar e equidistant . Firs t o f all , i t i s eviden t tha t th e
bias doe s no t declin e a t rat e T . Fo r example , i n Fig . 7.4(a )
(ol/o2 = 0.5), wit h p 2 = 0.6, th e bia s a t T = 2 5 i s 0.45 , a t T = 50 is
0.32, a t T = 100 i s 0.21 , an d a t T = 200 i s 0.13 . Thus , a n eightfol d
increase i n sampl e siz e reduce s th e bia s b y a facto r o f approximatel y

216

Co-integration in Individual Equations

Sample size

Fio7.1(a). N o constant in model, estimate d bias v. sample size, s = 16

Sample size
Fio7.1(&). Constan t i n model, estimate d bias v . sampl e size, s = 16
3.5. A s anothe r example , w e se e i n Fig . 7.2(a ) (01/02 = 4), wit h
p2 = 0.6, th e biase s a t th e sam e se t o f sampl e size s ar e 0.017 , 0.010 ,
0.005, 0.0026. 6 Her e a n eightfol d increas e i n sampl e siz e reduce s th e
6

Thes e number s ar e take n fro m th e experimenta l outpu t rathe r tha n rea d fro m th e
figures. Th e standar d erro r o f th e smalles t o f these number s i s roughly 5 x 10~ 5.

Co-integration i n Individual Equation s

217

Sample size

Fio7.2(a). N o constant in model, estimate d bias v. sampl e size , s = 4

Sample siz e

FIG 7.2(6). Constan t in model, estimate d bias v . sampl e size , s = 4

bias b y a facto r o f 6.5 . Usin g a standard-deviation ratio o f 4 again but a


value o f p 2 = 0.9, the biase s ar e 0.04 , 0.024, 0.014, an d 0.008 , a fivefol d
decrease i n bias . Th e rat e o f declin e o f th e bia s i s alway s faster tha n
but no t a s fast a s T fo r sampl e sizes up t o 200.
Second, th e biase s increas e uniforml y i n pi an d decreas e uniforml y i n

Co-integration i n Individual Equation s

Sample si/.e

Fio7.3(a). N o constant i n model, estimated bia s v . sampl e size, s = I

Sample size

FIG 7.3(6). Constan t i n model, estimate d bia s v . sample size .

01/02- T o understan d this , we can rewrite (9 ) and (10 ) t o ge t

Co-integration in Individual Equations

219

Sample size

Fio7.4(a). N o constant in model, estimate d bia s v . sampl e size , s = 0.5

Sample size

Fio7.4(b). Constan t i n model, estimate d bia s v . sampl e size, s = 0.5

Since p i = 1 , {v, } i s a rando m wal k an d therefor e asymptoticall y


dominates {<} . Henc e th e co-integratin g paramete r o f 2 . I n finit e
samples th e regressio n wil l com e close r t o revealin g thi s long-ru n
relationship i f th e varianc e o f u, i s smal l relativ e t o tha t o f v t. Recal l

220 Co-integratio

n i n Individua l Equations

that b y equatio n (10) , u t i s th e discrepanc y fro m thi s long-ru n relation ship. Smalle r value s o f p 2 an d smalle r value s o f o 2 (large r value s o f
Oi/o2) mak e the varianc e of u t relativel y small , and so we obtai n smalle r
biases a s p 2 fall s o r a s 0i/o 2 rises .
The fac t tha t thes e biase s d o disappea r les s quickl y tha n T, an d ma y
remain substantia l fo r sampl e size s larg e relativ e t o man y foun d i n
economics, suggest s tha t th e result s fro m pur e stati c model s mus t b e
treated wit h caution . W e wil l late r examin e way s i n whic h w e ca n
improve upo n simpl e stati c estimatio n eithe r b y includin g dynami c
elements, adjustin g th e result s o f th e stati c model , o r estimatin g a
system o f equations .
Finally, th e biase s ar e strongl y positivel y correlate d wit h ( 1 R2),
which indicate s tha t co-integratin g regression s wit h value s o f R 2 wel l
below unit y should b e viewe d with caution. 7 However , i n th e contex t of
multivariate regressions , a high value of R 2 i s not sufficien t t o guarante e
that th e biase s ar e small . Thi s i s because th e R 2 o f a n equatio n canno t
fall whe n a n additiona l variabl e i s adde d t o it . Thus , th e inferenc e tha t
high value s o f the R 2 impl y low biases, especially wher e th e forme r may
have bee n achieve d b y a n a d ho c additio n o f regressors , i s no t valid .
Banerjee e t al. (1986 ) explor e th e relationshi p betwee n bia s an d
(1 - R 2} i n more detail .
It i s usefu l t o conside r a n informa l explanatio n fo r th e existenc e o f
biases i n stati c regressions . Th e effec t o f usin g stati c regression s t o
estimate th e co-integratin g slope / ? is to allo w th e residua l u t t o captur e
all th e dynami c adjustmen t terms . Accordin g t o th e super-consistenc y
theorem, thi s i s certainl y permissibl e asymptotically. I t i s importan t t o
emphasize tha t th e proble m w e ar e discussin g her e i s strictl y a finite sample one ; the omissio n o f th e dynamic s ma y b e justifie d asymptotic ally b y observin g that , a s the y ar e o f a lowe r orde r o f magnitud e tha n
the non-stationar y term s i n th e regression , the y ma y b e ignore d i n th e
limit. However , th e omitte d dynamics , despit e bein g o f a lower orde r o f
magnitude, ca n matte r considerabl y i n determinin g biase s eve n i n fairl y
large bu t finit e samples. 8 Henc e i t seem s appropriat e t o pa y attentio n t o
modelling th e omitte d terms .
The dynami c terms ca n al l be parameterize d i n term s o f 1(0 ) serie s o f
the for m A >>,,, Az,_ ; , an d ( y yz)t-k wher e th e value s o f i , j, and k
1
W e ar e gratefu l t o To m Rothenber g fo r pointin g out tha t R 2 i s a rando m variabl e in
the presen t context . However, i t remain s a usefu l descriptiv e statistic.
8
Th e proble m o f finit e sampl e biase s wa s als o demonstrate d b y Hendr y an d Neal e
(1987). Usin g recursiv e procedure s fo r OL S estimation , the y estimate d a bivariat e stati c
regression fo r sampl e size s rangin g fro m 4 0 t o 200 , considering th e bia s o f th e coefficien t
estimate fo r eac h sampl e size . Th e result s indicate d that, eve n fo r sampl e size s o f 200, the
long-run coefficien t fro m th e stati c regressio n wa s approximatel y 0. 7 whil e th e tru e
long-run coefficien t wa s 1.0 . Convergenc e t o th e tru e valu e wa s no t nearl y a s fas t i n
practice a s T~ ! whic h dominate s for sufficientl y larg e T: se e (18 ) below.

Co-integration i n Individual Equations 22

will depen d upo n th e natur e o f the ARIM A process generatin g {y t} an d


{z<}.9 Conside r a simpl e mode l i n whic h {z t} i s strongl y exogenou s fo r
the regressio n parameter s an d th e tru e dynami c relationship, apar t fro m
deterministic components, i s given by

where {y,} an d {z t} ar e CI(1 , 1). 10 Th e error s ar e mea n zero , mutually


and seriall y uncorrelate d norma l variates . Th e variance s o f 1( an d 2t
are denote d b y o\ an d a\ respectively . Suppos e tha t economi c theor y
suggests that , i n th e lon g run , th e homogeneit y restrictio n 2;= i7 i = 1
holds. Equatio n (14a ) ca n be rewritte n as
or a s

Now ( y z) an d A z mus t bot h b e 1(0 ) usin g th e co-integratio n


assumption, a s is s\t. Hence, b y estimating the stati c regression
the dynamics , give n b y A.z t an d ( y z)t-i> ar e & U containe d i n th e
residual u t\ whe n \YI\ < 1, 13 = (72 + 73)7( 1 ~ 7i) - I n general , u, will b e
serially correlated . It s long-ru n varianc e o 2, whic h appear s i n th e
expressions fo r th e Wiene r distributiona l limit s o f th e sampl e moments ,
is given by
where

It ma y then be show n that

Phillips (1986 ) show s that i t i s th e presenc e o f A in (18 ) tha t cause s th e


biases.
9

Se e e.g. th e derivatio n o f the EC M representatio n i n Ch. 5 for CI(1 , 1 ) series.


A simpl e rewritin g o f equatio n (10 ) above , t o tak e accoun t o f th e structur e o f th e
residual autocorrelation , give s u s a versio n o f (14a ) wit h th e y ; suitabl y interpreted . Late r
in thi s chapte r w e conside r a generalizatio n o f (14 ) an d investigat e th e consequence s o f
using stati c an d dynami c regressions .
10

222 Co-integratio

n i n Individua l Equations

A simpl e wa y t o reduc e th e biase s i s to reparameteriz e th e equatio n


in suc h a wa y tha t A is se t a t zero . Bot h (15a ) an d (156 ) satisf y thi s
property. Fo r comparison , followin g Banerje e e t al. (1986) , w e ra n a
second se t o f experiment s i n orde r t o investigat e th e effect s o f suc h
re-parameterizations. Usin g th e DG P give n b y (14a)-(146), we estimate
equation (15a) , wit h a lagge d z include d a s a n extr a regressor . Th e
dynamic regression equatio n estimate d i s therefore

The extr a lagge d variable , z t-\, i s include d t o avoi d imposin g homo geneity (se e Chapte r 2) , a s i t woul d b e unrealisti c t o assum e tha t th e
investigator know s th e precis e for m o f th e data-generatio n process . Th e
co-integrating coefficien t i s estimate d b y computin g th e expressio n
1 - d/c: se e Sect . 2.4 . Th e stati c regressio n give n b y (16 ) i s als o
estimated.
The stron g exogeneit y propert y require d o f z t i s guaranteed , i n th e
design o f th e experiment , b y drawin g e lt an d e 2t fro m uncorrelate d
pseudo-normal distributions . Th e value s o f y , ( i = 1, . . ., 3 ) ar e varie d
as i n Tabl e 7.3 , while ensurin g tha t long-ru n homogeneit y i s preserved .
The sampl e size s an d th e rati o o f the standar d deviation s o f e lr an d e 2t
are als o varied , t o giv e a se t o f 9 0 experiments . Th e simulation s ar e al l
conducted with 5000 replications .
The purpos e o f th e firs t par t o f thi s exercise i s to compar e th e biase s
in th e estimate s o f th e co-integratin g paramete r obtaine d fro m dynami c
regression wit h thos e obtaine d fro m th e stati c regression . (Th e tru e
value o f th e co-integratin g paramete r i s 1. ) Som e o f th e result s fo r
different configuration s o f th e y , parameter s an d standard-deviatio n
ratios ar e give n i n Tabl e 7.3 . We repor t th e estimate d biases , fo r fou r
different sampl e sizes , i n th e stati c model . Th e correspondin g estimate d
biases fro m th e dynami c regressio n (wher e th e co-integratin g paramete r
is calculated a s (1 d/c)) ar e i n almost al l cases so small a s to b e withi n
2 Monte Carl o standar d error s o f zero an d s o ar e no t reported . W e wil l
return t o th e compariso n o f these estimator s (stati c an d dynamic ) below ;
for th e tim e being , th e noteworth y point i s simply that substantia l biases
remain i n stati c estimate s fo r paramete r combination s a t whic h th e
biases i n dynami c estimate s ar e zero , o r ver y clos e t o zero , sinc e th e
dynamic model ha s been specifie d s o a s to mak e A close t o zero .
While th e dynami c estimate s contai n negligibl e biase s i n thes e ex amples, Z t is strongly exogenou s fo r th e paramete r o f interest . Whil e i t
is fairl y straightforwar d t o exten d thi s specificatio n t o includ e weakl y
exogenous z t , th e usefulnes s o f estimate s fro m dynami c single equation s
is reduce d substantiall y i f th e regressor s ar e no t weakl y exogenous . I t
also become s difficul t t o mak e unambiguou s comparison s betwee n

Co-integration i n Individua l Equations 22

TABLE 7.3. Biase s in static models a


DGP: (14 ) + (146) ; 5000 replications
Sample siz e (T)
25 5
7i = 0.9 , 72
s =3
Yi = 0.9 , 72
s =1
Yi = 0.5 , 72
s =3
Yi = 0.5 , 72
s =1

= 0 ,-5,
= 0 ,,5,
= 0 , 1 ,
= 0 .1,

0 10

0 20

0 40

-0.,39

-0.25

-0.15

-0.07

-0.,04

-0,.32

-0.22

-0.14

-0.08

-0..04

-0,,23

-0.13

-0.07

-0.03

-0,,02

-0.,21

-0.12

-0.06

-0.03

-0,,02

Standar d error s o f thes e estimate s var y widely, but th e estimate d biase s ar e


in almos t al l case s significantl y differen t fro m zero , fo r sampl e size s o f 5 0 o r
greater. Not e tha t agai n th e biase s appea r t o declin e les s quickl y than T~ l, bu t
more quickl y than T~V Z. Calculation s wer e undertaken usin g GAUSS .

dynamic an d stati c single-equatio n estimates . W e discus s thi s issu e


below.
Recalling th e discussio n i n Chapte r 5 , a tes t o f th e nul l hypothesi s
H0 : c = 0, base d o n th e t -statistic t c= 0, i s a vali d tes t fo r co-integra tion.11 Thi s statistic , unde r th e nul l o f n o co-integration , i s no t asymp totically normall y distributed . Therefor e a secon d par t o f th e exercis e
was used t o comput e th e critica l values of the distributio n of t c= 0 an d t o
use thes e critica l values t o deriv e th e powe r o f thi s statistic , for a rang e
of cases , t o detec t co-integration . Thi s i s a n exampl e o f a tes t o f
co-integration base d no t directl y o n th e residuals , bu t o n a regressio n
coefficient. A powe r comparison , betwee n a residual-base d tes t an d th e
Mest, i s give n i n Tabl e 7.7 ; bu t firs t w e us e a mor e genera l DG P t o
consider furthe r th e issu e of finit e sampl e biases .
7.4.1. General Data-generation Processes

Consider no w th e compariso n o f stati c an d dynami c estimate s o f th e


long-run multiplie r whe n th e tim e serie s ar e derive d fro m a mor e
11
Whe n y an d z ar e no t co-integrated, ( y - z),_ 1 i s 1(1), in which case (19 ) ca n only be
balanced i f c - 0 . This observatio n form s the logica l basis fo r a test o f co-integration base d
on t c= a- Th e stron g exogeneit y o f z , (fo r th e parameter s i n (14a) ) ensure s tha t a tes t base d
on estimate s fro m a single equatio n suc h a s (19 ) i s fully efficient .

224 Co-integratio

n i n Individua l Equation s

general DGP . Th e experiment s describe d abov e ar e specia l case s o f this


more genera l DGP . Th e 'static ' estimat e o f the co-integratin g coefficient
[3 is called ft s, whil e the dynami c estimate i s denoted p d.
The exogenou s variabl e i s generated a s

so tha t z t ca n b e mad e eithe r 1(0 ) o r 1(1 ) b y choic e o f <p. Th e proces s


generating {y t} i s an autoregressive-distribute d lag model wit h three lags
on both endogenou s an d exogenou s variables:

Finally, th e dynami c regression model 12 is

In comparin g th e data-generatio n proces s wit h th e model , thre e


interesting case s ca n b e identified . Thes e ar e th e case s i n whic h th e
model i s over-parameterized , under-parameterized , an d exactl y para meterized.
For eac h o f thes e cases , severa l sub-cases , whic h deriv e fro m th e
integration propertie s o f th e {z,} an d th e {y t} series , ar e o f interest . I n
particular, w e migh t b e intereste d i n determinin g whethe r th e relativ e
performances o f th e stati c an d dynami c regression s depen d upo n th e
proximity o f th e larges t laten t roo t o f eithe r o f th e processe s t o
unity whether, fo r example , performanc e whe n th e {z t} serie s i s 1(0 )
and th e {y t} serie s i s very nearly 1(1) differ s fro m tha t whic h holds when
the {z t} serie s i s 1(1 ) an d th e {y t} serie s i s nearl y 1(2) , o r whethe r i n
general th e result s ar e affecte d b y specifyin g th e {z t} serie s t o b e
non-stationary ($ = 1.0 ) rathe r tha n clearl y stationary (<p = 0.5).
In orde r t o facilitat e interpretation , a typolog y o f th e particularl y
interesting case s i s presented i n Table 7.4 . Th e simulatio n results appea r
in Table 7.5 and were compute d usin g GAUSS.
It i s importan t t o not e tha t th e serie s {y,} ha s th e sam e orde r o f
integration a s {z t} i n eac h o f th e case s treate d below ; i t i s simpl y close
to deviatin g fro m th e orde r o f integratio n o f {z t} i f th e su m o f th e
parameters p l + p 2 + pa i s clos e t o 1 . I n finit e samples , i t wil l b e
difficult t o distinguis h this proximity ('borderline' stationarity , in case B )
from a n actua l difference i n orders o f integration.
12

Thi s produce s a n estimat e o f th e co-integratin g paramete r equa l t o tha t produce d b y


linear transformation s such a s the error-correctio n form .

Co-integration i n Individual Equations 22

TABLE 7.4. Example s o f propertie s o f {z t} an d {y (} fo r variou s para meter values a


Pi + P 2 + p3 b <fi
1
=1
=1
1

A
B
C
D

Propert
1
1
1.0
1.0

1(0)
1(0)

1(1)
1(1)

y o f {z t} Propert

y o f {y,}

1(0)
nearly 1(1)
nearly 1(2)
1(1)

Parameter s are those appearing in equations (20) and (21).


I n Tabl e 7. 5 belo w w e trea t value s o f 0.9 9 as ' = 1' an d value s o f 0.9 5 or
lower (i n absolut e value ) a s ' 1'. Not e tha t w e canno t hav e pi + p2 + PI = 1
exactly, sinc e th e ter m ( 1 pi - p 2 ~ Pa)" 1 appear s i n th e equatio n fo r th e
long-run equilibriu m solution.
c
B y 'nearly' , w e mea n tha t th e serie s i n questio n i s o n th e borderlin e
between tw o order s o f integration : i n finit e sample s th e differenc e between , fo r
example, a n AR(1 ) with paramete r 1. 0 an d a n AR(1 ) with paramete r 0.9 9 is a
difference o f degre e rathe r tha n o f kind. W e sa y that z , = 0.99z,_ i + e t i s nearly
1(1): see Ch . 3.
b

The Mont e Carl o result s ar e organize d a s follows . Tabl e 7. 5 contain s


three sections , applyin g to model s tha t ar e exactl y parameterized, over- ,
and under-parameterized . Fo r eac h cas e w e report percentag e biase s in
the estimatio n o f a scala r co-integratio n paramete r an d th e standar d
errors o f th e experimenta l estimate s o f thos e biases , fo r a rang e o f
parameter value s representativ e o f eac h o f th e case s A , B , C , D above .
Entries ar e marke d a s being example s o f either cas e A , B , C , or D .
Our inten t i n examinin g th e result s i s not simpl y to dra w conclusion s
about th e relativ e merit s o f th e stati c an d dynami c regressions , bearin g
in min d tha t i n practic e th e investigato r doe s no t kno w th e for m o f th e
DGP, an d s o canno t i n genera l produc e a mode l tha t contain s precisely
the correc t numbe r o f lag s o f relevan t variables . W e ar e als o intereste d
in discoverin g th e case s i n whic h on e o r bot h o f the method s (stati c an d
dynamic regression ) yiel d especiall y larg e finite-sampl e biases . Al l
results pertain t o a sample siz e o f 12 0 observations.
The followin g conclusion s emerg e fro m examinatio n o f Tabl e 7. 5 an d
the example s of each o f the fou r case s A-D.
First, th e dynami c regressio n tend s t o produc e lowe r biase s i n
estimates o f th e co-integratio n parameter . Thi s resul t doe s no t depen d
upon a clos e correspondenc e betwee n th e dynami c mode l an d th e
data-generation process : fo r example , eve n i n th e cas e wher e th e DG P
is a simpl e one , s o tSa t th e mode l use d her e i s substantiall y over parameterized, estimate s fro m a dynami c mode l ten d t o b e a t leas t a s
good a s from th e stati c model .

226

^ ^ ^^^ ^

____^

<M<QU<;m<:mQU<;<Q

Co-integration i n Individua l Equation s


cfl

^
.&1
Vj

1
ON
^
o f) rT-H ,* ON ON CM o *
l ^- - 0 0 , . O 4 O O , - N O O C O 0 0 ^ f I O T- H
ro^qooo^Hi/-)Oi-<mo<^^HOO

~I

O r- H

^ i 1' : -? <:: ^

c cr c

O^ <N O, 00 O, O^ <d, K O^ K O, O^ O^ O^ O_
^

pj

H
_, *g
^ ^

l f

'

^ ^^~, ^^ ^^ ^- v

^^^ ^^ ^^

S'SSS-S'SS'SSS'S-S-S-SS

^o-*^tNtNONvcmcocoror-~oom
o)O<-n(siinomO'-H(NrMOooom

<~, ^~, ^^ s~*

ocNooodoo'OO^oOv-idoo

i g i

?''1 0 9

2
^

O
*-

S
~'

, O O i - H O C N C N O O ^ O f N C N O i - H

'

^ ^~- ^ S ^^ ^ ^^ in ^~^ G- ^ ^^ ^^ ~~^ ~~'

II

c/)

^ ^~~

\.

I!

~^.

ty)
CH
^O

Z
^
[/I "
CD <<S5
-<-*

Bj I

a S
O

|-2 S

e s
I*
>,

-a o
"O

ci'~ H
S

ra

03 c/r

"

-
S S 52
oo O
^

_I

QJ

|u- i

C\3 fl

.^4
_) I

1 7 1 1

0 ) 0 , ? 'Q - ^ o o r - O N O ( N ^ H a \ o v o o ^ o r s i v o u - i - *
Sr n O ^ ^ CT\or~a\moooNON^oos(Noor^(N i

5^o
I
$..-*-

~~~"

n. ^ ^ II
O O O O O O O O O O O O ^ t ^ ^ J ^ " ..
. C-IC3INO<NlO^HO^OvH
O
q - i C S I m ^<
NI
N(
N^
H rl ii

S,

-S-

o.

Q.

^^nin

oocooooooooqooooooqo
dddoT-H^dddd^H^ddT-H

ddddddddddddddd

ON ON ON ON ON ON ON ON ON CTN ON ON ON ON ON

o o p o o o o o o o o o *_**
ddddddddddddddd

ddddddddddddddd

^inw-jinir)'n>o'Oioi^i^i^;

to +
<O
-i- ^ N ~O i o u ~ i i r ^ i n t / ~ ) i o i n i r ) i r ^ i o i o i r ) i n r i i r
)
c N
vo
cJddddcJddddcJdddc)
S ^-^ 'C
o.
I I I I I I
^- .. d>
S ^ u
o-

CO

o O S

r>

^S

rj

-a
g2

"3
G
o
on

'g

<u ^ ?*%
y o
< "
O

^
o

in
W1

?*^

oxS

fi

t^ 0

r^
"

ui

a
o

<l=a
I

'Q.
T-H

o^
O

Co-integration in Individual Equations 22

's O^

O^

^O

rH r-- Tj^-v CN

O*l CO O

V^

OO O

OO ON OO O> ON O> in
'v ^O

^O

(N rO

ONON^OfN

rH f* rH ^H

^H 00 C-4

<<<<!<<pqQQQU<!<<ffl<;<<:mQQQU

CN U"j ^
CO

oo

f^^^l

CN * C4

o o "^

f^^1^

OS

^ ^ '-J

^^^

CN (N i-H

r*]<Noooc>o^Hv.oooorr><-Hr-HTj-inoO'Hioooc>r-:
S'S'^i^ SS-S^ S-S2^2-S-SS^ SSS-^ SS-S-S-

I
I

I ^ I

I ^ I

II

I l l ^ l l l

^HO<N-*OOO^t-OOO^HOOO(NOOOir)OOOT-(

cN^)-^-^oost--^-moooN<^r--c^o^ooX:5sar'i/Tsi/-r^sCHs\osc7~oo"
S2'2'2'2'2'2'2'S-2'2'S'S-2'2'SSS-2'S2'2^S-2-

^ H O O O O O ^ H O O O O C ^ O O O O O O v H O O O C N f N

COOOCOOOOONrHCNlrOV^<NT-HQ\r--CNO^OO>^HCNCO^O^-^H

T-H

i i i

iI

in

i i i i i v
'

<I

rH

"n

i r~--o s i i

i i i

T-H

i i

C Noo

T-H

iifNOO-^HCNlOOTic<)OOT<c^)ioOTic^iOOTirO^nO
iI O
T-HO
'^O
if*/~)
'I *O
Tiv/^

vn

| i<Noocnvoa\oicN

cOcNi-HON<N'-H^OOO^toO'-H>OOO^-OONI--'^-a\CNUlON^H

,,

S~*^

55.

^H

Sj"

<CQ.

i i i

' ima\a\i<sit--a-i

<S2.

II
-

11

II

cT I
r- ir

^ S
II ^
^Q.

<
vc

II

iO

N ^

oooooooooooov")inini/^v")v/-)ir)i/^i/"}ir>i/^i/"i

O~

ovoONO\ov~io^ONOw^oaNOino%ooi/~jaN^o>oONO
ooooooocJoooooocsooooooocJo

N ^

^
^-

O O O O O O O O O O O O O O O C 5 O O O O O O O O O O O O O O O C 5
OOOOOOOOT-H^Hi-H^-5ooOOOOOC5T-HT-H^H^-H

\O

. O C 5 O O O C 5 O O O O O O O O O O O O O O O O O

,0

Q,

>
T3
(0

.s
'^
OJ

o>
6O

PH

CQ
t-l
CO

>

M
4)

"H3
T3
O

"Q-

" CT

o
'
ca

a
>>
Q
^
&

228

u?
C/3

in

'-H O

OO O-^ ^*~~s

^~

MD CN) T1 ON
^) f^ i^j- CO
Tf ii |
1

m s*-'

CO CN

O O

00 -<f

O O CO in oo

r- m co
o o o

1/1 CN rH

in rH in in NO in
1

NO

f? US'

< m Q Q u m < <! ffl < < u Q Q

rH

m CN o o o o

Ti CN ON ,,CO ON ^^oo"
C"-~ iI Tf rH "si" O O 00s O CN CO

CO <

Co-integration i n Individua l Equation s

<! <!

,-^G
NO

&o
NO

rH

'

^^

ON ON ON ON O

CN

r~ ON ON oo o
1
1
1 1

xV

yV

t- ON

in ON <o
ON

CN

G"^
oo ^t
^

CN CO

ON CN

m co

O O

in o ON in o
^ co ^|- Tf CO

88 8 8 8 88

ooo oo oo

ooo oo

CO ^

ON

in
in
in
o o o CN O O CN OO CN
10 in in
m rH m in rH
CN CN

rH

O O o o o

in in in ON i/}O

"^ CN

rH O

CO O ,, /^~s W)
^^ CO "!:T
in in o
CN OO ON rH O ^O fO T4

1 1

in 1/1

rH^f^
1/1 ON

rH

CN

O
1

in ONON m m m m in in 1/1 m m
o o o o o o o o oo o o

O O o o O O O

ON
ON ON
O O

rH

^^ xv

1 1

o
o

<o
CD
O
.~
, I
1
rH

&

O
Q,
<^

cd

.-i

'"'

IB

rH

c/l 'O

/* !
>H

3 g> en

^ aj
2

(^

0 -^
4H

u .an
M

C3

^ 'fe
Q^ &
CD
TO 43

a CD

M) S1

II

.g
3
CD
^O

CS
00

PH

CD

O
Q
^
X

^H

CD
>H
CTJ

Q. CD
cu 1)
^

CTj TJ
CD "

CD

-^

-g
^

Q
SH
CQ

.Si

cd

CD

^ "

S
^J

+j CD

SP i3
^

'||

<D

. ^

-D

HH

F-3

1 1

15

o "
C3 "^ S --^
o-5 o *

C -3 C c3
'** m .52 D
23
& "-rj CQ
CD
K
.3 uJ O

cu 73

o 8 T)CD CDrt
03 3w 'CN T30

CJ J^H

18

SH

S3 g>

^
0

-1 S

*" "CD 0 T3

?%

9 2

CD

J2 '55
x 2

* .
CD "*'

g CD

<D

04

rH

rH

O *

O O rH CO

O ON

m o

^_^^

^__^

CO ^-v

1 1

00 ON
1
1

co r-~ co o

rH

O ON

rH O

IT)

T)

rH ON rH O rH in
rH ^f
1 *> CN 1/1
i 1
1

~ ^

CO

in r-~
oo co

7
CN

rH

O O

ON

o o o oo 00 00 o o o o o O 00 oo oo o o o
rH O O o o O O rH T1 1I
o o o o o o

in ONON in
o o o o

o o

in m

o o o o
1 1 1 1

in m

m m o in in o

CO

rH

^vG
0s!--

QQ.

<CQ.
j

QQ.

T1

"*r1

["T']
**~s

--

^i.

o'
o

c^
II

o.

a
II

<
ii
n
^

"2
.s
CD
*5
CD

O.

CO
tn

ta "O

"c
3
^^ "~^I

c o
8 ~

i-^

^'ea3
^ a
w >->

<j .^
H -S

Co-integration i n Individual Equations 22

Second, a n especiall y troublesom e cas e arise s wher e th e root s o f th e


lag polynomia l i n th e serie s {y,} ar e suc h tha t {y t} i s close t o bein g a n
1(1) serie s i n spit e o f th e stationarit y o f {z f}. Thi s cas e lead s t o th e
largest biase s o f thos e examine d here . A s i t i s alread y know n tha t
regression o f 1(0 ) serie s o n 1(1 ) serie s produce s troublesom e results ,
especially i n th e for m o f non-standar d distributions of tes t statistics , this
result i n th e opposit e cas e i s unsurprising. It suggest s that th e investigation o f th e propertie s o f t - an d f-statistic s tha t woul d b e generate d i n
the co-integratin g regression s examine d her e ma y b e o f independen t
interest.
A featur e of thes e nearly unbalance d regression s i s that th e biase s i n
the estimate s o f th e long-ru n multiplier in th e static regression s ten d t o
be associate d wit h muc h lowe r standar d error s tha n thos e i n th e
corresponding dynami c regressions. Thi s is attributable to th e resul t tha t
the varianc e o f & i s o f orde r T~ 2 whil e i n th e dynami c regression ,
because o f th e asymptoti c normalit y o f th e coefficien t estimates , th e
variance of fi ^ i s of order T" 1.13 Expresse d differently , whe n c i s small,
ftd ca n tak e extremel y larg e values , an d i n finit e sample s may no t hav e
any analytica l moments (see Sarga n 1980 an d Hendr y 1991a) .
Finally, th e under-parameterize d dynami c regressions , fo r a wid e
range o f paramete r values , perfor m notabl y wors e tha n thei r correctl y
and over-parameterize d counterparts . I n th e absenc e o f a priori infor mation abou t la g structures this would appea r t o suppor t th e inclusio n of
a fairl y ric h dynami c structure i n th e regression . Not e als o tha t i n part s
(a) an d (c ) o f Tabl e 7. 5 ther e i s a t leas t on e cas e i n whic h th e stati c
regression appear s t o b e superio r t o th e dynamic . Henc e a preferenc e
for th e dynami c form seem s reasonabl e base d o n th e overal l results , bu t
the result s shoul d no t b e interprete d t o mea n tha t th e dynami c regression is invariably superio r eve n wit h stron g exogeneity.
The classificatio n recorde d i n Tabl e 7. 4 help s u s t o interpre t furthe r
the result s appearing in Table 7.5 . Case s labelled A an d D ar e examples
of balance d regressions . Howeve r whil e case A represent s regression s of
an 1(0 ) variabl e on othe r 1(0 ) variables , case D represent s regression s of
an 1(1 ) variabl e on othe r 1(1 ) variables . Thus in case A experiments , th e
omitted dynamic s are importan t an d th e dynami c model shoul d perform
noticeably bette r tha n th e correspondin g static model . A n examinatio n
of part s (a ) an d (b) o f Table 7. 5 show s that this is indeed true .
The mor e interestin g case , fro m th e poin t o f vie w o f th e stud y o f
co-integration, i s cas e D , i n whic h we hav e tw o 1(1 ) processe s tha t ar e
co-integrated. I t ma y b e see n that , fo r thi s cas e too , i n a substantia l
majority o f experiment s th e dynami c regression estimate s o f th e long run coefficien t ar e mor e accurat e tha n th e stati c estimates . Thi s recalls,
13
Th e rate s o f convergenc e ar e determine d b y usin g th e SS W theorems , discusse d i n
Ch. 6 .

230 Co-integratio

n i n Individual Equations

in th e contex t o f thi s mor e genera l data-generatio n process , th e


character o f th e result s i n Banerjee e t al. (1986) .
Cases B an d C denot e nearl y unbalance d regressions , an d th e result s
here shoul d b e interprete d wit h caution . I n a sense , on e migh t argu e
that th e regression s ar e spuriou s because variable s o f differen t order s o f
integration canno t b e linke d b y a n equilibriu m relationship . Hence ,
following fro m th e wor k o f Phillip s (1986) , thes e regression s ar e likel y
to b e characterize d b y asymptoticall y divergent coefficient estimate s an d
t-statistics. W e woul d als o expec t bot h th e stati c an d dynami c regres sions t o behav e rathe r badly , wit h th e behaviou r worsenin g th e furthe r
the regressio n move s awa y fro m balance . Fo r example , i n cas e B , th e
greater th e absolut e discrepanc y between 0 and p x + p 2 + p^, th e large r
the biase s i n the estimates . A s 0 approaches 1 (for P i + p 2 + P s clos e t o
the uni t circle) , cas e B approache s cas e C , an d th e biase s ar e generall y
lower. Cas e C represent s unbalance d regression s o f a rathe r specia l
kindnamely, regression s o f a near-I(2 ) variabl e o n 1(1 ) an d near-I(2 )
variablesand th e propertie s o f suc h regression s appea r t o b e bette r
than migh t have been expecte d (se e Chapte r 3) .
Where th e exogenou s variabl e i s 1(1) , th e regressio n estimate s ar e
super-consistent an d th e bia s for th e properl y specifie d mode l i s close t o
zero. Wher e eac h o f th e serie s i s 1(0 ) large r biase s ca n appear ; th e
largest aris e wher e on e serie s deviate s from anothe r b y a quantity that is
close to bein g non-stationary.
In sum , substantia l biases i n stati c OL S estimator s exist , an d specify ing dynami c regressions ca n hel p alleviat e th e problem . Th e desirabilit y
of usin g dynami c regression s i s reinforce d b y a consideratio n o f thei r
ability t o detec t co-integration , so we ca n compar e th e powe r propertie s
of a test base d o n dynami c models with on e base d o n th e residual s fro m
static regressions . Sectio n 7. 6 illustrate s th e tw o method s empirically . In
Section 7.7 , w e conside r method s o f correctin g stati c estimator s o f
co-integrating vectors an d discuss their properties .

7.5. Power s o f Single-equation Co-integratio n Tests


A rang e o f alternativ e test s fo r co-integratio n ha s bee n discusse d i n
earlier sections , an d her e w e commen t o n a numbe r o f feature s tha t
influence tes t power , followin g th e analysi s i n Kremers , Ericsson , an d
Dolado (1992) .
Reconsider th e DG P i n (9)-(12) above , i n case C:
Az, + Ay ( = e lt (9'

Ay, + 2Az r = (^ - l)(j>,- i + 2*,_! ) + 2t. (10'

)
)

Co-integration i n Individua l Equations 23

The stati c regression involve s estimating an equation o f the for m


and th e D F tes t i s conducted o n
where v t = yt- fiz t- Th e DGP is optimal for the DF test her e becaus e
(10') ha s a vali d commo n facto r whe n (e 1( 2< ) = 0 (se e Hendr y an d
Mizon 1978 , an d Sargan 1980). Sinc e ft = -2 , v t = y, + 2zt, so that (10')
corresponds t o Au f = (p 2 - l)v. t-i + 2t an d henc e CD, coincide s wit h e 2,
except fo r term s involvin g (/ 3 - f$)z t, etc . Fo r thi s reason , th e DG P
selected b y Engle an d Grange r (1987 ) i s relatively favourable to th e D F
test.
By contrast , conside r th e DG P i n (14 ) wit h th e stati c regressio n i n
(16) an d th e sam e form o f DF tes t a s in (24) :
In thi s case, u t = yt- fiz t s o that i n (25), evaluated a t ) = /? ,
hence
In (26) , a common-facto r restrictio n i s impose d o n th e dynamics , bu t
this tim e i t i s no t necessaril y a vali d representatio n o f (14a) . Indeed ,
since [3 = 1 by homogeneity, (14a) can be writte n as
Comparison wit h (26 ) reveal s tha t th e ne w error [ lf + (y 2 - l)Az J i s
white noise , bu t ha s a large r varianc e tha n tha t o f th e erro r i n (14a) .
Kremers e t al. (1992 ) sho w tha t t^ i n (24 ) retain s th e Dickey-Fulle r
distribution unde r th e null , <p - 0 , wherea s for a fixed f t (suc h a s unity)
t(Yi~ 1) m (14fl ) ca n be approximated by N(0, 1) whe n (y 2 - l) 2o\/a\ i s
sufficiently large . However , whe n / 3 i s estimate d unde r th e nul l o f n o
co-integration, th e secon d tes t cease s t o have a normal distribution.
Under th e alternativ e o f co-integration , t(y\ 1) = (GW/OI)^ an d
hence ha s higher power , a resul t verifie d b y powe r approximation s
based o n near-integrate d processes . T o illustrat e thi s for m o f powe r
analysis, conside r testin g th e nul l H 0 : p = 1 i n y t = pyt-\ + v, agains t
the loca l alternativ e H T : p - exp(e/r) , < 0, usin g the Dickey-Fuller
f-test. Whe n th e v, are IID, then under HQ

232 Co-integratio

n i n Individua l Equations

using results demonstrated i n Chapter 3 .


Under H T, however , usin g result s o n near-integrate d processe s i n
(3.40)-(3.42),

(29)
where r\ = (|J#e (r)dW(r)) (^K E(r)2dr)~l. Whe n e = 0, w e reproduc e
the distributio n unde r H 0. Otherwise , fo r e < 0 , th e distributio n i s
shifted t o th e lef t b y e(\\K (r)2 dr) 1/2 . Whe n T = 100, e = - 1 implie s
that p= 0.99, an d e = -5 implie s tha t p = 0.95; a s e , th e powe r
tends t o 1 .
Kremers e t al. (1992 ) argu e tha t simila r consideration s sho w tha t th e
non-centrality paramete r o f th e ECM-base d tes t fo r co-integratio n i s
larger tha n tha t o f th e non-parametri c statistic s discusse d i n Chapte r 4 .
Their Mont e Carl o result s support thes e asymptoti c results .
Return no w t o th e Mont e Carl o experimen t give n b y equation s
(14fl)-(146). On e appealin g tes t fo r co-integratio n tha t w e hav e men tioned consist s i n usin g th e mode l (15a) , where , unde r th e nul l o f n o
co-integration, j l = 1 so tha t th e secon d coefficien t i s equa l t o zero . A
f-test fo r thi s conditio n i s therefor e a tes t fo r co-integration . Whil e w e
would expec t th e distributio n o f thi s tes t statisti c t o b e non-standard , i t
is a straightforwar d tes t an d woul d therefor e b e especially usefu l i f it s
power wer e high . I n particular , fo r strongl y exogenou s regressor s i t i s
similar (se e Kivie t an d Phillip s 1992) .
We examin e th e tes t wit h a small Monte Carl o experiment , comparin g
its powe r wit h tha t o f th e AD F tes t base d o n a static mode l t o estimat e
the co-integratin g parameter , i n th e DG P give n b y (I4a)-(l4b). Th e
first tes t i s th e AD F tes t wit h on e lag , compute d fro m th e residual s of
the static regressio n (16) . Th e secon d tes t i s base d o n th e ^-statisti c fo r
c i n (19) . A s note d earlier , i f the nul l of no co-integratio n i s true, c = 0 .
Under th e nul l (i.e . y^ = 1 , y 2 = y 3 = 0, o l = o 2 = 1 in (14a)-(14&)) ,
c=0 ha s a Wiener distribution . The critica l values of this distribution an d
the AD F wer e compute d b y simulatin g th e nul l mode l fo r 500 0
replications usin g PC-NAIV E (Hendry , Neale , an d Ericsso n 1990) . Thes e
critical value s wer e the n use d fo r computin g th e tes t power , an d ar e
shown i n Tabl e 7. 6 fo r regression s lik e (19 ) wit h a n intercept . (Th e
population constan t i s zero. ) Th e sam e critica l value s resul t fo r
72 + 7 s = 0 when thes e parameter s ar e individuall y non-zero, s o the tes t

Co-integration i n Individua l Equations 23

TABLE7.6. Fractile s o f f-statisti c fo r H Q: c = 0 in (19 )


Fractiles of t c= 0 in (19) Fractile
T

25
50
100

0.10
-2.99
-2.95
-2.93

0.05
-3.42
-3.33
-3.28

s of ADF(l)
0.01
-4.22
-4.06
-3.95

0.10
-3.15
-3.10
-3.09

0.05
-3.51
-3.41
-3.39

0.01
-4.30
-4.08
-4.00

is simila r fo r th e impac t o f Az r : thi s findin g i s base d o n replicatin g th e


null experimen t a t differen t paramete r value s usin g th e sam e rando m
numbers. The ADF(l ) critica l values are als o the sam e for all the values
of th e nul l model' s parameter s sinc e th e AD F tes t i s know n t o b e
similar. (Th e sam e Mont e Carl o tric k wa s use d t o chec k tha t feature :
see Banerjee an d Hendry 1992. )
The t c= o fractile s ar e slightl y close r t o zer o tha n th e correspondin g
fractiles o f th e augmente d Dickey-Fulle r distribution . Unde r th e alter native hypothesi s o f co-integration , t c= 0 i s asymptoticall y normall y
distributed.
Each entr y in Table 7. 7 show s the proportiona l frequenc y o f rejection
of th e false nul l hypothesi s o f n o co-integration. 14 Th e powe r o f eac h
test fo r eac h se t o f paramete r value s o f th e DG P an d sampl e siz e i s
shown separately . A t smal l value s o f y 1 1, th e powe r P a o f th e
ADF(l) tes t i s ver y clos e t o tha t o f th e t c= 0 tes t (P c), bu t th e latte r
dominates a s YI ~ 1 increases. Increasin g the signal-nois e rati o o 2/Oi o r
(1 y2) als o favour s P c. Th e power s converg e t o unit y a s th e sampl e
size T increases , bu t slowl y when ( 1 yjj = 0.1 .
Thus, th e power o f t c= 0 relativ e t o th e ADF increases wit h ( 1 - y :),
(1 - y2)2/i, an d T , matchin g th e result s i n Kremer s e t al. note d
above. Th e firs t thre e experiment s hav e dynamic s tha t ar e clos e t o
satisfying a commo n facto r restriction : th e AD F equatio n ha s a residual
standard erro r tha t i s only abou t 4 per cen t large r i n (a ) tha n th e DGP .
On thes e experiment s th e AD F tes t doe s relativel y well , althoug h bot h
tests d o poorly i n absolut e terms . Whe n a common facto r approximatio n
is poo r a s in (/) , th e AD F tes t suffer s abou t a n 8 5 per cen t increas e i n
the residua l standar d erro r b y imposin g th e commo n facto r an d doe s
relatively badly , i n som e case s dramaticall y s o (e.g . T = 50 a t th e 1 %
significance level) . Owin g t o th e larg e valu e o f ( 1 - Yi), bot h test s d o
well absolutely for sampl e size s of 100 .
The tes t power s respon d i n a nonlinea r wa y to change s i n th e desig n
parameter values , bu t som e understandin g o f th e rejectio n frequencie s
14

A s i n Table 7.6 , all the result s ar e base d on 5000 replications.

234 Co-integratio

n i n Individua l Equations

TABLE 7.7. Tes t rejectio n frequencie s i n ECMs


DGP: (14a ) + (146) ; 500 0 replication s
Estimated powe r a t given tractile
0.10

WADF

();l/i =
T=

25
50
100

0.9 , y 2 = 0.5 , 5 = 3 "

0.13/0.13
0.21/0.17
0.44/0.31

0.05
WADF

0.01

0.06/0.06
0.10/0.10
0.26/0.20

0.01/0.01
0.02/0.02
0.07/0.05

0.06/0.05
0.10/0.09
0.30/0.19

0.01/0.01
0.02/0.02
0.08/0.04

0.07/0.05
0.12/0.07
0.40/0.14

0.02/0.01
0.03/0.01
0.13/0.03

0.45/0.20
0.97/0.72
1.00/1.00

0.16/0.05
0.78/0.34
1.00/0.97

0.66/0.18
1.00/0.67
1.00/1.00

0.29/0.04
0.94/0.28
1.00/0.96

0.87/0.12
1.00/0.60
1.00/1.00

0.64/0.03
1.00/0.22
1.00/0.94

( * > ) 'Yi = -9 , 7 2 = 0.5 , s


0.14/0.11
r = 25
0.21/0.15
50
0.49/0.30
100
(c) yi = 0.9 , y 2 = 0.5 , s = 1/ 3
0.13/0.10
T = 25
0.24/0.13
50
0.59/0.24
100
5
(d): Yi = - > 7 2 = 0.1 , s = 3
0.66/0.35
T = 25
0.99/0.84
50
1.00/1.00
100
=
1
/!
=
0.5
,
y
=
0.1
,
s
2
W:
0.79/0.31
r = 25
1.00/0.80
50
1.00/1.00
100
=
1/ 3
/!
=
0.5
,
y
=
0.1
,
s
2
(/)i
0.94/0.23
r = 25
1.00/0.75
50
1.00/1.00
100
a

S = CTi/0-2.

in Table 7. 7 ca n be obtaine d fro m th e followin g analysis. Neglectin g th e


intercept, th e AD F tes t essentiall y involve s testin g YI = 1 in
where th e firs t ste p regressio n o f y t o n z t estimate s fi , whic h her e ha s a
population valu e o f unity . Unde r th e alternative , y t-i flzt~~i is station ary, an d fo r y 3 = 1 the non-centralit y o f th e AD F pseud o Mes t wil l b e
given approximatel y b y

Co-integration i n Individua l Equations 23

(see Mizo n an d Hendr y 1980) , wher e AS E denote s th e coefficien t


asymptotic standar d erro r calibrate d t o a sampl e siz e o f T. Fo r give n
design paramete r values , th e AS E i s easil y calculate d usin g PC-NAIVE ,
and som e outcome s ar e show n below .
Similarly, the t c= 0 test i s actually based o n testing y j = 1 in

Since th e regresso r y ( _j - z t-\ i s stationar y unde r th e alternative , i f


7s + 72 + 7i = 1 i s impose d an d henc e z t-\ omitted , th e asymptoti c
non-centrality o f th e Mes t o f y i = 1 (agai n i n PC-NAIVE) , yield s th e
following illustrativ e values for T = 25:
Case
NCadf
NC,ecm

(a)
-1.15
-1.19

(*)
-1.15
-1.28

(c)
-1.15
-1.52

(d)
-2.89
-3.25

()
-2.89
-3.88

(/)
-2.89
-5.32

In practice , thes e approximat e non-centralitie s wer e clos e t o th e mea n


values o f the correspondin g tes t statistic s in th e Mont e Carlo , excep t fo r
(fl)-(c) fo r th e ADF , which ha d a mea n o f abou t -2.1 5 (se e (4.28)).
Their values hel p explai n both th e increasin g power s o f both test s acros s
the experiment s an d th e relativel y bette r performanc e o f f c = 0 - Compared wit h th e critica l value s i n Tabl e 7.6 , and give n th e samplin g
standard deviation s o f th e test s o f abou t 0. 8 fo r AD F an d 1. 0 for t c= 0,
the non-centralitie s als o accoun t for the absolut e power s of the tests :
when th e mea n outcom e i s below the critica l value, a power o f less tha n
0.5 usuall y results ; whe n th e mea n i s more tha n on e standar d deviatio n
below th e critica l value , th e resultin g powe r i s under 0.2 ; two standar d
deviations lowe r induce s a ver y lo w power ; an d s o on . Simila r argu ments appl y fo r deviation s o f the mea n abov e th e critica l value.
Overall, ther e woul d see m t o b e som e advantag e i n modellin g
dynamics les s restrictivel y tha n b y commo n factor s whe n th e latte r i s a
poor approximation . Not e tha t th e absenc e o f an y contemporaneou s
effect fro m Az , alway s induce s a violatio n o f commo n factors . Finally ,
since th e long-ru n paramete r i s no t assume d know n i n thes e experi ments, th e t c= 0 tes t procedur e i s a n operationa l one , and ha s th e sam e
number of parameters her e as the AD F test .
The mai n drawbac k t o suc h a n approac h i s its dependenc e o n stron g
exogeneity. Boswij k (1991 ) propose s a Wal d tes t fo r co-integratio n i n
individual equation s whe n th e regressor s ar e no t eve n weakl y exogen ous. Thi s jointly test s the nul l for th e coefficient s o f all the lagge d level s
in a Bardsen formulation . Th e resultin g test i s asymptotically similar an d
in effec t test s fo r a commo n facto r o f unit y (se e Hendry an d Mizo n
1978). Boswij k an d Franses (1992) investigat e the powe r o f this test.

236 Co-integratio

n i n Individua l Equations

7.6. A n Empirica l Illustratio n


To illustrat e severa l test s fo r co-integratio n i n singl e equations , w e
return t o conside r th e U K seasonall y adjuste d quarterl y dat a o n mone y
demand. Th e ra w dat a serie s wer e show n i n Chapte r 1 , an d w e
concentrate her e o n th e DW , DF , an d AD F test s base d o n a stati c
regression, an d o n thei r compariso n wit h a dynami c regression, whic h is
heavily over-parameterized . I n al l cases , w e assum e tha t ther e i s onl y
one co-integratin g vecto r an d tha t i t enter s th e money-deman d model .
See Kremer s e t al. (1992 ) an d Ericsson , Campos , an d Tra n (1990 ) fo r
related analyses .
The long-ru n determinant s o f th e deman d fo r transaction s mone y M,
as measure d b y Ml , ar e th e pric e leve l P, rea l incom e a s measure d b y
constant 1985-pric e tota l fina l expenditur e X S5, an d th e opportunit y cos t
of holdin g mone y measure d b y R n. (Se e Hendr y an d Ericsso n (1991i> )
for detail s o f it s calculation. ) W e assume d a log-linea r equation ,
consonant wit h pric e an d incom e homogeneity , give n by
where lower-cas e letter s denot e logs , ai = 1 i s anticipated , an d
a, > 0 , / = 1 , 2, 3. Least-square s estimatio n o f th e stati c regressio n ove r
the sampl e 1963(I)-t o 1989(11 ) yielded

The residual s wer e the n teste d fo r a uni t roo t usin g th e D F an d AD F


tests, th e latte r commencin g wit h fou r lag s an d testin g down . Th e
following result s were obtained :

No lagge d values of A w prove d significant , leadin g to th e D F test :

In n o cas e doe s an y tes t rejec t th e nul l o f n o co-integration , a s th e


lvalues on th e estimate d coefficien t o f M J ar e i n the neighbourhoo d o f 2
in bot h th e D F an d th e AD F regressions . Tha t outcom e continue s t o
hold i f a tren d i s adde d t o th e basi c static-regressio n mode l (30) , or i f

Co-integration i n Individual Equation s

237

price homogeneit y i s imposed an d Ap adde d a s a regressor, correspond ing t o allowin g m an d p t o be 1(2), wit h ( m - p ) an d Ap bein g 1(1). In
that last case , R 2 fo r real mone y is equal to onl y 0.68.
We assum e no w that Ap, x S5, an d R n ar e weakl y exogenou s fo r th e
parameters i n th e conditiona l mone y deman d model . Th e outcom e o f
estimating a dynami c equatio n i n th e level s o f th e variable s wit h fiv e
lags o n eac h o f m p, Ap , * 85, an d R n (plu s a constant ) b y leas t
squares i s shown in Table 7.8.
TABLE 7.8. Empirica l result s
Variable

Lag
1

0
m p

-1.000

xss

-0.041
0.115
-0.411
0.117
-0.757
0.210
-0.124
0.169

SE
SE

Rn

SE
Ap
SE
CONSTANT
SE

0.

Sum o f
lags

A
0.164
.147
0.549
0.,240
0,,251 0 .152
0,,132
0.,135 0 ,131
0.109
0,,028
0.118
0.087
0,.162
0.293 -0,,067 -0.,240
0,,130
0 .139
0.119
0 .026
0.135
0..139
0..139
-0.361 -0,,122 -0.,046 -0 .084 -0.045 -1.070
0.130
0 .187
0.178
0.,185
0.,176 0 .175
0.069 -1,.102
0.020
0,,307 -0.,412 -0 .329
0 .222
0.255
0,.253
0,,246
0 .246
0.203
- -0.12
4
0 .169

R2 = 0.9966 a
= 0.0130 F(23
, 76) = 975.3 8 D
W = 1.976
SC = -7.85 3 Mea
n = 10.89613 1 S
D = 0.19617 3
Normality % 2(2) = 4.29
AR 1- 5 F[5, 71]
=
0.2 0 ARC H 4 F[4
, 68] = 0.22
Xj F[37,38]
= 0.6
6 RESE
T F[l,75
] = 0.98
COMFACF[15,76] = 3.14
Tests on the significance of each variable
Variable

Ffnum., denom. ]

Value

Probability

Unit-root
Mest

m p

F[5,76]
F[6, 76]
F[6, 76]
F [6, 76]
F[l,76]

340.201
7.801
12.127
6.846
0.536

0.000
0.000
0.000
0.000
0.466

-5.168
6.171
-5.719
-4.963
-0.732

*85

Rn

Ap

CONSTANT

Solved static long-run equation


m p = 1.102jc 85 - 7.278R n - 7.493A; ? - 0.84 2
(0.112)
(0.528) (1.482
) (1.230
)

238 Co-integratio

n i n Individual Equations

These dynami c estimate s ar e wel l behaved: th e unit-roo t f-test s ar e al l


in th e neighbourhoo d o f 5 o r large r i n absolut e valu e an d ever y
regressor matter s a s a se t (i.e . testin g al l fiv e lags) ; th e solve d lon g ru n
is wel l define d an d compare s favourabl y wit h (30 ) sinc e th e thre e
economic variable s have highl y significant coefficient s wit h sensible sign s
and magnitudes ; th e goodnes s o f fi t i s reasonable ; an d th e diagnosti c
tests o f th e dynami c specification ar e al l acceptable . Not e tha t th e su m
of al l the lag s of th e dependen t variable , a s shown in th e fina l colum n of
Table 7.8 , i s similar t o tha t foun d i n th e D F regression , bu t ha s a muc h
smaller standar d error .
Only th e firs t la g i s strongl y significant , a s i s show n i n Tabl e 7.9 .
Tests o f commo n factor s i n th e la g polynomial s usin g th e procedur e i n
Sargan (1980 ) yiel d the result s in Table 7.10 .
Thus, th e hypothesi s o f fiv e commo n factor s ca n b e rejecte d a t an y
reasonable leve l o f significance . Recallin g th e discussio n i n Sectio n 7. 5
above, thi s outcom e help s explai n wh y th e D F an d AD F test s di d no t
reject th e nul l o f n o co-integration , wherea s th e dynami c mode l ha s
done s o decisively . Give n tha t th e commo n facto r restriction s ar e
rejected, th e D F an d AD F test s ar e no t wel l suite d t o detectin g
co-integration. Th e EC M versio n o f thi s equation , reporte d i n Hendr y
and Ericsso n (I99lb), ha s a ?-valu e greate r tha n 1 0 in absolut e valu e fo r
the EC M coefficient , i n a mode l whic h parsimoniousl y encompasse s th e
unrestricted equatio n fitte d above . Thus , th e evidenc e favour s rejectin g
no co-integration, an d the result s in the nex t chapter suppor t tha t claim .
TABLE 7.9. Test s on th e significanc e o f eac h la g
Lag F[num.

, denom. ] = Valu e Probabilit

5
4
3
2
1

0.691
1.615
1.654
1.416
12.967

F [4, 76]
F [4, 76]
F [4, 76]
F [4, 76]

F[4, 76]

y
0.600
0.179
0.170
0.237
0.000

TABLE 7.10. COMFA C Wald tes t statisti c summary table


Order x
13
26
39
41
51

2
5

d.f . Valu

e Incrementa
0.086
0.196
4.176
8.101
47.128

3
3
3
3
3

l x 2 d.f . Valu

0.086
0.110
3.980
3.925
39.028

Co-integration in Individual Equations 23

7.7. Full y Modifie d Estimatio n


This sectio n consider s method s fo r correctin g th e finite-sampl e biase s i n
static regressions . Par k an d Phillip s (1988) , Phillip s an d Durlau f (1986) ,
Phillips an d Hanse n (1990) , an d Phillip s (19880 , 1991 ) hav e argue d tha t
the performanc e o f estimator s o f co-integratin g vectors base d o n static
regressions is adversely affecte d b y the existenc e of second-order biases.
As show n i n th e example s below , thes e biase s hav e n o effec t o n th e
consistency o f th e estimators , bu t resul t i n th e asymptoti c distribution s
of scale d estimators , suc h a s T(p ft) i n (31 ) below , havin g non-zer o
means.
Such biase s pla y a potentiall y importan t role i n finit e samples . Fo r
example, le t the variables ylt an d y 2t b e generated by

When th e {u it} ar e autocorrelate d an d intercorrelated , a stati c regres sion o f yit o n y 2(, b y no t usin g an y informatio n abou t th e proces s
generating y 2t, provide s a n estimat e o f y 3 whic h ca n b e quit e severel y
biased eve n i n fairl y larg e samples . Phillip s e t al. therefor e recommen d
full-system maximu m likelihood estimatio n o f co-integrate d systems . A s
an alternativ e t o estimatio n o f th e ful l system , the y propos e correctin g
the single-equatio n estimate s non-parametricall y i n orde r t o obtai n
median-unbiased an d asymptoticall y norma l estimates . Thes e re commended corrections , fo r simultaneit y bia s an d residua l autocorrela tion, us e expression s derive d fro m th e asymptoti c distribution s o f th e
estimators althoug h th e correction s ar e mad e t o estimator s fro m finit e
samples. Phillip s an d Hanse n (1990 ) sho w tha t thes e correction s wor k
effectively i n sampl e size s a s smal l a s 50. 15 Thei r exampl e i s presente d
in Sectio n 7.10. 4 below.
The estimate s obtaine d fro m full y modifie d an d full-informatio n
methods ar e asymptoticall y equivalent . Thi s equivalenc e i s o f interes t
because i t link s th e discussio n wit h a thir d possibl e metho d o f reducin g
finite-sample biases , namely , estimatin g single-equatio n dynamic regres sions. Th e ai m o f th e analysi s i n thi s sectio n i s t o compar e th e
non-parametrically corrected estimate s (whic h ar e als o asymptoticall y
efficient an d median-unbiased ) wit h estimate s obtaine d fro m dynami c
regressions i n eithe r thei r AD L o r EC M forms . Th e for m o f th e
autocorrelation i n th e erro r proces s i n (31 ) an d (32 ) i s crucia l t o thi s
comparison. Fo r som e specification s o f th e erro r process , a dynami c
15
Whil e i t i s possible t o deriv e exac t expression s fo r th e biase s i n finit e sample s t o an y
desired leve l o f accuracy , usin g Edgeworth-typ e expansions , thi s i s a complicate d pro cedure .

240 Co-integratio

n i n Individua l Equation s

regression equatio n implicitl y perform s th e sam e correction s a s thos e


achieved b y the non-parametri c correctio n terms . Th e long-ru n estimate s
obtained fro m thi s properly specifie d dynamic equation ar e the n equivalent, asymptotically , t o th e non-parametricall y correcte d estimates. 16 I n
such cases , therefore , tw o way s o f incorporatin g informatio n abou t th e
marginal process (tha t is , th e proces s generatin g y^t) presen t them selves: non-parametri c correction , o r dynami c specification . However ,
for othe r specification s o f th e autocorrelatio n proces s a single-equatio n
dynamic regressio n ma y fai l t o achiev e efficiency , o r eliminat e th e
effects o f second-order bias , regardles s o f th e richnes s o f th e parameter ization, owin g t o a failur e o f th e conditionin g variables t o b e weakl y
exogenous fo r the parameter s o f the dynami c equation.
Our theoretica l discussio n i s based o n Phillip s (19880) . Althoug h i t is
fairly straightforwar d to describ e an d categoriz e th e circumstance s unde r
which dynami c single-equation estimate s wil l perfor m well , th e detaile d
theoretical backgroun d fo r thi s descriptio n i s length y an d complex .
Readers intereste d i n implementin g th e non-parametri c correction s ar e
referred t o th e paper s b y Phillip s an d hi s co-author s cite d previously .
We shal l focus on presentin g th e argument s intuitivel y and wil l illustrat e
the theoretica l analysi s wit h tw o simulatio n exercises , th e firs t take n
from Phillip s and Hanse n (1990) , an d th e secon d fro m Gonzal o (1990) .

7.8. A Fully Modifie d Least-square s Estimato r


Consider th e data-generatio n proces s give n b y (31 ) an d (32 ) an d
disregard, fo r th e moment , th e precis e autocorrelatio n structur e o f
u ( = [],, 2 f]' Assum e onl y tha t u ( i s weakly stationary with it s mean
vector an d long-ru n covarianc e matri x give n b y [0,0] ' an d S 2 respect ively, wher e i H = {a)y}y = 12 . 17 Th e followin g decompositio n o f th e fl
matrix i s usefu l i n understandin g it s structure : Q = V + F + F" , wher e
V = [u 0uo] an d r = 2)/t= i<[ u o u it]- Thus , i f th e u proces s i s seriall y
uncorrelated an d stationary , the S 3 matri x is the usua l covariance matrix.
In th e presenc e o f seria l correlation , additiona l term s i n th e for m o f T
need t o b e incorporated . Th e appendi x explains the derivatio n of J2 .
16
I n Ch . 3 we compare d th e performanc e o f th e AD F tes t wit h th e performanc e o f th e
non-parametrically correcte d D F test . Th e tw o test s wer e equivalen t asymptotically , i n
their abilit y t o mo p u p th e effect s o f residua l autocorrelatio n i n th e D F regression , bu t
they coul d behav e quit e differentl y i n finit e samples . Th e sam e comparison s appl y here .
Even whe n a particula r dynami c specificatio n estimato r i s asymptoticall y equivalen t t o a
non-parametric correction , i t i s stil l o f interes t t o compar e th e performance s o f th e
estimates obtaine d fro m each method .
17
Th e long-ru n covarianc e matri x i s give n b y 2irf uu(Q), wher e f m(0) i s th e spectra l
density matri x o f u , evaluate d a t zero . Th e correction s discusse d belo w ar e terme d
'non-parametric' becaus e consisten t estimate s o f thi s covarianc e matrix , an d o f relate d
matrices, mus t b e obtaine d non-parametrically .

Co-integration in Individual Equation s 24

The full y modifie d least-squares estimato r o f [3 takes th e for m

In (33)-(36) , S + i s a bia s correctio n term , fi>21 and fi)22 are consisten t


estimates o f th e correspondin g element s i n th e long-ru n covarianc e
matrix, an d A i s a consisten t estimat e o f A . Unde r quit e genera l
conditions,

The notatio n BM(12 U 2) i s used t o denot e a bivariat e Brownia n motion


process wit h covarianc e matri x S2n. 2 an d i s a matri x generalizatio n o f
scalar Wiene r processes, a s discussed i n Chapter 6 . The limitin g distribution (37 ) is a covariance matri x mixture of normals (see Table 3.3).
The 'ful l modification ' i n (33 ) achieve s tw o notabl e aims . First , b y
taking accoun t o f an y seria l correlatio n i n th e residuals , th e bia s
correction ter m 6 + mitigate s th e effect s o f second-orde r bias . Second ,
the correction s fo r long-ru n simultaneit y i n th e syste m mad e b y usin g
yit (i n plac e o f yi t) permi t th e us e o f conventiona l (asymptotic )
procedures fo r inference . Thus , definin g th e full y modifie d standar d
error b y s+ where ,

where o)
result:

112

i s a consisten t estimato r o f ft>ii. 2, w e hav e th e following

242 Co-integratio

n in Individua l Equations

Phillips an d Hanse n (1990 ) sho w tha t thi s approac h i s asymptoticall y


equivalent t o system s procedure s suc h a s ful l maximu m likelihoo d
estimation discusse d i n Chapte r 8 . Bot h (38) , which simplifie s th e
process o f inference , an d th e reductio n i n th e second-orde r bia s i n /3 +
help estimatio n an d testin g o f singl e equation s i n co-integrate d systems .
Our us e o f a simpl e data-generatio n process i s solely for th e purpose s o f
exposition; th e literatur e t o whic h w e hav e referre d i s capabl e o f
treating co-integrated system s at a high level of generality.

7.9. Dynami c Specificatio n


Is i t possible , b y suitabl e dynami c specification alone , t o mak e th e sam e
corrections a s those mad e b y the techniqu e describe d above ? I n orde r t o
answer thi s question , Phillip s (1988a ) consider s a dynami c versio n o f
equation (31):

yit = /3y 2t + r% + ? (39

where x t i s a vecto r wit h jointl y stationar y elements . Thus , x t contain s


lagged value s o f A_y l r an d curren t an d lagge d value s o f Ay 2 r . Whil e
far fro m bein g a genera l dynami c model , (39 ) i s a linear-in parameters AD L model .
The proces s o f constructin g a regressio n equatio n suc h a s (39 ) ha s
been extensivel y discusse d i n th e literatur e (see , i n particular , Engl e e t
al. 1983) . Thus , focusin g o n th e DG P give n b y (31 ) and (32 ) and
imposing no restrictions upo n the autocorrelatio n structur e o f the u it,

where %F f-i ' s th e informatio n se t containin g informatio n o n pas t


realizations o f y lt, y 2t an d henc e o f ,,_/ , / = 1 , 2 ; / 5 = 1. B y construc tion, {rj t} i n (40 ) is a martingale difference sequence .
If th e process generatin g u r i s now specialized t o th e cas e wher e i t is a
linear process , s o that

where
The varianc e o f v} t i s give n b y cr n 2 = a\\ O2io22, and r] t i s orthogona l
to 2, as well a s t o th e entir e histor y o f e, given b y (f,_i , r _ 2 > )
18

Not e that = {<7,y}j,,i, 2 an d I z i s th e ( 2 x 2 ) identit y matrix .

Co-integration i n Individual Equations 24

Estimating th e regressio n (39 ) is asymptotically equivalen t t o maxi mizing th e conditiona l likelihoo d functio n o f (s u, e 12, . . ., CIT), give n
(s2t, t = 1, 2, . . ., T) . Assumin g invertibility of A(L) in (41), we have

Equation (42 ) implie s


and

Thus, t o maximize the conditiona l likelihood require s

which involves
}

Solving (43 ) is equivalen t t o least-square s estimatio n o f th e regression


model
vt; (44
)
where d^L) = 2Jc=ldliL> , d 2(L) = ^0d2jL>, an d v t ~ IN(0, a u.2)
which is independent o f the regressors .
It is then possible t o sho w that

(45)
where / ? i s th e estimat e o f th e coefficien t o f y 2t i n (44) . Bv(r) an d
B2(r) compris e a bivariate Brownian motion process with a well-defined
variance-covariance matrix .
The questio n pose d a t th e beginnin g o f thi s sub-sectio n ca n no w b e
answered. Comparin g (37 ) and (45) , the full y modifie d estimato r fi +
and th e dynami c single-equation least-squares estimator ar e equivalen t if
and onl y i f B v(r) = BI ,2(r). Thes e tw o Brownia n motion processe s ar e
not necessaril y equa l t o eac h other . Thi s i s becaus e B v(r) ca n b e
correlated wit h B 2(r), despit e it s constructio n i n (40) . The generatin g
mechanism fo r u 2t ma y therefor e b e informative , and optima l inference
then require s join t estimatio n wit h th e error-correctio n model . Phillip s
(1988) describe s thi s a s a failur e o f wea k exogeneit y or vali d conditioning. If , o n th e othe r hand , B v(r) an d B 2(r) ar e uncorrelate d a t al l
frequencies, th e conditiona l proces s i s completel y informativ e fo r th e
purposes o f estimation o f f t an d th e margina l process generating u2t ma y
be ignored . In suc h a case, B v(r) B\ 2(r).

244 Co-integratio

n i n Individual Equations

The example s followin g thi s sub-sectio n wil l elaborat e upo n thes e


conditions, bu t w e wil l clos e thi s sectio n wit h a n interpretation . Th e
non-equivalence o f th e dynami c regressio n estimato r an d th e full y
modified estimato r arise s fro m possibl e correlatio n betwee n th e residual s
r)t o f th e conditiona l proces s an d th e residual s u 2t o f th e margina l
process. Thi s correlatio n arise s because, althoug h t] t i s orthogonal t o u 2t
and th e pas t histor y o f u 2t (t] t i s orthogona l t o it s ow n pas t b y
construction), u 2t i s no t necessaril y orthogona l t o th e pas t o f u\ t an d
hence (r\ t, u 2t)' jointl y is not a martingale difference sequenc e (MDS) .
Three example s ar e presente d below . The y ar e adapte d fro m Phillip s
(1988a) an d ar e specia l case s o f th e example s appearin g i n tha t paper .
Three differen t specification s o f th e autocorrelatio n structur e o f th e u ,
process ar e considere d whil e the data-generatio n proces s continue s t o b e
(31) an d (32) .
The example s hel p t o integrat e an d interpre t th e discussion s o n wea k
exogeneity, dynami c modelling, and full y modifie d estimation. Exogene ity play s a n importan t rol e i n dealin g wit h non-stationar y variables .
Dynamic regressio n equation s i n whic h the conditionin g is on weakl y or
strongly exogenou s variable s (fo r th e parameter s o f interest ) provid e
asymptotically unbiase d estimates. Further , inferenc e ma y b e conducte d
with standar d tables . I n case s wher e suc h conditionin g i s no t possible ,
improperly conditione d equation s lea d t o inefficien t an d biase d esti mates. Th e ful l syste m mus t therefore b e estimate d o r th e non-paramet rically modifie d estimate s used . I t i s see n tha t full y modifie d estimation
is anothe r wa y o f addressin g th e issu e of the completenes s o f conditiona l
models fo r purpose s o f estimatio n an d inference .
7.10. Example s
7.10.1. Example (Phillips 1988a: 352)

In reduce d form , th e DG P (31 ) an d (32 ) is given by

Hence

Co-integration in Individual Equations 24

Thus, usin g th e formul a fo r th e conditiona l expectatio n o f bivariat e


normal rando m variables , w e have
Defining
and usin g (48), we obtai n
or, alternatively ,
where
Finally, substitutin g for
Several feature s ar e no w evident . B y construction , j\ t i s a n MDS.
Second, agai n by construction , r\ t i s uncorrelated wit h u 2t.19 Fro m (47),
we hav e tha t th e u 2t proces s i s serially uncorrelate d bot h wit h pas t u 2t
and wit h pas t w l f . I t follow s tha t r\ t an d u 2t ar e incoheren t (tha t is ,
uncorrelated a t al l lag s o r frequencies) , tha t th e long-ru n covarianc e
matrix o f [r] t, u 2t]' i s diagonal , an d tha t th e estimatio n o f a singl e
dynamic equatio n should provid e a full y efficien t an d unbiase d estimat e
of th e vector a .
Looking a t th e conditiona l an d margina l processe s give n b y (50 ) and
the secon d equatio n i n (46) respectively, and a t th e propertie s identified
in th e previou s paragraph , single-equatio n leas t square s o n (50 ) i s
equivalent t o full-informatio n maximu m likelihood fo r estimatin g y3 . Th e
orthogonality o f th e r) t an d u 2t processe s ensure s tha t th e join t likeli hood functio n fo r th e syste m factorize s into th e likelihoo d function s fo r
the margina l an d conditiona l model s give n b y th e secon d equatio n i n
(46) an d (50 ) respectively. Ther e ar e n o cross-equatio n restrictions ; th e
parameter o f interes t /3 ca n b e estimate d an d identifie d fro m (50 ) alone;
and, recallin g th e discussio n o f wea k exogeneit y i n Chapte r 1 , th e
marginal proces s generatin g u 2t nee d no t b e modelle d whe n estima ting 13.
7.10.2. Example (Phillips 1988a: 355)

where,

246 Co-integratio

n i n Individua l Equations

Then

The long-ru n covariance matrix of (rj t, u 2t)' i s given by

where CTH 2= au - o\ 2a22. The expression fo r Sln.2 follow s from appli cation o f th e conditional-expectation s formul a an d fro m inspectio n o f
(53). t], an d u 2, ar e agai n incoherent , an d th e limi t Brownia n motion s
are

where B n an d B 2 ar e independen t an d 5, , = BI 2 . Thus , estimatin g a


dynamic single-equatio n mode l (th e conditional model ) provide s esti mates identical , asymptotically , t o thos e provide d b y th e Phillips Hansen procedure . Her e th e conditiona l mode l is given by
In error-correctio n format , we may rewrit e (54) a s

Equation (54 ) is th e on e tha t mus t b e estimate d i n orde r t o obtai n a n


asymptotically unbiase d estimato r o f 13. Th e static regressio n i s augmented i n (54 ) by th e term s Ay 2 r an d Ay 2 r _j. Thes e additiona l term s
are incorporate d t o reduc e o r eliminate , in finit e samples , th e effect s o f
second-order bias , without estimating the ful l system .
Phillips (1988fl ) note s tha t th e bia s correctio n ter m d + fo r thi s
example i s equal t o zer o sinc e A = ( 12, ft>22)'. However, t o obtai n full y
modified estimates , fro m (34 ) y^ need s t o b e correcte d fo r long-ru n
endogeneity a s follows :
The sam e correctio n i s achieve d i n th e dynami c regression b y th e tw o
Ay 2 r -/ term s i n (55) . The static regressio n produce s biase s b y ignoring
these corrections .
7.10.3. Example (Phillips 1988a: 356)

Co-integration in Individua l Equations

247

We tak e th e proces s (e lt , e 2t)' t o b e distribute d a s i n Sectio n 7.10.2 .


Then i t may be show n that
The long-ru n covariance matrix is given by

where a 11-2 is as defined i n Sectio n 7.10.2, an d

The Brownia n motion s B^ an d B 2 ar e correlate d an d th e single equation dynami c estimato r an d th e full y modifie d estimato r ar e n o
longer equivalent , unles s $ 21 =0. Fo r th e structur e o f th e correlatio n
between B n an d B 2 (se e Phillips 1988a):
where B^ 2(r) i s a univariat e Brownia n motio n proces s wit h varianc e
given by crn 2 - oli^d^H' 1 an d is independent o f B2(r). Further ,

From (58 ) setting 9 2\ equa l t o zer o make s th e B^r) an d


equivalent t o eac h other . Further , B^^r) ha s a variance o f on 2 an d is
in al l respect s equivalen t t o th e S 12 (r) proces s give n i n (37 ) above.
Thus, th e B n(r) an d B i2(r) processe s ar e equivalent , and , in accord ance wit h th e previou s discussion , thi s equivalenc e lead s t o th e equival ence o f th e single-equatio n dynami c estimato r an d th e full y modifie d
estimator.
It shoul d b e note d tha t # 21 = 0 also implie s that th e T-typ e term s (see
Section 7.8 ) are importan t i n th e long-ru n varianc e matri x fo r th e (TJ ( ,
M
2()' process . Thi s i s jus t anothe r wa y o f sayin g tha t th e pas t o f th e
process i s importan t (an d so, i n th e (rj t, u 2t)' constructio n w e hav e no t
achieved a martingal e differenc e sequence) . Thus , th e equivalenc e o f
dynamic single-equatio n estimator s an d full y modifie d estimator s ma y
also b e assesse d b y lookin g fo r th e presenc e o f T-typ e term s i n th e
long-run varianc e matrix . Thes e ar e th e term s (fo r example, th e firs t
term i n (59) ) tha t giv e ris e t o biase s i n th e single-equatio n dynami c
estimates o f the co-integratin g vector.
The necessar y an d sufficien t conditio n fo r non-equivalenc e ha s a
natural interpretatio n i n th e languag e of a n earlie r literatur e o n dynamic

248 Co-integratio

n i n Individua l Equation s

modelling. I t i s eviden t tha t th e conditio n 621 ^ 0 violate s wea k exo geneity20 a s ma y b e verifie d fro m (57) ; an d onc e again , i t ma y b e see n
that th e issue s o f a full y modifie d estimation an d dynami c specification
are closel y related . Thi s exampl e form s th e basi s fo r th e simulatio n
exercise discusse d i n the fina l sub-section .
7.10.4. Simulation Example (Phillips and Hansen 1990: 116)
The data-generatio n proces s fo r thei r simulation study is given by

The desig n o f th e experimen t consiste d i n allowin g o 2\ an d 0 21 t o


vary. Thus , fou r value s o f a 21 an d thre e value s o f 0 21 wer e used . Th e
values o f CT21 considered wer e -0.8 , -0.4 , 0.4 , an d 0.8 , an d th e thre e
values o f th e moving-averag e parameter 0 21 were 0.8 , 0.4 , an d O.O. 21 f t
was se t equa l t o 2 fo r al l twelv e combinations o f th e value s o f 02 1 an d
02i- Th e ai m wa s t o calculat e an d compar e th e distribution s of estima tors an d /-statistic s fo r th e co-integratin g parameter obtaine d b y OLS ,
single-equation dynamic , and full y modifie d methods.
For th e full y modifie d method , Phillip s an d Hanse n use d a Bartlet t
triangular windo w of lag length 5 and th e OL S residuals u lt t o calculate
non-parametric estimate s o f A , J 2 an d henc e o f d +. W e shal l denot e
these estimate s b y A , fi , an d < 5 +. Th e OL S f-statisti c wa s estimate d b y
using St u (th e (1,1 ) elemen t fro m th e non-parametricall y estimate d
long-run varianc e matrix ) a s a n estimat e o f th e standar d error . Th e
dynamic equatio n regresse d y lt o n (v 2 t , Ay 2 < , Ay 2 ,_i, Ay 2 ( _ 2 , A y l r _ l 5
Ayif- 2 ), usin g 30,000 replication s fo r eac h simulatio n (tha t is , fo r eac h
pair o f values o f (0 21, <T 21)). Give n th e natur e o f th e DGP , th e dynami c
20
Th e correlatio n betwee n u 2, and lt-i, introduced b y a non-zer o valu e o f 0 21, implie s
that th e margina l proces s contain s informatio n relevan t fo r th e r\, process . Thus , th e
construction o f rj, i n thi s exampl e doe s no t lea d t o a complet e purgin g o f informatio n
relating t o th e margina l process , an d efficien t estimatio n therefor e require s tha t th e u 2,
also b e modelled . Thi s i s of course jus t a restatement o f the weak-exogeneit y condition .
21
I n ligh t o f the simulatio n finding s presente d b y Schwer t (1989) , thi s experimen t desig n
is ope n t o on e criticism . N o negativ e value s o f th e moving-averag e paramete r wer e
considered. Negativ e M A value s wer e th e one s tha t wer e mos t troublesome , fo r th e
Phillips-Perron non-parametri c corrections , i n the stud y by Schwert .

Co-integration i n Individua l Equation s 24

equation i s only a n approximation . Th e result s ar e presente d i n Table s


7.11 an d 7.12.
Table 7.1 1 present s th e Mont e Carl o mean s an d standar d deviation s
of (/ 3 - j8 ) fo r th e OLS , dynami c (D) , an d full y modifie d (FM )
estimators. I t i s clea r fro m thi s tabl e that , i n general , OL S give s th e
most heavil y biase d estimator . However , ther e seem s t o b e littl e t o
choose betwee n th e full y modifie d estimato r (FM ) and th e dynami c
estimator (D) . No consisten t patter n o f superiorit y (measure d i n term s
of lowe r absolute value s o f biase s an d standar d errors ) appear s t o b e
present.
Consider firs t th e cases where 0 2i ^ 0. F r a value of a2i = -0.8, D is
more biase d tha n F M when 0 21 = 0.8 but i s less biase d whe n 0 21 = 0.4.
When cr 2i = 0.8, the opposit e i s true. Fo r th e tw o intermediate values of
a21 considered , th e bia s i n F M i s les s tha n o r equa l t o th e bia s i n D .
Thus, althoug h F M provide s lowe r biase s i n a large r numbe r o f case s
than D , th e evidenc e i s mixed . Fo r th e case s wher e 0 21 = 0, an d th e
single-equation an d full y modifie d estimato r hav e distribution s that ar e
asymptotically equivalent , D out-perform s F M o n thre e ou t o f fou r
occasions. I n th e onl y case wher e FM provide s a lower bia s (cr 21 = 0.4) ,
the D an d F M biase s ar e 0.00 9 an d 0.00 4 (i n absolut e value ) respect ively. Thi s compariso n i s therefor e mad e i n th e contex t o f bot h D an d
FM performing well in providing low biases.
TABLE 7.11. Mea n (standar d deviation) of ( p /?)
02i = 0. 4 02

021 = -0. 8

i = 0. 0

OLS
D
FM
0-21 = -0. 4
OLS
D
FM

-0.137 (0.125 )
-0.062 (0.106 )
-0.025 (0.127 )

-0.090 (0.089 )
-0.021 (0.066)
-0.028 (0.079 )

-0.055 (0.061 )
-0.003 (0.041 )
-0.025 (0.052 )

-0.067 (0.081)
-0.051 (0.086 )
-0.042 (0.094 )

-0.057 (0.079 )
-0.030 (0.077 )
-0.027 (0.081 )

-0.040 (0.061 )
-0.007 (0.060 )
-0.015 (0.063 )

OLS
D
FM
0-21 = 0. 8
OLS
D
FM

-0.024 (0.040 )
-0.023 (0.046 )
-0.023 (0.048 )

-0.020 (0.046 )
-0.019 (0.053 )
-0.012 (0.052 )

-0.011 (0.050)
-0.009 (0.060 )
0.004 (0.060 )

-0.015 (0.025 )
-0.009 (0.024 )
-0.016 (0.028 )

-0.010 (0.028 )
-0.008 (0.030 )
-0.005 (0.030 )

-0.004 (0.036 )
-0.005 (0.039 )
0.015 (0.043 )

a21 = 0.4

Reproduced fro m Phillip s an d Hanse n (1990) .

250 Co-integratio

n i n Individua l Equations

Phillips an d Hanse n als o estimat e th e probabilit y densit y function s for


the D an d F M estimators. When 6*2 1 = 0.8, th e densit y functio n o f FM is
better centre d (a t zero ) tha n D , whil e th e opposit e i s tru e whe n
021 = 0. 22

Based the n o n a consideratio n o f biases alone , ther e d o no t appea r t o


be stron g ground s fo r preferrin g th e F M ove r th e D estimator . Thi s
observation mus t b e qualifie d b y tw o cautionar y remarks . First , th e
experiment considere d her e i s ver y limited , an d a mor e extensiv e
simulation exercis e migh t revea l th e superiorit y o f F M ove r D . Second ,
both F M (i n finit e samples ) an d D coul d b e out-performe d b y full information maximu m likelihoo d estimatio n o f th e two-equatio n system.
Such a comparison i s undertaken i n the nex t chapter .
The argumen t i n favou r o f usin g full y modifie d method s i s stronge r
when on e consider s th e evidenc e presente d i n Tabl e 7.12 , wherei n th e
means an d standar d deviation s o f th e distribution s of th e f-statistic s ar e
tabulated. Her e th e conclusion s ar e mor e nearl y unambiguous . Whe n
02i ^ 0, th e f-statistic s fro m D ar e mor e heavil y biase d tha n thos e
obtained fro m F M (i n all but on e case ) an d hav e highe r standard errors .
The F M /-statisti c come s muc h close r t o achievin g a distributio n that i s
roughly norma l tha n doe s th e /-statisti c fro m D . A s note d b y Phillip s
and Hansen , th e relativel y inferio r behaviou r o f th e dynami c /-statisti c
may hav e bee n cause d b y th e inclusio n of a n insufficien t numbe r o f la g
terms i n th e D regression . Whe n 6> 2i = 0, th e dynami c /-statisti c i s
substantially les s biase d (i n al l but on e case ) tha n th e F M /-statistic , bu t
its variance i s much higher.
Since th e us e o f th e norma l distributio n i s a considerabl e simplifica tion an d th e bia s comparison s ar e a t bes t ambiguou s fo r th e dynami c
estimates (whe n $2 1 ^ 0) > ther e ma y b e reason s t o prefe r th e F M
estimator over th e D estimato r whe n onl y long-ru n parameter s ar e o f
interest. Thi s recommendatio n mus t b e qualifie d b y noting tha t a mor e
richly parameterize d dynami c mode l ma y hav e provide d lowe r biase s
and a distributio n o f th e /-statisti c close r t o th e norma l distribution .
Performance wit h a negativ e M A paramete r i s als o important ; som e
early studie s hav e suggeste d tha t th e F M estimato r perform s less well in
such cases . Bot h thes e qualification s poin t t o th e nee d fo r mor e
extensive simulation studies .
What i s clea r fro m al l th e studie s considere d s o fa r i s th e poo r
performance o f unmodifie d estimate s derive d fro m stati c regressions .
Some for m o f incorporatio n o f th e dynami c structur e o f th e data generation process , eithe r b y mean s o f a non-parametri c correctio n o f
the stati c regressio n estimate s o r b y runnin g dynami c regressions , i s
22
Phillip s an d Hanse n rationaliz e thi s behaviou r b y statin g tha t 'whe n thi s conditio n
[02i = 0] doe s hold , th e parametri c natur e o f th e [dynamic ] metho d give s i t a natura l
advantage ove r ou r semi-parametri c approach ' (1990 : 119) .

Co-integration i n Individual Equation s

251

TABLE 7.12. Mea n (standar d deviation ) of

02i = -0. 8

OLS
D
FM
021 = -0. 4
OLS
D
FM

02! = 0. 4

OLS
D
FM

CT21 = 0. 8

OLS
D
FM

02i = 0. 4

92i = 0. 0

-1.616 (1.268)
-1.259 (2.040 )
-0.388 (1.432 )

-1.240 (1.105 )
-0.563 (1.701 )
-0.449 (1.092 )

-0.930 (1.00 )
-0.003 (1.40)
-0.025 (0.896 )

-1.156 (1.32)
-1.058 (1.69)
-0.729 (1.49 )

-0.986 (1.25)
-0.636 (1.57 )
-0.516 (1.35 )

-0.754 (1.149)
-0.163 (1.388)
-0.335 (1.193)

-0.711 (1.19)
-0.664 (1.29 )
-0.606 (1.26 )

-0.520 (1.21)
-0.478 (1.34 )
-0.267 (1.30 )

-0.267 (1.24 )
-0.213 (1.37)
0.096 (1.36 )

-0.575 (0.955 )
-0.445 (1.15)
-0.519 (0.922 )

-0.302 (0.979 )
-0.339 (1.25 )
-0.102 (0.962 )

-0.098 (1.04 )
-0.184 (1.36 )
0.418 (1.12 )

Reproduced fro m Phillip s an d Hanse n (1990) .

necessary fo r inference . Whil e super-consistenc y theorem s sho w tha t


1(0) term s ma y b e ignore d asymptotically i n regression s wit h 1(1 )
variables, thes e asymptoti c result s hav e littl e bearing , o n sampl e size s
common i n econometrics , wher e 1(0 ) term s ar e importan t an d nee d t o
be accommodated .
The othe r importan t issu e raise d b y thes e example s i s th e wea k
exogeneity o f th e conditionin g variable s fo r th e parameter s o f interest .
Reconsider th e DG P i n (31 ) and (32 ) where u t i s a first-orde r auto regressive process, s o that a finite la g length dynamic model is valid:

where

Then
or

252 Co-integratio

n i n Individua l Equation s

in term s o f 1(0 ) variables . Le t []. < |e2f ] = u a222t = Y2t s o

Further, assum e tha t 0 = (ft* : a : ft : )' denote s th e parameter s o f


interest, an d indee d tha t 6 i s bot h constan t an d invarian t t o regim e
shifts affectin g Ay 2 ( . Nevertheless , althoug h (61 ) appear s t o defin e a
valid conditiona l mode l fo r al l value s o f 0 , i f c 21 = 0 the n Ay 2 , i s no t
weakly exogenou s fo r 6 . Becaus e o f th e resultin g non-diagonality o f th e
long-run c o variance matrix , thi s los s o f wea k exogeneit y ca n hav e a
detrimental impac t o n th e bia s an d efficienc y o f th e least-square s
estimator o f 9 in finit e samples .
In fact , c 21 = 0 jointly violates th e wea k an d stron g exogeneit y o f y 2f
for 0 . To sor t ou t whic h aspect i s dominant, thre e case s meri t comment :
the followin g implication s ar e base d o n Mont e Carl o studie s o f (61) .
First, eve n i f y = 0 , s o tha t ther e i s n o simultaneit y an d 13* = p , th e
previous conclusio n holds . Second , i f y = 0 wherea s c 21 = 0 , y 2r i s
strongly exogenou s fo r 6 an d n o problem s result . Finally , i f stron g
exogeneity alon e i s violated , bu t wea k exogeneit y holds , a s woul d
happen i f A y l r _ j directl y affected Ay 2 , whe n c 21 = 0 , ther e ar e agai n no
serious bia s effects . Thus , th e presenc e o f th e co-integratin g vecto r i n
another equatio n appear s t o b e th e primar y determinan t o f th e finite sample bias . Consequently , co-integratio n force s a renewe d emphasi s on
systems method s i f potentiall y misleadin g inferences ar e t o b e avoided .
That i s the focu s o f Chapte r 8 .

Appendix: Covarianc e Matrice s


Consider th e DG P i n (Al ) wher e y, is th e stationar y first-orde r vecto r
autoregressive process :
y r = Ay,_ i + e , wher e e t ~ IN(0 , S), (Al
)
and al l th e laten t root s o f A li e insid e th e uni t circle . Ther e ar e thre e
distinct c o variance matrice s relevan t t o th e analysis , a s follows , notin g
that (y f ) = 0.
(a) Th e conditional (o r contemporaneous) covariance matrix

Co-integration i n Individual Equations

253

(b) Th e unconditional covariance matrix

obtained a s show n b y substitutin g (Al ) fo r y t, multiplyin g out , an d


using stationarity . Th e element s o f G ca n be obtaine d b y vectoring (A3 )
and solving .
(c) Th e long-run covariance matrix
Consider th e finit e sampl e expression , analogou s t o E[T~~ 1S2T] i n th e
scalar case :

Rewriting 2 as (I - A)( I - A)^ : G + A + A' + G(I - A')' 1 ^ ~ A') - G,


on simplifyin g we have that :
However, a mor e convenien t for m o f Q , directl y relate d t o th e spectra l
density a t the origin , result s fro m (A3) :

(A5)
1

so tha t o n pre-multiplyin g E b y ( I -A)" an d post-multiplyin g b y


(I - A')" 1 and using (A4):

254 Co-integratio

n i n Individua l Equation s

Similar principle s appl y t o derivin g thes e thre e matrice s i n mor e


general weakl y stationar y processes . A s a secon d example , i f (Al ) i s
altered t o th e first-orde r moving average:
then, usin g j>t-i t o denot e availabl e information:

and:
(A10)
Following Phillip s an d Durlau f (1986) , consider a genera l 1(1 ) vecto r
process:
and v t i s a weakl y stationar y stochasti c proces s wit h unconditiona l
covariance E(v tv't) = G an d long-ru n covarianc e Q = G + A + A'. Fro m
(A4), A ca n be writte n as:

Extending th e analysi s in Chapte r 3 to allo w fo r vecto r processes , an d


in th e appendi x t o Chapte r 6 t o allo w fo r non-II D errors , x r /Vr
converges t o the vecto r Brownia n motion BM(fi) :

Then:

These vector formula e could b e standardize d usin g V(r) = K'B(r) wher e


fi" 1 =KK'.

Co-integration i n System s of
Equations
We hav e s o fa r considere d onl y single-equatio n estimatio n an d
testing. Whil e th e estimatio n o f singl e equation s i s convenien t an d
often efficient , fo r som e purpose s onl y estimatio n o f a syste m
provides sufficien t information . Thi s i s true, fo r example , whe n we
consider th e estimatio n o f multipl e co-integratin g vectors , an d
inference abou t th e numbe r o f suc h vectors. Traditionally , system s
have bee n estimate d whe n ther e i s a failur e o f weak exogeneit y i n
a singl e equation , an d thes e consideration s als o appl y here . Thi s
chapter examine s method s o f findin g th e co-integratin g rank ,
considers eircumstance s whe n dynami c single-equatio n method s
will be asymptoticall y equivalen t t o system s methods , an d provide s
examples t o illustrat e thes e issues . Asymptoti c distribution s ar e
also derived .
In earlie r chapters , w e investigate d dat a serie s containin g uni t root s i n
their scala r autoregressiv e representation s (i.e . thei r margina l distribu tions), an d denote d suc h serie s a s 1(1). I n thi s chapter w e will consider a
vector tim e serie s of dimensio n n, a, = (*u,*2o > x nt)' (generalizin g
the analysi s t o an y numbe r o f variables) , wher e x , i s 1(1 ) s o tha t Ax r i s
1(0). Generally , an y arbitrar y linea r combinatio n o f th e element s o f x f ,
say w ( = a'x t, wil l als o b e 1(1) , an d suc h linea r combination s impl y o r
give ris e t o spurious regressions. However , ther e ma y exis t vector s a ,
such tha t
in whic h case th e relevant component s o f \t are co-integrated .
In th e simples t bivariat e case , a s w e hav e seen , w e ma y tak e
xf = (y t, z ty, wher e y t an d z t ar e individuall y 1(1). Th e arbitrar y linea r
combination (y, - Kz t) wil l als o b e 1(1) , bu t i f there exist s a value i q of
K suc h tha t (y, - jqz, ) ~ 1(0) , the n y t an d z t ar e co-integrated . Lettin g
a{ = (1, iq) b e th e co-integratin g vecto r i n thi s case , a^ mus t b e
unique, sinc e fo r an y othe r valu e K*, the n y t K*zt = yt ~ *q r +
(jq - K*)z t = w t + (KI n*)zt, whic h i s the su m of a n 1(0 ) proces s an d
an 1(1) process, an d therefor e 1(1 ) unless j q = ie* .

256 Co-integratio

n i n System s o f Equations

For n element s i n x t ~ 1(1) , ther e ca n be , a t most , n 1 co-integrating combinations. l Henc e 0 ^ r ^ n I an d th e r vector s ma y b e


gathered i n a n n x r matri x = [ 1; 2, . . ., a,.] . Outsid e th e bivariat e
model, n > 2 an d th e co-integratin g matri x i s n o longe r uniqu e i n th e
absence o f prio r information . W e note d i n Chapte r 2 th e relate d issu e
for stationar y equilibria , onl y som e o f whic h nee d correspon d t o
substantive economi c hypotheses .
A simpl e cas e of non-uniquenes s occur s whe n subset s of the Xj t are
co-integrated. I n fact , fo r an y non-singula r r x r matri x F ,
wf = Fa'x t = a*'x t i s als o 1(0) . Thi s las t resul t show s tha t linea r combi nations o f th e co-integratin g vector s themselve s for m co-integratin g
combinations. Sinc e a)x r an d a-x , ar e 1(0) , s o i s any linea r combinatio n
thereof. I n th e terminolog y o f linea r algebra , th e dimensio n o f th e
co-integrating spac e (give n b y th e ran k o f th e matri x a ) i s r an d th e
columns o f form th e basis vectors of this space . Pre-multiplyin g ' b y
an r x r non-singula r matri x F doe s no t alte r eithe r th e co-integratin g
space o r it s dimensions . Therefore , strictl y speaking , estimatin g th e
co-integrating matri x essentiall y involve s derivin g th e basi s vectors .
The matri x a i s non-unique in the absenc e o f prior information .
A brie f justificatio n may b e offere d fo r focusin g on th e ope n interva l
(0, n) o f N , a s the domai n o f values for r. When r = n, x, must b e 1(0) ,
as show n in Sectio n 8. 1 below . W e therefor e exclud e thi s case whe n we
know tha t \ t i s 1(1 ) an d onl y conside r stochasti c processe s wher e
variables ar e marginall y 1(1) . Thus , n r > 0, an d w e ca n re-expres s
the proces s {x,} i n term s o f 1(0 ) processes , usin g th e r co-integratin g
relationships an d n r firs t difference s o f th e process . Th e cas e o f r = 0
is a trivia l on e a s i t implie s th e absenc e o f eve n a singl e co-integratin g
vector an d suggest s respecification of th e syste m in differences.
As w e sa w i n Chapte r 5, Engl e an d Grange r (1987 ) establishe d a n
isomorphism betwee n co-integratio n an d error-correctio n models . I n
order t o examin e co-integratio n i n system s o f equations , w e wil l deriv e
that result , formulatin g the syste m in EC M form , i n som e detai l below ,
starting thi s time fro m th e moving-averag e representation o f the process .
From tha t system , a maximu m likelihoo d estimato r (MLE ) o f r, th e
number o f co-integratin g relationships , wil l b e obtaine d base d o n a
method propose d b y Johansen (1988) . Thi s wil l i n turn enabl e u s to tes t
hypotheses concernin g th e dimensio n o f th e co-integratio n space , an d
establish a 'central value' o f a .
A proo f o f this result i s given i n Sect . 8.1.

Co-integration i n System s o f Equations 25

8.1. Co-integratio n and Erro r Correction


We no w retur n t o th e representatio n o f a co-integrate d syste m i n
autoregressive o r (equivalently ) i n error-correctio n form . Whe n {Ax, } is
a stationar y proces s (possibly ) wit h drift , w e ca n expres s i t a s a
multivariate movin g averag e usin g th e Wol d (1954 ) decompositio n
theorem:
where e , ~ IID(0 , ft) ; L i s agai n th e la g operator , an d C(L ) i s a
polynomial matrix in L give n by

The cumulativ e or tota l effec t fro m C(L ) i s given by

where th e C , agai n obe y a n exponentia l deca y conditio n o f th e for m


discussed i n Chapter 5 . Using C(l), w e can rewrite C(L ) as
where C*(L ) = Zr=oCfL' an d Cf= -E^+iC / s o that Cj f = ! - C(l) .
Note tha t th e existenc e o f thes e matrice s i s agai n guarantee d b y th e
exponential deca y condition. Thus , fro m (1) ,
or

where fi = C(l)m .
The ke y assumption s needed t o deriv e th e autoregressiv e representa tion o f th e proces s ar e give n below . A s i n Chapte r 5 , th e proo f follow s
Johansen (1991a) .
ASSUMPTION Bl. Th e characteristi c polynomial,

has root s eithe r equa l t o o r strictl y greate r tha n 1 ; tha t is ,


|C(z)| = 0 implies tha t eithe r \ z > 1 or z = 1.
ASSUMPTION B2. Th e matri x C(l ) ha s reduce d ran k n r an d i s
therefore expressibl e a s the produc t o f two n x ( n - r ) matrice s (j>
and tj, wher e ^ an d i\ have rank n r. Thus, C(l ) = <j)t]' .

258 Co-integratio

n i n System s o f Equations

ASSUMPTION B3. Th e r X r matri x 0j_C*(l)i/ i ha s ful l ran k r.

Assumptions B1-B3 are analogous t o Assumptions A1-A3 in Chapter 5 .


Given ou r result s o n C(l) in Chapter 5 , i t is natural t o requir e tha t C(l)
be o f reduced ran k an d have ran k n - r . Also, r = n implie s tha t C(l)
is identicall y th e nul l matrix . Thus , fro m (3) , (Ax, fi) = C*(L)Ae( ,
which implies , afte r integration , tha t x , i s integrate d (a t most ) o f orde r
0. Assumption B 3 then rule s out th e possibilit y that C*(L ) ha s a root o n
the uni t circle , s o x , canno t b e integrate d o f orde r 1. I n eithe r case ,
we hav e a contradictio n o f the assumptio n tha t th e component s o f x, ar e
1(1).
To deriv e th e autoregressiv e representation , multipl y (3 ) b y tjt' an d
0i respectivel y t o obtain th e equations

using th e decompositio n C(l ) = <$>r\' an d th e resul t tha t


The matri x C(l) is not invertibl e an d th e syste m given b y (4a) an d (4b)
therefore canno t b e inverte d directl y t o expres s th e x it i n term s o f th e
e,,. A n invertibl e syste m i s obtaine d b y defining , a s i n Chapte r 5 , tw o
new variables , w , = (i/'/)~ 1 ij'e r an d y, = (i/li/i^i/iAe,. Repeatin g th e
steps use d i n Chapte r 5 , th e matrice s fj an d i] L ar e define d a s
and i/^i/'ii/i)" 1 respectively . Next, agai n as in Chapter 5 ,
Substituting int o (4a ) an d (4b) give s

We therefor e hav e
with

For z = l , thi s matrix has determinant

Th e orthogona l complemen t of a matrix is defined in Sect . 5.3.1. Usin g this definition,


an d i/ ar e n x r dimensiona l matrices with rank r.

Co-integration i n System s of Equations 25

which i s non-zero , usin g Assumption s B 2 an d B3 . Thus , B(z ) does no t


have a root a t 1 . For \ z > 1,
where (7 ) ma y b e show n b y substitutin g fo r C*(z ) i n B(z ) in term s
of C(z ) an d C(l ) = (jtrj 1 , an d usin g th e orthogonalit y conditio n
tj>'4> = i f Iff = O r x ( n _ r ) . Fo r z > 1 , from (7),
Thus fo r z > 1, |B(z ) = 0 i f an d onl y if |C(z) [ = 0 . Excludin g z = 1 , by
Assumption B l th e onl y remainin g root s o f thi s determinan t li e outsid e
the uni t circle .
All the root s of |B(z) | = 0 are therefor e outsid e th e uni t disk , an d th e
system define d by (5a) an d (5b) i s invertible. Thus , fro m (6),
Also fro m (6) , note tha t

and, usin g the formul a for inversio n of partitioned matrices ,

From th e definitio n of Ae ?,

where F(L ) = [fj(l - L) , i


Integrating (9 ) gives
where x 0 is the constan t o f integration. T o deriv e the valu e of F(l), not e
that
Substituting fo r (B(l))- 1 fro m (8 ) give s F(l ) =
Thus, recallin g tha t fi = C(l)m = (>if') m> F(l)^ i = 0 n x l . Th e auto regressive representation , i n its fina l form , is therefore give n by

260 Co-integratio

n i n System s of Equations

Several feature s o f th e derivation s abov e ar e noteworthy , particularl y


with respec t t o th e F(l ) matrix. First , F(1)C(1 ) = C(1)F(1) = O n . Thi s
result follow s fro m substitutin g /j_(VlC*(l)ij 1 )^ 1 ^>j. fo r F(l ) and tfrtj'
for C(l ) and usin g the orthogonalit y conditions . Thi s re-emphasize s th e
duality, firs t mentione d i n Chapte r 5 , betwee n th e impac t matri x i n th e
MA representation , give n b y C(l) , and th e impac t matri x i n th e A R
representation, give n her e b y F(l). The nul l spac e o f th e forme r i s th e
range spac e o f the latte r an d vic e versa.
Second, th e isomorphis m o f F(l) with the ya ' matri x in Chapter 5 can
be demonstrate d easily . Note tha t

Both /i(^j.C*(l)i/ 1 )~ 1 an d <j> L ar e matrice s o f ran k r an d dimensio n


n x r . Thus , redefinin g \ a s ' an d iji(0j_C*(l)i|_ L )~ 1 a s y , w e have
F(l) = ya' , whic h i s a n n x n matri x wit h ran k r an d i s isomorphi c t o
n. I t i s natura l t o defin e (jt'^x, (a'x, i n Chapte r 5 ) a s th e co-integrate d
combinations o f th e x it. Integratin g (4b) show s tha t ^x, doe s no t
contain a n integrate d componen t o f th e for m 2;= i e c Further , b y th e
orthogonality o f ft wit h <j> L, th e co-integratin g combination s d o no t
contain a trend . Bot h thes e result s matc h exactl y th e correspondin g
results o n a'x t i n Chapter 5 .
Third, i f B(L) wer e no t o f ful l rank , i t woul d b e possibl e t o extrac t
another uni t roo t i n th e representatio n give n b y (6) , and th e syste m
would b e 1(0 ) instea d o f 1(1) , a s assume d originally . Th e importanc e o f
Assumption B 3 i s no w clear . Finally , usin g th e resul t tha t th e ran k o f
F(l) i s r, i t is possible t o rewrit e th e mode l i n error-correction for m as
where F(l) , lik e n i n Chapte r 5 , i s a matrix o f rank r an d ca n therefore
be decompose d int o tw o n x r matrices , eac h o f ran k r. Th e step s
involved i n goin g from th e fina l autoregressiv e for m of the syste m t o th e
ECM form ar e given in (5.25)-(5.27), with n playin g the rol e o f F(l) .
Sections 5. 3 an d 8. 1 hav e demonstrate d th e isomorphis m o f th e
moving-average, error-correction , an d autoregressiv e representation s o f
co-integrated processes . Th e nex t sectio n return s t o th e autoregressiv e
representation an d relate s thi s t o th e metho d use d b y Johanse n (1988 )
which test s th e ran k o f n = ya', since , i f ther e ar e r co-integratin g
vectors an d ya ' = n, the n ran k (it) = r. Th e non-uniquenes s o f thes e
vectors (i n the absenc e o f a priori information ) is easily seen:
for al l r X r non-singula r matrice s P . However , sinc e ran k (a ) = r , w e
can normaliz e a * (perhap s afte r suitabl e rearrangemen t o f the variables )
such tha t * ' = (I r : ft'), and so a*'x, = \at + fl'xbt wher e x' t = (x^ : x' bt).

Co-integration in System s of Equations 26

An importan t poin t fo r inference , give n (10) , is that th e EC M term s


'xr-/t wil l generall y ente r mor e tha n on e equation . Thi s wil l violat e
weak exogeneit y whe n a i s a parameter o f interest, sinc e th e ECM s wil l
be presen t i n som e o f th e othe r margina l distributions , an d wil l
therefore necessitat e join t estimatio n fo r efficienc y a s discusse d i n
Chapter 7 (see e.g. Phillips 1991, and Phillip s an d Hanse n 1990) . Henc e
a necessar y conditio n fo r th e us e o f single-equatio n method s t o b e
appropriate i n th e analysi s of co-integrate d system s is tha t th e relevan t
ECM term s ente r only th e equatio n unde r study ; thi s i s clearl y no t a
sufficient condition , sinc e i t i s possibl e tha t ther e ca n b e link s betwee n
other parameters .
As a n illustratio n o f (10) , conside r th e cas e wher e n = 2 an d r = 1.
Let ' = (1 , K) an d X Q = 0 , s o that th e respectiv e system s become

(11')
The for m i n (11 ) i s th e 'canonical ' representatio n i n 1(0 ) space , an d
Phillips (1991 ) focuse s o n estimatio n o f thi s system . Whe n
E(uitu2t) + 0, a 'simultaneit y problem ' i s present, bu t thi s ca n be deal t
with b y th e inclusio n of A.x 2t a s a regressor i n th e firs t equatio n o f (11).
The functiona l central-limi t theorem s fo r Wiene r processe s note d i n
earlier chapter s appl y despit e th e seria l dependenc e i n u ( = [ lf , u 2t]',
and direc t estimatio n o f K in th e firs t equatio n o f (11) ca n b e see n a s th e
method originall y propose d b y Engl e an d Grange r (1987) . Inferenc e
must, however, allo w for the seria l dependenc e i n ut.
The latte r system , (11') , highlight s the 'structural ' form . At leas t one
of Yi o r 7 2 mus t b e non-zero , sinc e otherwis e th e syste m ca n b e
expressed i n term s o f difference d variable s alone . Wea k exogeneit y i s
violated b y (among other possibilities ) YiY2 ^ 0. Since we are unlikely to
know a priori whic h other equations ar e influence d by an y give n ECM,
we tur n no w t o a metho d o f estimatin g th e co-integratin g rank r o f a
system, which will also allo w tests o f this aspect o f weak exogeneity.

8.2. Estimatin g Co-integratin g Vector s in System s


Consider the linea r system in (10) rewritten as

262 Co-integratio

n i n System s of Equations

where, fo r simplicity , w e hav e exclude d deterministi c term s suc h a s


trends o r constants . W e shal l retur n t o a consideratio n o f thes e i n
Section 8.5 . I n general , th e numbe r o f co-integratin g vector s wil l b e
unknown i n empirica l modelling , an d mus t firs t b e determine d fro m th e
data. Thi s step is important, becaus e both under - an d over-estimatio n of
r hav e potentiall y seriou s consequence s fo r estimatio n an d inference .
Under-estimation implie s th e omissio n o f empiricall y relevan t error correction terms , wit h thes e omitte d term s bein g relegate d t o e ( .
Over-estimation implie s tha t th e distribution s o f statistic s wil l b e non standard. Thi s ma y b e demonstrate d b y inspectio n o f (12) . I f n i s
correctly specified , al l th e variable s i n (12 ) ar e 1(0 ) an d standar d
distributional result s apply . Howeve r ft">it-k will no t b e 1(0 ) i f the matri x
contain s vector s 0,%, say , suc h tha t ax r _ i s no t a co-integratin g
combination an d i s therefore 1(1) . Th e vecto r itx. t-k W 'H hav e a mixture
of 1(0 ) an d 1(1 ) term s correspondin g t o th e correc t an d incorrec t (o r
over-estimated) co-integratin g vector s respectively . Incorrec t inference s
will resul t fro m th e us e o f conventiona l critica l value s i n tests . W e wil l
see late r tha t thi s ma y als o hav e a n advers e effec t o n forecastin g
accuracy.
Once r i s known , w e ca n procee d t o estimat e a an d y , notin g tha t
non-singular linea r combination s o f thes e matrice s provid e equivalen t
representations. Indeed , ( : y) is an over-parameterization o f n, so only
the dimensio n o f the co-integratin g space ca n be establishe d directly .
A tes t fo r th e nul l hypothesi s tha t ther e ar e r co-integratin g vector s
can b e base d o n th e maximu m likelihoo d approac h propose d b y
Johansen (1988) . Th e tes t i s equivalen t t o testin g whethe r j r = y ' ,
where a an d y are n x r ; henc e i t i s a tes t o f the hypothesi s tha t n ha s
less tha n ful l rank .
We emphasiz e that , o f th e thre e distinc t cases , (i ) r = n, (ii ) r = 0,
and (iii ) 0 < r < n, onl y cas e (iii ) wil l b e considere d formally . W e hav e
already show n tha t cas e (i ) implie s tha t al l th e variable s i n x t ar e 1(0 )
and woul d onl y b e o f interes t i f ou r initia l assumption , tha t x , i s 1(1) ,
were incorrect . I n cas e (ii) , n = 0 and the syste m ought t o b e respecified
in difference s t o achiev e stationarity . W e ca n potentiall y cove r thi s cas e
as an extrem e o f cas e (iii) .
For 0 < r < n, unde r th e assumption s tha t (12 ) i s the DGP , tha t al l
coefficient matrice s ar e constant , tha t xj_ f c . . . x0 ar e give n and that 3

3
Phillip s an d Durlau f (1986 ) deriv e th e limitin g distributio n o f th e least-square s
estimator o f (the equivalen t of) n , allowin g fo r more genera l error processes .

Co-integration i n System s o f Equations 26

the log-likelihoo d functio n i s derive d fro m th e multivariat e norma l


distribution:4

The firs t ste p i s t o concentrat e L ( ) wit h respect t o 2 , whic h involves


no ne w considerations , an d yield s th e conventiona l resul t tha t
2 = r~ 1 X; r = 1 e r eJ. Next , we remov e th e know n 1(0) variable s fro m (12)
to focu s o n th e matri x of interest n , whic h requires concentratin g L ( )
with respec t t o (D 1; . . ., D^_j) . T o d o so , sinc e th e {D J ar e unre stricted, w e ca n partia l ou t th e effect s o f (Ax,_! , . . ., A.x,_ k+l) fro m
both Ax t an d x ( _^ b y regression , t o obtai n residual s Ro f an d R ^
respectively. Le t q ( = (AxJ_ 1; . . ., AxJ_ A + i)'; then

The concentrate d likelihoo d functio n L*(JT ) no w depend s onl y o n


{Rn,, Rift} an d take s th e form

Next, w e comput e th e second-momen t matrice s o f al l o f thes e


residuals and their cross-products , S 0o, S 0 ^, Sk0, Skk, where

4
Not e that we use th e upper-cas e n fo r th e rati o of the circumferenc e of a circle to it s
diameter, a s opposed to the lower-case n define d earlie r a s the matrix product yo'.

264 Co-integratio

n i n System s o f Equation s

Consequently, fro m (18) ,

If n were unrestricted, a conventional regression estimator would result .


However, w e ar e intereste d i n th e clas s o f solution s tha t resul t from th e
imposition o f the restrictio n tha t
Hence, fro m (20) ,

Next, concentrat e L*(y , a) wit h respec t t o y , whic h wil l delive r a n


expression fo r th e ML E o f y a s a functio n o f , an d yield s a furthe r
concentrated likelihoo d functio n whic h depend s onl y o n a . Onc e th e
MLE o f a i s obtained , w e ca n solv e backward s fo r estimate s o f al l th e
other unknow n parameter s a s function s o f th e ML E o f a . Thus , fro m
(21),

Substituting $ into (21) yields L**() :

At firs t sight , differentiatin g L**( ) wit h respec t t o looks formidable,


but i n fac t th e algebr a involve d i s clos e t o tha t underlyin g th e well known LIM L estimato r fo r a singl e equatio n fro m a simultaneou s
system; bot h depen d o n reduced-ran k restriction s bein g imposed . I n
order t o solv e th e problem , w e appl y partitione d inversio n result s t o
(23) an d obtai n

Then maximizin g L**(a) wit h respec t t o a correspond s t o minimizing


the generalized varianc e ratio ,
noting tha t [Soo l i s a constant . T o locat e tha t minimum , we procee d a s
with LIM L an d impos e th e normalizatio n tha t a'S kka= I. Th e ML E
now requires tha t w e minimize, with respec t t o ,

Co-integration i n System s o f Equations 26

This involve s finding th e saddle-poin t o f the Lagrangian ,


where <p is th e Lagrangia n associate d wit h the constraint . Th e minimiza tion ha s no w bee n translate d int o a generalize d eigenvalu e problem ,
where we need t o solv e a set o f equations o f the form 5
where A is given by solving
for th e r larges t eigenvalue s A I 3 = A 2 3= . . . 3= A, . . . . ^ A n s = 0. Th e col umns of a ar e the corresponding eigenvector s
The complet e se t o f eigenvectors is given by solving
subject t o th e normalization
Thus a simultaneousl y diagonalizes S kk (t o a unit matrix) and SkoSoo&ok
(to A , th e matri x of eigenvalues). Thes e tw o diagonalizations impose r 2
restrictions in total on the system.
Defining a selectio n matri x P ' = (I r ,0'), w e ma y deduc e fro m (28)
that (a ) a = VP and (b ) a'S kka = I .^Further, w e can set V = (a, d), fo r
instance, wher e th e column s o f d ar e th e estimate s o f th e n r
eigenvectors correspondin g t o th e n r smalles t eigenvalues . Onc e a
has bee n calculated , al l th e othe r intermediat e MLE s ca n b e obtained .
For example , fro m (22), 6 Y = S0ka an d thus , f t = y' = SQkaa'. Also ,
from (13) an d (23),7 T& = S00 - S 0kaa'Sk0 = S00 - ^S fc0.
Johansen (1988 ) prove s tha t f J an d f t ar e consisten t fo r Q an d n
respectively. Th e expressio n fo r f t give n abov e i s a natura l one , in th e
sense tha t a n unrestricte d estimato r o f f t woul d b e simila r i n for m bu t
would us e a n unrestricte d estimato r f t o f n;. Thus , denotin g th e
unrestricted estimato r o f 1 2 by 2 , then Q = S 00 ftS ko an d f t S0kSkk.
Conventional method s ma y b e use d t o solv e (27) . Since S kk i s a
5
B y a standar d resul t fro m th e theor y of canonica l correlations, a n expressio n o f th e
form |'(M [ M2 )f||C'M 1 f|" 1 ca n b e minimize d b y solvin g th e equatio n JAM ; M2 = 0 .
This is the resul t used above. Details are containe d in Anderson (1958) .
6
Not e that we have used the restriction that a'S kka = I.
7
Fro m (13) andJ20) , fr = T-^^Sft =
T~ l(Sm - S W - S okn'+ nS kkx'). Substituting for n = S otaa' an d simplifying give s the desired result.

266 Co-integratio

n i n System s o f Equation s

symmetric, positive-definit e matri x fo r finit e T, it s invers e ca n b e


factorized a s
where G is non-singular. Substitutin g thi s expressio n int o (27 ) produce s
a conventional eigenvalu e problem :
In derivin g (30) , w e hav e mad e us e o f th e fac t tha t G'S^ G = I. Thus ,
only conventional estimatio n tool s ar e needed .
Further, fro m (29) ,
where A is the diagona l matri x of eigenvalues. Hence , as V'S^V = I ,

so tha t SfcoSoo^of c i s diagonalize d t o A b y th e V , V transformation .


Moreover, A i s ordered suc h tha t th e firs t r element s (denote d A r ) ar e
the larges t eigenvalues , an d th e remainin g {n r) (denote d A n _ r ) ar e
the smallest . Thes e eigenvalue s wil l pla y a primar y rol e i n inferenc e
about th e dimensio n r o f th e co-integratin g space . W e focu s o n thi s
issue i n th e nex t sectio n wher e th e asymptoti c distributio n o f th e
estimators o f the eigenvalue s is also discussed . Finally , fro m (32) ,
where p i s the ( n - r ) x n matrix , analogou s t o y , an d corresponds t o
the omitte d eigenvectors .

8.3. Inferenc e abou t th e Co-integratio n Spac e


From (24 ) and (32) , the maximize d value of the likelihoo d functio n (23 )
is given b y

since A r i s th e sub-matri x o f A correspondin g t o th e r larges t eigen values.

Co-integration in System s o f Equations 26

Denote by H r th e hypothesi s that there ar e r co-integratin g vector s i n


the syste m (i.e . ther e ar e n r uni t roots) . Whe n n i s unrestricted, al l
n eigenvalue s ar e retaine d an d th e unrestricte d maximu m o f th e
likelihood functio n i s given b y

Since th e r larges t eigenvalue s delive r th e co-integratio n vectors , an d


since A r+1 , A r+2 , . . ., A n shoul d b e zer o fo r th e non-co-integratin g
combinations, test s o f th e hypothesi s tha t ther e ar e a t mos t r co-inte grating vector s 0 = r < n, an d thu s n r uni t roots , ca n b e base d o n
twice th e differenc e betwee n th e log-likelihoo d i n (33 ) an d tha t i n (34) ;
that is,

The distributio n o f th e ry r o r trace statisti c i s derive d unde r th e


hypothesis tha t ther e ar e r co-integratin g vector s an d test s H r withi n
Hn. Th e tes t strateg y is, therefore , th e multivariat e analogue o f the D F
test: th e potentiall y stationar y varian t i s estimated , th e coefficien t
(matrix) o f th e level s i s teste d fo r significance , an d uni t root s ar e
imposed wher e th e nul l canno t b e rejected . Th e testin g therefor e
proceeds i n sequence fro m r] 0, jjj , . . ., t] n-\. Th e numbe r of co-integrating vector s selecte d i s r + 1 wher e th e las t significan t statisti c i s ?j r ,
which thereb y reject s th e hypothesi s o f n r uni t roots . H 0 i s no t
rejected i f r] 0 i s insignificant ; H O i s rejecte d i n favou r o f HI i f r]i i s
significant; etc . Sinc e r] r = Tlog|l Ar , fro m (32 ) r\ r measure s th e
'importance' o f th e adjustmen t coefficient s o n th e eigenvector s t o b e
potentially omitted . Th e distributio n o f r] r wil l b e discusse d shortly ;
however, i t wil l no t b e th e conventiona l x 2 distributio n becaus e x f i s a
(multivariate) 1(1 ) process . Thus , whil e Tt] r stil l measure s th e cos t i n
likelihood term s o f omittin g n r linea r combination s o f th e level s o f
\t-k, th e metri c fo r judgin g a significan t los s o f likelihoo d i s differen t
from tha t i n the 1(0 ) case .
Alternatively, test s o f significance of the larges t A r coul d b e base d o n
From (36) , r test s H, withi n H r+i. Th e t, r statisti c i s ofte n calle d th e
maximal-eigenvalue o r K-max, statistic .
Both rj r an d r hav e non-standar d distribution s which ar e functional s
of multivariat e Wiene r processes . Fo r r) r, thi s proces s i s o f dimensio n
n r. Thes e distribution s ar e generalization s o f th e scala r (Dickey Fuller) Wiene r processe s considere d i n earlie r chapters . Th e crucia l

268 Co-integratio

n i n System s of Equations

feature tha t make s thes e method s operationa l i s tha t th e distribution s


only depen d o n th e dimensio n n o f th e proces s unde r analysis . Thus ,
although ther e ar e n o analytica l form s fo r th e distributions , critica l
values unde r thei r respectiv e null s ca n b e obtaine d b y Mont e Carl o
simulation. Fo r example , critica l value s fo r th e abov e test s hav e bee n
tabulated b y Johanse n (1988 ) an d Osterwald-Lenu m (1992) , inter alia,
for a rang e o f value s o f n . Th e uppe r percentile s o f th e Osterwald Lenum table s ar e give n in Tabl e 8.1. 8 Eve n thoug h th e distribution s ar e
non-standard, Johanse n (1988 ) suggest s a ^ 2-based approximatio n t o th e
distribution o f r] r o f th e for m
where h = 0.85 - 0.58/(2m 2) for m = n - r .
Once th e degre e o f co-integratio n ha s bee n established , th e co integrating combinations ar e give n by
and thes e linea r combination s o f th e dat a ar e th e estimate d ECMs . A s
before, linea r transformation s ar e als o vali d co-integrating vectors, an d a
choice amon g thes e coul d b e mad e eithe r o n th e basi s o f prio r
information o r b y followin g test s fo r hypothesize d vector s a s considere d
in Sectio n 8.52 .
Moreover, onc e th e ECM s hav e bee n defined , y reveal s th e import ance of eac h co-integratin g combinatio n in eac h equation , and is relate d
to th e speed s o f adjustmen t of each dependen t variabl e t o th e associate d
disequilibria. I f a give n EC M enter s mor e tha n on e equation , th e
co-integration parameter s ar e inherentl y cross-linke d betwee n suc h
equations, an d henc e thei r dependen t variable s canno t b e weakl y
exogenous i n th e relate d equations . Thi s implie s tha t join t estimatio n i s
required t o comput e full y efficien t estimators . B y wa y o f contrast , i f a
given colum n o f y is zero excep t fo r a singl e entry, an d ther e is only on e
co-integrating vector , single-equatio n estimation o f tha t relatio n wil l no t
lead t o an y loss o f information on co-integration .

8.4. A n Empirica l Illustratio n


To illustrat e th e calculatio n involve d i n th e MLE , w e conside r th e
relationship betwee n th e (log s of ) th e price s o f ne w an d second-han d
8

Th e table s i n Osterwald-Lenu m (1992 ) giv e critical values for value s o f n runnin g fro m
1 to 1 1 and are therefor e mor e extensiv e than those i n Johansen (1988) . W e ar e gratefu l t o
Michael Osterwald-Lenu m fo r permission t o reproduc e thi s table.

Co-integration i n System s o f Equations 26

TABLE 8.1. Quantile s of th e asymptoti c distribution of the co-integratio n


rank test statistic s rj r an d r
DGP an d model: Ax , = ^fr^D/Ax,.. ; + jrx,_ fe + e, ; e
t ~ IN(0 , 2)
n - r 90

% 95

2
3
4
5
6
7
8
9
10
11

2.86
9.52
15.59
21.58
27.62
33.62
38.98
44.99
50.65
56.09
61.96

1
2
3
4
5
6
7
8
9
10
11

2.86
10.47
21.63
36.58
55.44
78.36
104.77
135.24
169.45
206.05
248.45

% 97.5

t,r (A-max )
3.84
11.44
17.89
23.80
30.04
36.36
41.51
47.99
53.69
59.06
65.30
t\r (trace )
3.84
12.53
24.31
39.89
59.46
82.49
109.99
141.20
175.77
212.67
255.27

% 99

4.93
13.27
20.02
26.14
32.51
38.59
44.28
50.78
56.55
61.57
68.35

6.51
15.69
22.99
28.82
35.17
41.00
47.15
53.90
59.78
65.21
72.36

4.93
14.43
26.64
42.30
62.91
86.09
114.22
146.78
181.44
219.88
261.71

6.51
16.31
29.75
45.58
66.52
90.45
119.80
152.32
187.31
226.40
269.81

Source: Osterwald-Lenu m (1992 : Table 0).

houses i n th e U.K. , denote d p n>t an d p hi, respectively , ove r th e


quarterly (seasonall y unadjusted ) sampl e 1957(111) - 1981(11). A la g
length of two periods is selected to captur e the mai n short-run dynamics
in a parsimoniou s way , an d th e syste m t o b e estimate d take s th e for m
(see Ericsso n an d Hendr y 1985 )

270 Co-integratio

n i n System s of Equations

The constan t an d th e thre e seasona l dumm y variable s (denote d q it)


included unrestrictedl y i n bot h equation s wer e firs t concentrate d ou t o f
the likelihoo d b y regressin g th e remainin g variable s o n the m an d takin g
the residual s a s th e 'new ' dat a set . Next , th e lagge d difference d
variables wer e remove d i n a simila r wa y (se e equations (14)-(17) ) t o
leave th e R 0r an d R 2r term s use d i n calculatin g th e secon d moment s S, y
in (19) . Give n thes e moments , (27 ) ca n b e solve d fo r th e eigenvalue s Ay,
which yielde d
The test-statistic s r\ r an d t, r base d o n these , togethe r wit h thei r 5 pe r
cent critica l value s fro m Tabl e 1 o f Osterwald-Lenu m (1992 ) (denote d
by r] r(Q.Q5) etc. ) are give n i n Tabl e 8.2 . The hypothesi s tha t ther e ar e
two uni t root s ca n b e rejecte d i n favour of one uni t root (an d henc e on e
co-integrating vector ) a t the 5 per cent level using bot h statistics, but th e
hypothesis tha t ther e i s on e uni t roo t canno t b e rejecte d agains t th e
maintained hypothesi s o f no uni t roots. W e therefor e selec t r = 1 in this
case.
The correspondin g estimate d eigenvector s (normalize d b y thei r diagonal elements ) ar e give n i n Tabl e 8.3 . The row s ar e th e row s o f a',
and bot h ar e approximatel y (1 , -1) an d (-1,1), whic h correspond s t o
the relativ e pric e (p n ph) bein g the co-integratin g relation, a s might b e
expected fo r a n ECM .
The estimate s o f y ar e give n i n Tabl e 8.4 . The firs t colum n corresponds t o th e firs t colum n o f y an d reveal s on e reasonabl y larg e
feedback coefficien t o f -0.0 6 from (p n>t-2 ~ Ph,t-2) o n t o Ap n>f ; mos t o f
the remainin g coefficient s ar e relativel y clos e t o bein g negligible , give n
the meanin g an d unit s of the EC M here . Thus , i t woul d no t b e possibl e
to rejec t th e hypothesi s tha t p hit wa s weakl y exogenou s i n th e p n>t
equation o n th e basi s o f thi s evidenc e alone . Th e smal l value s o f th e
coefficients i n th e secon d colum n ar e consisten t wit h th e ver y smal l
values of rji and 1; s o littl e los s of likelihoo d woul d resul t fro m
respecifying th e syste m i n term s o f th e 1(0 ) variable s &p n,t, &Ph,t> an d
(Pn,t-l ~ Ph,t-l)-

TABLE 8.2 . Test s an d Critica l value s


,(0.05)

n 2 = r =0
n i_ ~ r = j

16.1
0.41

14..1
3..76

Source: Osterwald-Lenu m (1992) , Tabl e 1 .

16 .5
0 .41

15,.4
3,.76

Co-integration i n System s of Equations 27

TABLE 8.3. Normalize d eigenvector s '


Variable p

ph

pn 1.00
ph -1.06

0 -1.07
3 1.00

7
0

TABLE 8.4. Adjustmen t coefficients y


Variable p
pn -0.06
ph 0.02

3 -0.00
2 -0.01

7
9

8.5. Extension s
The precedin g result s hol d fo r a simpl e model . Severa l possibl e exten sions an d othe r consideration s aris e i n thi s mode l an d w e shal l briefl y
consider eigh t o f these:
1. dumm y variables (suc h a s constants an d trends) ;
2. linea r restriction s o n co-integrating vectors ;
3. power s o f tests ;
4. forecastin g in co-integrated processes ;
5. finite-sampl e properties;
6. selectin g la g length;
7. 1(2 ) variables;
8. wea k exogeneity an d conditional models .
8.5.1. Dummy Variables
The firs t issu e o f practica l importanc e i s th e potentia l presenc e o f
intercepts i n th e equations . Th e inclusio n o f intercept s i n th e estimate d
system alter s th e critica l values of th e test s fro m thos e tha t obtai n whe n
no intercept s ar e presen t (a s a compariso n o f Tabl e 8. 1 (n o constant )
with Tabl e 8. 5 belo w shows) . Unde r th e nul l o f n o co-integratin g
vectors, non-zer o intercept s woul d generat e trends . However , eve n i n
equations wit h ECMs , tw o possibilitie s arise : tha t th e intercep t enter s
only i n th e ECM , o r tha t i t als o enter s a s an autonomou s growt h facto r
in th e equation . Bot h case s ar e considere d b y Osterwald-Lenu m (1992 )
and Johanse n an d Juselius (1990) . I n term s o f (12), th e mode l become s

272 Co-integratio

n i n System s o f Equations

where fi i s a n n x 1 vector o f intercepts . Whe n ji i i s unrestricted , i t ca n


be concentrate d ou t o f th e likelihoo d function , an d merel y make s al l
variables deviation s abou t thei r sampl e means . Afte r estimatio n o f y and
a, th e ML E o f fi ca n b e derive d i n th e sam e wa y a s th e othe r
parameters, concentrate d ou t o f th e likelihoo d function , wer e estimate d
in Sectio n 8.2 .
If an y give n equatio n contain s a n ECM , the n th e estimate d (un restricted) intercep t coul d b e include d i n tha t term , perhap s a t th e cos t
of havin g ECM s wit h non-zer o means . However , thi s coul d lea d t o th e
system havin g different mean s for th e sam e EC M i n differen t equations .
An interestin g alternativ e possibilit y i s tha t fi i s restricte d t o enterin g
only th e ECMs , namely ,
where 0 is r x 1 . I n tha t case , (37 ) become s

Equations withou t ECM s clearl y ar e rando m walk s withou t drif t (bu t


may hav e lagge d differences) , whil e equation s wit h ECM s hav e a
common mea n give n b y y 0, an d henc e als o hav e n o drift . Model s o f
the ter m structur e o f interes t rate s migh t b e expecte d t o hav e suc h a
property. Hall , Anderson , an d Grange r (1992) , Johanse n an d Juseliu s
(1990), an d Osterwald-Lenu m (1992 ) discus s testing fo r thi s possibility.
More specifically , consider a syste m writte n i n first-orde r autoregress ive for m (eithe r se t k = 1 o r regar d th e syste m a s bein g stacke d a s i n
Chapter 5):
where n = y ' an d n* = I + ya ' . Reformulat e (38 ) i n 1(0 ) spac e b y
partitioning x , int o (\' at:x'bt)' wher e 'x , an d Ax fe , ar e 1(0 ) b y construc tion. Fro m (38) ,
where w , = (xj a : AxJ,,)' = (w^:wj,,) ' an d y' = (r'a-Yb) whic
(r x r:r X (n - r)}, s o that normalizin g by a'(l r : *') the n
t,

where E, ~ IN(0 , E). Lettin g J' = (0:1) , it is seen tha t

hi s

Co-integration i n System s o f Equations 27

This 1(0 ) for m allow s u s t o determin e th e unconditiona l mean s an d


variances o f th e variable s an d henc e t o establis h th e impac t o f ft o n th e
growth o f the variables . Whe n a' y is non-singular, th e long-ru n solutio n
for th e syste m is defined by

so that
which determine s th e growt h i n th e system . Sinc e n*y= ( I + y')y =
y(I + a'y ) = y^ > where , matchin g th e structur e o f C , ip = (I + 'y) , i t
follows tha t jr* s y= yi/^ . Bu t sinc e C define s th e 1(0 ) representation ,
tys 0 a s s o o , s o tha t JT * ha s som e root s equa l t o unit y an d a
convergent componen t ip. I n a bivariate case , ty would b e th e stationar y
root o f JT* .
The matri x K i s non-symmetri c an d idempoten t wit h a' K = 0' an d
K y = 0 s o that,?r* K = K. Also , whe n y = 0 the n K = I. Sinc e th e
condition tha t fi fall s i n th e co-integratin g spac e i s fi = y 0 wher e 0 i s
r x 1 , then
confirming th e absenc e o f any linear tren d i n x, when fi = y 0.
Further, th e unconditiona l varianc e matri x o f w t , var[w, ] = G , i s
G = CGC' + , or

This long-ru n varianc e matri x can be solve d by vectorizing , and reveal s


the dependenc e o f G o n n onl y throug h y fc an d ip. Th e diagonalit y o r
otherwise o f G i s importan t fo r determinin g th e qualit y o f single equation least-square s estimatio n o f co-integratin g relation s (se e Chapter 7).
Tables 8.5-8. 7 provid e critica l values , agai n take n fro m Osterwald Lenum, fo r th e trac e an d A-ma x statistic s fo r bot h treatment s o f
intercepts. The two possibilitie s may be deal t wit h mor e explicitl y by
rewriting (37) as

wher e y i s a n n x (n r) matri x orthogona l t o y an d f t y0 + YPo

274 Co-integratio

n i n System s o f Equation s

without los s o f generality . Thus , (} 0 = 0 correspond s t o th e cas e wher e


the intercep t enter s onl y via the EC M terms . Equivalently , the constan t
fi lie s i n th e spac e spanne d b y y an d henc e y'fi= 7i7 o + Y'i.YPo ~
TiTiA) =0 whe n /J 0 = 0. Th e cas e )8 0 = 0 allow s the intercept s t o ente r
autonomously a s growt h factors . Th e critica l value s i n th e table s appl y
to three interestin g DGP-model combinations.
Table 8. 5 provide s critica l value s whe n a 0 = 0, P 0= 0 i n bot h th e
DGP an d the mode l (i.e . th e intercep t enters separately) . Critica l values
for 0 = 0, f a = 0 in th e DG P an d a 0 = 0, /? 0 ^ 0 in the mode l ar e given
in Tabl e 8. 6 (intercep t enter s onl y EC M bu t mode l i s over-parameter ized). Tabl e 8. 7 considers the DGP-mode l combination given by 0 = 0,
TABLE 8.5. Quantile s o f th e asymptoti c distribution of th e co-integratio n
rank tes t statistic s r\ r an d r
DGP an d model: Ax ( = ^fj^Dj-Ax,-, - + n\ t-k + Y ao + Y-iPo + K t>
o * 0, fa * 0; e ( ~ IN(0 , Q)
n - r 90

% 95

2
3
4
5
6
7
8
9
10
11

2.69
12.07
18.60
24.73
30.90
36.76
42.32
48.33
53.98
59.62
65.38

1
2
3
4
5
6
7
8
9
10
11

2.69
13.33
26.79
43.95
64.84
89.48
118.50
150.53
186.39
225.85
269.96

% 97.5

t,r (A-max )
3.76
14.07
20.97
27.07
33.46
39.37
45.28
51.42
57.12
62.81
68.83
r\r (trace )
3.76
15.41
29.68
47.21
68.52
94.15
124.24
156.00
192.89
233.13
277.71

Source: Osterwald-Lenu m (1992 : Tabl e 1).

% 99

4.95
16.05
23.09
28.98
35.71
41.86
47.96
54.29
59.33
65.44
72.11

6.65
18.63
25.52
32.24
38.77
45.10
51.57
57.69
62.80
69.09
75.95

4.95
17.52
32.56
50.35
71.80
98.33
128.45
161.32
198.82
239.46
284.87

6.65
20.04
35.65
54.46
76.07
103.18
133.57
168.36
204.95
247.18
293.44

Co-integration i n System s o f Equations 27

TABLE 8.6. Quantile s o f the asymptoti c distribution of the co-integratio n


rank tes t statistic s r] r an d r
DGP: Ax , = ^fr/DiAx^ , + :rx ( _ fc + y 0 + e, ,
Model :a t
n r

90%

1
2
3
4
5
6
7
8
9
10
11

6.50
12.91
18.90
24.78
30.84
36.35
42.06
48.43
54.01
59.19
65.07

1
2
3
4
5
6
7
8
9
10
11

6.50
15.66
28.71
45.23
66.49
90.39
118.99
151.38
186.54
226.34
269.53

0 ^ 0 e
. ^_
i.

f~IN(0,

95%

r (A-max)
8.18
14.90
21.07
27.14
33.32
39.43
44.91
51.07
57.00
62.42
68.27
r]r (trace )
8.18
17.95
31.52
48.28
70.60
95.18
124.25
157.11
192.84
232.49
277.39

fi);

97.5%

ft + e,
99%

9.72
17.07
22.89
29.16
35.80
41.86
47.59
53.85
59.80
64.98
70.69

11.65
19.19
25.75
32.14
38.78
44.59
51.30
57.07
63.37
68.61
74.36

9.72
20.08
34.48
51.54
74.04
99.32
129.75
162.75
198.06
238.26
283.84

11.65
23.52
37.22
55.43
78.87
104.20
136.06
168.92
204.79
246.27
292.65

I n th e model , fi = yo + yPo enter s unrestrictedly; that is , 0 = 0, /J 0 ^ 0.


Source: Osterwald-Lenu m (1992 : Table 1.1*) .

fa = 0 in bot h th e DG P an d the mode l (intercep t enter s onl y ECM an d


model i s correctl y parameterized) . Not e tha t th e critical value s fo r th e
DGP-model combinatio n give n by 0 = 0, /} 0 = 0 in both th e DG P an d
the mode l appear i n Table 8.1.
Other possibl e dumm y variables includ e a trend , whic h would allo w
the possibilit y tha t som e variable s wer e tren d stationary , an d seasona l
dummy variable s i n quarterl y dat a (o r equivalen t dummie s i n dat a o f
other frequencies) . Critica l value s fo r som e o f these additiona l case s ar e
given b y Osterwald-Lenum , althoug h th e necessar y critica l value s t o

276 Co-integratio

n i n System s o f Equations

TABLE 8.7. Quantile s o f th e asymptoti c distribution o f th e co-integratio n


rank tes t statistic s r\ r an d r
DGP and model: Ax r = ^fj'/Dj-Ax,-., - + n^t-k + Y ao + e n
0 ^ 0 e
, ~ IN(0 , ft)
n- r 90

% 95

2
3
4
5
6
7
8
9
10
11

7.52
13.75
19.77
25.56
31.66
37.45
43.25
48.91
54.35
60.25
66.02

1
2
3
4
5
6
7
8
9
10
11

7.52
17.85
32.00
49.65
71.86
97.18
126.58
159.48
196.37
236.54
282.45

% 97.5

r (A-max )
9.24
15.67
22.00
28.14
34.40
40.30
46.45
52.00
57.42
63.57
69.74
77, (trace )
9.24
19.96
34.91
53.12
76.07
102.14
131.70
165.58
202.92
244.15
291.40

% 99

10.80
17.63
24.07
30.32
36.90
43.22
48.99
54.71
60.50
66.24
72.64

12.97
20.20
26.81
33.24
39.79
46.82
51.91
57.95
63.71
69.94
76.63

10.80
22.05
37.61
56.06
80.06
106.74
136.49
171.28
208.81
251.30
298.31

12.97
24.60
41.07
60.16
84.45
111.01
143.09
177.20
215.74
257.68
307.64

Source: Osterwald-Lenu m (1992 : Tabl e 1*) .

implement test s fo r al l r an d fo r al l possibl e DGP-mode l combination s


are no t available.
8.5.2. Linear Restrictions on Co-integrating Vectors
A differen t se t of generalizations concern s testin g linea r restriction s o n
and y . Thes e woul d correspon d t o investigatin g a priori theorie s abou t
the co-integratin g vectors , an d abou t thei r role s i n differen t equations .
Conditional o n r bein g th e numbe r o f co-integratin g relationships, an d
the mode l bein g transforme d t o 1(0 ) space , th e relevan t hypothese s

Co-integration i n System s of Equations 27

generally involv e standar d x 2 distributions . (Again , se e Johanse n 1988 ,


and Johansen an d Juselius 1990. )
As an example, conside r testin g linear restriction s o n a of the for m
where J i s a know n n x s matrix an d * P i s an s x r matri x of unknown
parameters an d r = s s < n . Maximizatio n o f th e likelihoo d functio n i s
unaltered until equation (26) , which becomes

(39)
In plac e o f (27) , w e mus t solv e fo r th e eigenvalue s A f s= A f & . . . ^ Af
from th e equatio n
using th e principle s applie d above . A likelihood-rati o tes t agains t th e
unrestricted valu e o f a ca n b e calculate d an d amount s t o testin g H }
within H r, an d is therefore based o n

The % r tes t result s i n a n asymptoti c X 2[r(n ~ s )] distribution . I t i s


important t o not e tha t th e analysi s is now i n 1(0 ) space , conditiona l on
having selecte d r earlier . Simila r results obtain fo r testin g the hypothesi s
that a subset of equals a known matrix.
8.5.3. Test Power
Johansen (1989 ) ha s investigate d the powe r functio n o f th e r\ r tes t using
the theor y o f 'near-integrated ' processe s a s develope d i n Phillip s (1991 )
and discusse d in Chapter 3 . I n place o f n ya', Johansen considers
where t/ an d t ar e n x 1 fixe d vectors . Fo r a give n standardize d
importance o f th e co-integratin g vecto r effect , th e powe r fall s a s n r
rises (sinc e a large r spac e ha s t o b e searche d t o fin d th e co-integratin g
vector), an d depend s bot h o n th e magnitud e of the EC M impac t an d o n
the positio n o f th e 'local ' co-integratin g vector s i n th e space . I n th e
simple cas e where r = 1 , two scalar measures of the impac t o f the 'local '
co-integrating vector ar e give n by
When eithe r i s zero , powe r rise s wit h th e other , bu t thei r effect s als o
interact. Otherwise , not muc h is know n as yet abou t the powe r
properties o f this systems approach.

278 Co-integratio

n i n System s o f Equations

An implicatio n o f thi s lack o f knowledge i s that mor e tha n usua l car e


should b e take n i n decidin g upo n th e relevan t valu e o f r. T o rejec t th e
null o f r + 1 co-integrating vectors, a critica l value fro m a n ( n r 1)dimensional Brownia n motio n i s consulted . Thi s i s a muc h large r valu e
than tha t associate d wit h th e usua l ^-distribution , s o a large r absolut e
value o f th e likelihoo d rati o seem s acceptabl e i f onl y r co-integratin g
vectors ar e retained . However , i f the en d resul t o f a modellin g exercis e
is a n overal l tes t o f th e validit y o f al l th e over-identifyin g restriction s
imposed, a n investigato r wh o regarde d th e ( r + l)th co-integratin g
vector a s 1(0) woul d obtai n a larg e valu e o f the tes t statisti c for omittin g
this component . Sinc e test s o f over-identificatio n ten d t o hav e hig h
numbers o f degree s o f freedom , tha t additiona l likelihoo d los s coul d b e
highly significant . Thus, i t ma y not b e wis e simply to omi t co-integratin g
vectors whic h ar e clos e t o som e conventiona l significanc e value . Alter natively, al l over-identificatio n tests shoul d b e conducte d i n 1(0 ) space ,
and th e reductio n fro m th e origina l level s syste m fo r x z teste d firs t a s
1(1) 1(0) an d the n fo r furthe r restriction s conditiona l o n th e firs t tes t
(see Hendr y an d Mizo n 1992) .
8.5.4. Forecasting with Co-integrated Systems
Engle an d Yo o (1987 ) investigate d th e possibl e gain s fro m utilizin g
co-integration informatio n whe n makin g /z-step-ahea d forecast s fro m
dynamic system s fo r larg e h . The y conside r a dynami c bivariat e syste m
and contraste d a n EC M formulatio n base d o n th e Engl e an d Grange r
(1987) two-ste p approac h wit h a n unrestricte d VAR . Fro m th e commo n
trends formulatio n of th e syste m (Stoc k an d Watso n 1988& ) discusse d i n
Chapter 5 ,

where th e firs t ter m o n th e right-han d side i s a stochasti c tren d o f ran k


n r. If th e C*(L ) weight s decline rapidl y as functions of power s o f L ,
then fo r larg e h, th e h -step-ahead forecas t conditiona l o n informatio n
available a t tim e t is approximatel y

Forecast error s ar e give n by

Such forecas t error s hav e variance s o f O(h) fo r individua l series , bu t

Co-integration i n System s o f Equations 27

remain 0(1 ) fo r combination s o f th e for m a'f t+fl\t sinc e a'C(l ) = 0.


Thus, th e n tim e serie s shar e onl y n r trends , s o forecast s o f th e
series mov e togethe r i n linea r combination s eve n thoug h forecast s o f
individual series diverg e fro m outcomes . Henc e
to th e orde r o f approximatio n i n (42) . A n EC M impose s thi s conditio n
whereas a VAR doe s not ; henc e th e forme r may be expecte d t o forecast
better fo r long horizons. Engle an d Yoo present a Monte Carl o exampl e
with thi s property. However , the y find tha t th e VA R doe s slightl y better
on shor t horizons ; we comment o n this below.
When th e proces s ha s a non-zero mean ft, a term of the for m fi(t + h )
should b e include d i n the abov e analysis , which otherwis e i s unchanged:
see Section 8.5. 1 fo r th e cas e where fi lie s in the co-integratio n space .
The fac t tha t variance s o f forecas t error s fo r co-integrate d combina tions remain bounde d doe s no t resolv e th e proble m o f long-run forecasting wit h integrate d variables . A simpl e scala r exampl e illustrate s th e
difficulty. Conside r th e proces s
where \n < 1. Then, b y repeated substitution , the /z-step-ahea d forecas t
at tim e t , denote d x t+i,\t, is given by
As h oo , x t+h\t > 710(1 Tr)"1, whic h is th e unconditiona l mean o f th e
process.9 Thi s argument , whe n applie d t o stationar y variable s suc h a s
'xr o r &x it (wher e \, = (* 1(, x2t, ., x nt)'), implie s tha t th e syste m of
equations, i f rewritte n entirel y i n term s o f 1(0 ) variables , lose s th e
ability t o forecas t futur e value s base d o n it s past . A s th e forecas t
horizon increases , th e bes t predicto r turn s ou t t o b e th e unconditiona l
mean. Workin g i n th e level s o f 1(1 ) variable s i s equall y problematic
now th e pas t i s apparentl y informative , bu t forecas t error s hav e
variances increasing with h.
An exampl e fro m Hendr y (1991& ) demonstrate s th e importan t
features o f the problem . Conside r a system of three variables, 'consump tion', 'income' , an d 'saving' , denote d b y C, Y, an d S respectively . Th e
data-set i s artificia l bu t matche s importan t propertie s o f actua l U K
series, suc h as the growt h rate o f income, whe n the variable s ar e viewed
as logarithm s of th e origina l dat a (s o S t i s the lo g o f th e saving s ratio) .
Using PC-NAIVE, dat a ar e generate d by
9
Th e algebr a generalize s t o th e cas e wher e x , i s a n n-dimensiona l vecto r an d n i s a
matrix. Th e necessar y an d sufficien t condition s fo r stationarit y o f a vecto r proces s ar e
given in Ch . 1.

280 Co-integratio

n in System s of Equations

where e it ~ IN(0 , a,-,- ) wit h r(e1,e2s) = 0 Vt,s, o


o22 = 0.05. Th e syste m can be written i n levels a s

= 0.02, an d

Note no w tha t consumptio n an d incom e ar e bot h 1(1 ) variables , con sumption an d incom e ar e co-integrated , an d savin g i s a stationar y
variable. Th e equation s

define th e syste m i n 1(0 ) space . Th e discussio n abov e provide d tw o


implications, bot h o f which may now be confirmed.
A: The system in 1(0) space loses predictive power but variances of
forecast errors remain bounded.
The confirmatio n o f thi s predictio n i s twofold . First , definin g th e
vector w , = (S, , AY,)' an d the matri x A as

we hav e w , = k + Aw ( _ x + v r , wher e k = (-0.025 , 0.050) ' an d


vt = (0,52t-it, 2t)' . The various power s o f A are a s follows:

Thus, notin g tha t \v t+h\, - (I 2 - A)^ : (l2 ~ A ) k + A wh n th e abilit y to


predict AY , vanishe s rapidl y an d littl e remain s fiv e period s ahead . Thi s
is also tru e fo r 5 r, although th e rat e o f decay i s slower.
Forecasting fro m th e syste m usin g th e artificiall y generate d dat a
provides additiona l confirmatio n o f implicatio n A . Figur e 8. 1 show s th e
forecast behaviou r fo r th e chang e i n consumption . Th e forecas t vari ances rapidl y converg e t o a constan t size , spannin g abou t on e unit ,
which matche s th e rang e o f th e observe d change s i n consumptio n i n th e
sample use d t o estimat e th e system . Th e forecas t reveal s a retur n o f th e

Co-integration i n System s o f Equation s

281

FIG 8.1. Eight-year-ahea d forecas t of A C

growth rat e t o it s unconditiona l mea n o f 0. 1 afte r abou t fiv e periods ,


where i t then settles .
Figure 8. 2 shows the correspondin g forecas t behaviou r for saving . The
outcome i s simila r t o tha t depicte d i n Fig . 8.1. Th e forecas t variance s
stabilize rapidly , ther e i s som e informatio n u p t o abou t eigh t period s
ahead, bu t thereafte r conditiona l forecast s ar e n o bette r tha n th e
unconditional mea n o f -0.125.
B: The system in 1(1) space has variances of forecast errors increasing
linearly with h.
Figure 8. 3 report s th e dynami c forecasts fo r th e leve l o f consumptio n
together wit h th e forecas t erro r bars . Th e hug e increas e i n th e forecas t

1i

FIG 8.2. Eight-year-ahea d forecas t of 5

Co-integration i n System s o f Equation s

FIG 8.3. Eight-year-ahea d forecast for C


standard error s a s the horizo n increase s i s obvious. The y tren d upwards ,
and a t 3 2 periods ahead , correspondin g t o eigh t year s o f quarterl y data ,
span a rang e almos t a s larg e a s tha t o f th e previou s 6 0 dat a observa tions. Tha t rang e i s about 7. 5 units , whereas savin g never varie s outsid e
1. Th e mea n forecas t quickl y become s a tren d sinc e th e serie s i s 1(1)
and th e forecast s ar e uninformativ e after 1 0 periods becaus e o f the larg e
variances. Eithe r a larg e recessio n o r a majo r boo m woul d b e compat ible wit h th e confidenc e interval s calculated . Figur e 8. 4 report s a
recession scenari o fo r consumptio n tha t induce s a fal l o f over 1 0 pe r
cent i n final-perio d consumptio n relativ e t o th e centra l forecast , bu t
nevertheless lie s entirel y withi n th e 9 5 per cen t confidenc e band s o f th e
latter.
The discussio n s o fa r ha s abstracte d fro m th e problem s arisin g fro m
parameter uncertainty . Th e analysi s has bee n conducte d i n wha t migh t
be regarde d a s a Utopia n worl d fo r a n economi c forecaster . Th e mode l
coincides wit h th e mechanis m tha t generate d th e data , a n assumptio n
that seriousl y underestimate s th e uncertaint y likel y to b e presen t i n an y
realistic setting . Allowin g for , say , paramete r uncertaint y make s fore casts eve n more uncertain .
Sampson (1991 ) describe s th e effect s o f paramete r uncertaint y o n th e
variances o f conditiona l forecas t errors . Th e conditiona l forecas t vari ance grow s wit h th e square o f th e forecas t horizon , bot h fo r unit-roo t
(difference-stationary) an d trend-stationar y models . Chon g an d Hendr y
(1986) discus s th e sam e issu e fo r a stationar y example . Brandne r an d
Kunst (1990 ) sho w tha t a marke d deterioratio n i n forecas t accurac y
occurs i f 1(1) combination s ar e retained , s o some o f the suppose d ECM s
are spurious .
Clements an d Hendr y (1991 ) als o fin d tha t poo r estimate s of induce
a simila r effect , whic h help s accoun t fo r th e Engle-Yo o Mont e Carl o
results. However , the y als o sho w tha t mean-squar e forecas t error s

Co-integration i n System s of Equation s

283

FIG 8.4. Alternativ e futur e trajectorie s for C

(MSFEs) constitut e a n inadequat e basi s fo r selectin g forecastin g models


or method s becaus e o f a lac k o f invarianc e o f MSFE s t o non-singular ,
scale-preserving linear transforms . As a result, fo r multi-ste p forecasts in
systems o f equations , minimu m MSF E fo r on e linea r functio n o f
predicted variable s doe s no t impl y minimu m MSF E o n another . On e
method ca n dominat e al l other s fo r comparison s i n th e level s o f
variables, ye t los e t o on e o f th e other s fo r differences , t o a secon d fo r
co-integrating vectors , an d t o a thir d fo r combination s o f variables .
Thus, th e outcom e o f a forecas t compariso n ca n depen d o n whic h
representation i s selected .
By re-examinin g th e Mont e Carl o stud y o f Engl e an d Yo o (1987) ,
Clements an d Hendr y (1991 ) fin d tha t differen t ranking s o f VA R an d
Engle-Granger (EG ) estimators d o indee d resul t fro m th e 1(0 ) an d 1(1 )
representations o f the process . Fo r MSF E calculation s usin g co-integrating combination s rathe r tha n levels , th e VA R dominate s E G fo r al l
forecast horizon s eve n thoug h th e difference s o f th e variable s ar e
predicted wit h approximatel y th e sam e accuracy . The y propos e a n
alternative invarian t criterio n whic h ensure s a uniqu e rankin g acros s
models o r method s an d show s that ther e i s little t o choos e betwee n th e
VAR an d E G estimator s i n a bivariat e process . However , bot h ar e
dominated, fo r mos t o f th e paramete r value s considered , b y th e
Johansen maximu m likelihood estimato r (MLE) .
The asymptoti c formula e fo r th e /z-step-ahea d forecas t variance s i n
co-integrated autoregressiv e system s ar e derive d b y Clement s an d
Hendry (1992) . Th e /z-step-ahea d realization s fo r know n parameter s i n
terms o f (38) for xt ove r th e forecast perio d T + 1 to T + h ar e

284 Co-integratio

n i n System s o f Equation s

where
The conditiona l expectatio n E[x T+h X T] a t T i s

with forecas t erro r

Thus, th e forecas t error varianc e matrix is

For C and i n the model define d i n Section 8.5.1 , usin g w,,

Hence, th e MSF E fo r x , i n (45) i s O(h), whil e the MSF E fo r w , i n (46 )


is O(l ) i n h sinc e C s -* 0 a s s > oo . Thes e result s reflec t the fac t tha t x ,
is 1(1 ) bu t w ( ~ 1(0) . Th e covarianc e between forecas t error s a t h an d / ,
denoted b y co v [ ], i s

when m = min(/, h) .
When th e syste m i s expresse d i n difference s t o forecas t Ax
outcomes ar e give n by

T+h,

Letting AXJ-+/ , denot e th e conditiona l expectation


Then, subtractin g (49) fro m (48) ,

and s o for known parameters, th e variance formul a is Q for h = 1 and

Thus, th e MSF E i n (50 ) i s agai n 0(1) . In al l cases , whe n parameter s


need t o b e estimated , mor e complicate d formula e with additional term s
result.

Co-integration in System s o f Equations 28

These asymptoti c forecast erro r varianc e formula e revea l a grea t dea l


about th e behaviou r o f forecas t error s a s horizon s increase . Clement s
and Hendr y (1992 ) repor t a Mont e Carl o stud y fo r a bivariat e syste m
which show s tha t th e formula e abov e reflec t th e mai n finit e sampl e
effects whe n T = 100 . Thei r evidenc e als o suggest s tha t ther e i s littl e
benefit fro m imposin g reduce d ran k co-integratio n restriction s i n a
bivariate VA R unles s the forecas t horizo n i s short o r th e sampl e siz e is
small. However , ther e ar e losse s fro m omittin g relevan t co-integratin g
vectors. Thei r conclusion s ar e base d o n experiment s wher e th e numbe r
of co-integratin g combination s i s known . Whe n th e numbe r o f co-inte grating vector s ha s t o b e determine d fro m th e data , th e performanc e
of th e ML E wil l reflec t both under - an d over-specificatio n of th e degre e
of co-integration . Also , th e ML E migh t b e expecte d t o dominat e th e
unrestricted vecto r autoregressio n i n large r system s when co-integrating
relations impos e many more restrictions .

8.5.5. Finite Sample Properties


Gonzalo (1990 ) ha s undertaken a Monte Carl o stud y of the small-sample
behaviour o f th e Johanse n procedur e i n a bivariat e model , an d ha s
compared it s performanc e wit h th e Engl e an d Grange r (1987 ) two-ste p
approach, a s wel l a s severa l othe r procedure s base d o n canonica l
correlations an d principa l components . Eve n thoug h th e paramete r
estimates i n 1(1 ) processe s converg e a t a rat e o f T, rathe r tha n T 1/2,
quite larg e difference s i n estimate s emerg e fro m th e variou s method s
considered. Th e finding s ar e reasonabl y encouragin g fo r th e maximumlikelihood method . Specifically , Gonzalo find s tha t th e ML E frequentl y
has th e smalles t mean-square d erro r acros s a rang e o f parameter value s
of interes t t o empirica l research . H e als o delineate s severa l feature s of
the DG P whic h influenc e th e relativ e performance s o f th e variou s
estimators significantly . Fo r example , whe n ther e i s on e co-integratin g
vector an d a commo n facto r erro r representatio n (COMFAC ) i s valid (se e
Hendry an d Mizo n 1978 , an d Sarga n 1980) , the n th e Engle-Grange r
two-step metho d i s asymptoticall y equivalent t o MLE . Generally , ML E
does better a t large r sampl e size s an d whe n COMFA C does no t hold . Th e
effects o f non-normal errors see m minimal . However, give n the similari ties of the ML E t o LIML , particularl y the normalization s in , the ML E
may hav e no finit e sampl e moment s (se e Anderson 1976) .
Gonzalo's pape r als o provide s usefu l derivation s o f th e asymptoti c
distributions o f al l th e estimator s h e consider s i n th e Mont e Carlo , an d
relates th e simulatio n finding s t o thes e limitin g distributions. W e retur n
to thi s below.

286 Co-integratio

n i n System s o f Equation s

Reimers (1991 ) compare s th e power s o f variou s test s fo r co-integra tion fo r bivariat e an d trivariat e processes . H e find s tha t th e Johanse n
procedure over-reject s whe n th e nul l i s true , i n smal l samples ,
and suggest s correctin g thi s usin g ( T - p)log( l - A,- ) instea d o f
T log (1 A,-) fo r th e tes t statistic s wher e p = nk take s accoun t o f th e
number o f estimate d parameters . Whil e nk/T i s asymptoticall y negli gible, i t ca n b e larg e i n smal l samples . Th e powe r o f th e test s i s
dependent o n th e specificatio n of the DGP , bu t Reimer s doe s no t relat e
his simulation finding s t o th e typ e of analysis in Section 8.5.3 .
8.5.6. Selecting Lag Length
Both Gonzalo' s (1990 ) an d Reimers' s (1991 ) studie s conside r th e effect s
on th e ML E o f usin g incorrec t la g length s fo r th e short-ru n dynamics .
Gonzalo find s tha t th e los s o f efficienc y fro m choosin g to o lon g a la g is
small, an d tha t th e ML E perform s best eve n i f a la g o f fou r period s i s
used fo r th e short-ru n dynamic s instea d o f th e correc t valu e o f 0 .
However, i f to o shor t a la g lengt h i s use d (fo r example , zer o lag s
instead o f one ) the n th e ML E i s n o longe r th e bes t method . Mor e
practical experienc e is required befor e a fina l judgemen t can b e reache d
on th e relativ e cost s o f under-specifyin g versu s over-specifyin g th e
lag-length, bu t Gonzalo' s simulatio n evidenc e seem s intuitivel y reason able sinc e under-specificatio n wil l induc e residua l autocorrelation .
Reimers find s tha t th e Schwar z criterio n doe s wel l i n a data-base d
lag-length selectio n exercise . However , sinc e th e rol e o f th e Ax ( _, i s t o
whiten th e error , i t i s no t clea r tha t th e us e o f th e Schwar z criterion ,
which penalize s th e additio n o f lag s strongly , will prov e optima l i n thi s
context.
8.5.7. The Analysis of 1(2) Variables
Reconsider th e basi c autoregressive system with lag length k, written as

where A 0 = I, s o that

Co-integration i n System s o f Equations 28

Writing this system in the usua l form ,

we see tha t

The mean-la g matrix is given by

To preclud e x , bein g integrate d o f orde r 2 , y'<ba L mus t b e a full-ran k


matrix, where y L an d a ar e full-column-ran k n x ( n r) matrice s such
that y'y_ L = a' = 0 (se e Sectio n 5.2) . A natura l issu e i s whethe r o r
not ran k (y'i<&j_ ) = ( n r) ca n b e teste d and , i f so, wha t ca n b e don e
if a rank failur e i s found. This problem i s analysed i n Johansen (19916) .
First, not e tha t th e 1(2 ) mode l i s a sub-mode l o f th e 1(1 ) model . Thi s
can be seen mos t easil y in the univariat e case:
If th e proces s i s no t explosive , the n th e coefficient s (a 1; fl 2 ) o f th e
polynomial (1 a\z, 2^2 ) mus t lie on or insid e a triangula r regio n
bounded b y the points (0 , 1), (2 , -1) an d (-2, -1) . Th e line connectin g
the firs t tw o o f thes e point s describe s a singl e uni t root (th e su m o f th e
coefficients i s unity) , an d onl y it s righ t end-poin t determine s tw o uni t
roots.
Second, w e ca n repea t th e tric k use d earlie r fo r characterizin g
reduced-rank matrice s an d expres s ylOa ^ a s ^>i/ ' wher e <j) an d t] ar e
(n r) x p matrice s of ran k p = (n r). Whe n p < (n - r), an addi tional conditio n i s neede d t o preven t 1(3 ) variables , simila r i n for m t o
the earlie r mean-la g condition. W e assum e tha t x , is 1(2) s o Ax , i s 1(1),
and A 2 x, i s 1(0) . However , th e origina l serie s a'x t wil l usuall y be 1(1) ,
and combination s of the for m a * 'Ax, an d a'x , + d' Ax, will be 1(0) . Thi s
result help s explai n wh y investigator s ofte n nee d variable s suc h a s
inflation i n long-ru n mone y deman d equations . Whe n nomina l mone y
and price s ar e 1(2 ) bu t co-integrat e t o 1(1 ) a s rea l money , an d rea l
income i s 1(1) , velocit y ma y stil l b e 1(1 ) an d requir e inflatio n t o
co-integrate t o 1(0) . Further , th e concept s o f multi-co-integratio n (se e
Granger an d Le e 1990 ) o r polynomia l co-integratio n (se e Engl e an d
Yoo 1991 ) ca n be linke d b y such results to th e analysi s of 1(2) processes .
Thus, earlie r model s of , fo r example , consumers ' expenditur e involving

288 Co-integratio

n i n System s o f Equations

the wealth-incom e rati o a s a n integra l correctio n mechanis m ca n b e


appropriately re-interprete d (se e Hendr y an d Ungern-Sternberg 1981) .
Jbhansen (1991b ) provide s a statistica l procedur e base d o n a n exten sion o f th e 1(1 ) MLE , whic h essentiall y consist s i n repeatin g th e 1(1 )
method twice . Th e firs t stag e proceed s a s usua l fo r th e reduced-ran k
analysis o f th e level s o f th e variables , correctin g fo r th e lagge d firs t
differences an d an y dumm y variables, t o determin e r , y , an d a . Next ,
one transform s th e variable s t o 1(1 ) combination s a s jus t describe d b y
creating a^Ax^ j an d a'x r _i, y^A 2 x ( an d regresse s o n thos e tw o plu s
lagged A 2 x ( _; u p t o la g lengt h k 2 t o establis h <j>, ij , an d p . Johanse n
shows that , asymptotically , this procedur e determine s th e correc t para meters. H e als o obtain s th e relevan t limitin g distribution s o f th e
estimators.
8.5.8. Weak Exogeneity and Conditional Models
Most large-scal e econometri c system s an d man y other empirica l model s
are ope n i n th e sens e tha t the y trea t a subse t o f th e variable s a s
'exogenous'. I n thi s sub-section , w e wil l focu s o n th e potentia l wea k
exogeneity o f contemporaneou s conditionin g variable s fo r th e para meters o f interes t i n 1(1 ) co-integrate d system s (se e Engl e e t al. 1983) .
As discusse d i n Chapte r 1 , wea k exogeneit y require s tha t ther e i s n o
loss o f informatio n abou t th e parameter s o f interes t i n reducin g th e
analysis fro m th e join t distributio n t o a conditiona l model . Th e concep t
was develope d initiall y in th e contex t o f stationar y processes, bu t a s th e
results i n Chapte r 7 suggested, it play s a n importan t rol e i n 1(1) system s
as well.
In particular , whe n th e vecto r o f observable s x , i s 1(1 ) ther e ca n b e
cross-equation link s betwee n parameters , whic h ar e induce d b y th e
occurrence i n severa l equation s o f commo n co-integratin g combinations
'x ( . I f a'\ t enter s bot h th e z't h and ;'t h equations , the n Xj t canno t b e
weakly exogenou s fo r th e parameter s o f th e z't h equatio n sinc e th e
parameters o f the tw o equations shar e commo n component s o f a'x , an d
so canno t b e variatio n free . Failur e t o accoun t fo r suc h paramete r
dependencies ca n adversel y affec t th e validit y o f inferenc e i n finit e
samples (se e Chapte r 7 , Phillip s 1991 , Phillip s an d Loreta n 1991 , an d
Hendry an d Mizo n 1992).
To develo p notatio n fo r an 1(1) ope n system , tw o partitions o f \t are
needed. T o exposi t th e basi c idea , i t i s convenien t t o retur n t o th e
first-order syste m in (38 ) above , writte n as
where e r ~ IN(0,) an d ' i s r x n o f ran k r. First , w e have th e usua l

Co-integration i n System s o f Equations 28

transformed partitio n o f x, int o w ( = (xJer.Ax^)' , capturin g the location s


of th e uni t root s an d th e co-integratin g vectors , wher e ther e ar e r
elements i n x',a an d ( n - r ) i n Ax& r . Th e histor y o f th e proces s u p t o
time t - 1 is denoted i n 1(0) spac e b y Wj_ i = (w l5 . . ., w,_i) . Second ,
we partitio n Ax ( int o (Axi,:Ax 2r)', wher e Ax 2f i s r a x 1 an d i s t o b e
treated a s weakl y exogenous fo r th e vecto r paramete r o f interes t tjt e 4> ,
which include s thos e element s o f a an d y relevan t t o Ax lt . Fo r late r
use, w e explicitl y write ou t nm t-i i n term s o f (xi^ix^-i)', whe n ther e
are r v + r2 = r co-integrating relations i n the tw o blocks, namely

The dimension s o f y n , y 12, y 2i, an d y 22 ar e ( n m) X r 1;


(n - m ) x r 2, m x r l 5 an d m x r 2 respectively ; and , correspondingly,
a'n, a[ 2, 21, an d 22 ar e r^x ( n m), TI x m , r 2x ( n - m) , an d
r2 x m. If r 2 0, the n the relevan t element s are set to zero . Sinc e the
analysis i n term s o f w , i s i n 1(0 ) space , th e approac h i n Engl e e t al.
applies.
The complet e se t of parameters o f the join t distributio n i s 0 e 0, an d
these ar e mappe d one-for-on e t o f(0 ) = A e A, an d partitione d int o
A=(Ai:A2)' wher e ^ e \i an d A 2 e A2. Factoriz e th e join t sequentia l
density D x(^t Wj_ l 5 ff ) o f Ax ( int o it s conditiona l an d margina l
components:

(56)
Since w ( _! = (xJ-jtrAx^-j)', al l th e informatio n o n th e co-integratin g
vectors i s retained i n Wj_j . Consequently , Ax 2f i s weakly exogenous fo r
<j> i f (jt depend s o n A t alone , an d A : an d A 2 ar e variatio n free , s o tha t
A = A j x A 2. Wea k exogeneit y o f Ax 2( fo r (j> canno t occu r whe n A ! an d
A2 bot h depen d o n commo n component s o f a .
As a consequenc e o f th e normalit y assumption , an d usin g the expres sion in (55) for ya'x^, conditionin g Ax lf o n Ax 2, lead s t o th e mea n of
the conditiona l density:

290

Co-integration i n System s o f Equations

where W = E^E^1. Thus , a necessary conditio n fo r the wea k exogeneity


of Ax 2( fo r (yii:ii:i 2) i s that eithe r {y 12 - Vy22} = 0 o r y 22 = 0; i.e.
(2ix lt _i + a 22x2r-i) appear s i n onl y on e o f D Xl\X2(-) o r D Xl(-), bu t
not both . Further , unles s y 21 = 0, the n (a'uXi t~i + a 'ux2t-i) wil l appea r
in th e margina l distributio n o f Ax 2( , s o y 21 = 0 is als o necessary . Ther e
are sufficien t condition s for thes e necessar y conditions t o hold , including
721-0, y 22 0 an d y 12 = 0 wher e th e latte r tw o aris e becaus e r 2 = 0.
Such condition s ca n b e teste d usin g th e approac h i n Johanse n (1992b)
and Johansen an d Juselius (1990) .
Short-run parameter s ma y depen d o n som e o f th e element s i n a
without jeopardi/in g efficien t inference s abou t long-ru n parameter s o f
interest. However , i f al l th e element s o f ^ ar e o f interest , the n agai n
variation-free parameter s ar e required , an d an y cross-restrictions violat e
weak exogeneity.
To illustrat e thi s analysis , reconside r th e exampl e i n equation s (31)
and (32 ) and (60 ) of Chapte r 7 . Ther e i s one co-integratin g vecto r wit h
parameter /? , r\ = r = 1, r 2 = 0, m = 1 , and n = 2:

This representatio n i s in term s o f w r (se e (38) above) bu t i s written a s a


triangular syste m erro r correctio n a s i n Phillip s (1991) , imposin g a
specific first-orde r autoregressiv e parametri c for m fo r th e erro r proces s
u, (compare d wit h the genera l processe s allowe d by Phillips):
The unconditiona l covarianc e matri x o f u r i s pli m T~1 ^u t uJ = G ,
derived i n Sectio n 8.5.1 . Le t c 12 = c 22 = 0 sinc e thes e parameter s onl y
determine th e presenc e o f the lagge d differenc e o f x2t, an d d o not affec t
co-integration vectors . The n th e long-ru n covariance matri x is (see Ch. 7
appendix):

where ft)u = on/(l - c n ) 2 an d ^12 = cr12/(l - c u). Th e non-diagonalit y


of fl implie s tha t ther e i s informatio n abou t th e parameter s o f eac h
equation i n th e other . However , b y conditionin g Ax lf o n Ax 2, i n th e
first equation , th e cr 12 effec t i s removed. The n eve n i f th e firs t equatio n
is dynamic , s o c u = 0, th e diagonalit y o f fl onl y depend s o n c 21 = 0.
When c 21 + 0, th e long-ru n covarianc e matri x i s non-diagonal an d ther e

Co-integration i n System s o f Equations 29

is a los s o f wea k exogeneity , whic h ca n hav e a detrimenta l impac t o n


the bia s an d efficienc y o f th e least-square s estimato r o f f i i n finit e
samples. Not e tha t c 12 = 0 ca n b e correcte d withi n th e firs t equatio n
treated i n isolatio n b y addin g lagge d A* 2/, bu t tha t c 21 = 0 require s
modelling th e syste m (althoug h correction s base d o n addin g lead s o f
Ax 2 , hav e been propose d t o exploi t th e obvers e Grange r causalit y of x\
on X2'. se e Stoc k an d Watso n 1991) .
We no w deriv e th e conditiona l an d margina l factorizations . I n term s
of observables , th e origina l syste m fro m Chapte r 7 ca n b e writte n a s
w, = Cw,_! + e t, or

Rewritten a s a VAR i n 1(0) variables as in (37) , w e have

where d 12 = c12 + ^c22, y n = (cn - 1 + /3c21), d 22 = c 22, an d y21 = c21.


The restricte d firs t colum n o f D i s a n incidenta l effec t fro m assumin g a
first-order autoregressiv e erro r initially.
Finally, solvin g fo r th e conditiona l an d margina l representations , w e
have

where W = ouo22\ A u = (/3 + W), A 12 = (cu - 1 - Wc 21), A 13 =


(c12 - ^c 22), A 21 = c 21, A 22 = c 22, an d E[v ts2t] = 0. Assum e tha t <j> =
(An:A12:A13:/J)' i s th e vecto r paramete r o f interest . Whe n A 21 = 0 ,
least-squares estimatio n o f 0 from th e firs t equatio n involve s n o los s of
information. I n fact , x 2t i s strongl y exogenous fo r 0 in suc h a system .
However, whe n A 21 + 0, Ax 2( i s no t weakl y exogenous fo r <j> an d th e
analysis i s no t full y efficient . Mont e Carl o studie s (e.g . Phillip s an d
Loretan 1991 ) confir m th e impac t o f thi s los s o f efficienc y i n finit e
samples (se e Chapte r 7 ) .
Irrespective o f th e valu e o f A 21, th e firs t equatio n i n (62 ) i s th e
conditional expectation fro m (58) , namely
Thus, onc e dat a ar e 1(1 ) bu t co-integrated , th e fac t tha t a n equatio n
coincides wit h th e conditiona l expectatio n i s no t sufficien t t o justif y
single-equation least-square s modelling . Rathe r surprisingly , weak exo geneity is at leas t a s important i n 1(1) processe s a s in 1(0) processes .

292 Co-integratio

n i n System s o f Equation s

8.6. A Second Exampl e o f the Johanse n Maximu m


Likelihood Approach
We reconside r th e U K seasonall y adjuste d quarterl y dat a fro m Sect . 7. 6
on money , prices , output , an d interes t rates , thi s tim e treate d a s a
system, represente d b y a VA R wit h tw o lag s o n eac h o f m p, &p,
xS5, and R n, plus a constant an d a trend. Th e la g length was selected b y
commencing a t fiv e lag s on ever y variable, an d sequentiall y testin g fro m
the highes t order . Th e sampl e wa s 1964(3)-1989(2) . Th e residua l
standard deviation s o f th e fou r equation s wer e 0.0161 , 0.0069 , 0.0126 ,
and 0.012 7 respectively , an d o n recursiv e F-test s al l fou r equation s ha d
acceptably constan t coefficient s usin g one-of f 1(0 ) critica l values . Th e
residuals als o yielde d insignifican t outcome s o n % 2 test s fo r autocorrela tion bu t no t fo r normality.
In almos t ever y instance, tw o co-integratin g combinations wer e signifi cant (i.e . tw o unit roots were rejected) ; th e secon d o f these wa s virtually
the sam e i n al l la g specifications , bu t th e firs t wa s ofte n a linea r
combination o f th e firs t tw o row s reporte d i n Tabl e 8.9 . Suc h a findin g
matches tha t i n Hendr y an d Mizo n (1992 ) an d Ericsso n e t al. (1991) .
Beginning wit h th e larges t statistics , tw o o f th e test s i n eac h colum n ar e
significant (se e Osterwald-Lenu m 1992 : Tabl e 2).
The correspondin g eigenvector s ar e show n i n Tabl e 8.9 , i n rows ,
augmented b y th e tw o non-co-integratin g combination s i n th e las t tw o
TABLE 8.8. Eigenvalues , tes t statistics , an d 5 per cen t critica l value s
Eigenvalues

0.013817

Statistics

-riog(i-ft.;) ,(0.05

n 4= r =

n 3= r = 1
n - 2 = r =2
n - 1 =r = 3

72.82
28.73
6.22
1.39

0.060350

30.33
23.78
16.87
3.74

0.249694

0.517240

-riog(l - M, ;) ?
109.17
36.34
7.62
1.39

n - r (0.05)

54.64
34.55
18.17
3.74

TABLE 8.9. Normalize d eigenvector s '


Variable

m p

1.0000
0.0311
-0.2633
0.9838

'l
l>2

R,,

6.3966
1.0000
0.9435
4.5659

-0.8938
-0.3334
1.0000
-0.7701

7.6838
-0.1377
-1.2117
1.0000

Co-integration i n System s o f Equations

293

rows. Th e firs t ro w suggest s th e following long-ru n solutio n fo r th e


money equation:
This i s clos e t o tha t foun d fro m th e single-equatio n dynami c analysis in
Chapter 7 . N o tren d i s required . Th e y matri x i s give n i n Tabl e 8.10.
Only th e firs t entr y i n th e firs t colum n i s a t al l large , s o tha t th e firs t
co-integrating vecto r onl y affect s th e firs t equatio n consisten t wit h th e
weak exogeneit y o f x 85, R n, an d A p fo r th e parameter s o f th e
money-demand equation . Thi s agai n matche s th e findin g ove r a shorte r
sample in Hendry an d Mizo n (1992) .
The secon d ro w o f Tabl e 8. 9 deliver s th e approximat e long-ru n
solution
This correspond s t o th e impac t o f exces s demand , a s measure d b y th e
deviation fro m it s linea r trend , o n inflatio n wit h a smal l an d possibl y
insignificant effec t fro m interes t rates . N o additiona l tren d i s the n
required. Th e secon d colum n o f y show s a larg e effec t o f thi s ECM o n
all fou r equations , violatin g an y possibilit y o f treatin g an y o f th e fou r
variables a s weakly exogenous i n a model o f inflatio n or exces s demand
when the parameter s o f interest includ e th e long-ru n multipliers.
When th e orderin g o f variables is ( m p,Ap, x S5, R n) th e long-ru n
n matrix is
-0.082
-0.245
-0.081
-0.761
0.164
-0.009
-0.474
0.112
0.007
0.146
-0.108
-0.147
-0.021
-0.119
0.149
-0.059

8.7. Asymptoti c Distributions o f Estimators o f


Co-integrating vectors in 1(1 ) system s
Gonzalo (1990 ) review s an d compare s th e variou s alternative s t o OL S
for th e estimatio n o f co-integrating vectors, includin g those propose d b y
TABLE 8.10. Adjustmen t coefficients y
Variable

7i

72

m- p
Ap

-0.0952
0.0048
-0.0210
-0.0001

0.4268
-0.5147
0.2578
-0.2253

*85

Rn

-0.0300
-0.0013
-0.0318
0.0796

-0.0076
0.0024
0.0116
0.0069

294

Co-integration i n System s o f Equation s

Stock (1987) , Stoc k an d Watso n (19886) , Johanse n (1988) , Phillip s


(1988a), an d Phillip s an d Hanse n (1990) . Whil e al l o f th e suggeste d
methods shar e th e super-consistenc y property , w e hav e see n tha t ther e
can b e substantia l difference s i n thei r performanc e o n moderatel y size d
samples.
Gonzalo make s th e compariso n on a simple dat a generatio n proces s i n
which co-integratio n hold s between th e 1(1 ) serie s z t an d y t:
and

This syste m i s a specia l cas e o f (58 ) an d ca n therefor e b e represente d i n


the error-correction for m

where w l f = /3e 2r + eic U 2t = 2n an d (uu' ) = A, with

The logarith m o f th e likelihoo d functio n fo r th e EC M i s therefore


L(a, y , A) = K - (r/2)ln|A |

where x , = (y t, z t)' , J~ (p 1,0)' , ' = (1 , -/?), an d y ' i s th e 2 x 2


matrix o f rank 1 given i n (64).
The system s (63 ) an d (64 ) hav e th e propert y tha t z t i s weakl y
exogenous fo r /? . Sinc e th e u it are normall y distribute d (fro m (63)) , tak e
conditional expectation s in (64)

Taking th e covariance s o f the u t fro m (65) , w e have

Co-integration in System s o f Equations 29

The paramete r /3 i s recoverabl e fro m (67) . Moreover, / ? doe s no t


enter th e margina l distribution .
Weak exogeneit y o f z t fo r / ? implie s tha t inferenc e concernin g f t ca n
be carrie d ou t wit h n o los s o f informatio n b y usin g th e densit y o f y t
conditional o n z t an d ignorin g th e margina l densit y o f z t (tha t is , th e
DGP o f z t)- I t i s the n no t surprisin g that , whe n th e log-likelihoo d i s
formally spli t int o a conditiona l an d a margina l likelihood, th e margina l
density contain s n o informatio n abou t ft . Tha t is , (66 ) can b e rewritte n
as

with A 0 = An - A 12A^A21, , = Ay , - ( p - l)(y t-i - fizt-i) ~ ty&z t,


and, finally , i/ > = A^A^ 1 = (f t + 0ffi/ff 2 ); V ca n b e interprete d a s a
short-run multiplier , bein g th e coefficien t o n Az , i n (67) , while th e
long-run multiplie r i s ft , fro m (63) . The ter m i n parenthese s i n (68 ) is
the margina l likelihood o f z t (o r Az r ) an d doe s no t involv e /3; estimatio n
of f t ca n b e carrie d ou t b y maximizin g the conditiona l likelihoo d alone .
The estimat e i s tha t whic h woul d b e obtaine d fro m OL S i n th e
regression correspondin g to (67).
In orde r t o discus s th e asymptoti c propertie s o f differen t estimatio n
methods, w e us e th e multivariat e functiona l central-limi t theore m an d
transformation t o th e uni t interva l describe d i n Chapte r 6 . Fo r th e
vector e t - (v t, E 2t)' , let pt - p,_ j + ef . The n

with B(r ) = (5i(r), B 2(r))'. Th e long-ru n covarianc e matri x o f thi s


bivariate Brownia n motion proces s ca n b e calculate d a s in th e appendi x
to Chapte r 7 :

Further,

where

296 Co-integratio

n in System s o f Equations

Hence

Results o n th e asymptoti c distribution s o f th e differen t estimator s o f


co-integrating parameters wil l be state d withou t proof, bu t ca n b e found
in Gon/al o (1990).
(i) Static regression estimated by OLS. For \t generate d by (63) , the
OLS estimator o f ft in a static regression ha s the asymptoti c distribution

using th e decomposition BI(S) = a)i 2a)22B2(s) + (<ui i where eo, y i s th e (i , /)th elemen t o f Q an d W(s ) i s a Brownia n motio n
process independen t o f B 2 sinc e aj 12co2~21 = 0cr 1/a2(l - p ) an d
The importan t propertie s o f stati c estimatio n o f f l ar e apparen t fro m
(69) an d (70) . The estimato r i s super-consistent , bu t contain s second order biase s reflecte d i n (706 ) an d (70c) ; thes e term s als o mak e
standard distribution s inappropriat e fo r hypothesi s testing . Estimatio n
will b e improve d t o th e exten t tha t (706 ) and (70c ) can b e reduce d o r
eliminated, an d the y will vanish if the short-ru n an d long-ru n multipliers
are equal , sinc e i/ > = / 3 implie s 6 = 0, an d s o A 2 = A% = 0. Whil e th e
limiting distributio n abov e i s specifi c t o th e DG P (63) , i/ > = / ? wil l
typically onl y aris e becaus e o f a n absenc e o f lagge d value s o f z t an d y t
from th e DGP ; if fo r exampl e y, = yzt + Y\yt-\ + Y2Zt-i + error, the n
the long-ru n multiplie r i s / ? = ( V + 72)/( l ~ 7i) > m whic h cas e
7i 72 = 0 i s sufficien t fo r fi = ty . A commo n facto r (y 2 = VYi) i s
necessary an d sufficient .
The term s A 2 an d A 3 abov e ca n b e eliminate d whe n if> = / ? by th e us e
of othe r estimatio n methods, a s will be see n below .

Co-integration i n System s o f Equations 29

(ii) Non-linear least squares (Stock 1987). Thi s method , whic h elimin ates th e bia s containe d i n (70c) , consist s i n minimizin g th e su m o f
squared residual s defined as

which i s non-linea r i n tha t th e coefficien t o n z t-i i n th e correspondin g


regression mode l i s YiP- Th e coefficien t f t ca n howeve r b e recovere d
from th e ordinar y linear regressio n

The asymptoti c distribution o f thi s NL S estimato r i s simila r to tha t i n


(69), bu t wit h the ter m (70c ) omitted an d (706 ) modifie d to

Comparing (706) and (706') , we see that (706' ) contain s a factor of ty


rather tha n (i/;-/3) . A s (706 ) is on e o f th e term s responsibl e fo r
second-order bias , i t seem s likel y tha t OL S wil l perform relativel y well
when ty ft = Q, reducin g th e bia s i n (706) , an d tha t NL S wil l perfor m
relatively wel l whe n ^ = 0, reducin g th e bia s i n (706') . I n th e Mont e
Carlo stud y of Stock (1987) , th e DG P chose n implie s that ip = 0, leading
to th e superiorit y o f th e NL S technique ; wher e t/ ; = ft , however , OL S
may d o better . Recal l fro m th e definitio n of if> tha t V = f t i f 0 > a scaling
factor fo r th e correlatio n betwee n th e underlyin g white-nois e disturb ances in y t an d z, t, is equal to zero .
(in) Full-information maximum likelihood (FIML). Th e FIM L pro cedure o f Johanse n (1988 ) fo r estimatin g the matri x a o f co-integrating
vectors i n a syste m i s describe d above . Gonzal o show s that , fo r th e
DGP (63) , the FIML estimator o f ft has the asymptoti c distribution

where AI i s as given i n (70a) . Therefor e (71 ) is equivalent t o (69 ) wit h


terms A 2 an d A 3 eliminated . FIML estimatio n eliminate s two sources of
bias: th e non-symmetr y caused b y ip = ft which leads t o a bias in median
(term (706)), an d th e simultaneous-equation s bias , whic h i s a bia s i n
mean (ter m (70c)) , whic h results when the long-ru n covariance betwee n
zt an d v t i n (63 ) i s no t accounte d for . Th e FIM L estimato r i s
asymptotically symmetrically distributed.

298 Co-integratio

n i n System s of Equations

Moreover, th e asymptoti c distributio n give n i n (71 ) i s a mixtur e o f


normals. (Recal l tha t i n (70a ) B 2(s) an d W(s) ar e independen t Brow nian motio n processes. ) A s a result , standar d asymptoti c chi-square d
hypothesis tests ar e valid.
(iv) Other estimators. Stoc k an d Watso n (19886 ) an d Bossaert s (1988 )
propose additiona l method s o f estimatio n base d o n principa l compon ents an d canonica l correlations respectively .
The principal-componen t metho d find s th e linea r combinatio n o f y t
and z t wit h minimu m variance , whic h amount s t o findin g th e co integrating vector. Give n th e covarianc e matrix of (y t, z t), th e principalcomponent estimat e o f th e co-integratin g vecto r i s th e eigenvecto r
corresponding t o th e smalles t eigenvalu e o f thi s covarianc e matrix . Fo r
the DG P (63) , it s asymptoti c distribution i s like tha t o f OL S a s given in
(69), wit h th e additio n o f a fourt h ter m groupe d wit h A\, A-i an d AT,.
Calling thi s term A 4,
The additiona l ter m affect s th e bia s i n mean , whic h ma y b e large r o r
smaller tha n tha t o f OL S a s thi s term ma y b e positiv e o r negative . Lik e
FIML, th e principal-componen t metho d lend s itsel f naturall y t o th e
estimation o f more than on e co-integratin g vector.
The metho d o f canonica l correlatio n i s base d o n a searc h fo r th e
linear combinatio n o f (y t, z t) an d (y t-i, z t-i) whic h ha s th e maxima l
correlation subjec t t o normalizatio n and identificatio n constraints.
Gonzalo compare s th e method s i n a Mont e Carl o simulatio n that use s
a DGP simila r to (63) , but wit h (63a ) modifie d t o

where a\ = 0 o r 1 and wit h a\ = 1 . Th e result s ar e consisten t wit h th e


analysis o f biase s give n above , an d i n particula r suppor t th e contentio n
that th e Johansen-typ e FIM L estimato r wil l ten d t o b e superior . Whic h
of OL S an d NL S i s superior depends , a s anticipated , o n th e parameter s
V an d t y fi. Moreover, a s w e hav e see n above , i t appear s tha t th e
efficiency cos t o f over-parameterizatio n o f th e FIM L o r NL S estimator s
is modest , whil e th e consequence s o f under-parameterizatio n ma y b e
more serious .

Conclusion
We briefl y summariz e th e mai n theme s o f th e book , an d the n
consider th e invarianc e o f th e matri x o f co-integrating vectors i n a
linear syste m unde r bot h linea r transformation s an d seasona l
adjustment. Next , co-integratio n i s related t o structure d time-serie s
models, whic h offe r a n alternativ e approac h t o modellin g inte grated data . Recen t researc h o n integratio n an d co-integratio n i s
described, an d th e boo k conclude s b y re-interpretin g som e ol d
econometric problem s i n the ligh t of co-integration theory .

9.1. Summar y
Many economi c tim e serie s appea r t o b e non-stationar y and to drif t ove r
time. Efficien t inferenc e i n time-serie s econometric s require s takin g
account o f thi s phenomenon . Thi s boo k describe d th e modellin g o f
economic variable s a s integrate d processes , allowin g fo r th e possibilit y
that variable s ma y b e linke d i n th e lon g run , implyin g tha t linea r
combinations of them ar e co-integrated .
We firs t presente d th e backgroun d t o th e theor y o f integrate d series ,
building o n concept s fro m time-serie s analysi s an d th e theor y o f sto chastic processes . Th e resultin g distribution s o f estimator s an d test s
applied t o integrate d dat a wer e functional s o f Wiene r processes , whic h
when combine d wit h a functional central-limi t theorem le d to a powerfu l
and genera l metho d fo r derivin g their limitin g distributions. These wer e
different fro m th e limitin g distribution s conventionall y applie d t o sta tionary processes , bot h becaus e th e normalizatio n facto r was the sampl e
size rathe r tha n it s squar e root , an d becaus e th e for m o f the asymptoti c
distribution wa s non-normal . A n importan t implicatio n wa s tha t th e
critical value s o f tes t statistic s differe d betwee n 1(0 ) an d 1(1 ) data .
Although th e asymptoti c distributio n theor y involve d ne w type s o f
derivations, i t wa s feasibl e t o maste r th e logi c o f Wiene r processe s
without excessiv e effort ; th e pay-of f wa s tha t th e approac h simplifie d
other derivation s (suc h a s constanc y tests , a s i n Hanse n 1992) , and , i n
addition, wa s very general.
The Wiene r proces s tool s the n allowe d u s t o analys e suc h divers e
problems a s spuriou s (o r nonsense ) regressions , spuriou s detrending ,

300 Conclusio

parametric an d non-parametricall y adjuste d univariat e test s fo r uni t


roots, regression s o n 1(1 ) data , an d test s fo r co-integration . W e showe d
that eve n wit h 1(1 ) dat a man y test s ha d conventiona l distributions , bu t
some di d not , s o car e wa s require d i n conductin g inference . Fo r
example, test s suc h a s the Johansen statisti c Tlo g (1 - A ) for co-integration ha d distribution s whic h wer e functiona l o f Wiene r processes ,
although test s o n co-integratin g vector s wer e asymptoticall y normal . I n
particular, over-identificatio n test s neede d t o b e formulate d after map ping t o th e spac e o f 1(0) variable s t o ensur e tha t thei r distribution s wer e
not a mixture of thes e tw o type s of distributions (se e Hendr y an d Mi/o n
1992). Conditionin g test s o n th e 1(1 ) decisio n fo r th e numbe r o f
co-integrating relation s allowe d th e test s t o b e treate d a s having conventional distributions .
Co-integration provide d a conceptua l framewor k fo r mappin g t o 1(0 )
space an d therefor e w e examine d i t a s a data-reductio n too l an d
investigated som e o f it s wide-rangin g implications. Test s fo r co-integra tion base d o n residual s fro m stati c regression s an d o n system s wer e
derived. Th e Grange r Representatio n Theore m linke d co-integratio n t o
a variet y of other representations , includin g error-correction mechanism s
(ECMs) whic h hav e been widel y used sinc e th e lat e 1970s .
This lin k in tur n entail s a ne w view of dynamics : lagged feedbacks an d
ECMs d o no t necessaril y violate rationalit y in a n 1(1 ) world . Further , a s
in Davidso n e t al. (1978) , th e rol e o f differencin g i s a s a transform ,
which preserve s co-integration , an d no t a s a filter , whic h eliminate s
levels variable s an d henc e lose s co-integration . Conversely , omittin g a n
ECM generall y induces a negative moving-averag e error, a point elabor ated upo n below .

9.2 Th e Invarianc e o f Co-integrating Vectors


Linear systems , perhap s formulate d afte r suitabl e dat a transformation s
(such a s logarithms) intende d t o mak e linearit y a reasonable approxima tion, pla y a leadin g role i n co-integratio n analysis . A linea r syste m i s
invariant unde r non-singula r linea r transforms , bu t usuall y it s para meters ar e altere d b y suc h transforms . Chapte r 2 discusse d th e proper ties o f linea r autoregressiv e distribute d la g (ADL ) model s fo r stationar y
data, relatin g transformation s o f ADL s t o ECM s t o demonstrat e th e
equivalence o f estimator s o f long-ru n multiplier s fro m an y o f th e
transforms eve n thoug h th e parameter s o f the equatio n wer e altered . I n
1(1) processes , th e correspondin g resul t i s that co-integratio n define s a n
invariant o f a linear system , a s we now show .
Consider a n identifie d n x r co-integratio n matri x i n th e 1(1 )
system:

Conclusion 30

(1
)
where e ( ~IN(0,i;). Th e syste m i n (1 ) ha s parameter s (T , y, a, fi, E).
Then, \, is 1(1 ) i f an d onl y i f rank (yl^aj j = n r wher e * P i s th e
mean la g matrix defined i n Chapter 8 . Here (y : y ) has rank n, with y
being n X (n r) suc h tha t y i y = 0 an d (a:a ) ha s ran k n wit h
^ = 0 fo r _ L o f siz e nx(n r). Pre-multiplyin g (1 ) b y a know n
n x n non-singula r matri x B (s o | B = 0),
t

The syste m i n (2 ) ha s th e sam e likelihoo d a s (1) , bu t wit h parameter s


(r*, y*, a, jti* , *) wher e * = BB'; a n exampl e o f a n admissibl e
transform i s an y just-identifie d reformulatio n o f (1) . Onl y a i s unaf fected b y th e linea r transform , an d a'x,_ i remain s th e co-integratin g
combination, s o a i s an invariant parameter o f the system.
The 1(1 ) propert y o f th e syste m i s als o preserve d a s follows . Th e
mean-lag matri x become s *P * = B*P and , lettin g (y * : yj) =
(By: B^'yj.) s o that y*'y l = 0, the n
and henc e th e tw o matrices hav e th e sam e rank . The invarianc e of is a
natural propert y o f reduced-ran k system s an d extend s t o 1(2 ) processe s
and t o conditiona l systems . Thus , fo r a give n vecto r x, , reduce d forms ,
marginal models , conditiona l models , an d structura l form s al l ca n b e
modelled wit h the sam e se t of co-integration vectors .

9.3. Invarianc e o f Co-integration Unde r Seasona l


Adjustment
The co-integratin g vecto r a i s invarian t t o seasona l adjustmen t b y a
diagonal seasona l filte r S(L ) whic h satisfie s th e scale-preservin g prop erty S(l ) = I, a s does a procedur e lik e X-ll . Th e result s i n this sectio n
are draw n fro m Ericsson , Hendry , an d Tra n (1992) . I t i s assume d tha t
S(L) annihilate s an y deterministi c seasona l dummies . Th e invarianc e
result hold s becaus e S(L ) can be written a s (see Chapte r 5) :
We firs t sho w th e co-integratio n relatio n betwee n adjuste d an d
unadjusted dat a an d the n establis h th e invarianc e o f th e co-integratio n
matrix a o f x, . Le t x , = S(L)x,. denot e th e seasonall y adjuste d vecto r
variable. The n

302 Conclusio

so tha t x , \t = S*(L)Ax r . Henc e \ at an d x, co-integrat e wit h a uni t


coefficient t o 1(0 ) whe n x, i s 1(1). Mos t seasona l adjustmen t filter s ar e
two-sided an d symmetri c for mos t o f th e availabl e sample , s o that i n fac t
S*(l) = 0 an d S(L ) = I + S**(L)A 2 . The n x ? - x , = S**(L)A 2 x ( s o
that co-integratio n t o 1(0 ) occur s betwee n adjuste d an d unadjuste d dat a
even whe n x t i s 1(2). Alternatively , i f Ax r i s 1(0) wit h a non-zer o mea n
(as i n GNP) , the n x " - x , ha s a zer o mean , a s seem s sensibl e fo r the
seasonal residual . Generally , i f S(L ) = I + St(L)A d , the n x ? an d x ,
co-integrate wit h a unit coefficien t to 1(0 ) whe n xt i s I(d), an d als o hav e
a zer o mea n differenc e whe n x ( i s \(d 1). Whe n x", xt i s a t mos t
1(0), an y co-integratin g vecto r ' o f eithe r x ? o r x , i s a co-integratin g
vector o f th e other , s o co-integratio n parameter s ar e unaffecte d b y
S(L). Sinc e x", = xt + S**(L)A2 x ( , we have tha t

and henc e th e differenc e is at leas t tw o order s o f integratio n lowe r tha n


that of xt.
However, th e adjustmen t paramete r y i s altere d a s follows . Multipl y
(1) by S(L) t o give
Ax? = S(L)fi + S(L)rAx,_! + S(L)y'x f _ 1 + S(L)e ,

By suitabl e additio n an d subtractio n o f lag s an d difference s o f x ? o n th e


right-hand side ,

When Sf(-L ) i s a scala r time s th e uni t matri x (th e sam e filte r fo r al l x it),
vat = ef. I n (6) , i t look s a s i f y i s als o a n invariant , bu t a s o at involve s
lagged, current , an d futur e difference s of x, o f dth o r highe r order , a s
well a s e", the n on e o f v at o r e t i s likel y t o b e autocorrelated . Sinc e
'x?_i i s a n 1(0 ) variable , conventiona l seria l correlatio n biase s appl y t o
it, an d henc e y will usuall y b e affecte d b y whethe r o r 'not th e dat a ar e
seasonally adjusted . Th e short-ru n dynamic s wil l be change d whe n e t i s
an innovation , becaus e v" i s correlate d wit h Ax?_i , an d additiona l lag s
are neede d t o remov e it s autocorrelation .

Conclusion 30

9.4. Structure d Time-serie s Models and Co-integratio n


An alternativ e approac h t o modellin g integrate d processe s i s offered b y
structured time-serie s model s (se e Harvey 1989) . 1 I n thi s section , w e
briefly explai n thei r for m an d relat e thei r dat a descriptio n propertie s t o
a co-integrated system . A simpl e univariat e example i s given by

and E[e tvs] = 0 V?,s . Thei r for m generall y lead s t o th e presenc e o f


negative moving-average errors , sinc e (7 ) and (8 ) imply that
The proces s {e t et_i + vt} ca n be re-expresse d a s a first-order moving
average {e, 9et-i}, wher e th e moment s o f th e derived proces s ar e
identical t o thos e o f the origina l process an d determin e 9 . Th e variance
of th e forme r i s 2o 2E + o 2v, an d tha t o f th e latter , {e t-det_i}, i s
(1 + 0 2)ol, an d thes e mus t b e equa l t o eac h other ; thei r first-orde r
auto-covariances ar e o2 an d 9o2, and agai n these mus t be equal . Al l
longer la g c o variances vanish . Equatin g th e first-orde r seria l correlatio n
coefficients of the two representations yield s
where q = o2Ja2. Equatio n (10 ) is a quadratic i n 6 that, give n q, can be
solved fo r a valu e o f 9 betwee n 0 an d 1 . Finally , equatin g first-orde r
covariances a 2, = o 2e/9. Thus , Ay , i s 1(0 ) an d ha s a negativ e moving average erro r wit h parameter 9 : Ay, = e t (?e,_i.
There ar e clos e link s betwee n negativ e moving-averag e error s an d
error-correction mechanism s a s remarke d earlie r (se e e.g. Gregoir an d
Laroque 1991) . Conside r a simple co-integrated system ,

To marginaliz e with respect t o z a t al l lags in (11), firs t rewrit e it a s


so that, i n terms o f differences ,
In (14) , w, = Ay3v,_ ! + AM , an d a s wit h (9) , when {v, } an d {u s} ar e
mutually independent , w e ca n rewrit e w t a s , T,_I, wher e equatin g
1
Harve y call s suc h model s 'structural' , bu t a s tha t wor d i s heavil y over-use d i n
econometrics, we have substituted 'structured' .

304 Conclusio

moments yield s -t/( l + r 2) = -l/( 2 + s) fo r s = )?ff-o 2v/o2u. Thus , a


negative moving-averag e erro r als o result s fro m th e marginalizatio n
providing A ^ 0 (th e uni t roo t i n (14 ) cancel s whe n A =0 sinc e the n
s = 0 an d s o r = l ) . I f (7 ) an d (8 ) allowe d fo r a short-ru n dynami c
element, th e observe d outcom e woul d b e simila r t o tha t entaile d

by (14) .

A structure d time-serie s mode l tha t generalize s (8 ) b y includin g a


time-varying slope generate s a n 1(2) series ,

Thus, a s long as cr 2 + 0,
Hence fro m (7) ,
When cr ^ = 0, we have t = t, t_v = 0, say, so that
and C o i s th e mea n growt h rat e [Ay r ] = g y = 0- Whe n a 2 =0, (18 )
entails changes in [Ay r ] = g y (f) over tim e an d generate s y , a s 1(2).
The alternativ e possibilit y to evolvin g growt h rate s i s tha t o f change s
in mean s ove r time , s o tha t g y(t) take s differen t value s i n differen t
epochs. Suc h behaviour coul d b e approximate d b y a mode l i n which th e
distribution D n(r]t) wa s non-normal, wit h a large mass a t zer o an d smal l
probabilities o f larg e values . The n r woul d usuall y b e constant , bu t
would occasionall y jum p t o a ne w level . Thus , i t i s unsurprisin g tha t
discrimination betwee n integrate d an d regime-chang e model s i s difficul t
(see Perro n 1989) . Conversely , ther e ar e clos e affinitie s betwee n struc tured time-serie s an d econometri c model s fo r integrate d data . Indeed ,
several researcher s hav e suggested switchin g from a unit-root nul l to on e
of 1(0 ) o r co-integration . Fo r example , on e migh t see k t o tes t a 2, = 0
when a ^ = 0 (s o r = W) a s a tes t fo r a uni t roo t (se e e.g . Kwiatkowski, Phillips , an d Schmid t (1991) an d Leybourn e an d McCab e (1992)) .

9.5. Recen t Researc h o n Integration an d Co-integratio n


During th e las t decad e ther e ha s bee n a n explosio n o f researc h o n
integrated an d co-integrate d processes . Dozen s o f papers appeare d whil e
we wer e writin g the book , an d man y will appea r betwee n completio n o f

Conclusion 30

writing an d it s appearanc e i n print . Wit h suc h a rapidl y movin g target,


we focuse d o n centra l researc h topic s t o explai n wha t see m likel y t o
remain th e majo r concepts , tools , techniques , models , methods , an d
tests.
Consequently, som e researc h area s receive d scan t treatment , including
other estimatio n method s fo r co-integratio n vectors , a s well as studies of
their properties : see inter alia Ahn and Reinse l (1988) , Bewley , Orden ,
and Fishe r (1991) , Boswij k (1991) , Bo x and Tia o (1977) , Engl e an d Yo o
(1991), Phillip s (1991) , Saikkone n (1991) . Som e comparativ e Mont e
Carlo studie s o f finit e sampl e behaviou r an d relate d econometri c theory
have bee n noted , bu t other s appea r apac e an d w e ca n expec t man y
more ove r th e nex t fe w year s clarifyin g th e choic e o f method , an d th e
likely problem s confrontin g eac h proposal . Researcher s wil l als o stud y
the problem s o f join t selectio n of , e.g . la g lengt h an d th e numbe r o f
co-integration vectors . Anothe r researc h topi c i s th e orde r i n whic h
hypothesis tests should be conducted . Intuitio n suggest s that i t should b e
constancy, la g length , co-integration , congruenc e o f th e system , wea k
exogeneity, structura l restrictions , encompassing , intercept s (an d
whether the y lie in the co-integratio n space), etc . However , th e distributions o f test s o f th e firs t hypothesi s ar e affecte d b y th e presenc e o f
co-integration, an d i t ma y wel l b e difficul t t o implemen t a goo d order ,
although i f the dat a ar e indee d 1(1) , test s fo r la g length based o n lagged
first difference s wil l b e i n 1(0 ) space . On e recommendatio n concernin g
choices o f method s an d estimator s tha t emerge d a s w e proceede d wa s
for a system s approac h i n preferenc e t o single-equatio n modellin g until
weak exogeneit y has been ascertained .
Further development s hav e occurre d i n testin g fo r uni t root s i n
univariate processe s suc h a s instrumenta l variable s test s an d Durbin Hausman test s (se e e.g . Hal l 1991 , Cho i 1992 , Schmid t an d Phillip s
1992, Kremer s e t al. 1992 ; an d Banerje e an d Hendr y 199 2 fo r a
summary). However , th e previou s recommendatio n o f modellin g th e
system rathe r tha n usin g univariate representation s bring s into questio n
the poin t o f conductin g unit-roo t test s i n margina l processes . On e
purpose migh t be t o rejec t th e nul l of integration against trend stationar ity. Here , th e availabl e test s ar e know n to hav e relativel y low power. I n
particular, investigator s ofte n us e t( p = 1) rathe r tha n T(p 1) (se e
Sect. 4.6 ) althoug h Mont e Carl o evidenc e show s th e latte r t o hav e
higher power . I n an y case , failur e t o rejec t th e nul l doe s no t entai l
accepting it as 'true' . For example , univariat e unit-roo t test s can reflec t
other non-modelle d form s o f non-stationarit y suc h a s regim e shifts , an d
inherent non-stationarit y i n mea n an d varianc e functions . Further ,
variables inherit uni t roots fro m marginalizin g with respect t o othe r unit root processe s o n whic h they depend . Thus , failur e t o rejec t a nul l o f a
unit roo t tell s u s littl e abou t th e persistenc e o f shock s t o th e variabl e

306 Conclusio

being considere d i n isolatio n o r i n a small, highly marginalized syste m a s


discussed i n Campbel l an d Perro n (1991) .
A secon d purpos e migh t be t o chec k that variable s i n a system ar e no t
1(2) (se e e.g . Pantul a 1991) , s o th e nul l woul d b e a uni t roo t i n th e
differences o f th e origina l variables . However , i f th e intentio n i s t o
model th e system , the n i t seem s bette r t o procee d fro m th e genera l t o
the specifi c her e a s wel l an d tes t th e necessar y ran k condition s o n th e
mean la g matri x o f th e syste m (se e followin g (1 ) above) . Nevertheless ,
sequential test s i n thi s contex t rais e som e ne w problems . Fo r example ,
the outcom e o f a pretest fo r a uni t root (i.e . rejec t o r no t reject ) affect s
the critica l values used t o tes t economi c hypotheses , s o the possibilit y of
Type-I error s a t th e firs t stag e ma y lea d t o siz e o r powe r distortion s a t
the secon d stag e whe n conventional initia l values ar e used .
Finally, a uni t roo t ma y b e o f interes t i n orde r t o validat e a specific
estimator (e.g . Engle-Granger ) b y appealing t o super-consistency . Her e
a uni t roo t tes t ma y b e o f descriptive valu e as i t depend s o n th e rati o of
the covarianc e o f the firs t differenc e wit h the leve l to th e varianc e o f th e
level, an d s o should b e clos e t o zer o whe n ther e i s a unit root, althoug h
we showe d i n Sectio n 3. 6 tha t simila r distribution s wil l resul t fo r
integrated an d near-integrate d processes . Th e rati o o f th e varianc e o f
the firs t differenc e to tha t o f th e leve l i s another inde x of th e rapidit y of
accrual o f information (either fro m trend s o r fro m drift) .
Other likel y researc h interest s concer n test s o f structural , long-run ,
exogeneity, causality , an d encompassin g hypothese s (se e e.g . Boswij k
1991, Hendr y an d Mizo n 1992 , an d Banerje e an d Hendr y 1992) .
Modelling 1(2 ) system s i s i n it s infanc y (se e Johanse n 1991fo) , bu t ha s
close links to multi-co-integratio n an d th e analysi s of stock-flow relations
(see Grange r an d Le e 1990) . Thi s las t developmen t provide s a n addi tional explanatio n fo r suc h phenomen a a s th e rol e o f inflatio n i n rea l
money deman d equations : i f nominal money and th e pric e leve l are 1(2) ,
and rea l mone y an d inflatio n ar e 1(1) , the n th e las t ma y b e neede d t o
create a n 1(0 ) co-integratio n vector . Extensiv e development s als o see m
likely t o occu r i n estimatio n an d dynami c modelling , sinc e fo r man y
objectives i n econometrics , includin g forecasting and policy , the focu s o f
interest mus t b e al l parameter s o f th e syste m an d no t jus t th e long-ru n
parameters.
In co-integrate d processes , wea k exogeneit y o f th e conditionin g vari ables fo r th e parameter s o f interes t remain s a s vita l a s i t di d i n
stationary processeseve n fo r th e long-ru n parameters . Thus , i t i s
important t o tes t fo r th e presenc e o f co-integratin g vector s i n othe r
equations a s discusse d i n Chapte r 8 . Doin g so , however , implie s syste m
modelling eve n fo r a n L M tes t (se e Boswij k 1991) . Further , Urbai n
(1992) show s tha t test s fo r orthogonalit y betwee n regressor s an d error s
lack powe r t o detec t suc h a weak exogeneity failure.

Conclusion 30

9.6. Reinterpretin g Econometrics Time-series Problems


Integration an d co-integratio n als o lea d t o th e re-interpretatio n o f many
extant econometric s time-serie s problems . W e conside r a fe w o f these ,
commencing with multi-collinearity.

9.6.1. Multi-collinearity
When x , ~ 1(1 ) an d a'x , ~ 1(0) , the n includin g all the element s o f x ( o r
\t-i a s regressors i n a singl e equatio n wil l induc e a n apparentl y seriou s
collinearity problem . Th e secon d momen t matri x (X'X ) will b e O(T 2),
whereas th e linea r combinatio n (a'X'Xa ) wil l b e O(T). Consequently ,
(T~ 2 X'X) will converge on a singular matrix . Generally, it is inadvisable
to 'solve ' thi s proble m b y deletin g variables ; fo r 1(1 ) data , doin g s o
jeopardizes th e possibilit y of co-integration . I f th e dependen t variabl e i s
1(0), the n th e solutio n i s to fin d th e co-integratin g combination a'x t o r
'x,-i an d us e tha t a s a n explanator y variable . Thi s strateg y cor responds t o th e usua l recommendatio n o f transformin g t o near-ortho gonal an d interpretabl e variables . I n othe r cases , wher e th e dependen t
variable i s 1(1) bu t i s co-integrated wit h a subset o f \t, say, elimination
may b e sensible , bu t Wiener-base d critica l value s shoul d b e use d fo r
variables tha t canno t b e writte n implicitl y a s a n 1(0 ) functio n (se e
Chapter 7) . Thes e idea s ar e relate d t o th e earlie r techniqu e o f con fluence analysi s in Hendry an d Morga n (1989) .

9.6.2. Measurement Errors


Measurement error s ar e a secon d proble m wher e treatmen t recommen dations ca n differ i n the light o f data bein g integrated . Whe n \t ~ 1(1),
then Ax ( ~ 1(0) , an d i f the dat a ar e i n logarithms , then th e change s ar e
growth rates . I f observed growt h rate s ar e t o b e a t al l sensible, the n th e
error wit h which the y ar e measure d mus t no t b e 1(1 ) o r higher . Lettin g
x? denot e th e observe d series , on e possibl e mode l i s Ax t = Ax, + u f ,
where u, i s 1(0), s o that
If th e measuremen t erro r i n level s i s denote d vr t = xt \t, then w r i s
apparently 1(1) . Thi s consideratio n therefor e onl y rather weakl y bounds
the scal e o f measuremen t error . Indeed , i f the DG P i s of th e for m tha t
Ax, = e t, then u, and e t ar e essentially indistinguishable in models of x.

308 Conclusio

However, whe n a'x ( i s a n 1(0 ) co-integratin g combination , then , o n


pre-multiplying (20 ) by a',
Since Aa'x , is I(1) an d
a'Xf wil l b e 1(0 ) onl y if a'u, i s I(1) . Thus , 1(0 ) measuremen t error s o n
growth rate s mus t co-integrat e t o I(1 ) wit h co-integratio n matri x a if
the observe d serie s ar e t o co-integrat e i n th e sam e wa y a s th e laten t
variables whe n the measuremen t errors ar e 1(0 ) o n growt h rates. Nowa k
(1990) call s a failur e t o observ e a'x t bein g 1(0 ) whe n a'x, i s 1(0 ) a
problem o f 'hidde n co-integration' . However , man y co-integratio n rela tionships, suc h a s consumption and income , ar e likel y to hav e connecte d
measurement errors . Governmenta l statistica l bureaux ma y eve n correc t
the dat a o n suc h serie s i n a relate d wa y t o avoi d divergence , whic h
suggests a n 1(0) measurement erro r for , say, the rati o betwee n them .
An alternativ e mode l o f measurement error fo r logarithm s is one wit h
a constant-percentag e standar d deviation, s o that th e siz e of the absolut e
error grow s with th e variable . This lead s t o x ? = x, + v t wher e var[v f ] is
constant. Suc h a measuremen t erro r woul d no t imped e co-integratio n
analyses, i n tha t inconsistenc y would not resul t a s in a n 1(0 ) setting , bu t
would hav e th e usua l impact i n 1(0 ) representation s sinc e a'v t coul d b e
1(0). A n importan t instanc e is when v t i s an expectation s error , i n whic h
case th e distribution s of th e long-ru n parameter estimate s ar e unaffecte d
but short-ru n paramete r estimate s ma y b e biase d (se e Engl e an d
Granger 1987 , an d Hendr y an d Neal e 1988) .
9.6.3. Incorrectly Omitted and Included Variables
When a relevan t 1(1 ) variabl e i s omitte d fro m a relationship , 1(0 )
co-integration i s impossibl e an d seriou s biase s ca n result . I n particular ,
for a n 1(0 ) dependen t variable , al l th e remainin g 1(1 ) regressor s ma y
cease t o b e significan t give n th e appropriat e critica l values , leadin g th e
model t o collaps e t o on e i n differences . Includin g a n irrelevan t 1(1 )
variable o r vecto r wil l probabl y lowe r th e efficienc y o f estimate s o f th e
co-integrating vector s bu t shoul d b e detectabl e i n larg e enoug h samples ,
with th e usua l possibility of Type-I errors.
If on e incorrectl y include s a n 1(0 ) variabl e i n a co-integratio n vecto r
in a stati c regression , it s coefficien t wil l b e biase d whe n tha t variabl e i s
correlated wit h omitte d 1(0 ) variables . Th e consequence s i n th e max imum likelihoo d procedur e see m les s seriou s a s it is possible t o tes t fo r a
unit vecto r (i.e . on e o f th e for m ( 0 ... 0 1 0 ... 0) ) lyin g i n th e co-inte -

Conclusion 30

gration spac e (se e Sect . 8.5.2.) . However , conditionin g o n th e estimate d


coefficients o f 1(0 ) variable s i s inappropriate , an d spuriousl y smal l
confidence interval s fo r th e remainin g 1(0 ) effect s wil l usuall y result .
Finally, excludin g a n 1(0 ) variable fro m a mode l wil l no t affec t th e
long-run paramete r estimate s i n larg e samples , bu t wil l usually bias th e
short-run parameter s as in conventional econometric derivations .
9.6.4. Parameter Change in Integrated Processes
The mos t seriou s proble m arisin g fro m possibl e paramete r chang e i n
econometrics i s th e predictiv e failur e o f model s tha t fai l t o incorporat e
the necessar y effects . Unfortunately , i t i s difficul t eve n t o diagnos e th e
problem sinc e i t is easy to confus e a n 1(1) proces s wit h an 1(0 ) subjec t to
shifts (se e e.g . Perro n 1989 , Rappopor t an d Reichli n 1989 , an d Hendr y
and Neal e 1991) . Indeed , a s note d i n Sectio n 9. 4 above , structure d
time-series model s implemen t th e latte r an d produc e th e former .
Whether it is mor e usefu l to vie w economi c dat a as integrate d (in the
sense o f havin g a uni t roo t i n th e autoregressiv e representatio n subjec t
to regula r smal l shocks ) o r a s subjec t t o larg e an d persisten t regim e
shifts (th e abolitio n o f fixe d exchang e rates followin g Bretto n Woods , o r
their reinstatemen t i n th e ERM ; th e formatio n o f OPEC ; th e denation alization o f large sector s o f a n economy ; ne w form s o f monetary contro l
or thei r removal ; financial and technological innovation ; etc.) remain s to
be seen . However , bot h type s ar e boun d t o pla y importan t roles , an d
although w e hav e focuse d o n th e forme r i n thi s book , understandin g
economic behaviou r wil l necessitat e modellin g bot h integrate d dat a an d
breaks appropriately . E x ante, structura l break s ca n lea d t o ba d
predictions, whic h 1(1) data alon e d o not see m to cause . E x post, testing
for paramete r chang e i n 1(1 ) dat a mus t allo w fo r a wid e rang e o f
possible choice s fo r brea k points . Usefu l development s ar e occurrin g in
deriving appropriat e test s base d o n Wiene r distributions , an d decisio n
taking i n thi s are a shoul d improv e rapidl y (se e Nyblo m 1989 , Ch u an d
White 1991 , 1992 , Andrew s an d Ploberge r 1991 , Hanse n 1991 , an d Li n
and Terasvirt a 1991) .
9.6.5. Conditional Models o f Co-integrated Processes
Chapter 8 emphasize d th e maximum-likelihoo d approac h t o testin g fo r
and estimatin g co-integratin g vector s i n th e contex t o f a VAR . Thi s
imposed th e minimu m conditionin g assumption s an d allowe d a clea r
focus o n th e propertie s o f co-integratio n estimation . However , man y
papers hav e begu n t o develo p approache s i n th e contex t o f systems that

310 Conclusio

treat a subset o f variables a s weakly exogenou s fo r al l the parameter s o f


interest: se e Johansen (1992a , 1992&), Johanse n an d Juseliu s (1990) , an d
Boswijk (1991) , inter alia. Relate d wor k include s tha t o n testin g fo r
Granger causalit y i n co-integrate d system s (se e Tod a an d Phillip s 1991 ,
Mosconi an d Giannin i 1992 , an d Hunte r 1992) .
For a lon g time , econometrician s hav e 'talked ' co-integratio n withou t
realizing it : fo r example , Klei n (1953 ) discusse s variou s grea t ratio s o f
economics, namel y consumption-income , capital-output , wag e shar e i n
total income , an d s o on, implicitl y assuming a stationary , o r 1(0) , world .
From ou r perspective , give n tha t th e component s o f thes e relation s ar e
1(1), Klein' s ratio s are earl y example s of co-integratio n hypotheses . In a
log-linear multivariat e analysis , thes e postulat e particula r form s fo r th e
rows of the co-integratio n matrix , highlightin g the potentia l confirmatory
role o f th e method s discusse d i n Chapte r 8 . Econometrician s nee d n o
longer simpl y assume long-ru n equilibrium relation s sinc e i t is feasible t o
test fo r thei r existence . Onc e tha t i s establishe d th e analysi s is reduce d
from 1(1 ) t o 1(0 ) space , allowin g th e applicatio n o f wel l establishe d
tools.
Thus, th e recen t focu s o n conditiona l o r ope n model s take s us back t o
the 1970 s i n a n importan t sens e wit h th e link s betwee n economi c theor y
or long-ru n equilibriu m reasonin g an d dat a modellin g havin g bee n
placed o n a sounder footing .
As w e hav e show n i n thi s book , ther e stil l remai n man y difficul t
theoretical an d empirica l problem s t o b e overcome . However , th e
literature o n co-integration , erro r correctio n an d th e econometri c analy sis of non-stationary data ha s enable d u s to gai n many important insights
into modellin g relationship s amon g integrate d variables . Thi s ha s en hanced rathe r tha n replace d existin g method s o f dynami c econometri c
modelling of economic tim e series.

References
ABADIR, K . M . (1992) , 'Th e Limitin g Distributio n o f th e Autocorrelatio n
Coefficient Unde r a Unit Root' , Annals o f Statistics, forthcoming .
AHN, S . K. , an d REINSEL , G . C . (1988) , 'Neste d Reduced-Ran k Autoregressiv e
Models fo r Multipl e Tim e Series' , Journal o f th e American Statistical Association, 83: 849-56.
ANDERSON, T . W . (1958) , A n Introduction t o Multivariate Statistical Analysis,
John Wiley , New York.
(1976), 'Estimatio n o f Linea r Functiona l Relationships : Approximat e Distributions an d Connection s wit h Simultaneou s Equation s i n Econometric s
(with discussion)' , Journal of th e Royal Statistical Society B,38 : 1-36 .
ANDREWS, D . W . K. , an d PLOBERGER , W . (1991) , 'Optima l Test s o f Paramete r
Constancy', mimeo. , Yale University Press.
BANERJEE, A. , an d DOLADO , J . (1987) , 'D o W e Rejec t Rationa l Expectation s
Models Too Often ? Interpretin g Evidence using Nagar Expansions', Economics Letters, 24: 27-32.
(1988), 'Test s o f th e Lif e Cycle-Permanen t Incom e Hypothesi s i n th e
Presence o f Rando m Walks : Asymptoti c Theor y an d Smal l Sampl e Interpre tations', Oxford Economic Papers, 40: 610-33.
-and GALBRAITH , J . W . (1990a) , 'Orthogonalit y Test s wit h De-trende d
Data: Interpretin g Mont e Carl o Result s using Nagar Expansions' , Economics
Letters, 32: 19-24.
-HENDRY, D . F. , an d SMITH , G . W . (1986) , 'Explorin g Equilibriu m
Relationships i n Econometric s throug h Stati c Models : Som e Mont e Carl o
Evidence', Oxford Bulletin of Economics an d Statistics, 48: 253-77.
-GALBRAITH, J . W. , an d DOLADO , J . (19906) , 'Dynami c Specificatio n with
the Genera l Error-Correctio n Form' , Oxford Bulletin o f Economics an d
Statistics, 52: 95-104.
-and HENDRY , D . F . (eds. ) (1992) , Testing Integration an d Cointegration,
special issue of th e Oxford Bulletin of Economics and Statistics, 54, 225-55.
BARDSEN, G . (1989) , 'Th e Estimatio n o f Long-Ru n Coefficient s fro m Error Correction Models' , Oxford Bulletin of Economics and Statistics, 51: 345-50.
BEWLEY, R . A . (1979) , 'Th e Direct Estimatio n of the Equilibriu m Response i n a
Linear Model' , Economics Letters, 3 : 357-61.
BEWLEY, R . A. , ORDEN , D. , an d FISHER , L . (1991) , 'Box-Tia o an d Johanse n
Canonical Estimator s o f Cointegratin g Vectors' , Universit y o f Ne w Sout h
Wales, Economics Discussion Paper, 91/5 .
BHARGAVA, A . (1986) , 'O n th e Theor y o f Testin g fo r Uni t Root s i n Observe d
Time Series' , Review of Economic Studies, 53 : 369-84.
BILLINGSLEY, P . (1968) , Convergence of Probability Measures, John Wiley , New
York.
BOSSAERTS, P . (1988) , 'Commo n Non-Stationar y Components o f Asse t Prices' ,
Journal o f Economic Dynamics an d Control, 12 : 347-64.

312 Reference

BOSWIJK, H . P . (1991) , 'Testin g fo r Cointegratio n i n Structura l Models', Univer sity o f Amsterdam, Econometric s Discussio n Pape r AE7/91 .
(1992), 'Efficien t Inferenc e on Cointegratio n Parameter s i n Structural Erro r
Correction Models' , Universit y o f Amsterdam , Econometric s Discussio n
Paper,
-and FRANSES , P . H . (1992) , 'Dynami c Specificatio n an d Cointegration' ,
Oxford Bulletin o f Economics an d Statistics, 54: 369-81.
Box, G . E . P. , an d JENKINS , G. M . (1970) , Time Series Analysis Forecasting and
Control, Holden-Day , Sa n Francisco.
and TIAO , G . C . (1977) , ' A Canonica l Analysi s o f Multipl e Tim e Series' ,
Biometrika, 64: 355-65.
BRANDNER, P. , an d KUNST , R . (1990) , 'Forecastin g Vecto r Autoregressions : Th e
Influence o f Cointegration', Memorandu m 265 , IAS , Vienna .
CAMPBELL, B. , an d DUFOUR , J.-M . (1991) , 'Over-Rejection s i n Rationa l Expec tations Models : A Non-Parametri c Approac h t o th e Mankiw-Shapir o Prob lem', Economics Letters, 35 : 285-90.
CAMPBELL, J . Y. , an d PERRON , P . (1991) , 'Pitfall s an d Opportunities : Wha t
Macroeconomists Shoul d Kno w Abou t Uni t Roots' , i n Blanchard , O . J . an d
Fischer, S . (eds) , NBER Economics Annual 1991, MIT Press .
and SHILLER , R . J . (1991) , 'Cointegratio n an d Test s o f Presen t Valu e
Models', Journal o f Political Economy, 95 : 1062-88.
CHAMBERS, M . J . (1991) , ' A Not e o n Forecastin g i n Co-Integrate d Systems' ,
Department o f Economics, Universit y of Essex .
CHAN, N . H. , an d WEI , C. Z . (1988) , 'Limitin g Distribution s o f Least-Square s
Estimates o f Unstabl e Autoregressiv e Processes' , Annals o f Statistics, 16 :
367-401.
CHOI, I . (1992) , 'Durbin-Hausma n Test s fo r Uni t Roots' , Oxford Bulletin o f
Economics an d Statistics, 54: 289-304.
CHONG, Y . Y. , an d HENDRY , D . F . (1986) , 'Econometri c Evaluatio n o f Linea r
Macroeconomic Models' , Review o f Economic Studies, 53 : 671-90.
CHOW, G . C . (1960) , 'Test s o f Equalit y Betwee n Set s o f Coefficient s i n Tw o
Linear Regressions' , Econometrica, 52: 211-22.
CHU, C.-S . J. , an d WHITE , H . (1991) , 'Testin g fo r Structura l Chang e i n som e
Simple Tim e Serie s Models' , Discussio n Pape r 91-6 , Universit y of California,
San Diego, Dept . o f Economics .
(1992) ' A Direc t Tes t fo r Changin g Trend' , Journal o f Business an d
Economic Statistics, 10: 289-99.
CLEMENTS, M . P. , an d HENDRY , D . F . (1991) , 'O n th e Limitation s o f Mea n
Square Erro r Forecas t Comparisons' , Discussio n pape r 138 , Oxfor d Institut e
of Economic s an d Statistics . Forthcoming, Journal o f Forecasting.
(1992), 'Forecastin g i n Cointegrate d Systems' , Discussio n pape r 139 ,
Oxford Institut e o f Economics an d Statistics .
DAVIDSON, J . E . H. , HENDRY , D . F. , SRBA , F. , an d YEO , S. (1978) , 'Economet ric Modellin g of th e Aggregat e Time-Serie s Relationshi p Between Consumers '
Expenditure an d Incom e i n th e Unite d Kingdom' , Economic Journal, 88 :
661-92.
DAVIDSON, R. , an d MACKINNON , J . G . (1992) , Estimation an d Inference i n
Econometrics, Oxfor d University Press.
DEATON, A . S. , an d MUELLBAUER , J . N . J . (1980) , Economics an d Consumer

References 31

Behavior, Cambridge University Press.


DICKEY, D . A . (1976) , 'Estimatio n an d Hypothesi s Testin g fo r Nonstationar y
Time Series' , Ph.D . dissertation , Iowa State University.
and FULLER , W . A . (1979) , 'Distributio n o f the Estimator s fo r Autoregress ive Tim e Serie s wit h a Uni t Root' , Journal o f th e American Statistical
Association, 74 : 427-31.
-(1981), 'Likelihoo d Rati o Statistic s fo r Autoregressiv e Tim e Serie s
with a Unit Root' , Econometrica, 49: 1057-72.
and PANTULA , S . G . (1987) , 'Determinin g th e Orde r o f Differencin g i n
Autoregressive Processes' , Journal o f Business an d Economic Statistics, 15 :
455-61.
and SAID , S . E . (1981) , Testin g ARIMA(p , 1, q) agains t ARM A
(p + l,q)', Proceedings of the Business and Economic Statistics Section,
American Statistical Association, 28 : 318-22.
BELL, W . R. , an d MILLER , R . B . (1986) , 'Uni t Root s i n Tim e Serie s
Models: Test s an d Implications', American Statistician, 40: 12-26.
-HASZA, D . P. , an d FULLER , W . A . (1984) , 'Testin g fo r a Uni t Roo t i n
Seasonal Tim e Series' , Journal o f th e American Statistical Association, 79 :
355-67.
DURLAUF, S . N. , an d PHILLIPS , P . C . B . (1988) , 'Trend s versu s Random Walk s
in Tim e Serie s Analysis', Econometrica, 56: 1333-54.
ENGLE, R . F. , an d GRANGER , C . W . J . (1987) , 'Co-integratio n an d Erro r
Correction: Representation , Estimatio n an d Testing' , Econometrica, 55 :
251-76.
and Yoo , B . S . (1987) , 'Forecastin g an d Testin g i n Co-integrate d Systems',
Journal o f Econometrics, 35: 143-59.
(1991), 'Cointegrate d Economi c Tim e Series : A n Overvie w wit h New
Results', i n R . F . Engl e an d C . W . J . Grange r (eds.) , Long-Run Economic
Relationships, Oxfor d University Press, 237-66 .
GRANGER, C . W . J. , an d HALLMAN , J . (1988) , 'Mergin g Short - an d
Long-run Forecasts : An Applicatio n of Seasona l Co-integratio n to Monthl y
Electricity Sales Forecasting', Journal of Econometrics, 40: 45-62.
-HYLLEBURG, S. , an d LEE , H. S . (1993) , 'Seasona l Co-Integration : Th e
Japanese Consumptio n Function' , Journal of Econometrics, 55: 275-98.
-HENDRY, D . F. , an d RICHARD , J.-F . (1983) , 'Exogeneity' , Econometrica,
51: 277-304.
ERICSSON, N . R . (1992) , Cointegration, Exogeneity an d Policy Analysis, Specia l
Issue, Journal of Policy Modeling, 14 , 3 and 4 .
CAMPOS, J. , an d TRAN , H.-A . (1990) , 'PC-GIV E an d Davi d Hendry' s
Econometric Methodology' , Revista de Econometrica, X, 7-117.
and HENDRY , D . F . (1985) , 'Conditiona l Econometri c Modelling : A n
Application t o Ne w House Prices i n the Unite d Kingdom' , i n Atkinson, A. C .
and Fienberg, S . E . (eds) , A Celebration o f Statistics, Springer-Verlag ,
251-85.
-HENDRY, D . F . an d TRAN , H.-A . (1992 ) 'Cointegration , Seasonality ,
Encompassing an d th e Deman d fo r Mone y i n th e Unite d Kingdom' , Discus sion Paper , Boar d o f Governor s o f th e Federa l Reserv e System , Washington,
DC.
ERMINI, L. , an d GRANGER , C . W . J . (1991) , 'Som e Generalization s o n th e

314 Reference

Algebra o f 7(1 ) Processes' , Workin g Paper , Departmen t o f Economics ,


University of Hawaii at Manoa .
ERMINI, L. , an d HENDRY , D . F . (1991) , 'Lo g Incom e vs . Linea r Income : A n
Application o f th e Encompassin g Principle' , Workin g Pape r no . 91-11 , De partment o f Economics, Universit y of Hawaii at Manoa.
EVANS, G . B . A. , an d SAVIN , N . E . (1981) , 'Testin g fo r Uni t Roots : 1' ,
Econometrica, 49: 753-79.
(1984), Testin g for Unit Roots : 2 ' Econometrica, 52 : 1241-69.
FRIEDMAN, M. , an d SCHWARTZ , A . J . (1982) , Monetary Trends i n th e United
States and the United Kingdom: Their Relation to Income, Prices, and Interest
Rates, 1867-1975, Universit y o f Chicago Press .
FULLER, W . A . (1976) , Introduction t o Statistical Time Series, John Wiley , New
York.
GALBRAITH, J . W. , DOLADO , J. , an d BANERJEE , A . (1987) , 'Rejection s o f
Orthogonality i n Rationa l Expectation s Models: Furthe r Mont e Carl o Result s
for a n Extende d Se t of Regressors', Economics Letters, 25 : 243-7.
GANTMACHER, F . R . (1959) , Applications o f th e Theory o f Matrices, Inter science, Ne w York.
GEL'FAND, J . M . (1967) , Lectures on Linear Algebra, Interscience , New York.
GEWEKE, J . (1986) , 'Th e Super-Neutralit y of Mone y i n th e Unite d States : A n
Interpretation o f the Evidence' , Econometrica, 54 : 1-21 .
GHYSELS, E . (1990) , 'O n th e Economic s an d Econometric s o f Seasonally' ,
paper presente d t o th e Sixt h World Congress o f the Econometri c Society.
GONZALO, J . (1990) , 'Compariso n o f Fiv e Alternativ e Method s o f Estimatin g
Long-Run Equilibriu m Relationships' , Discussio n Paper , Universit y of Cali fornia a t Sa n Diego.
GRANGER, C . W . J . (1981) , 'Som e Properties o f Time Serie s Dat a an d thei r Us e
in Econometri c Mode l Specification' , Journal of Econometrics, 16: 121-30.
(1983), 'Forecastin g Whit e Noise', i n A. Zellne r (ed.) , Applied Time Series
Analysis o f Economic Data, Bureau o f the Census , Washington, DC, 308-14 .
(1986), 'Development s i n th e Stud y of Co-integrate d Economi c Variables' ,
Oxford Bulletin of Economics an d Statistics, 48: 213-28.
-and HALLMAN , J . (1991) , 'Th e Algebr a o f 1(1) Processes' , Journal of Time
Series Analysis, 12 : 207-24.
-and LEE , T.-H. (1990) , 'Multicointegration' , i n G . F . Rhode s Jr . an d T . B .
Fomby (eds.) , Advances i n Econometrics, JA I Press , Greenwic h Conn. ,
71-84.
and NEWBOLD , P . (1974) , 'Spuriou s Regression s i n Econometrics' , Journal
of Econometrics, 2: 111-20 .
-(1977), 'Th e Tim e Serie s Approac h t o Econometri c Mode l Building' ,
in C . A . Sim s (ed.) , Ne w Methods i n Business Cycle Research, Federa l
Reserve Ban k o f Minneapolis.
-(1978), Forecasting Economic Time Series, Academi c Press , Ne w
York.
and WEISS , A . A . (1983) , 'Time-Serie s Analysi s o f Error-Correctio n
Models', i n S . Karlin , T . Amemiya , an d L . A . Goodma n (eds.) , Studies i n
Econometrics, Time Series an d Multivariate Statistics, Academi c Press , Ne w
York.

References 31

GREOOIR, S. , an d LAROQUE , G . (1991 ) 'Multivariat e Integrate d Tim e Series : A


General Error Correctio n Representatio n wit h Associated Estimatio n an d Tes t
Procedures', Discussio n pape r 53/G305 , INSEE, Paris .
GRIMMET, G . R. , an d STIRZAKER , D . R . (1982) , Probability an d Random
Processes, Oxford University Press.
HALDRUP, N. , an d HYLLEBERG , S . (1991) , 'Integration , Near-Integratio n an d
Deterministic Trends' , Discussio n Pape r no . 1991-15 , Aarhu s University ,
Denmark.
HALL, A . (1989) , 'Testin g fo r a Uni t Roo t i n th e Presenc e o f Movin g Average
Errors', Biometrika, 79 : 49-56.
(1990), 'Testin g fo r a Uni t Roo t i n Tim e Serie s using Instrumenta l
Variables Estimator s wit h Pre-tes t Data-Base d Mode l Selection' , Discussio n
Paper, Nort h Carolin a Stat e University.
-(1991), 'Mode l Selectio n an d Uni t Roo t Test s base d o n Instrumenta l
Variables Estimators', Discussio n paper, North Carolin a Stat e University.
HALL, A . D. , ANDERSON , H . M. , an d GRANGER , C . W . J . (1992) , ' A
Cointegration Analysi s o f Treasur y Bil l Yields' , Review o f Economics an d
Statistics, 74: 116-25.
HALL, P. , an d HEYDE , C . C . (1980) , Martingale Limit Theory an d Applications,
Academic Press , Ne w York.
HALL, R . E . (1978) , 'Stochasti c Implication s o f th e Life-Cycl e Permanen t
Income Hypothesis' , Journal of Political Economy, 86: 971-87.
HAMMERSLEY, J . M. , an d HANDSCOMB , D . C . (1964) , Monte Carlo Methods,
Methuen, London .
HANSEN, B . E . (1991) , Test s fo r Paramete r Instabilit y in Regression s wit h 1(1)
Processes', Discussio n paper . Universit y of Rochester .
(1992), 'Testin g fo r Paramete r Instabilit y i n Linea r Models' , Journal o f
Policy Modeling, 14 : 517-33.
HARVEY, A . C . (1989) , Forecasting, Structural Time Series Models an d th e
Kalman Filter, Cambridge Universit y Press.
HASZA, D . P. , an d FULLER , W . A . (1982) , 'Testin g for Nonstationary Paramete r
Specifications i n Seasona l Time-Serie s Models' , Annals o f Statistics, 10 :
1209-16.
HENDRY, D . F . (1984) , 'Mont e Carl o Experimentatio n i n Econometrics' , ch . 16
in Z . Griliche s an d M . D . Intrilligato r (eds.) , Handbook o f Econometrics, ii ,
North-Holland, Amsterdam, 937-76.
(1989), PC-GIVE: A n Interactive Econometric Modelling System, Institut e
of Economic s an d Statistics , Oxfor d University, Oxford .
(1991o), 'Usin g PC-NAIV E i n Teachin g Econometrics' , Oxford Bulletin o f
Economics and Statistics, 53, 199-223.
(1991 b), 'Economi c Forecasting' , Repor t t o th e Treasur y an d Civi l Servic e
Committee, UK .
and ANDERSON , G . J . (1977) , 'Testin g Dynami c Specificatio n i n Smal l
Simultaneous Models : A n Applicatio n t o a Mode l o f Buildin g Societ y Beha vior i n th e Unite d Kingdom' , ch . 8 c i n M . D . Intrilligato r (ed.) , Frontiers o f
Quantitative Economics, iii(a) , North-Holland, Amsterdam, 361-83 .
and CLEMENTS , M. P . (1992) , 'Toward s a Theory o f Economic Forecasting',
unpublished paper , Institut e of Economics an d Statistics , Oxfor d University.

316 Reference

HENDRY, D . F. , an d ERICSSON , N . R . (1991a) , 'A n Econometri c Appraisa l o f


U.K. Mone y Deman d i n Monetary Trends i n th e United States and th e United
Kingdom b y Milto n Friedma n an d Ann a J . Schwartz' , American Economic
Review, 81: 8-38 .
and ERICSSON , N . R . (19916) , 'Modellin g th e Deman d fo r Narro w Mone y
in th e Unite d Kingdo m an d th e Unite d States' , European Economic Review,
35: 833-81 .
-and MIZON , G . E . (1978) , 'Seria l Correlatio n a s a Convenien t Simplifica tion, no t a Nuisance : A Commen t o n a Stud y o f th e Deman d fo r Mone y b y
the Ban k of England', Economic Journal, 88 : 549-63.
(1992), 'Evaluatin g Dynami c Model s b y Encompassin g th e VAR' , i n
P. C . B . Phillip s (ed.) , Models, Methods, an d Applications o f Econometrics,
Basil Blackwell , Oxford.
and MORGAN , M . S . (1989) , ' A Re-analysi s o f Confluenc e Analysis' ,
Oxford Economic Papers, 41 : 35-52 : reprinte d i n N . d e March i an d C . L .
Gilbert (eds.) , History an d Methodology o f Econometrics, Clarendo n Press ,
Oxford, 1990 .
-MuELLBAUER, J . N . J. , an d MURPHY , A . (1990) , 'Th e Econometric s o f
DHSY', i n J . D . He y an d D . Winc h (eds.) , A Century o f Economics, Basi l
Blackwell, Oxford , 298-334.
and NEALE , A . J . (1987) , 'Mont e Carl o Experimentatio n usin g PC NAIVE', i n T . Fomb y an d G . Rhode s (eds.) , Advances i n Econometrics, vi ,
JAI Press, Greenwich , Conn. , 91-125.
-(1988), 'Interpretin g Long-Ru n Equilibriu m Solution s i n Conventiona l
Macro Models : A Comment' , Economic Journal, 98 : 808-17.
-(1991), ' A Mont e Carl o Stud y o f th e Effect s o f Structura l Break s o n
Unit Roo t Tests' , i n P . Hack l an d A . H . Westlun d (eds.) , Economic
Structural Change: Analysis an d Forecasting, Springer-Verlag, Vienna , 95-119 .
-and ERICSSON , N . R . (1990) , PC-NAIVE: A n Interactive Program fo r
Monte Carlo Experimentation i n Econometrics, Institut e o f Economic s an d
Statistics, Oxfor d University, Oxford.
PAGAN, A . R. , an d SARGAN , J . D . (1984) , 'Dynami c Specification' , ch . 18
in Z . Griliche s an d M . D . Intrilligato r (eds.) , Handbook o f Econometrics, ii,
North-Holland, Amsterdam , 1023-100 .
-and RICHARD , J.-F . (1982) , 'O n th e Formulatio n o f Empirica l Model s i n
Dynamic Econometrics', Journal of Econometrics, 20: 3-33 .
-and UNGERN-STERNBERG , T . VO N (1981) , 'Liquidit y an d Inflatio n Effects o n
Consumers' Behaviour' , ch . 9 in A . S . Deato n (ed. ) Essays i n th e Theory an d
Measurement o f Consumers' Behaviour, Cambridge Universit y Press, 237-60 .
HUNTER, J . (1992) , 'Test s o f Cointegratin g Exogeneit y fo r PP P an d Uncovere d
Interest Rat e Parit y in the UK' , Journal of Policy Modeling, 14 : 453-64.
HYLLEBERG, S . (1991) , Modelling Seasonally, Oxfor d University Press.
and MIZON , G . E . (1989a) , 'Cointegratio n an d Erro r Correctio n Mechan isms', Economic Journal (Supplement) , 99 : 113-25.
-(1989&), ' A Not e o n th e Distributio n o f th e Leas t Square s Estimato r
of a Random Wal k with Drift', Economics Letters, 29 : 225-30.
ENGLE, R . F. , GRANGER , C . W . J. , an d Yoo , B . S . (1990) , 'Seasona l
Integration an d Co-Integration' , Journal of Econometrics, 44: 215-28.

References 31

ILMAKUNNAS, P . (1990) , Testin g th e Orde r o f Differencin g i n Quarterl y Data :


An Illustratio n o f th e Testin g Sequence' , Oxford Bulletin o f Economics an d
Statistics, 52: 79-88.
IMHOF, P . (1961) , 'Computin g th e Distributio n o f Quadrati c Form s i n Norma l
Variates', Biometrika, 48: 419-26.
JARQUE, C . M. , an d BERA , A . K . (1980) , 'Efficien t Test s fo r Normality ,
Homoskedasticity an d Seria l Independence o f Regression Residuals' , Economics Letters, 6: 255-9.
JAZWINSKI, A . H . (1970) , Stochastic Processes an d Filtering Theory, Academi c
Press, Ne w York.
JOHANSEN, S . (1988) , 'Statistica l Analysi s o f Cointegratio n Vectors' , Journal o f
Economic Dynamics and Control, 12 : 231-54.
(1989), 'Th e Power o f the Likelihoo d Rati o Tes t fo r Cointegration', mimeo,
Institute o f Mathematical Statistics, Universit y of Copenhagen .
(1991fl), 'Estimatio n an d Hypothesi s Testin g o f Cointegratio n Vector s i n
Gaussian Vector Autoregressive Models', Econometrica, 59: 1551-80.
(1991&), ' A Statistical Analysi s of Cointegration fo r 1(2 ) variables', Institut e
of Mathematica l Statistics, Universit y of Copenhagen .
(1992a), 'Cointegratio n i n Partia l System s an d th e Efficienc y o f Singl e
Equation Analysis' , Journal o f Econometrics, 52: 389-402.
(19926), Testin g Wea k Exogeneit y and th e Orde r o f Cointegratio n i n U K
Money Demand', Journal of Policy Modeling, 14 : 313-34.
-and JUSELIUS , K . (1990) , 'Maximu m Likelihoo d Estimatio n an d Inferenc e
on Cointegrationwit h Application s t o th e Deman d fo r Money' , Oxford
Bulletin of Economics and Statistics, 52: 169-210.
KELLY, C . M . (1985) , ' A Cautionar y Not e o n th e Interpretatio n o f Long-Ru n
Equilibrium Solution s i n Conventiona l Macr o Models' , Economic Journal, 95:
1078-86.
KIVIET, J. , an d PHILLIPS , G . D . A . (1992) , 'Exac t Simila r Test s fo r Uni t Root s
and Cointegration , Oxford Bulletin of Economics and Statistics, 54: 349-67.
KLEIN, L . R . (1953) , A Textbook o f Econometrics, Row , Peterso n an d Com pany, Evanston, 111 .
KOERTS, J. , an d ABRAHAMSE , A . P . J . (1969) , O n th e Theory an d Application o f
the General Linear Model, Rotterda m Universit y Press.
KREMERS, J . J . M. , ERICSSON , N . R. , an d DOLADO , J . (1992) , Th e Powe r o f
Co-integration Tests' , Oxford Bulletin of Economics and Statistics, 54: 325-48.
KWIATKOWSKI, D. , PHILLIPS , P . C . B. , an d SCHMIDT , P . (1991) , Testin g the Null
Hypothesis o f Stationarit y agains t the Alternativ e o f a Uni t Root: Ho w Sur e
Are W e tha t Economi c Tim e Serie s Hav e a Uni t Root' , Cowle s Foundatio n
Discussion Pape r No . 979 .
LEYBOURNE, S . J. , an d MCCABE , B . P . M . (1992) , ' A Simpl e Tes t fo r
Cointegration', typescrip t Nottingham University.
LIN, C.-F. , an d TERASVIRTA , T . (1991) , Testin g th e Constanc y o f Regressio n
Parameters agains t Continuou s Structura l Change', Discussio n paper , Univer sity o f California at Sa n Diego .
MCCALLUM, B . T . (1984) , 'O n Low-Frequency Estimate s o f Long-Run Relation ships in Macroeconomics', Journal of Monetary Economics, 14 : 3-14 .
MACKINNON, J . G . (1991) , 'Critica l Value s fo r Co-Integratio n Tests' , i n R . F .

318 Reference

Engle an d C . W . J . Grange r (eds.) , Long-Run Economic Relationships,


Oxford Universit y Press, 267-76 .
MANKIW, N . G. , an d SHAPIRO , M . D . (1985) , 'Trends , Rando m Walk s and Test s
of th e Permanen t Incom e Hypothesis' , Journal o f Monetary Economics, 16 :
165-74.
(1986), 'D o W e Rejec t To o Often ? Smal l Sampl e Propertie s o f Test s
of Rationa l Expectation s Models', Economics Letters, 20: 139-45 .
MANN, H . B. , an d WALD , A . (1943) , 'O n Stochasti c Limi t an d Orde r Relation ships', Annals o f Mathematical Statistics, 14: 217-77.
MIZON, G . E . (1977) , 'Mode l Selectio n Procedures' , i n M . J . Arti s an d A . R .
Nobay (eds.) , Studies in Modern Economic Analysis, Basi l Blackwell, Oxford.
and HENDRY , D . F . (1980) , 'A n Empirica l Applicatio n an d Mont e Carl o
Analysis o f Test s o f Dynami c Specification', Review o f Economic Studies, 47 :
21-45.
MORGAN, M . S . (1990) , Th e History o f Econometric Ideas, Cambridg e Univer sity Press .
MOSCONI, R. , an d GIANNINI , C . (1992) , 'Non-Causalit y i n Cointegrate d Systems :
Representation, Estimatio n an d Testing' , Oxford Bulletin o f Economics an d
Statistics, 54: 399-417.
NANKERVIS, J . C. , an d SAVIN , N . E . (1985) , 'Testin g th e Autoregressiv e
Parameter wit h the r-statistic' , Journal of Econometrics, 27: 143-61 .
(1987), 'Finit e Sampl e Distribution s o f t an d F Statistic s i n a n AR(1)
model with an Exogenous Variable' , Econometric Theory, 3 : 387-408.
NELSON, C . R. , an d KANG , H . (1981) , 'Spuriou s Periodicit y i n Inappropriatel y
Detrended Tim e Series' , Journal of Monetary Economics, 10 : 139-62.
NEWEY, W . K. , an d WEST , K . D . (1987) , ' A Simpl e Positiv e Semi-Definit e
Heteroskedasticity an d Autocorrelation-Consistent Covarianc e Matrix' , Econometrica, 55: 703-8.
NOWAK, E . (1990) , 'Hidde n Cointegration' , Discussio n paper , Universit y o f
California a t Sa n Diego.
NYBLOM, J . (1989) , 'Testin g fo r th e Constanc y o f Parameter s ove r Time' ,
Journal o f th e American Statistical Association, 84: 223-30.
OSBORN, D . R. , CHIU , A . P . L. , SMITH , J . P. , an d BIRCHENHALL , C . R . (1988) ,
'Seasonality an d th e Orde r o f Integratio n fo r Consumption' , Oxford Bulletin
of Economics an d Statistics, 50: 361-78 .
OSTERWALD-LENUM, M . (1992) , ' A Not e wit h Fractile s o f th e Asymptoti c
Distribution o f th e Maximu m Likelihoo d Cointegratio n Ran k Tes t Statistics :
Four Cases' , Oxford Bulletin o f Economics an d Statistics, 54: 461-72.
PANTULA, S . G . (1991) , 'Testin g fo r Uni t Root s i n Tim e Serie s Data' , Econometric Theory, 5 : 265-71.
PARK, J . Y. , an d PHILLIPS , P . C . B . (1988) , 'Statistica l Inferenc e in Regression s
with Integrate d Processes : Par t F, Econometric Theory, 4 : 468-97.
PERRON, P . (1988) , 'Trend s an d Rando m Walk s in Macroeconomi c Tim e Series :
Further Evidenc e fro m a New Approach' , Journal of Economic Dynamics an d
Control, 12 : 297-332.
(1989), 'Th e Grea t Crash , th e Oi l Shoc k an d th e Uni t Roo t Hypothesis' ,
Econometrica, 57: 1361-402.
PHILLIPS, P . C . B . (1986) , 'Understandin g Spuriou s Regression s i n Economet -

References 31

tics', Journal o f Econometrics, 33: 311-40.


(1987o), 'Tim e Serie s Regressio n wit h a Uni t Root' , Econometrica, 55 :
277-301.
(19875), 'Toward s a Unifie d Asymptoti c Theor y o f Autoregression' , Biometrika, 74 : 535-48.
-(1988a), 'Reflection s o n Econometri c Methodology' , Economic Record, 64:
344-59.
(19885), 'Multipl e Regressio n wit h Integrate d Tim e Series' , Contemporary
Mathematics, 80 : 79-105.
-(1991), 'Optima l Inferenc e i n Co-integrate d Systems' , Econometrica, 59 :
282-306.
and DURLAUF , S . N . (1986) , 'Multipl e Tim e Serie s Regressio n wit h
Integrated Processes' , Review of Economic Studies, 53: 473-95.
and HANSEN , B . E . (1990) , 'Statistica l Inferenc e i n Instrumenta l Variables
Regression wit h 1(1) Processes' , Review of Economic Studies, 57 : 99-125.
and LORETAN , M . (1991) , 'Estimatin g Long-Ru n Economi c Equilibria' ,
Review of Economic Studies, 58: 407-36.
and OULIARIS , S . (1988) , Testin g fo r Co-integratio n usin g Principa l
Components Methods' , Journal o f Economic Dynamics an d Control, 12 :
205-30.
-(1990), 'Asymptoti c Propertie s o f Residua l Base d Test s fo r Cointegra tion', Econometrica, 58: 165-93.
and PARK , J . Y . (1988) , 'Asymptoti c Equivalenc e o f Ordinar y Leas t
Squares an d Generalize d Leas t Square s i n Regression s wit h Integrate d Vari ables', Journal of th e American Statistical Association, 83: 111-15.
-and PERRON , P . (1988) , 'Testin g fo r a Uni t Roo t i n Tim e Serie s Regres sion', Biometrika, 75 : 335-46.
PRIESTLEY, M . B . (1989) , Nonlinear an d Nonstationary Time Series Analysis,
Academic Press , Ne w York.
QUANDT, R . E . (1978) , 'Test s o f Equilibriu m vs . Disequilibriu m Hypotheses' ,
International Economic Review, 19 : 435-52.
(1982), 'Econometri c Disequilibriu m Models' , Econometric Reviews, 1 :
1-63.
RAPPOPORT, P. , an d REICHLIN , L . (1989) , 'Segmente d Trend s an d Non-Station ary Tim e Series' , Economic Journal, 99 : 168-77.
REIMERS, H . E . (1991) , 'Comparison s o f Test s fo r Multivariat e Co-integration',
Discussion Pape r no . 58, Christian-Albrechts University, Kiel.
RIPLEY, B . D . (1987) , Stochastic Simulation, Joh n Wiley , New York.
SAID, S . E. , an d DICKEY , D . A . (1984) , 'Testin g fo r Uni t Root s i n Autoregres sive-Moving Average Models of Unknown Order', Biometrika, 71 : 599-607.
SAIKKONNEN, P . (1991) , 'Asymptoticall y Efficien t Estimatio n o f Cointegratin g
Regressions', Econometric Theory, 1 : 1-21 .
SAMPSON, M . (1991) , 'Th e Effec t o f Paramete r Uncertaint y o n Forecas t Vari ances an d Confidenc e Interval s fo r Uni t Roo t an d Tren d Stationar y Time Series Models' , Journal o f Applied Econometrics, 6 : 67-76.
SARGAN, J . D . (1964) , 'Wage s an d Price s i n th e Unite d Kingdom : A Stud y i n
Econometric Methodology' , i n P . E . Hart , G . Mills , an d J . K . Whitake r
(eds.), Econometric Analysis fo r National Economic Planning, Butterworth ,

320 Reference

London; reprinte d i n D. F . Hendr y an d K. F . Wallis (eds.), Econometrics and


Quantitative Economics, Basil Blackwell , Oxford , 1984 .
SARGAN, J . D . (1980) , 'Som e Test s o f Dynami c Specificatio n fo r a Singl e
Equation', Econometrica, 48: 879-97.
and BHAROAVA , A . (1983) , 'Testin g Residual s fro m Leas t Square s Regres sion fo r Bein g Generate d b y th e Gaussia n Rando m Walk' , Econometrica, 51 :
153-74.
SCHMIDT, P. , an d PHILLIPS , P . C . B . (1992) , 'L M tes t fo r a Uni t Roo t i n th e
Presence o f Deterministi c Trends' , Oxford Bulletin o f Economics an d Statistics, 54: 257-87.
SCHWERT, G . W . (1989) , 'Test s fo r Uni t Roots : A Mont e Carl o Investigation' ,
Journal o f Business and Economic Statistics, 1: 147-59.
SHEPPARD, D . K . (1971) , Th e Growth and Role o f U K Financial Institutions
1890-1962, Methuen , London .
SIMS, C. A. (ed. ) (1977) , New Methods in Business Cycle Research, Federa l
Reserve Ban k o f Minneapolis.
STOCK, J. H. , an d WATSON , M . W . (1990) , 'Inference i n Linear Tim e Serie s
with Som e Uni t Roots' , Econometrica, 58 : 113-44.
SPANOS, A . (1986) , Statistical Foundations o f Econometric Modelling, Cambridg e
University Press .
STOCK, J . H . (1987) , 'Asymptoti c Propertie s o f Least-Square s Estimator s o f
Co-integrating Vectors', Econometrica, 55 : 1035-56.
and WATSON , M . W . (1988) , 'Variabl e Trend s i n Economi c Tim e Series' ,
Journal o f Economic Perspectives, 2: 147-74.
(1988&), 'Testin g fo r Commo n Trends' , Journal o f th e American
Statistical Association, 83: 1097-107.
- (1991) ' A Simpl e MLE o f Cointegratin g Vectors i n Genera l Integrate d
Systems', Typescript , Northwester n University ,
-and WEST , K . D . (1988) , 'Integrate d Regressor s an d Test s o f th e Perman ent Incom e Hypothesis' , Journal of Monetary Economics, 21: 85-96.
TODA, H. , an d PHILLIPS , P . C . B . (1991) , 'Vecto r Autoregression s an d Causal ity', Cowle s Foundation Discussio n Paper, 997 .
URBAIN, J.-P . (1992) , 'O n Wea k Exogeneit y i n Erro r Correctio n Models' ,
Oxford Bulletin o f Economics an d Statistics, 54: 187-207.
WEST, K . D . (1988) , 'Asymptoti c Normality , whe n Regressor s hav e a Uni t
Root', Econometrica, 56 : 1397-418.
WHITE, H . (1980) , ' A Heteroskedasticity-Consisten t Covarianc e Matri x Estima tor an d a Direct Tes t for Heteroskedasticity' , Econometrica, 48 : 817-38.
(1984), Asymptotic Theory fo r Econometricians, Academi c Press , Ne w
York.
WICKENS, M . R. , an d BREUSCH , T . S . (1988) , 'Dynami c Specification , the Lon g
Run an d th e Estimatio n o f Transforme d Regressio n Models' , Economic
Journal, 9 8 (Conference 1988) : 189-205 .
WOLD, H . (1954) , A Study i n th e Analysis o f Stationary Time Series, Almqvis t
and Wiksell , Stockholm .
YULE, G . U . (1926) , 'Wh y D o W e Sometime s Ge t Nonsens e Correlation s
Between Tim e Series ? A Stud y i n Samplin g and th e Natur e o f Tim e Series' ,
Journal o f th e Royal Statistical Society, 89 : 1-64 .

Acknowledgements for Quoted


Extracts
The author s ar e gratefu l t o the followin g fo r permission t o reproduce extracts:
Elsevier Scienc e Publishers , fo r materia l from N . G . Manki w and M . D . Shapir o
(1986), 'D o w e reject to o often : Small-sampl e properties o f rational expectations
models', Economics Letters, 20: 142-3.
The Review o f Economic Studies, fo r materia l fro m P . C . B . Phillip s an d B . E .
Hansen (1990) , 'Statistica l Inferenc e i n Instrumenta l Variables Regressio n wit h
1(1) Processes', Review of Economic Studies, 57: 116-17.
The Econometri c Societ y fo r materia l fro m D . A . Dicke y an d W . A . Fulle r
(1981), 'Likelihoo d Rati o Statistic s fo r Autoregressiv e Tim e Serie s wit h a Uni t
Root', Econometrica, 49: 1062-3.
David A . Dickey , Professor o f Statistics, North Carolin a Stat e University.
John Wile y & Sons , Inc. , fo r materia l fro m Wayn e A. Fulle r (1976) , Introduction to Statistical Time Series, 371-3.

This page intentionally left blank

Author Inde x
Abadir, K . M . 126 , 128
Abrahamse, A . P . J . 10 4
Ahn, S . K . 30 5
Anderson, G . J . 5 , 50, 140
Anderson, H . 27 2
Anderson, T. W . 70n. , 26 5 n., 285
Andrews, D . W . K . 31 0
Banerjee, A . 55 , 95, 97, 163 , 166, 177n.,
187, 191 , 192, 214, 215, 220, 222, 230 ,
233, 306 , 307
Bardsen, G . 47 , 53, 56, 62, 235
Bewley, R. 47 , 49, 53, 152 , 305
Bhargava, A. 101 , 104, 155, 176, 207, 209
Billingsley, P . 24 , 89
Birchenhall, C . R . 12 2
Bossaerts, P . 29 8
Boswijk, H . P . 235 , 305, 307, 310
Box, G . E . P . 10 , 13, 121, 305
Brandner, P . 28 2
Breusch T . S . 47 , 55 , 56 , 59 , 62 , 63 , 64
Campbell, B . 167n .
Campbell, J . Y . 30 6
Campos, J . 23 6
Chan, N . H . 91 , 96 n.
Chiu, A . P . L . 12 2
Choi, I . 30 6
Chong, Y . Y . 28 2
Chow, G . C . 194n .
Chu, C.-S . J. 31 0
Clements, M . P . 282 , 283, 285
Davidson, J . E . H . 5 , 50, 52, 140, 300
Davidson, R. 16 , 28
Deaton, A. S . 5 3
Dickey, D . A . 8 , 24, 82, 100 , 103, 107,
108, 112-23 , 169
Dolado, J. J . 55 , 97, 163, 166, 177n., 187,
191, 192 , 230
Dufour, J.-M . 167n.
Durlauf, S . N . 82 , 92 , 93 , 182 , 203, 238 ,
254, 262n .
Engle, R . F . 6 , 7, 17 , 18, 19, 43, 67, 84n.,
121, 122 , 137 n., 145 , 146, 152, 157-9,
163, 205n. , 208, 209, 211, 215, 231, 242 ,
256, 261, 278, 279, 282, 283, 287, 288 ,
305, 30 9

Ericsson, N . R . 18 , 28, 29, 41, 230, 232 ,


236, 238, 269, 292, 301
Ermini, L . 32 , 193-7
Evans, G . B . A . 10 4
Fisher, I. 6 5
Fisher, L . 30 5
Frances, P.-H . 23 5
Friedman, M . 29 , 190 , 194
Fuller, W . A . 8 , 13 , 14, 15 , 24, 26, 100-3 ,
106, 107 , 112-23, 169
Galbraith, J . W . 55 , 98, 166 , 177n., 191
Gantmacher, F . R . 14 0
Gel'fand, J . M . 14 0
Ghysels, E. 12 1
Giannini, C . 31 0
Gonzalo, J . 240 , 285, 286, 293, 294, 296-8
Granger, C . W . J. 6 , 7, 32 , 43, 69, 70, 81,
83, 84n. , 121 , 137n., 138 , 139, 145, 146,
157-9, 196 , 205n., 208, 209, 215, 231,
256, 257, 260, 261, 272, 278, 285, 287 ,
307, 309, 310
Gregoir, S . 30 4
Grimmet, G . R . 9 6
Haldrup, N . 9 6
Hall, A . 107 , 119, 130, 133, 306
Hall, A. D. 27 2
Hall, P . 23 , 24, 89n., 179n .
Hall, R . E . 164 , 165, 177
Hallman, J. 32 , 121
Hammersley, J . M . 2 8
Handscombe, D . C . 2 8
Hansen, B . E . 176 , 194, 238-41, 246,
248-51, 261, 294, 299, 310
Harvey, A . C . 30 3
Hasza, D . P . 122 , 123
Hendry, D . F . 5 , 17 , 28, 29, 32, 41, 47, 48,
49, 50 , 53 , 65 , 95 , 101 , 140, 162, 163,
193-5, 197 , 221, 229, 231-3, 235, 236 ,
238, 269, 278, 279, 282, 283, 285, 288 ,
292, 300, 301, 306-309
Heyde, C . C . 23 , 24, 89n., 179n .
Hunter, J. 31 0
Hylleberg, S . 96 , 121-3 , 152 , 170
Ilmakunnas, P . 12 1
Imhof, P . 104 , 207

324

Author Index

Jenkins, G . M . 10 , 13, 121


Johansen, S . 43 , 96 , 146 , 151 , 153 , 211 ,
256, 257 , 260 , 262 , 265 , 268 , 271 , 272 ,
277, 287 , 288 , 290 , 292 , 294 , 297, 298 ,
307, 31 0
Juselius, K . 271 , 272 , 277, 290 , 31 0
Kang, H. 19 1
Kelly, C . M . 47 , 64 , 65, 66
Kiviet, J . 104 , 105 , 169n. , 232
Klein, L . R. 31 0
Koerts, J . 10 4
Kremers, J . M . J. 230-3 , 306
Kunst, R . 28 2
Kwiatkowski, D. 30 4
Laroque, G. 30 4
Lee, H . S . 12 1
Lee, T.-H . 287 , 307
Lin, C.-F . 31 0
Loretan, M . 163 , 288 , 29 1
Leybourne, S . J. 30 4
McCabe, B . P . M . 30 4
McCallum, B . T. 47 , 64- 6
MacKinnon, J . G . 16 , 28, 211, 213 , 214
Mankiw, N . G . 164 , 165 , 166 , 177n. , 191
Mann, H . B . 1 4
Mizon, G . E . 101 , 152 , 162 , 170 , 231 , 235 ,
278, 285 , 288 , 292 , 300 , 30 7
Morgan, M . S . 5 , 308
Mosconi, R . 31 0
Muellbauer, J . N . J . 5 3
Murphy, A . 5 3
Nankervis, J. C . 10 4
Neale, A . J . 47 , 65, 221, 309
Nelson, C . R . 19 1
Newbold, P . 69 , 70, 81 , 83 , 138 , 139 , 19 1
Newey, W. K. Il l
Nowak, E . 30 8
Nyblom, J . 310
Orden, D . 30 5
Osborn, D . R . 122 , 12 3
Osterwald-Lenum, M. 268-76 , 292
Ouliaris, S . 133 , 134 , 208 , 210 , 21 1
Pagan, A . R . 4 8
Pantula, S . G. 120 , 121 , 30 6
Park, J . Y . 176 , 238
Perron, P. 107 , 109 , 111-19 , 133, 248n. ,
304, 306
Phillips, G . D . A . 104 , 105 , 169n. , 232
Phillips, P . C . B . 22 , 24, 43, 71, 72, 81-3 ,
86-8, 90-3 , 95 , 96, 101 , 107 , 109 , 111 ,
113, 114 , 119 , 129 , 133 , 134 , 163 , 175 ,

176, 179n. , 182 , 203 , 208 , 210 , 211 , 222 ,


230, 238-41 , 242-51, 254 , 261, 262n. ,
277, 288 , 290 , 291 , 294 , 304-6, 310
Ploberger, W . 31 0
Priestley, M . B . 4 0
Quandt, R . E . 3
Rappoport, P. 30 9
Reichlin, L . 30 9
Reimers, H . E . 28 6
Reinsel, G . C . 30 5
Richard, J.-F . 18 , 162
Ripley, B. D. 2 8
Rothenberg, T . 220n .
Said, S . E. 82 , 107 , 108 , 11 3
Saikkonnen, P . 30 5
Sampson, M . 28 2
Sargan, J. D . 5 , 48, 50, 101 , 140 , 155 , 176 ,
207, 209 , 229 , 231 , 238 , 28 5
Savin, N . E . 10 4
Schmidt, P . 101 , 304 , 306
Schwartz, A. J . 29 , 194
Schwert, G . W . 82 , 114 , 119 , 130 , 248n .
Shapiro, M . D. 164-6 , 177n. , 191
Sheppard, D . K . 13 9
Sims, C. A . 43 , 125 , 162 , 168 , 178 , 186- 9
Smith, G . W . 16 3
Spanos, A . 12 , 16 , 72 , 162
Stirzaker, D . R . 9 6
Stock, J. H . 43 , 119 , 152 , 158 , 163 , 172 ,
177, 178 , 185-90 , 192 , 211 , 278 , 291 ,
294, 296-8
Terasvirta, T . 31 0
Tiao, G . C . 30 5
Toda, H. 31 0
Tran, H.-A . 236 , 301
Ungern-Sternberg, T . vo n 28 8
Urbain, J.-P . 30 7
Wald, A . 14 , 43
Watson, M. W . 119 , 152 , 178 , 187-90 ,
211, 278 , 291 , 294 , 298
Wei, C . Z. 91 , 96n.
West, K . D . 105 , 111 , 169 , 171 , 172 , 177 ,
178, 185-7 , 18 9 n., 192
White, H . 15 , 16, 27, 86 , 89 , 90, 310
Wickens, M . R . 47 , 55 , 56 , 59 , 62, 63 , 64
Wold, H. 25 7

Yeo, S . 5
Yoo, B . S . 121 , 152 , 208 , 209 , 278 , 279 ,
282, 283 , 287 , 305
Yule, G . U . 69 , 70n., 71, 77, 138

Subject Inde x
absolute summabilit y 15 8
adjustment:
coefficient 15 5
disequilibrium 51 , 52, 55, 61
speed of 26 8
approximation theore m 12 3
asymptotic:
convergence 15 8
independence 16 , 17
normality 105 , 126, 134, 163, 177, 178,
180, 185 ; and drif t ter m 169-7 4
asymptotic standar d erro r (ASE ) 235
Augmented Dickey-Fulle r tes t (ADF ) 106 ,
108, 109 , 207-12, 232-4 , 238, 239 n.
asymptotic distributio n 127 , 128
comparison wit h non-parametrically ad justed D F 114- 9
use o f IV i n 11 9
autocorrelation 13 , 71-2, 83 , 129, 163, 191,
206, 207, 212, 221 n., 238-42, 244,
286, 29 2
function 12 , 1 3
autocovariance functio n 12 , 13
autoregressive:
-distributed lag (ADL) model 47-55 ,
60-4, 224 , 239, 242
error 83 , 114 , 191, 291
process 12 , 72, 251, 257-60; see also
autoregressive moving-average
(ARMA) proces s
representation (VAR) , see co-integrat ing: representations o f co-integrate d
systems
autoregressive integrate d moving-averag e
(ARIMA) process 13 , 38, 39, 221
autoregressive moving-averag e (ARMA )
process 12 , 13, 39, 84 , 85 , 88 , 107,
108
examples o f 32- 8
Bardsen transformation , se e transformation: Bardse n
Bartlett windo w 24 8
Bewley:
representation 152 , 153
transformation, se e transformation :
Bewley
bias 67 , 68, 191 , 244, 246-8, 249, 250, 290 ,
309
in AR(1 ) parameter 100 , 101
correction ter m 241 , 246

in estimate s o f co-integratin g vecto r


162-3, 214-30 , 238 , 239, 246, 250, 252
second-order 163 , 176, 238, 240, 246 ,
296, 29 7
simultaneity 238 , 241, 297, 298
borderline-stationary 39 , 95, 166 , 208, 225
see also near-integrate d proces s
bounds tes t 133 , 134
Brownian motio n 21 , 89 , 152 , 153, 241,
243, 246 , 247, 255, 278, 296, 297
see also Wiene r proces s
vector, 200- 3
Cayley-Hamilton theore m 14 0
central limi t theorem 16 , 73, 88, 89, 171,
295
functional (FCLT) , see functional centra l
limit theore m
Liapunov 16 , 27, 44
Lindeberg-Feller 2 7
co-integrating:
combination 279 , 283, 288
parameters 215 , 220, 222, 224, 248
rank 145 , 146, 262
regresssion 191 , 220, 229, 230; asymptotic theory o f 174- 7
representations o f co-integrated system s
(EC, MA , VAR) 146 , 153-7, 257-6 1
vector 137 , 138, 145, 158, 159, 163, 205,
214, 236 , 248, 252-6, 262, 267, 268,
276, 277, 285, 289, 290, 293; asymptotic distributio n o f estimator s of
293-8; biase s i n estimation of , see
bias; generalized 179 ; invariance
of 300- 3
co-integration 6-8 , 67 , 136-61, 167 , 189,
255, 268 , 300, 308
definition 14 5
in logarithm s or level s 198 , 199
multi- 287 , 307
seasonal 121 , 151
space 256 , 266-99, 273, 279
system 257 , 260, 261
testing for 9 , 134 , 176, 205-52, 286;
table o f critical value s 213 ; test
power 230- 5
common facto r 13 , 101, 231, 233, 235, 238,
239, 285, 296
common tren d 152 , 153, 278
companion for m 143 , 181-3, 272
concentrated serie s 88 , 89, 263, 264, 272

326

Subject Inde x

conditioning, imprope r 244 , 245


constant, inclusio n of 212-1 9
continuous mapping theorem 89 , 90
convergence:
in distributio n 1 6
of functional s o f Wiener processe s 91 ,
183
in probabilit y 14 , 15 , 16 , 86, 157 , 176 ,
185
to rando m variabl e 86 , 89
rate o f 14 , 125 , 158-9 , 168
weak 23 , 8 9
Cramer's theore m 173 , 17 7
cross-equation restriction s 155 , 24 5
decomposition 179 , 240 , 260, 296
deterministic trend , se e trend: non stochastic
de-trending 70 , 82 , 83 , 191
spurious 92- 3
diagonalization 265 , 266 , 273, 290
Dickey-Fuller:
distribution/critical value s 97 , 98, 100-3,
105, 106 , 121 , 129-32 , 167 , 169 , 170 ,
210-11, 268; table s 102- 3
test (DF ) 101 , 104-10 , 112 , 114-19 ,
207-12, 231 , 233 , 235 , 236 , 238 ,
239 n., 267 ; asymptoti c distribution
of 124-7 ; tests o n more tha n on e
parameter 113 , 114 , 11 6
differencing 11 , 30, 99 , 111 , 119 , 134 , 139 ,
147, 153 , 158 , 168 , 192 , 199 , 30 0
seasonal 121 , 12 2
diffusion proces s 9 6
discontinuity 95 , 96
Donsker's theore m 8 9
drift ter m 9 , 72 , 101 , 106 , 108 , 111 , 15 1
see also trend : non-stochasti c
dummy variable 134 , 270-6 , 288
Durbin-Hausman test s 30 6
Durbin-Watson tes t 73 , 81, 93
in co-integrating regression (CRD W
test) 176 , 207-8, 235-6
dynamic:
estimator 223 , 224-30 , 237 , 243 , 244,
247-51
modelling/regression 5 , 8 , 46, 47, 50 , 51,
106, 163 , 167-71 , 177 , 178 , 192 , 214 ,
221 n., 222-4, 225-6, 229 , 239, 243 ,
246, 24 7
omitted dynamic s 157 , 220 , 22 9
specification 168 , 240 , 242-4
system 27 8
Edge-worth expansio n 23 9 n.
eigen-:
value 134 , 140 , 143 , 144 , 179 , 265 , 266,
267, 268 , 270 , 277 , 292, 298

vector 265 , 270 , 292 , 298


empirical data/result s 29-32 , 40-2 , 52-3 ,
159, 194-7 , 235-8, 269-71, 292, 293
encompassing 193 , 198 , 23 8
endogeneity 176 , 24 6
Engle-Granger:
theorem 159-6 2
two-step procedure 153 , 157-61 , 205n.,
278, 285, 283
equilibrium:
dis- 2
miltiplier, se e long-run: multiplier
relationship 2-9 , 46 , 47, 50, 54, 55,
136-9, 192 , 205
state 2 , 4
static 4 8
ergodicity 16 , 17 , 88 , 8 9
error-correction 5 , 6 , 47 , 51 , 55 , 63, 64, 96,
224n., 246
mechanism 5-7 , 51-4 , 139 , 140 , 151 ,
232, 234 , 238 , 268 , 270-5 , 278 , 279 ,
294, 300 , 30 4
model 47 , 49-52, 55 , 63, 158 , 159 , 239 ,
243, 256 , 257, 260 , 61 , 268, 274 ,
277-9, 290 ; generalize d 50 , 52 , 60 , 61
representation 138 , 139 , 153 ; definition
of 145 ; derivation o f 154- 7
term 50-3 , 60 , 61, 140 , 151 , 155 , 157 ,
262
exact tes t 10 5
exogeneity 17-18 , 288
strict 19 , 67
strong 18 , 20, 222-3, 244 , 252 , 291
super 18-2 0
in uni t roo t test s 10 7
weak 18 , 20, 65-8 , 163 , 168 , 192 , 204 ,
223, 240 , 243-5, 248 , 251-2, 261, 268 ,
288-91, 295; importanc e i n co-inte grated processe s 252 , 307
finite sampl e biases , se e bia s
Fisher effec t 6 5
forecasting 278-8 5
multi-step 18 , 19
frequency:
domain 88n .
zero v. seasonal 12 2
Frisch-Waugh theore m 70n .
full-information maximum-likelihoo d
(FIML) 238 , 239 , 241 , 245 , 250 , 297 ,
298
fully modifie d estimation 238-41 , 243 , 244,
246-50
estimator 243 , 244 , 247, 248 , 249, 250
method 239 , 240
functional centra l limi t theore m
(FCLT) 22 , 89 , 124-7, 261 , 295 , 299

Subject Index
generalized co-integratin g vector 17 9
general-to-specific modellin g 168 , 192
Granger causalit y 18 , 291
Granger Representatio n Theore m 48 ,
146-53, 300
homogeneity 47 , 51, 52, 60, 61, 221, 222 ,
231, 23 6
impact matri x 151 , 260
inconsistent regressio n 164-8 , 190 , 191,
229, 230
innovation sequenc e 12 , 85-7, 183
instrumental variable s (IV) 55 , 59, 62, 63,
119, 130- 3
integrated process 1 , 6, 7, 11 , 12, 21, 39 ,
69-71, 73, 136-8, 162-9 9
asymptotic theory o f 86-9 1
near-, see near-integrated process
properties o f 84- 6
see also non-stationar y proces s
integration:
order of , se e ordej r o f integration
seasonal, see seasonal integratio n
intercept 72 , 151 , 210, 232, 234, 271, 272 ,
273, 274
interim multiplie r representation 15 3
invariance 20 , 282, 283
principle 22 ; see also functiona l central
limit theore m
invertibility 13 , 84, 108 , 242
invertible system 148 , 149, 258, 259, 266
Jacobian 62 , 63
Johansen maximum-likelihoo d procedure 211 , 262-9, 285, 286, 300
power o f 277 , 278
Kronecker product 18 1
lag 9 , 11 , 47, 50, 52, 66, 106-8 , 123 , 225 ,
248, 250, 251, 286, 303
length 248 , 286
mean 28 7
polynomial 22 9
structure 208 , 222, 229
truncation paramete r 110 , 111, 113
latent roo t 13 , 104 , 142, 144, 158, 224
law o f large numbers 86 , 90
life-cycle hypothesi s 164 , 188
likelihood rati o test s 153 , 277, 278, 294,
295
limited-information maximum-likelihoo d
(LIML) 264 , 28 5
linear system 30 0
logarithms v. level s 29-32 , 193- 7

327

long-run:
covariance matrix 240 , 241, 245-7, 252,
290
multiplier 8 , 47-9, 51 , 54, 57, 59-64,
188, 230 , 235, 293, 295, 296; variance
of estimate s o f 61- 4
relationship 2 , 7, 8 , 140 , 220; see also
co-integrating: vecto r
response 15 3
solution 50 , 64-8
marginal:
distribution 18 , 19 , 290, 295
process 240 , 243-5, 248n .
marginalization 30 4
market clearing 3
martingale difference sequence (MDS ) 11,
12, 21, 163 , 179n., 185, 242, 244, 245 ,
247
maximal-eigenvalue statisti c 267 , 273
maximum-likelihood 159 , 241-5, 256 , 262 ,
264, 265, 266, 267, 269, 277, 283, 285,
286, 288
full-information, se e full-information
maximum-likelihood
limited-information, se e limited-information maximum-likelihood
mean la g 144 , 287, 301
memory 8 5
mixing:
coefficient 8 7
strong 16 , 17 , 87
uniform 16 , 17
mixingale 17 9 n.
Monte Carlo :
method 9 , 27, 28
response surface s 28 , 211, 213, 214
results 73-83 , 101 , 106 , 108, 114,
117-19, 133, 165, 214, 215, 222-3,
225-9, 232-5, 248-51, 279, 282, 283 ,
285, 291, 298
standard erro r 7 5
moving-average 12 , 88; see also auto regressive moving-average (ARMA)
process
component o f errors 10 7
negative components 113 , 119, 250, 304
parameter 24 8 n.
representation 133 , 153, 155, 156
seasonal filte r 12 1
multiple roots 119-2 2
multiplier, long-run, se e long-run: multiplier
near-integrated process 95-7 , 99 , 164, 166,
225, 231, 277
nearly-inconsistent regressio n 229 , 230
non-centrality parameter 97 , 98

328

Subject Inde x

non-parametric:
correction/test 9 , 108-10 , 114-9 , 130 ,
208, 210 , 211 , 238-40 , 25 1 asymptotic
theory o f 129-3 0
estimation 244 , 248 , 249
nonsense regressio n 69 , 80, 138
see also spuriou s regressio n
non-stationarity 4 , 8, 9, 65, 67, 72, 81-4 ,
134, 150 , 21 5
transformation t o stationarit y 69 , 70, 82,
83, 99 , 134 , 14 7
non-stationary process 5 , 6 , 9 , 38 , 39, 70,
71, 81 , 163 , 24 4
v. integrate d proces s 1 2
normality 180 , 28 9
asymptotic, se e asymptotic : normalit y
normalization 57-9 , 265 , 285
nuisance parameter s 100 , 104-6 , 172 , 176 ,
207, 21 0
order:
of magnitud e 14 , 15 , 21 , 9 0
in probabilit y 14 , 1 5
order of integration 6-9 , 48 , 79-80, 84 , 85,
147, 151 , 190-2 , 258
defined 8 4
first 137 , 17 7
higher 138 , 157 , 16 3
zero 13 7
Ornstein-Uhlenbeck proces s 9 6
orthogonal complemen t 14 7
orthogonality 86 , 149 , 151 , 242 , 244 , 245 ,
258n., 259,260 , 273
asymptotic 10 7
testing 164- 8
over-identification tes t 278 , 30 0
over-rejection 206 , 210 , 28 6
parameterization 48 , 207 , 208 , 250 , 274 ,
275
of dynamic s 22 1
exact 105 , 224
of nearly-integrate d processe s 9 5
over-/under- 224-9 , 262
permanent incom e hypothesi s 164 , 177 ,
178, 188 , 19 0
Perron-Phillips/Phillips test, se e non-para metric: correction/tes t
polynomial matrice s 140-5 , 152 , 257
isomorphism wit h companion mat rices 142- 4
power serie s expansio n 9 7
power o f tests 8 , 15 , 96, 101 , 108 , 113 , 198 ,
208, 214 , 223-4 , 230-5, 277 , 278 , 28 6
pre-determinedness 1 9
random wal k 11 , 21, 22, 24-9 , 38 , 71, 72,
82, 87 , 93 , 100 , 101 , 114 , 191 , 220 , 272

in logarithm s o r level s 19 3 n.
see also unit root
rank:
co-integrating, se e co-integrating: ran k
full 56 , 58 , 59 , 144 , 147 , 151 , 181 , 258 ,
260, 28 7
reduced 144 , 147 , 151 , 256 , 257 , 264 ,
285, 287 , 288 , 30 1
recursive estimatio n 194n. , 221 n.
re-parameterization 67 , 157 , 168 , 189 , 191 ,
222
see also transformatio n
representation theorem, see Granger Rep resentation Theore m
Said-Dickey tes t 107 , 108
compared wit h Perron-Phillip s tes t 11 3
Sargan-Bhargava test , se e Durbin-Watson
test (CRD W test)
Schwarz Criterion 194 , 28 6
seasonal adjustmen t filte r 301 , 303
seasonal integratio n 121- 3
sequential cu t 18 , 19
similar test s 100 , 104 , 105 , 16 9 n.
size distortion s 113 , 133 , 166 , 16 7
Slutsky's theore m 89 , 173
spurious:
correlation 70 , 71; in de-trended rando m
walks 82 , 8 3
regression 69-81 , 83 , 92-5, 134 , 138-9 ,
158, 159 , 162 , 191 , 230 , 25 5
stacked form , se e companion for m
static regression 162 , 163 , 167 , 205 , 214 ,
220-3, 231 , 238 , 246 , 251 , 29 6
comparison wit h dynami c 167 , 168 ,
224-30
example o f 23 6
see also Engle-Granger: two-ste p pro cedure
stationarity 1 , 4, 12 , 13 , 17 , 69, 212 , 26 2
stationary proces s 4 , 5, 6 , 7, 9, 11 , 29, 38,
39, 47 , 85 , 86 , 134 , 138 , 256 , 257 , 267 ,
279
strictly 11 , 1 2
weakly/second-order/covariance 11 , 1 2
stochastic:
differential equatio n 9 6
trend, se e trend, stochasti c
structural representatio n 261 , 30 3
super-consistency 158 , 176 , 191 , 214 , 220 ,
230, 251 , 294 , 296
total effect 142 , 25 7
trace 267 , 273
transformation 6 , 28-32, 88, 111 , 125 ,
178-80, 185
ADL 51 , 59
ADL t o EC M 60 , 61 300, 301

Subject Inde x
transformation (cont.):
Bardsen 51 , 54-9, 62 , 63
Bewley 51 , 53-6, 58n. , 59, 60, 62, 63
equivalence of , 54-60 , 62 , 64
linear 47 , 51 , 60, 61, 63, 64, 145 , 152 ,
178, 224 ; in dynamic regression 167-8 ,
177, 178 ; o f polynomial matrice s 144 ,
145
logarithmic 99 , 192- 9
trend (inclusio n of) 5 , 9, 82, 100, 101 , 106 ,
125, 185 , 211 , 212 , 213 , 214 , 236
non-stochastic (deterministic ) 6 , 20, 21,
69-72, 82, 84, 125 , 146 , 151 , 172 , 173 ,
185, 187 , 27 5
stochastic 153 , 169 , 172 , 174 , 179 , 180 ,
185, 187 , 191 ; se e also commo n trend ;
unit roo t
sums of powers o f 2 0
unit circl e 13 , 104 , 123 , 141 , 149 , 15 8
unit root 8 , 9 , 13 , 38, 72, 83-6, 95 , 96,
133, 144 , 147 , 163 , 177 , 185 , 215 , 236 ,
255, 258-60, 267, 270 , 287 , 289
multiple 12 2

329

near- 95 , 99; see also near-integrate d


process
in polynomial matri x 14 1
testing for 8 , 96, 99-135, 206, 211 , 215 ,
306; descriptiv e valu e 306 ; in marginal
processes 306 ; a t seasona l frequency
120-3
variance-covariance matri x 62 , 107 , 183 ,
189, 243 , 252-4 , 273
long-run 248 , 249
vector autoregressio n (VAR ) 278 , 279 ,
283, 291 , 29 2
vectoring operato r 181 , 273
Wald statisti c 127 , 188 , 23 9
Wiener proces s 21-3 , 26 , 86-91, 93 , 96,
131, 188 , 189 , 241 , 261 , 268
distribution 191 , 22 1
functional o f 24 , 90 , 93 , 125-8 , 163 , 188 ,
300
multivariate 182-4 , 200-3, 268
white noise 11 , 12, 22, 87, 106 , 23 1
Wold Decompositio n Theore m 257 , 258

You might also like