You are on page 1of 64

Lecture 5: Stationarity and Unit Roots

Reading Asteriou P229-239 and Chapter 16


(or Enders Chapter 4)


1
Background
Up to now we have been mostly looking at
cross-section methods, today we will begin to
move towards time-series econometrics and
we will focus on this for the remainder of the
course.
Sometimes the methods we have looked at
are applicable to time-series however, in
many other cases they are not applicable (or
not directly at least).
2
Background
In a cross section model such as:
Y
i
= +
1
X
1i
+
i
Recall that E(
i
) was 0.
Thus if we have a sample then on average Y will be known if we
know X
1
. We can write this as:
E(Y
i
|X
1i
) = +
1
X
1i

E(Y
i
|X
1i
) is pronounced expectation of Y
i
given X
1i
Implication: we dont need to know the errors to know what Y will be
on average as long as we know X
1
. [since on average errors are 0]
Another way to say this is, if we know X then we know the average
value that Y will take.
The important point is for a given value of X, the average for Y is
constant [since no random in E(Y
i
|X
1i
) = +
1
X
1i
].
3
This is not always the case for time-series and
this causes us some problems

(Aside: recall that the average for Y was used
to find our estimates for and in OLS)

4
0 50 100 150 200 250 300 350 400 450
.25
.5
.75
1
1.25
1.5
1.75
2
2.25
2.5
Many Economic Series Do not Conform to the
Assumptions of Classical Econometric Theory
Share Prices
0 50 100 150 200 250
.35
.4
.45
.5
.55
.6
.65
.7
Exchange Rate
1960 1965 1970 1975
8.7
8.8
8.9
9
9.1
9.2
9.3
Income
5
Before considering this though we
will talk a little about time series.
6
Time series data
We view realisations of economic time series as being
generated by a stochastic process;
the particular realisation of a variable at one point in time is just one
possible outcome from an inherently random variable
So we can think of this as
Y
t
= () +
t

Where (.) is some relationship to observed variables
When we are dealing with time series data, a common
starting point is to ask:
Well what can the past values of the series itself tell us
about the likely future path?
For the next few lectures we will be focusing on this
question
7
Difference Equation
Since we will (for now) consider the variable
(Y
t
) to depend only on its past and future
values, much of the maths underlying this
type of analysis is based on difference
equations.
E.g. Y
t
= + Y
t-1
+
t

However not necessary for us to go deep into
the maths in this course
8
Some observations regarding time-series
When we look at a time-series, such as the one
above, we often notice that:
The stochastic process generating economic data show a
distinct tendency to sustain movements in one direction
Mean of the series is often not constant
0 50 100 150 200 250
.35
.4
.45
.5
.55
.6
.65
.7
Exchange Rate
9
We distinguish between two types
of series:
Stationary and Non-stationary

[Very important to understand which you are
dealing with as OLS is wrong when the series
is non-stationary!!!]
10
Some Notation:
Recall:
Mean X
t
= E(X
t
) sometimes we just say average!



Covariance measures the degree of association between different
variables, for example between X
t
and Y
t
; and can also measure the
association between the same variable at different points in time, for
example X
t
and X
s
for ts.

Correlation (rho) = and

Where are the standard deviations of X and Y.
Measures the strength of the linear relationship between X and Y.
11
Stationary Series
A stationary series has:
Constant mean
E(Y
t
) = u
Constant variance
Var(Y
t
) = E (Y
t
u)
2
=
2
The covariance between two time periods depends only on the
lag between the time periods and not on the point in time at which
the covariance is computed

k
= Cov(Y
t
,Y
t-k
) = Constant for all t and k0.
E.g. Cov(Y
2008
, Y
2008-s
)=0.5 then Cov(Y
2004
, Y
2004-s
) should also be
0.5 since the gap is 4 years here too! This should be the same for
any set of observations s years apart
Basically this is saying that there shouldnt be periods when the
past doesnt matter much and periods when it matters a lot the
effect of the past should be the same in all periods


12
Example in Stata
13
Note, there are other types of
stationarity. However we wont deal
with them in this course!
14
Non-Stationary Series
There may be no long-run mean to which the
series returns [e.g. there may be a trend in the
data]

The variance may be time dependent and go to
infinity as time approaches to infinity

Theoretical autocorrelations may not decay over
time but, in finite samples, the sample
correlogram dies out slowly
[well talk about this later today]

15
Conditions for Stationarity in a simple model
Consider the model:
Y
t
= Y
t-1
+
t
Stationarity requires that < 1
So that the impact of a disturbance dies out over time!
To see why consider the alternatives:
If > 1 we have an explosive series (because shocks compound
over time rather than dying out)
To see this: Remember we assume that the same process is at work in
each period: so Y
t-1
= Y
t-2
+
t-1
Y
t
= Y
t-1
+
t
becomes Y
t
= [Y
t-2
+
t-1
] +
t
Y
t
=
2
Y
t-2
+
t-1
+
t
Note that the shock from the previous period is having a bigger impact in this
period, if we kept substituting we would see the shock from n periods ago has an
effect of
n
now so the shock keeps having a bigger and bigger impact!
If = 1 then the series has a unit root => a shock has the
same effect every period i.e. never dies out
16
In Economics, sustained explosive series
are less common (output is an example),
however unit roots are common in many
series so we will spend a little time
considering them now.

17
Thanks to Tom Pierse for the pointing out the output example!
18
Using OLS.
Recall that the OLS estimator is:



Notice this includes the mean, the variance and the
covariance
This is why we need the three of these to be finite
constants.
Otherwise we would not get a stable unique
estimate for the coefficient using OLS!!





( )( )
( )
) (
) , (

1
2
1
X Var
Y X Cov
X X
Y Y X X
N
j
j
N
j
j j
=

=
=
|
The Random Walk Model: An Example of
a series with a unit root.
Stock Price
The day to day change in a stock price should have a mean
value of 0
y
t
= y
t-1
+
t
y
t
= the price of the stock on day t

t
= a random disturbance term that has an expected value of
zero

Knowledge that a capital gain can be made by buying a share on
day t and selling it for an expected profit the next day will push up
the price on day t, leaving no expected gain.
Thus the price should be yesterdays price plus whatever random
news arrives today so =1:
y
t
= y
t-1
+
t
Note that if we look at the change: y
t
= (y
t
y
t-1
) =
t
=> unpredictable as it is
just the randomness


19
Suppose we were to plot the series
Y
t
- what would it look like?


Well each period, wed start at the value from
last year then wed add a bit on (which may be
negative) but the change is random so may
look something like:
20
Non-stationary time series: Random Walk


Random Walk
0
0.5
1
1.5
2
2.5
y
t
=y
t-1
+
t

t
~n(0,
2
)

21
A key point is, if Id told you the value at the
start was close to zero, you would still not
have been able to accurately predict the actual
outcome at the end, since there was just a
series of random movements..

. Like watching a drunk guy (or girl) stumbling
along, youre never sure which way theyll walk next!
[Our best guess is our current information]

22
Why is a random walk nonstationary?
Y
t
= Y
t-1
+
t

E(
t
) = 0,
Denote the initial value of Y = y
0

Mean:
y
1
= y
0
+
1

y
2
= (y
0
+
1
) +
2

y
3
= (y
0
+
1
+
2
) +
3

y
t
= y
o
+
t

E (Y
t
) = y
0
= Constant
[Satisfies 1
st
requirement for stationarity]

Variance:
Var(Y
t
) = Var(Y
0
) + Var(
1
) ++ Var(
t
)
= 0 +
2
++
2

= t
2

(variance is not constant through time)


Hence as t the variance of Y
t
approaches infinity

[Thus violates the 2
nd
requirement for stationarity => non-stationary]




23
E.g. Exchange rates may have a unit root:
0 50 100 150 200 250
.35
.4
.45
.5
.55
.6
.65
.7
Exchange Rate
24
Return to the Random Walk model
y
t
=

y
t-1
+
t
y
1
= y
0
+
0

y
2
= y
1
+
1

= (y
0
+
0
) +
1

y
t
= y
0
+
i


Thus the general solution if y
0
is a given initial condition is
y
t
= y
0
+

i.e. our best prediction of the value of the series t periods
from now is the value now (since E( )=0)

=
t
i 1

=
t
i 1
t
c
25

=
t
i 1
t
c
A random walk is not the only type
of non-stationary series though
26

Non-stationary time series
UK GDP (Y
t
)
The level of GDP (Y) is not constant and the mean increases over time. Hence the level of
GDP is an example of a non-stationary time series.
Here though the non-stationarity looks to be caused by a trend rather than a random walk!!
GDP Level
0
20
40
60
80
100
120
1
9
9
2

Q
3
1
9
9
3

Q
2
1
9
9
4

Q
1
1
9
9
4

Q
4
1
9
9
5

Q
3
1
9
9
6

Q
2
1
9
9
7

Q
1
1
9
9
7

Q
4
1
9
9
8

Q
3
1
9
9
9

Q
2
2
0
0
0

Q
1
2
0
0
0

Q
4
2
0
0
1

Q
3
2
0
0
2

Q
2
2
0
0
3

Q
1
27
When we have a series with a unit
root or a trend, differencing it may
lead to a stationary series..
28


First difference of GDP is stationary
Y
t
=Y
t
-

Y
t-1

since it fluctuates around about 0.6, and the variance is stable.








Stationary time series
GDP Growth
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1
9
9
2

Q
3

1
9
9
3

Q
2

1
9
9
4

Q
1

1
9
9
4

Q
4

1
9
9
5

Q
3

1
9
9
6

Q
2

1
9
9
7

Q
1

1
9
9
7

Q
4

1
9
9
8

Q
3

1
9
9
9

Q
2

2
0
0
0

Q
1

2
0
0
0

Q
4

2
0
0
1

Q
3

2
0
0
2

Q
2

2
0
0
3

Q
1

29
Testing for non-stationarity

- Graphically



30
Testing For non-stationarity
Constant covariance - use of correlogram
Covariance between two values of Y
t
depends only on the
difference apart in time for stationary series.

Cov(Y
t
,Y
t-k
) = (k)


i.e. covariance is constant in t, and depends only on the length
apart of the observations.

(A) Covariance for 1980 and 1985 is the same as for 1990 and
1995. (i.e. t = 1980 and 1990, k = 5)
(B) Covariance for 1980 and 1987 is the same as for 1990 and
1997. (i.e. t = 1980 and 1990, k = 7)
But:
(C) Covariance for 1980 and 1985 may differ from that between
1990 and 1997, since gap is different (i.e. k differs between the
two periods)
31
Testing For non-stationarity
Constant covariance - use of correlogram
One simple test is based on the autocorrelation
function (ACF)
The ACF at lag k, denoted by
k
is defined as:

k
=
k
/
0
= Covariance at lag k / Variance
Note: if k=0 then
o
= 1

As with any correlation coefficient
k
lies between -1
and 1
The quicker
k
goes to zero as k increases, the quicker
shocks are dying out
A plot of
k
against time is known as the sample
correlogram

32

Note:
When dealing with a single variable Y
t
. It
turns out that the autocorrelation equals
the coefficient ()
E.g. if we have: Y
t
= Y
t-1
+
t
Then:
1
=
Since: Y
t-1
= Y
t-2
+
t-1
Y
t
= (Y
t-2
+
t-1
) +
t
Y
t
=
2
Y
t-2
+
t-1
+
t

2
=
2


33
Key point is:
For Y
t
= Y
t-1
+
t



In a stationary series, shocks should die out
quickly if || is low but should not die out
quickly in a non-stationary series (i.e. if
||=1)!!!
[Hard to tell though when || is close to 1]
34

Non-stationary time series
UK GDP (Y
t
) - correlogram



0 1 2 3 4 5 6 7
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
ACF-Y
35

White noise process
A white noise process is completely random.
y
t
=
t

t
~ n(0,
2
)
We can think of a white noise process as:
y
t
= 0y
t-1
+
t
=
t

This is stationary because:
mean of
t
is 0,
variance is constant (
2
)
Cov(
t
,
t-k
)= 0 [since
t
is random!]
So Correlation (cov/var) will be 0 too
The ACF for a white noise process will have bars
randomly distributed around 0.
36


WHITE NOISE PROCESS
y
t
=
t

t
~n(0,
2
)






Correlogram:


Stationary time series
White Noise
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
37
Other processes
So far we have seen;
Random Walk process: y
t
= 1y
t-1
+
t

White Noise process: y
t
= 0y
t-1
+
t
=
t

But often we have processes like:
Y
t
= Y
t-1
+
t
Examples:
y
t
= 0.2y
t-1
+
t

y
t
= 0.8y
t-1
+
t
y
t
= -0.2y
t-1
+
t


These processes are also stationary if ||<1 since
overtime the impact of previous errors diminishes
and eventually dies out.
38
Other Stationary processes
If 0<<1 e.g. y
t
= 0.2y
t-1
+
t
the correlations decrease to 0
quickly (if is close to 0 the correlations fall quickly)




If -1<<0 e.g. y
t
= -0.2y
t-1
+
t
the correlations decrease to 0
quickly but oscillate between positive and negative
correlations. (Again, if is close to 0 the correlations fall
quickly)

39
We will use ACF plots quite a lot so its
important to get used to understanding
them (later in the course well look at
PACF plots too)!
40
Statistical Tests of Stationarity

But first well take a break!



41
Box Pierce Test
The Box-Pierce statistic tests the joint hypothesis
that all
k
are simultaneously equal to zero. (i.e.
series is white noise process (and thus stationary)
The test statistic is approx. distributed as a
2

distribution with m df.


n = sample size
m = lag length
If B.P. >
2
m
() then reject H
0
:
k
= 0

2
1

. .
k
M
k
n P B

=
=

42

Ljung-Box Q-Statistic
Box-Pierce statistic performs poorly in small sample so E-
views uses the Ljung-Box (Q) statistic instead.




n is the sample size,
k
is the sample autocorrelation at lag k,
and m is the number of lags being tested.
Null hypothesis is still that all
k
are simultaneously equal to
zero [more of a test whether the process is white noise when
used on the ACF]
Process may be stationary but not white noise!!

43
A Second Test of Stationarity
The Dickey Fuller Test
Consider the model:
Y
t
= Y
t-1
+
t

u
t
is a stochastic white noise error term
If H
0
: = 1 then Y
t
has a unit root and is nonstationary.
By subtracting Y
t-1
from both sides:
Y
t
-Y
t-1
= Y
t-1
Y
t-1
+
t
Y
t
= (-1)Y
t-1
+
t

This can be rewritten as Y
t
= Y
t-1
+
t
[where = (-1) and Y
t
= Y
t
-Y
t-1
]
Now for stationarity, test the H
0
: =0


44
A Second Test of Stationarity
The Dickey Fuller Test
We can test H
0
: = 1 [or H
0
: =0 ] by a t statistic.
However the t statistic does not follow a students t-distribution.
Under the null hypothesis of nonstationarity, Dickey and Fuller
(1979) have tabulated the critical values of the conventionally
calculated t- statistic. These critical values have been further
extended and improved by MacKinnon (1991)
If by the critical values in MacKinnon(1991):
We cannot reject H
0
: =0,
We cannot reject H
0
: = 1,
We cannot reject the unit root hypothesis,
We cannot reject nonstationarity
If we can reject H
0
: =0,
We can reject H
0
: = 1,
We conclude the series is stationary.
45
The Augmented Dickey Fuller Test
The error term in the DF test is unlikely to be white noise
i.e. there is usually autocorrelation present!
To whiten the error terms, empirical studies often
specify lags of the dependent variable as follows:

Y
t
= Y
t-1
+ +
t

This is known as the Augmented Dickey Fuller statistic
(ADF)
The appropriate lag length can be determined by AIC,
SBC or through an LM test
The H
0
: =0 is that there is a unit root (as with the DF
test)
i t
p
i
i
y

=
A

1
|
46
Tests
Note there are many statistical tests for stationarity:
DF and ADF test
Discussed earlier.
Ho: There is a unit root (i.e. series is not stationary)
Phillips Perron (PP) test:
Nonparametric test
Ho: There is a unit root (i.e. series is not stationary)
KPSS
Ho: The series is Stationary
If the tests agree you can be reasonably confident
but if not, you need to look into the assumptions etc.

47
Trend Stationary
- Stochastic trend
- Deterministic trend

48
Stochastic trend:
In this case the trend is due to
shocks not dying out so we see a
trend
49
Stochastic trend: e.g. random walk model
From earlier remember E(y
t
) = y
o

If y
t
is the most recent observation/realisation of the
data generating process of y then y
t
is the unbiased
estimator of all future values of y
t+s
u
t
is a random shock in the random walk model, but it has a
permanent effect on the conditional mean
E(y
t+s

t
) = y
t
, but E(y
t+s

t+1
) = y
t+1

Because u
t
, u
t+1
are random we have no reason to
believe that the series mean will revert to y
t
from y
t+1
.
The random shock has permanent effects on the mean.
Such a sequence is said to have a stochastic trend
The series may seem to increase/decrease due to a
sequence of positive (negative) shocks

50
Stochastic Trend
We have seen that a series with a unit root
will wander aimlessly around. However if we
have an intercept in the model:
y
t
= + y
t-1
+
t

Then each period, Y
t
is changing by +
t
so
we will observe a trend in the data. This
type of trend is known as a stochastic
trend
A random walk model with an intercept, is
known as a random walk with drift
51
Non-stationary time series: Random Walk with Drift
y
t
= + y
t-1
+ u
t
u
t
~ n(0,
2
) ( >0)

















Errors still persist, but ensures there is a pattern the larger is the clearer the
pattern since the shock each period contributes less in relative terms
Random Walk with Drift
0
2
4
6
8
10
12
52
Why we care about a stochastic trend:
If we have a stochastic trend the series is not
stationary => classical inference (OLS) is
wrong for Y
t

However by taking the first difference we get
a series which is stationary so we can use
OLS on the first difference, y
t
.

We would call Y
t
difference stationary
53
Difference Stationary
54
Difference Stationary (=>need to difference to make it stationary)
The Random walk model
y
t
= y
t-1
+
t

Subtracting y
t-1
from both sides gives
y
t
y
t-1
= y
t-1
y
t-1
+
t

y
t
=
t

Since
t
is white noise y
t
is stationary

The y
t
series is said to be integrated of order one I(1)

More generally a series may need to be differenced more
than once before it becomes stationary. A series which is
stationary after being differenced d times is said to be
integrated of order I(d).


55
Deterministic trend: In this case
there is a real tendency for the
series to increase over time (i.e. its
not just due to errors not dying out)
56
Deterministic trend
Suppose the series increases by b
1
each
period (or decreases if b
1
<0). We could
express this as:

We can remove the trend by subtracting
1
t
from both sides:
We call this detrending the series.
Y
t
* = Y
t
-
1
t
Y
t
* = +
2
Y
t-1
+
t
Classical inference is valid on the detrended series provided |
2
|<1

57
Y
t
= +
1
(t) +
2
Y
t-1
+
t
Y
t
-
1
(t) = +
1
(t) -
1
(t) +
2
Y
t-1
+
t
In practice it is not always easy to tell
graphically whether a series has a stochastic or
a deterministic trend. It is particularly difficult
to distinguish between a deterministic trend
and a random walk with drift!
58
How to test whether the series is
stationary if we may have a
stochastic or deterministic trend??
59
Trend Stationary and Difference Stationary
Time Series
How to find out whether a trend is deterministic or
stochastic?
Use two additional forms of the ADF Test
Y
t
=
0
+ Y
t-1
+ +
t


0
is known as a drift term

Y
t
=
0
+ Y
t-1
+
2
t + +
t

t is a time trend
i t
p
i
i
y

=
A

1
|
i t
p
i
i
y

=
A

1
|
60
Testing for stationarity (again)!
We would carry out all 3 forms of the DF (or
ADF) test:
No intercept and no trend
intercept (drift) but no trend
Both intercept and trend

The we use AIC or SBC to choose between
the models and trust the result of the test
for that model

61
Problems in Testing for Unit Roots
We need to determine the appropriate lag length in the ADF
equation
The ADF only tests for a single unit root. Although we can apply
the ADF test procedure to differences of the y
t
in order to test
higher unit root hypothesis
The ADF has low power to distinguish between a unit root and a
near unit root process. In other words it has low power to reject a
false null hypothesis.
Incorrectly including or omitting a drift or trend term lowers the
power of the test to reject the null hypothesis of a unit root.
The tests fail to account for structural breaks in the time series. A
structural break may make a stationary series appear non
stationary.

62
Trend Stationary and Difference Stationary Time Series
When the true data generating process is
unknown it is sensible to plot and observe the
data and to start statistical tests with the most
general model
Also AIC and SBC criteria can be used to see
which specification best suits the data.
See A&H figure 16.5 on Page 298
63
Over view of remainder of course:
64
TIME-SERIES
STATIONARY
One variable
Homoskedastic
ARMA
Heteroskedasticity
GARCH
More than 1
Variable
VAR
NON
STATIONARY
One Variable
ARIMA
More than 1
variable
Cointegration
(VECM)

You might also like