You are on page 1of 34

Multivariate Time Series

Consider n time series variables {y1t}, . . . , {ynt}. A


multivariate time series is the (n×1) vector time series
{Yt} where the ith row of {Yt} is {yit}. That is, for
any time t, Yt = (y1t, . . . , ynt)0.

Multivariate time series analysis is used when one


wants to model and explain the interactions and co-
movements among a group of time series variables:

• Consumption and income

• Stock prices and dividends

• Forward and spot exchange rates

• interest rates, money growth, income, inflation


Stock and Watson state that macroeconometricians
do four things with multivariate time series

1. Describe and summarize macroeconomic data

2. Make macroeconomic forecasts

3. Quantify what we do or do not know about the


true structure of the macroeconomy

4. Advise macroeconomic policymakers


Stationary and Ergodic Multivariate Time Series

A multivariate time series Yt is covariance stationary


and ergodic if all of its component time series are
stationary and ergodic.

E[Yt] = μ = (μ1, . . . , μn)0


var(Yt) = Γ0 = E[(Yt − μ)(Yt − μ)0]
⎛ ⎞
var(y1t) cov(y1t, y2t) · · · cov(y1t, ynt)
⎜ cov(y2t, y1t) var(y2t) · · · cov(y2t, ynt) ⎟
⎜ ⎟
=⎜ .. .. ... .. ⎟
⎝ ⎠
cov(ynt, y1t) cov(ynt, y2t) · · · var(ynt)

The correlation matrix of Yt is the (n × n) matrix

corr(Yt) = R0 = D−1Γ0D−1
where D is an (n × n) diagonal matrix with j th diag-
onal element (γ 0jj )1/2 =var(yjt)1/2.
The parameters μ, Γ0 and R0 are estimated from data
(Y1, . . . , YT ) using the sample moments
T
1 X p
Ȳ = Yt → E[Yt] = μ
T t=1
1 XT
p
Γ̂0 = (Yt−Ȳ)(Yt−Ȳ)0 → var(Yt) = Γ0
T t=1
p
R̂0 = D̂−1Γ̂0D̂−1 → corr(Yt) = R0
where D̂ is the (n×n) diagonal matrix with the sample
standard deviations of yjt along the diagonal. The
Ergodic Theorem justifies convergence of the sample
moments to their population counterparts.
Cross Covariance and Correlation Matrices

With a multivariate time series Yt each component


has autocovariances and autocorrelations but there
are also cross lead-lag covariances and correlations be-
tween all possible pairs of components. The autoco-
variances and autocorrelations of yjt for j = 1, . . . , n
are defined as

γ kjj = cov(yjt, yjt−k ),


γ kjj
ρkjj = corr(yjt, yjt−k ) = 0
γ jj

and these are symmetric in k: γ kjj = γ −k


jj , ρk = ρ−k .
jj jj
The cross lag covariances and cross lag correlations
between yit and yjt are defined as

γ kij = cov(yit, yjt−k ),


γ kij
ρkij = corr(yjt, yjt−k ) = q
γ 0iiγ 0jj
and they are not necessarily symmetric in k. In gen-
eral,

γ kij = cov(yit, yjt−k ) 6= cov(yit, yjt+k )


= cov(yjt, yit−k ) = γ −k
ij
Defn:

• If γ kij 6= 0 for some k > 0 then yjt is said to lead


yit.

• If γ −k
ij 6= 0 for some k > 0 then yit is said to lead
yjt.
• It is possible that yit leads yjt and vice-versa. In
this case, there is said to be feedback between
the two series.
All of the lag k cross covariances and correlations are
summarized in the (n × n) lag k cross covariance and
lag k cross correlation matrices

Γk = E[(Yt − μ)(Yt−k − μ)0] =


⎛ ⎞
cov(y1t, y1t−k ) cov(y1t, y2t−k ) · · · cov(y1t, ynt−k )
⎜ cov(y2t, y1t−k ) cov(y2t, y2t−k ) · · · cov(y2t, ynt−k ) ⎟
⎜ ⎟
⎜ .. .. ... .. ⎟
⎝ ⎠
cov(ynt, y1t−k ) cov(ynt, y2t−k ) · · · cov(ynt, ynt−k )
Rk = D−1Γk D−1
The matrices Γk and Rk are not symmetric in k but
it is easy to show that Γ−k = Γ0k and R−k = R0k .

The matrices Γk and Rk are estimated from data


(Y1, . . . , YT ) using

1 X T
Γ̂k = (Yt − Ȳ)(Yt−k − Ȳ)0
T t=k+1
R̂k = D̂−1Γ̂k D̂−1
Multivariate Wold Representation

Any (n × 1) covariance stationary multivariate time


series Yt has a Wold or linear process representation
of the form
Yt = μ + εt + Ψ1εt−1 + Ψ2εt−2 + · · ·

X
=μ+ Ψk εt−k
k=0
Ψ0 = In
εt ∼ WN(0, Σ)
Ψk is an (n × n) matrix with (i, j)th element ψ kij . In
lag operator notation, the Wold form is
Yt = μ + Ψ(L)εt

X
Ψ(L) = Ψk Lk
k=0
elements of Ψ(L) are 1-summable
The moments of Yt are given by

X
E[Yt] = μ, var(Yt) = Ψk ΣΨ0k
k=0
Long Run Variance

Let Yt be an (n × 1) stationary and ergodic multivari-


ate time series with E[Yt] = μ. Anderson’s central
limit theorem for stationary and ergodic process states
⎛ ⎞
√ ∞
X
d
T (Ȳ − μ) → N ⎝0, Γj ⎠
j=−∞
or
⎛ ⎞
1 ∞
X
A
Ȳ ∼ N ⎝μ, Γj ⎠
T j=−∞
Hence, the long-run variance of Yt is T times the
asymptotic variance of Ȳ:

X
LRV(Yt) = T · avar(Ȳ) = Γj
j=−∞
Since Γ−j = Γ0j , LRV(Yt) may be alternatively ex-
pressed as

X
LRV(Yt) = Γ0 + (Γj + Γ0j )
j=1
Using the Wold representation of Yt it can be shown
that
LRV(Yt) = Ψ(1)ΣΨ(1)0
P∞
where Ψ(1) = k=0 Ψk .
Non-parametric Estimate of the Long-Run Variance

A consistent estimate of LRV(Yt) may be computed


using non-parametric methods. A popular estimator
is the Newey-West weighted autocovariance estimator
M
XT ³ ´
d
LRV NW (Yt) = Γ̂0 + wj,T · Γ̂j + Γ̂0j
j=1
where wj,T are weights which sum to unity and MT
is a truncation lag parameter that satisfies MT =
O(T 1/3). Usually, the Bartlett weights are used

Bartlett = 1 − j
wj,T
MT + 1
Vector Autoregression Models

The vector autoregression (VAR) model is one of the


most successful, flexible, and easy to use models for
the analysis of multivariate time series.

• Made fameous in Chris Sims’s paper “Macroeco-


nomics and Reality,” ECTA 1980.

• It is a natural extension of the univariate autore-


gressive model to dynamic multivariate time se-
ries.

• Has proven to be especially useful for describing


the dynamic behavior of economic and financial
time series and for forecasting.

• It often provides superior forecasts to those from


univariate time series models and elaborate theory-
based simultaneous equations models.
• Used for structural inference and policy analysis.
In structural analysis, certain assumptions about
the causal structure of the data under investiga-
tion are imposed, and the resulting causal impacts
of unexpected shocks or innovations to specified
variables on the variables in the model are summa-
rized. These causal impacts are usually summa-
rized with impulse response functions and forecast
error variance
The Stationary Vector Autoregression Model

Let Yt = (y1t, y2t, . . . , ynt)0 denote an (n × 1) vec-


tor of time series variables. The basic p-lag vector
autoregressive (VAR(p)) model has the form

Yt = c + Π1Yt−1 + Π2Yt−2 + · · · + ΠpYt−p + εt


εt ∼ WN (0, Σ)
Example: Bivariate VAR(2)
à ! à ! à !à !
y1t c1 1
π 11 π 121 y1t−1
= +
y2t c2 π 121 π 122 y2t−1
à !à ! à !
2
π 11 π 122 y1t−2 ε1t
+ +
π 221 π 222 y2t−2 ε2t
or

y1t = c1 + π 111y1t−1 + π 112y2t−1


+π 211y1t−2 + π 212y2t−2 + ε1t
y2t = c2 + π 121y1t−1 + π 122y2t−1
+π 221y1t−1 + π 222y2t−1 + ε2t
where cov(ε1t, ε2s) = σ 12 for t = s; 0 otherwise.
Remarks:

• Each equation has the same regressors — lagged


values of y1t and y2t.

• Endogeneity is avoided by using lagged values of


y1t and y2t.

• The VAR(p) model is just a seemingly unrelated


regression (SUR) model with lagged variables and
deterministic terms as common regressors.
In lag operator notation, the VAR(p) is written as

Π(L)Yt = c + εt
Π(L) = In − Π1L − ... − ΠpLp
The VAR(p) is stable if the roots of

det (In − Π1z − · · · − Πpz p) = 0


lie outside the complex unit circle (have modulus greater
than one), or, equivalently, if the eigenvalues of the
companion matrix
⎛ ⎞
Π1 Π2 · · · Πn
⎜ In 0 · · · 0 ⎟
⎜ ⎟
F =⎜ . ⎟
⎝ 0 ... 0 . ⎠
0 0 In 0
have modulus less than one. A stable VAR(p) process
is stationary and ergodic with time invariant means,
variances, and autocovariances.
Example: Stability of bivariate VAR(1) model

Yt = ΠYt−1 + εt
à ! à !à ! à !
y1t π 11 π 12 y1t−1 ε1t
= +
y2t π 21 π 22 y2t−1 ε2t

Then det (In − Πz) = 0 becomes

(1 − π 11z)(1 − π 22z) − π 12π 21z 2 = 0

• Stability condition involves cross terms π 12 and


π 21

• If π 12 = π 21 = 0 (diagonal VAR) then bivariate


stability condition reduces to univariate stability
conditions for each equation.
If Yt is covariance stationary, then the unconditional
mean is given by

μ = (In − Π1 − · · · − Πp)−1c
The mean-adjusted form of the VAR(p) is then

Yt − μ = Π1(Yt−1 − μ) + Π2(Yt−2 − μ) + · · ·
+Πp(Yt−p − μ) + εt
The basic VAR(p) model may be too restrictive to
represent sufficiently the main characteristics of the
data. The general form of the VAR(p) model with
deterministic terms and exogenous variables is given
by

Yt = Π1Yt−1 + Π2Yt−2 + · · · + ΠpYt−p


+ΦDt + GXt + εt
Dt = deterministic terms
Xt = exogenous variables (E[Xtεt] = 0)
Wold Representation

Consider the stationary VAR(p) model

Π(L)Yt = c + εt
Π(L) = In − Π1L − ... − ΠpLp
Since Yt is stationary, Π(L)−1 exists so that

Yt = Π(L)−1c + Π(L)−1εt

X
=μ+ Ψk εt−k
k=0
Ψ0 = In
lim Ψk = 0
k→∞
Note that

X
Π(L)−1 = Ψ(L) = Ψk Lk
k=0
The Wold coefficients Ψk may be determined from
the VAR coefficients Πk by solving

Π(L)Ψ(L) = In
which implies

Ψ1 = Π1
Ψ2 = Π1Π1 + Π2
..
Ψs = Π1Ψs−1 + · · · + ΠpΨs−p
Since Π(L)−1 = Ψ(L), the long-run variance for Yt
has the form

LRVVAR(Yt) = Ψ(1)ΣΨ(1)0
= Π(1)−1ΣΠ(1)−10
= (In − Π1 − ... − Πp)−1 Σ (In − Π1 − ... − Πp)−10
Estimation

Assume that the VAR(p) model is covariance station-


ary, and there are no restrictions on the parameters
of the model. In SUR notation, each equation in the
VAR(p) may be written as

yi = Zπ i + ei, i = 1, . . . , n

• yi is a (T × 1) vector of observations on the ith


equation

• Z is a (T × k) matrix with tth row given by Z0t =


0 , . . . , Y0 )
(1, Yt−1 t−p

• k = np + 1

• π i is a (k × 1) vector of parameters and ei is a


(T × 1) error with covariance matrix σ 2i IT .
Since the VAR(p) is in the form of a SUR model
where each equation has the same explanatory vari-
ables, each equation may be estimated separately by
ordinary least squares without losing efficiency relative
to generalized least squares. Let
Π̂ = [π̂ 1, . . . , π̂ n]
denote the (k × n) matrix of least squares coefficients
for the n equations.Let
⎛ ⎞
π̂ 1
⎜ .. ⎟
vec(Π̂) = ⎝ ⎠
π̂ n
Under standard assumptions regarding the behavior
of stationary and ergodic VAR models (see Hamilton
(1994) vec(Π̂) is consistent and asymptotically nor-
mally distributed with asymptotic covariance matrix
0
d
avar(vec(Π̂)) = Σ̂ ⊗ (Z Z)−1
where
T
1 X
Σ̂ = ε̂tε̂0t
T − k t=1
0
ε̂t = Yt−Π̂ Zt
Lag Length Selection

The lag length for the VAR(p) model may be de-


termined using model selection criteria. The gen-
eral approach is to fit VAR(p) models with orders
p = 0, ..., pmax and choose the value of p which min-
imizes some model selection criteria
MSC(p) = ln |Σ̃(p)| + cT · ϕ(n, p)
T
X
Σ̃(p) = T −1 ε̂tε̂0t
t=1
cT = function of sample size
ϕ(n, p) = penalty function
The three most common information criteria are the
Akaike (AIC), Schwarz-Bayesian (BIC) and Hannan-
Quinn (HQ):
2 2
AIC(p) = ln |Σ̃(p)| + pn
T
ln T 2
BIC(p) = ln |Σ̃(p)| + pn
T
2 ln ln T 2
HQ(p) = ln |Σ̃(p)| + pn
T
Remarks:

• AIC criterion asymptotically overestimates the or-


der with positive probability,

• BIC and HQ criteria estimate the order consis-


tently under fairly general conditions if the true
order p is less than or equal to pmax.
Granger Causality

One of the main uses of VAR models is forecasting.


The following intuitive notion of a variable’s forecast-
ing ability is due to Granger (1969).

• If a variable, or group of variables, y1 is found to


be helpful for predicting another variable, or group
of variables, y2 then y1 is said to Granger-cause
y2; otherwise it is said to fail to Granger-cause
y2.

• Formally, y1 fails to Granger-cause y2 if for all


s > 0 the MSE of a forecast of y2,t+s based on
(y2,t, y2,t−1, . . .) is the same as the MSE of a
forecast of y2,t+s based on (y2,t, y2,t−1, . . .) and
(y1,t, y1,t−1, . . .).

• The notion of Granger causality does not imply


true causality. It only implies forecasting ability.
Example: Bivariate VAR Model

In a bivariate VAR(p) , y2 fails to Granger-cause y1 if


all of the p VAR coefficient matrices Π1, . . . , Πp are
lower triangular:
à ! à ! à !à !
y1t c1 1
π 11 0 y1t−1
= + + ···
y2t c2 π 121 π 122 y2t−1
à p !à ! à !
π 11 0 y1t−p ε1t
+ p p +
π 21 π 22 y2t−p ε2t
Similarly, y1 fails to Granger-cause y2 if all of the coef-
ficients on lagged values of y1 are zero in the equation
for y2. Notice that if y2 fails to Granger-cause y1 and
y1 fails to Granger-cause y2, then the VAR coefficient
matrices Π1, . . . , Πp are diagonal.
The p linear coefficient restrictions implied by Granger
non-causality may be tested using the Wald statistic
n h i o−1
0 d 0
Wald = (R·vec(Π̂) − r) R avar(vec(Π̂)) R
×(R·vec(Π̂) − r)

Remark: In the bivariate model, testing H0 : y2 does


not Granger-cause y1 reduces to a testing H0 : π 112 =
p
· · · = π 12 = 0 from the linear regression
p
y1t = c1 + π 111y1t−1 + · · · + π 11y1t−p
p
+π 112y2t−1 + · · · + π 12y2t−p + ε1t
The test statistic is a simple F-statistic.
Example: Trivariate VAR Model

In a trivariate VAR(p) , y2 and y3 fail to Granger-cause


j j
y1 if π 12 = π 13 = 0 for all j:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞
y1t c1 π 1 0 0 y1t−1
⎜ ⎟ ⎜ ⎟ ⎜ 11 ⎟ ⎜ ⎟
⎝ y2t ⎠ = ⎝ c2 ⎠ + ⎝ π 1 21 π 1
22 π 1 ⎠⎝ y
23 2t−1 ⎠ + · · ·
y3t c3 π 131 π 132 π 133 y3t−1
⎛ p ⎞⎛ ⎞ ⎛ ⎞
π 11 0 0 y1t−p ε1t
⎜ ⎟⎜ ⎟ ⎜ ⎟
+ ⎝ π p21 π p22 π p32 ⎠ ⎝ y2t−p ⎠ + ⎝ ε2t ⎠
p p p
π 31 π 32 π 33 y3t−p ε3t
Note: One can also use a simple F-statistic from a
linear regression in this situation:
p
y1t = c1 + π 111y1t−1 + · · · + π 11y1t−p
p
+π 112y2t−1 + · · · + π 12y2t−p
p
+π 113y3t−1 + · · · + π 13y3t−p + ε1t
Example: Trivariate VAR Model

In a trivariate VAR(p) , y3 fails to Granger-cause


y1 and y2 if all of the p VAR coefficient matrices
Π1, . . . , Πp are lower triangular:
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞
y1t c1 1
π 11 0 0 y1t−1
⎜ ⎟ ⎜ ⎟ ⎜ 1 ⎟ ⎜ ⎟
⎝ y2t ⎠ = ⎝ c2 ⎠ + ⎝ π 21 π 1 22 0 ⎠ ⎝ y2t−1 ⎠ + · · ·
y3t c3 π 131 π 132 π 133 y3t−1
⎛ p ⎞⎛ ⎞ ⎛ ⎞
π 11 0 0 y1t−p ε1t
⎜ ⎟⎜ ⎟ ⎜ ⎟
+ ⎝ π p21 π p22 0 ⎠ ⎝ y2t−p ⎠ + ⎝ ε2t ⎠
p p p
π 31 π 32 π 33 y3t−p ε3t
Note: We cannot use a simple F-statistic in this case.
We must use the general Wald statistic.
Forecasting Algorithms

Forecasting from a VAR(p) is a straightforward exten-


sion of forecasting from an AR(p). The multivariate
Wold form is

Yt = μ + εt + Ψ1εt−1 + Ψ2εt−2 + · · ·
Yt+h = μ + εt+h + Ψ1εt+h−1 + · · ·
+Ψh−1εt+1 + Ψhεt + · · ·
εt ∼ WN(0, Σ)
Note that

E[Yt] = μ
var(Yt) = E[(Yt − μ)(Yt − μ)0]
⎡ ⎛ ⎞⎛ ⎞0⎤
∞ ∞
⎢⎝ X X
= E⎣ Ψk εt−k ⎠ ⎝ Ψk εt−k ⎠ ⎥

k=1 k=1
X∞
= Ψk ΣΨ0k
k=1
The minimum MSE linear forecast of Yt+h based on
It is

Yt+h|t = μ + Ψhεt + Ψh+1εt−1 + · · ·


The forecast error is

εt+h|t = Yt+h − Yt+h|t


= εt+h + Ψ1εt+h−1 + · · · + Ψh−1εt+1
The forecast error MSE is

MSE(εt+h|t) = E[εt+h|tε0t+h|t]
= Σ + Ψ1ΣΨ01 + · · · + Ψh−1ΣΨ0h−1
h−1
X
= ΨsΣΨ0s
s=1
Chain-rule of Forecasting

The best linear predictor, in terms of minimum mean


squared error (MSE), of Yt+1 or 1-step forecast based
on information available at time T is
YT +1|T = c + Π1YT + · · · + ΠpYT −p+1
Forecasts for longer horizons h (h-step forecasts) may
be obtained using the chain-rule of forecasting as
YT +h|T = c + Π1YT +h−1|T + · · · + ΠpYT +h−p|T
YT +j|T = YT +j for j ≤ 0.
Note: Chain-rule may be derived from state-space
framework (assume c = 0)
⎛ ⎞ ⎛ ⎞⎛ ⎞
Yt Π1 Π2 · · · Πp Yt

⎜ Yt−1 ⎟


⎜ In 0 .. 0 ⎟ ⎟⎜
⎜ Yt−1 ⎟

⎜ .. ⎟ = ⎜ .. . . . .. .. ⎟ ⎜ .. ⎟
⎝ ⎠ ⎝ ⎠⎝ ⎠
Yt−p+1 0 ··· In 0 Yt−p+1
⎛ ⎞
εt
⎜ 0 ⎟
⎜ ⎟
+⎜ . ⎟
⎝ . ⎠
0
ξt = Fξt−1 + vt
Then

ξt+1|t = Fξt
ξt+2|t = Fξt+1|t = F2ξt
..
ξt+h|t = Fξt+h−1|t = Fhξt

You might also like