Professional Documents
Culture Documents
Preliminary
Abstract. This paper develops a method to identify structural shocks of a vector autoregression
(VAR) model based on a dynamic stochastic general equilibrium (DSGE) model and uses the VAR
model as a reference model to evaluate the DSGE model. In an application, I study the real
exchange rate dynamics in sticky-price models. I identify a VAR model based on a two-country
sticky-price model and compare the impulse response functions of the two models. As well known,
a sticky-price model is misspecified in its real exchange rate dynamics. I find evidence that the
source of misspecification likely exists in the nominal exchange rate dynamics of the model.
1. Introduction
The literature estimating a dynamic stochastic general equilibrium (DSGE) model by a fullinformation, likelihood-based approach has been growing rapidly for the last decade. It started
for the purpose of monetary policy analysis, but many interesting questions of macroeconomics
other than monetary policy analysis are now answered based on estimation of a DSGE model. The
structure of a DSGE model is based on economic theory and gives us a chance to provide theoretical
explanations to empirical finding as well as take the theory to the data. The structure, however,
renders it important to evaluate a DSGE model. This paper proposes a Bayesian econometric
framework to evaluate a DSGE model and investigate its possible misspecifications. I apply the
framework to a two country sticky price model and investigate the real exchange rate dynamics of
the model.
A DSGE model completely specifies the dynamic properties of the data and does not allow the
dynamics to deviate from what is implied by its structure. Though we fit a structural model to the
data, the model is not able to capture dynamics beyond what changes in parameter values allow.
Because of this, two problems arise. First, it is difficult to assess how well the model fits the data
if we do not compare it with alternative models. We may settle for the model even when there is
an alternative model with better data fit. Second, if we do not consider another model that has
different specifications or desirably nests the original model, it is hard to detect misspecifications
that the model might have. This may mislead researchers seriously.
Date: October 2010.
Fisher Hall, Princeton University, NJ 08544. Email: woongp@Princeton.EDU.
1
The two problems are well known. Studies involving estimation of a DSGE model often conduct
robustness exercise one way or another with alternative specifications for the model. See, for
example, Justiniano, Primiceri, and Tambalotti (2010). There are also specific comparative studies
of different models. See, for recent example, Rabanal and Rubio-Ramirez (2005) and Taylor and
Wieland (2009). However, constructing a sufficiently large set of alternative models for a DSGE
model is a big obstacle. Unlike a statistical model for which it is straightforward to come up with a
nesting model, a DSGE model does not have a straightforward DSGE model that nests the model.
Additional theoretical work is often required to develop a nesting model. If the set of competing
models is not large enough, the competing models may suffer a common misspecification and the
model comparison exercise would not be able to detect the misspecification.
Del Negro and Schorfheide (2004) (DS, henceforth) propose a Bayesian framework to use a vector
autoregression (VAR) model as a benchmark model. Since a VAR model does not have a a priori
structural restrictions, it can serve as a convenient benchmark model. It is a common practice to
compare the fit and forecasting performance of a DSGE model with that of a VAR model. See, for
example, Smets and Wouters (2003) and Smets and Wouters (2007). DS use a VAR model as a
model that fits the data, but derive its prior distribution from a DSGE model of interest. Through
the prior distribution, the cross-equation restrictions and identification1 of the DSGE model are
imposed on the VAR model. As the prior distribution is systematically relaxed, the VAR model
captures richer dynamics than the DSGE model and we can evaluate the DSGE model by studying
the overall data fit and model characteristics of interest. The idea of inducing a prior for a VAR
model from a structural model dates back to Ingram and Whiteman (1994), and it is formalized
in the modern Bayesian framework by DS. DS and subsequent work (Del Negro and Schorfheide
2006 and Del Negro, Schorfheide, Smets, and Wouters 2007) show that combining a VAR model
with information from a DSGE model makes the VAR model have a better fit and forecasting
performance compared to the DSGE model.
It is central to model comparison and evaluation to compare impulse response functions to a
structural shock. Therefore, identification is important if we use a VAR model as a benchmark
model. However, the identification method of structural shocks of DS depends on the ordering of
the variables of a VAR model and hence the result changes as the ordering of the variables changes.
I propose an alternative identification method which is invariant to the ordering of the variables.
Identification of structural shocks of a VAR model is derived based on a DSGE model and therefore
impulse response functions to an identified structural shock are directly comparable to that of the
DSGE model.
I apply the framework to evaluate a seven-variable two country sticky price model. The application not only illustrates how the framework can be used for model evaluation, but also turns out
1
The term identification is used for different meanings in the literature. In this paper, it means identifying reducedform shocks to obtain structural shocks in a VAR model. Another meaning is identifying parameters of a model. The
second identification is also an important issue in the DSGE model estimation. See, for example, Canova and Sala
(2009), Iskrev (2010) and Komunjer and Ng (2010). Identifying reduced-form shocks is equivalent to identifying the
contemporaneous coefficient matrix of a structural VAR (SVAR) model, and the second meaning often includes the
first meaning.
2
to be interesting in its own right. Chari, Kehoe, and Mcgrattan (2002) document that in a sticky
price model with staggered price contracts monetary shocks can generate the high volatility of the
real exchange rate but cannot match the high persistence found in the data. Efforts to resolve the
persistence issue have tried various features that control price dynamics or explored alternative
monetary policy rules. See, for example, Bergin and Feenstra (2001), Benigno (2004), Bouakez
(2005), Steinsson (2008), and Carvalho and Nechio (2010). I find evidence that the problem is
likely to exist in the nominal exchange rate dynamics, or the uncovered interest rate parity implied
in the model. Recently, Steinsson (2008) claims that a sticky price model can match the persistence
of the real exchange rate by generating hump-shaped dynamics in response to a shock such as a
technology shock that enters the Phillips curve equation of the model. I find that the real exchange
rate responds in a delayed, hump-shaped fashion in response to the shock, but the mechanism by
which the model generates such hump-shaped dynamics is not supported by the data. Also, the real
exchange rate is found to respond in the same delayed, hump-shaped fashion to a monetary policy
shock as opposed to the prediction of the sticky price model. This is consistent with the findings
of the identified VAR literature. See, for example, Eichenbaum and Evans (1995) and Scholl and
Uhlig (2008). The empirical study of Steinsson (2008) uses a univariate model for the real exchange
rate and does not identify any structural shocks. Therefore, it does not directly support the claim.
In the application, I identify a technology shock and a monetary policy shock based on the DSGE
model and estimate the impulse response to the shocks.
The paper is organized as follows. Section 2 introduces the structure of a generic problem and
describes the empirical methodology in detail. The two country sticky price model is descried in
Section 3 and the empirical results are reported in Section 4. I investigate the real exchange rate
dynamics in detail in Section 5. All the proofs of propositions in Section 2 are in the Appendix.
2. Methodology
Consider the following state space representation of a DSGE model
(1)
[Measurement equation]
(2)
[Transition equation]
yt = + t + H () st ,
st = G () st1 + M () et ,
(L) (yt t) = t ,
where (L) is a p-th order matrix polynomial in the lag operator L, (L) = 0 + 1 L + + p Lp ;
t is an n 1 vector of the structural shocks of the SVAR model which follows i.i.d. N (0, In ); is
an n 1 vector of constant terms; and is an n 1 vector of coefficients on a deterministic time
trend. The deterministic time trend is included since the DSGE model may imply that some of the
observable variables have a deterministic time trend. When there is no deterministic time trend
in the DSGE model, we can drop t. I assume that the initial observations Y0 = {y0 , . . . , yp+1 }
are given, but omit Y0 in conditional densities henceforth. In order for yt to have a non-degenerate
distribution, the contemporaneous coefficient matrix 0 should be non-singular. The SVAR model
(3) can be transformed into a reduced form by multiplying through on the left by 1
0
(4)
B(L) (yt t) = ut ,
or
(5)
B(L)yt = + t + ut ,
1
p
where B(L) = In + 1
0 (1 L + + p L ). ut = 0 t is an n 1 vector of reduced-form disturbance terms and ut i.i.d. N (0, u ) where u = (00 0 )1 . Let Bs = 1
0 s for s = 1, , p.
Pp
Then = B (1) and = B (1) + ( s=1 sBs ) . It is convenient for derivations later to write
(5) compactly in matrix form
(6)
Y = XB + u,
0
0 , , y0
where Y = (y1 , , yT )0 ; X = (x1 , , xT )0 with x0t = 1, t, yt1
tp ; and u = (u1 , , uT ) .
VAR model approximately nests the DSGE model. We can also recover et by observing yt if we
can identify et . Even though Assumption (3) does not hold, we may be able to recover part of
et that we are interested in with a good accuracy as discussed in Sims and Zha (2006). In this
paper, I only consider the case where Assumption (3) holds and thus a DSGE model has a VAR
representation. Assumption (4) is necessary since the prior distribution for B in (6) is derived using
the unconditional population moments of yt in the DSGE model. It is stricter than what is actually
necessary to derive the prior distribution. What I need is the finite-sample population moments of
yt conditional on the initial observations, and thus it is fine to have non-stationary yt . However,
when Assumption (4) is satisfied, computation to derive the prior distribution of the VAR model
becomes much easier. Also, if yt is non-stationary, Assumption (3) is violated in general. Note that
Assumption (4) does not necessarily apply to the VAR model. The same assumptions are required
for the DS framework.
I use Bayesian inference to estimate the VAR model (6). The set of the parameters of the model
consists of the structural parameter , the contemporaneous coefficient matrix 0 , and the reducedform coefficient matrix B. is included as a set of hyperparameters. It is a set of parameters for
the prior distribution and conditional on B and 0 , the density of the data Y is not dependent on
. The prior density and the posterior density of (6) are factorized hierarchically
p (B, 0 , ) = p (B|0 , ) p (0 |) p () ,
and
p (B, 0 , |Y ) = p (B|0 , , Y ) p (0 , |Y ) ,
respectively. In the following sections, I describe the prior distribution and derive the posterior
distribution from the prior distribution according to the hierarchical factorization. Then I discuss
how to simulate from the posterior distribution and do inference.
2.1. Prior distribution.
2.1.1. Prior distribution of . The prior distribution of is not restricted to a specific form. The
prior distribution of 0 and B is conditional on and not dependent on a specific form of the prior
distribution of . We can set a prior distribution of so that it is consistent with economic theories
and related empirical studies as in the literature estimating DSGE models.
2.1.2. Prior distribution of 0 . The prior distribution of 0 is defined conditional on . The identification of the SVAR model is characterized by the contemporaneous coefficient matrix, 0 . The
contemporaneous impact of a shock to t on yt is computed as
yt
= 1
0 .
0t SVAR
I assume that the prior distribution of 0 is centered around the identifying restrictions of the
DSGE model. The initial impact of a shock to et on yt conditional on in the DSGE model is
yt
= H () M () ,
e0t DSGE
5
0 ()
which will be denoted by
1
0 () , S 2 In ,
0 | N
0
where the precision matrix S02 is an n n positive definite symmetric matrix.2 Note that with
this prior 0 is non-singular with probability 1. The Kronecker product structure of the covariance
matrix in (7) is assumed for the tractability of the resulting posterior distribution
of 0 . It is
2
somewhat restrictive, however. The covariance matrix S0 In implies that Var 0 = S02 In
or Var 00 = In S02 , where is a vectorization operator. That is, the rows of 0 are independent
of each other, all the rows have the common covariance matrix S02 , and all the elements in each
column have the same variance. In the following application, I simply let S02 = In , which
means that all the elements of 0 have the same degree of dispersion around their mean. The
hyperparameter is to control the weight of the DSGE model identifying restrictions. As
becomes larger, the DSGE model identification is more tightly imposed.
The identification problem arises in a reduced-form VAR model since there are uncountably many
possible decompositions 0 or 1
0 from a reduced-form covariance matrix u of the disturbance
1 =
term. Even though we have a decomposition 0 of u , any transformation 0 satisfying (0
0 0 )
u preserves the likelihood of the reduced-form VAR model. The possibility of such an invariant,
likelihood-preserving transformation of 0 is eliminated by concentrating the prior distribution of
0 () and penalizing deviations from it. In principle, any distribution that
0 around the mean
0 () will work. I use a normal distribution for the tractability of the prior
can be centered around
and posterior distribution of 0 . The fact that we cannot pin down 0 from u means that the
data have information on 0 only through u . Thus a dummy observations prior that uses artificial
observations simulated from a DSGE model cannot be used for the identification.
0 (). Such zero restrictions
A DSGE model might impose zero restrictions on some elements of
cannot be handled with this type of prior since 0 should have a non-degenerate distribution with
no restrictions. But this limit should not be a big concern. First, for a large-scale or mediumscale model, it is difficult to know a priori which element is restricted to zero. Also, to solve the
model, we rely on a numerical solution method which inevitably incurs numerical errors in the
solution. It is hard in practice to figure out all the zero restrictions in the solution of numerical
solution methods. Second, though the elements of 0 restricted to zero are assumed having a nondegenerate distribution as opposed to the DSGE model, their dispersion will be smaller in general
than the dispersion of other non-zero elements. This is because the mean of the elements restricted
to zero will be always zero while the mean of the non-zero elements will move around due to the
variation of .
1
0 ()
DS also use a DSGE model for identification. DS decompose the impact matrix
of
0 ()
the DSGE model using the QR decomposition as
1
(), where L
()
() is lower
= L
2Here and in what follows, the hyperparameter S 2 is suppressed for notational simplicity.
0
6
() =
()
()0 = In .
() is called a rotation matrix.
() is unitary:
()0
triangular and
1
0 ()
()
Since we assume
is non-singular, the decomposition is unique if we require that L
have positive diagonal elements. The identification method by DS is to use the rotation matrix
() in order to identify 0 from u , the covariance matrix of the reduced-form shocks. Note that
we can find lower triangular L such that u = LL0 using the Cholesky decomposition. Then we
1
p YT |B, 0
= (2)
n2T
T2
|u |
0
1
YT XB
YT XB
exp tr 1
u
2
1
0 X
exp tr 1
B
B
X
B
u
T
T
T T
2
=
B
T
0 X
X
T T
1
:
matrix u X
T T
, 0 1 X
0 X
B|Y , 0 N B
0 0
T
T T
3A variable with tilde () stands for simulated observations.
7
1
DS propose to use this posterior distribution of B with the simulated data as a prior distribution
of B.4 However, it is not practicalto simulate
every
time
because of noise from simulation. DS
0
0
X
samples. Suppose that we repeat simulation of the same length and generate YTm
n
M
m
is an index for a simulation. Let us construct {
xm
t }m=1 and XT
law of large numbers, as M ,
(8)
oM
m=1
oM
m=1
where m
M
M
T
X
1 X
1 X
0
0
p T E xt x0t ,
XT XT =
(
xm
xm
t ) (
t )
M m=1
M m=1 t=1
and
(9)
M
M
T
X
X
X
1
0 Y = 1
X
(
xm
ytm )0 p T E xt yt0 ,
t ) (
T T
m=1
m=1
t=1
where the expectations are conditional on the DSGE model and its parameter . Since the DSGE
model is stationary, the limiting moments in (8) and (9) exist and can be easily computed given .
I substitute the limiting moments for the simulated moments. Conditional on 0 and , its prior
distribution is
e () , 0 0
B | 0 , N B
0
(10)
1
TE xt x0t
1
e () = E (xt x0 )1 E (xt y 0 ). The prior distribution implies that the cross-equation restricwhere B
t
t
tions of the VAR model are centered around those of the DSGE model. The length of the simulated
data T plays the same role as in DS: It controls how tightly the DSGE cross-equation restrictions
are imposed. For ease of interpretation, I use the ratio of the lengths of the simulated data and the
actual data, = T/T , instead of T. The prior distribution (10) is a natural conjugate prior: The
posterior distribution of B is also a normal distribution. The prior distribution of B is not limited
to a natural conjugate prior. For example, it is possible to use a generalized dummy observations
prior by Sims (2005).
When the DSGE model has a deterministic time trend, I consider the autoregressive coefficient
matrix and the coefficients on the constant and time trend term separately.
Consider
0
the demeaned
0
0 , , y 0
and detrended form (4). Let yt = yt t and xt = 1, t, x2,t with x2,t = yt1
tp .
Then, since yt is stationary, the mean of the autoregressive coefficient matrix B2 = (B1 , , Bp )0
1
() = E x
x
0
E x
yt0 . A DSGE model is usually solved by
can be obtained as B
2
2,t 2,t
2,t
0
to compute the moments E x
2,t x
2,t and E x
2,t yt from the DSGE model. I let the coefficients
() = ( , ) of the DSGE
on the constant and time trend term B = ( , )0 be equal to B
1
B |0 , N
(11)
0
e () = B
0 () , B
0 ()
where B
1
2
1
1
e
B () , 0 0
T QT
,
and,
1
2T
1 2
3T
1
1
QT =
2T
0
0
.
0
E x
2,t x
2,t
P =
1 0 (0 + 0 )
0 1
( 0 )
0 0
.. ..
. .
I
0 0
It follows that
(0 + p 0 )
( 0 )
B = (I P ) B ,
e (),
() = P B
and thus for B
i1
1 h
0
0 1 1
e
QT P
,
B |0 , N B (), 0 0
T P
or
(12)
() , 0 0
B|0 , N B
0
1
P0
1
QT P 1
i1
(13)
e () , 0 0
B | 0 , N B
0
1
T xx , T
i1
e () is given in (10) and xx , T = E (xt x0 ) when the DSGE model does not have a
where B
t
e
deterministic time trend and B () is given in (11) and xx , T = (P 0 )1 Q P 1 when the DSGE
T
Proposition 1. For the model (3), assume that 0 and B have the prior distributions (7) and
(13), respectively, conditional on . Then the conditional posterior density for 0 is
p (0 |Y, ) =
(14)
2
0 T
1
be2 2
be
0
2
T + n, S , S0 , 0 () 0 0 exp tr S 0 0
2
0
1
Cn1
0 ()
exp tr S02 0
2
0 ()
0
e () , 0 0
B|Y, 0 , N B
0
(15)
1
T xx , T + X 0 X
i1
b
be
b
e () ,
e 0 () [T xx (, T )] B
e () B
() T xx (, T ) + X 0 X B
Se = Y 0 Y + B
e () = T xx (, T ) + X 0 X
B
b
1 h
e () + X 0 X B
b ,
T xx (, T ) B
and,
b = X 0X
B
1
X 0 Y.
The posterior density (14) shows how the identification derived from a DSGE model works. It
is non-standard in that its kernel is a multiplicative mixture of a Wishart density kernel and a
normal density kernel. This is because the prior on 0 directly assumes a probability distribution
over 0 while the likelihood depends only on its cross product, 00 0 . The part that is a kernel of
the Wishart distribution
2
0 T
1
be
0
2
(16)
0 0 exp tr S 0 0
2
comes from the likelihood and the part that is a kernel of a normal distribution
0
1
0 ()
0 ()
0
exp tr S02 0
2
(17)
emerges from the prior distribution. The fact that the likelihood depends only on 00 0 is the fundamental reason of the identification problem of a reduced-form VAR model. Here, (17) distinguishes
different 0 s which have the same crossproduct and thus preserve the likelihood. For example,
suppose that 0 is a scalar. Then, the kernel (16) is symmetric around 0 = 0 and has two modes,
one on R+ and the other on R . If 0 () > 0, then (17) will put large probability mass on R+ .
Combining (17) with (16), the mode on R+ has a higher density and the mode on R is assigned
a lower density.
The posterior density has a unique global maximum near the prior mean (see Proposition 3). The
density is non-standard, but we can compute its normalizing constant Cn () using a technique for
a non-central Wishart distribution. The detail is described in Appendix. The following proposition
computes the marginal posterior distribution of after integrating out 0 and B using an identity
implied by the Bayes rule.
10
Proposition 2. For the model (3), assume that 0 and B have the prior distributions (7) and
(10), respectively, conditional on . Assume that has a prior density p (). Then, the data density
conditional on is
(18)
(2)
p (Y | ) =
nT
2
|T xx (, T )| 2 (2)
|T xx (, T ) +
X 0 X| 2
Cn1
n2
2
2 n
S 2
0
,
be2 2
T + n; S ; S0 ; 0 ()
where Cn () is defined in (36) in the Appendix. Its marginal posterior density is proportional to
q ( | Y ) = p (Y | ) p () .
(19)
As in DS, for posterior simulation, I can draw first and then conditional on the draw of ,
sample 0 and B. However, in the following application, it turns out that the computation of (18)
is prohibitively slow because of the function Cn () and this hierarchical simulation is not practical.
Alternatively, it is possible to draw and 0 jointly and then sample B directly from its posterior
distribution conditional on 0 and . By drawing and 0 together, I dont have to integrate
out 0 and there is no need to evaluate the function Cn (). The following proposition presents a
kernel of the joint posterior distribution of and 0 and its unique global maximum. The posterior
distribution of B conditional on and 0 is the same as in (15).
Proposition 3. For the model (3), assume that 0 and B have the prior distributions (7) and
(10), respectively, conditional on . Assume that has a prior density p (). Then, the following
hold:
(1) The data density conditional on 0 and is
(2)
(20)
nT
2
p (Y | 0 , ) =
2
b
|T xx (, T ) + X 0 X| 2
(21)
where
p (0 | ) = (2)
n2
n
0
1
2 2
2
0 0 ()
.
S0 exp tr S0 0 0 ()
1
2
1
be
2
0 () S 2 S
In + M
+
S
,
0
0
2
1 2
1
be
2
0 () S 2
S +S
.
0
(22)
T P
1
0
0
1
0
11
2
b
+ Se + S02 In ,
where P is a commutation matrix such that P Z = Z 0 for a conformable matrix Z. The Hessian
is negative semi-definite at 0 = +
0 ().
Since each element of 0 is a parameter and 0 is a large dimensional matrix, the framework
has to deal with many parameters in addition to the DSGE model parameter. A simple posterior
simulator which does not consider the structure of (21) is likely to be slow in convergence. The
global maximum and the Hessian at the maximum found in Proposition 3 turn out to be useful in
posterior simulation. Concentration of the kernel (21) with respect to 0 makes it easy to maximize
the joint posterior density of 0 and and draw 0 from its posterior density conditional on .
The fact that the posterior density of 0 conditional on and the joint posterior density of
0 and with fixed have the unique global maximum shows that the identification issue of a
reduced-form VAR model is indeed addressed. I assume a sufficiently large precision matrix to
make the posterior density behave well. Then the normalization problem or the weak identification
problem discussed in Hamilton, Waggoner, and Zha (2007) is also mitigated.
2.3. Posterior simulation. This section describes how to simulate from the posterior distribution
and do Bayesian inference. I propose two posterior simulators: The first simulator samples and
0 in an alternating manner with a Gibbs sampler and then sample B conditional on them. The
second simulator draws first and then samples 0 and B conditional on .
2.3.1. Posterior simulator I. Using the kernel (21), I construct a Gibbs sampler. First, I use a
random-walk Metropolis algorithm to draw conditional on the previous sample of 0 and . Then
I apply a Metropolis-Hastings algorithm to draw 0 conditional on the previous sample of 0 and
. This sampler is often called a Metropolis within Gibbs in the literature. Let us denote the Gibbs
sampler by G. The Metropolis step for and the Metropolis-Hastings step for 0 are denoted by M1
and M2 , respectively. B can be directly sampled from its posterior distribution (15) conditional
on 0 and .
The random-walk Metropolis algorithm to draw is widely used in the literature. The proposal
density of M1 is a normal distribution whose mean is the previous successful draw and the covariance matrix is a scaled inverse matrix of the minus Hessian at a posterior mode. The maximization
to find a mode is actually done only for the concentrated kernel with respect to 0 since for given
the kernel (21) is maximized at +
0 ().
For M2 , I propose to use a Metropolis-Hastings algorithm with a transition mixture. This
simulator is a mixture of an independence Metropolis algorithm5, denoted by M21 , and a randomwalk Metropolis algorithm, denoted by M22 . The transition density mixture chooses randomly
between the two algorithms with a specified probability. For the proposal distribution of M21 , I
use a normal distribution around +
0 (). (21) is locally well approximated by a normal distribution
+
around 0 () when the precision matrix S02 in (7) is sufficiently large. Let V + () denote the
inverse matrix of the minus Hessian of log p (0 | Y, ) at +
0 (), which is an approximate variance
based on a second-order Taylor expansion of the log density. Then, the proposal distribution for a
5See Tierney (1994) and Geweke (2005).
12
2
+
candidate draw is N +
0 () , c2 V () , where c2 is a scale factor. For the transition distribution
of the random-walk Metropolis part, M22 , I use a normal distribution whose mean is the previous
successful draw and variance is a scaled V + ().
In summary, the transition density for a candidate 0 of the simulator M2 is
q (0 |, 0 , M2 ) = 1 q (0 |, M21 ) + 2 q (0 |, 0 M22 )
2
+
(23)
(0 |, 0 , M2k ) = min
p (, 0 | Y ) /q (0 |, 0 , M2k )
,1 ,
p (, 0 | Y ) /q (0 |, 0 , M2k )
for k = 1 and 2.
Why do I mix two algorithms? Since 0 has many parameters (n2 ), a random-walk Metropolis
algorithm is in general slow to explore the posterior density of 0 . By mixing with an independence
Metropolis-Hastings algorithm, I can attain a high acceptance ratio and a faster convergence. On
the other hand, when I tried only an independence Metropolis-Hastings algorithm, occasionally no
draws of 0 were accepted for a long period. This is because the transition distribution q (0 |, M21 )
has a thinner tail than its target distribution p (0 | , Y ). When M21 happens to get stuck in
a region far from +
0 (), the mean of the transition distribution, the random-walk Metropolis
algorithm makes the simulator crawl around and helps the simulator to get out of the region. It
improves the convergence. The independence Metropolis algorithm works well in general, so I put
a high probability on M21 : 1 = 0.8 and 2 = 0.2. Table 1 summarizes the Gibbs sampler G.
2.3.2. Posterior simulator II. With the second simulator, I first draw from its marginal posterior
distribution and then sample 0 and B conditional on the draw of .
Let us denote the simulator by H. Sampling from its marginal posterior density which is
proportional to (19) can be done using a random-walk Metropolis algorithm in the same way as in
the Gibbs sampler G. The covariance matrix of the transition distribution is a scaled inverse matrix
of the minus Hessian at a posterior mode. Conditional on the draw of , I can sample 0 from its
conditional posterior density (14) using an algorithm similar to M2 of G. Sampling B conditional
on and 0 from its posterior distribution is straightforward.
However, for the application of the paper, it takes too much time to evaluate the function Cn ()
in (19). In the function Cn (), which emerges from the normalizing constant of the conditional
posterior density (14) of 0 , the time-consuming part is the generalized hypergeometric function
of matrix argument, 1 F1 (). It is an infinite sum of a series of functions of the eigenvalues of a
matrix argument. Here I end up having such a function because the posterior kernel of 0 (14)
is a multiplicative mixture of a normal density kernel and a Wishart density kernel. In other
applications, the hypergeometric function may converge quickly. It is worth a try. Computational
detail is described in the Appendix.
13
2.3.3. Estimation of marginal likelihoods. [why marginal likelihoods?] I construct a set of competing
models that are a continuum of models with different s and s. They are indexed by (0, ),
which controls the weight of the DSGE model prior on B, and (0, ), which controls the
weight of the DSGE identification prior on 0 . Following DS, I assume a flat prior on (, ) and
choose (, ) having the highest marginal likelihood. The marginal likelihood with respect to (, )
can be estimated using the modified harmonic mean (MHM) method with the standard weighting
function by Geweke (1999) or the new elliptical weighting function proposed by Sims, Waggoner,
and Zha (2008). For the Gibbs sampler G, I use the kernel (21) of the posterior density of 0 and
and, for the simulator H, I use the kernel (19) of the posterior density of . In practice, it is
impossible to do posterior simulation and estimate the marginal likelihoods for all the (, ) on
R++ . DS propose to discretize the positive real line and simulate at each point on the grid. I follow
their strategy in the following application.
of the price of non-tradable goods relative to the price of tradable goods between the countries. Engel (1999) and
Chari, Kehoe, and Mcgrattan (2002) show that the second term contributes a negligible part of fluctuations of real
exchange rates. The first component, the relative price of tradable goods between countries, account for most of the
real exchange rate fluctuations. Therefore, I ignore non-tradable goods. Burstein, Eichenbaum, and Rebelo (2005)
document that after large devaluations the relative price of purely traded goods with local non-tradable components
removed adjusts quickly and the slow adjustment of relative prices of non-tradable goods and services is the main
source of a large drop in real exchange rates. However, they also find that when considering small fluctuations of real
exchange rates in normal times, the relative price of purely traded goods accounts for the majority of the fluctuations
of real exchange rates, though not as much as the conventional measure of the relative price of tradable goods which
includes some local non-tradable components.
14
3.1.1. Households. There is a continuum of households in each country: the households in Home
are indexed by i NH = [0, 1] and the households in Foreign are indexed by j NF = (1, 2]. The
preferences of households are assumed identical across countries. Household i in Home seeks to
maximize the expected sum of discounted utilities
Et
(
X
"
s=0
1 1/
1+
#)
where Ct is the consumption index and Lt (i) is labor supply. is the elasticity of intertemporal
substitution and is the inverse of the Frisch elasticity of labor supply. Consumption Ct is not
indexed by i because consumption is identical for all the households in equilibrium under the
assumption of complete asset markets. The labor market is heterogeneous and household i supplies
its specialized labor to domestic firm i.
Ht is an external habit, which is taken exogenously by the households. Here, I assume that
Ht = Aw,t , where Aw,t = exp (t) is a deterministic time trend component of technology in the
production function of firms. It is common in Home and Foreign. The formulation ensures the
stationarity of labor input along the balanced growth path while keeping the additive separability
of consumption and labor.7
The consumption index Ct is defined as
Ct =
1 1
H CH,t
1 1
F CF,t
1
11/
where CH,t is the aggregate index of consumption goods produced in Home and CF,t is the aggregate
index of consumption goods imported from Foreign. H is the share of local goods and F = 1H
is the share of imported goods. is the elasticity of intratemporal substitution between Homeproduced goods and Foreign-produced goods. H > 1/2 means that the households in Home have a
home bias in consumption. CH,t and CF,t are a constant elasticity of substitution (CES) aggregator
Z
CH,t =
ct (i)
1 1
di
1
11/
Z
and
NH
CF,t =
ct (j)
1 1
dj
1
11/
NF
where , which is greater than one, is the intratemporal elasticity of substitution between goods
produced in the same country.. Let us denote the Home currency price of goods produced in Home
and in Foreign by pH,t (i) and pF,t (j), respectively. Taking the prices of individual goods given,
the households allocate optimally a given level of total expenditure among the differentiated goods
7When a model has technology growth, a common approach in the literature such as Del Negro, Schorfheide, Smets,
and Wouters (2007) and Justiniano and Primiceri (2008) is to use the log utility ( = 1) and keep the additive
separability. However, I instead choose to deflate the consumption index by the trend and allow the intertemporal
elasticity of substitution (IES) to vary since I want to estimate the value of the intertemporal elasticity of substitution.
Though in a slightly different setting, Chari, Kehoe, and Mcgrattan (2002) find that the IES is required to be as
small as 1/5 to match the volatility of the real exchange rate. This is because the real exchange rate is determined
by the relative consumption between countries multiplied by the inverse of the IES and consumption is much less
volatile than the real exchange rate. Chari, Kehoe, and Mcgrattan (2002) instead let 1/ = to have a balanced
growth path. For a technical detail about the balanced growth and the functional form of a utility function, see King,
Plosser, and Rebelo (1988).
15
each period. This optimal allocation yields the demand for good i NH produced in Home and
the demand for good j NF produced in Foreign as
cH,t (i) =
pH,t (i)
PH,t
CH,t
and
pF,t (j)
PF,t
cF,t (j) =
CF,t ,
respectively. The price indices PH,t and PF,t associated with CH,t and CF,t , respectively, are determined as
Z
PH,t =
pH,t (i)
di
1
1
Z
and
PF,t =
pF,t (j)
dj
1
1
NF
NH
CH,t = H
PH,t
Pt
Ct
and
CF,t = F
PF,t
Pt
Ct ,
respectively, where the consumption-based price index Pt associated with the consumption index
Ct is given by
h
1
1
Pt = H PH,t
+ F PF,t
1
1
Asset markets are complete and assets are freely traded. Household i faces the following flow
budget constraint
Pt Ct + Et [Mt,t+1 Bt+1 (i)] Bt (i) + Wt (i) Lt (i) + t + Tt .
Bt+1 (i) is the holdings of state contingent claims, Bt (i) is the net cash flow from the households
portfolio of state contingent claims, Wt (i) is the nominal wage, Tt is the government transfer net
of taxes, and t is the per-capita profit accruing to households from their ownership of local firms.
Mt,t+1 is the nominal stochastic discount factor.
The problem of the households in Foreign is analogous.
3.1.2. Firms. I assume that there is a continuum of firms whose size is the same as the size of
households in each country. Firm i NH in Home produces its good using a linear production
technology
(24)
I follow Calvo (1983) and Yun (1996) for analytical tractability in order to introduce nominal
rigidities: A fraction of firms cannot choose their prices optimally. Fraction of them reset the
goods price in Home by gross inflation as pH,t (i) = pH,t1 (i) and the goods price in Foreign by
gross inflation as pH,t (i) = pH,t1 (i) , where and are the steady state gross inflation rate
in Home and in Foreign, respectively. The remaining fraction 1 of the firms set their price
optimally. They maximize the present discounted value of future nominal profits
Et
(
X
Mt,t+s
s=0
pH,t (i)s
PH,t+s
(CH,t+s + GH,t+s )
and
YH,t+s
(i)
pH,t (i)s
PH,t+s
respectively, and the production function (24). The stochastic discount factor is given as Mt,t+s =
sj=1 Mt+j1,t+j and Mt,t = 1. St is the nominal exchange rate, which is units of Home currency per
unit of Foreign currency. An increase of St means the nominal depreciation of the Home currency.
GH,t+s and GH,t+s are the government purchases of goods produced in Home, by the government
in Home and in Foreign, respectively. I assume that the government has the same preferences for
goods as the households. Therefore, the government allocates its expenditure in the same way as
the households. With the assumption that firms take wage as given, their pricing decision on local
sales and exports can be solved separately.8
The problem of the firms in Foreign is defined in an analogous way with a nominal rigidity
parameter .
3.1.3. Government. The government purchases Gt that consists of goods produced in Home, GH,t ,
and goods produced in Foreign, GF,t . The government budget constraint is
Bt Rt1 Bt1 = Tt + Pt Gt ,
where Bt is the government bond supply and Rt is the nominal interest rate between period t1 and
period t. The fiscal policy is fully Ricardian. I do not explicitly model the government expenditure,
but assume that gt = log [Gt / (Aw,t G)], with the steady state government expenditure G, follows
an exogenous first-order autoregressive process as gt = g gt1 + g g,t , where g,t i.i.d. N (0, 1).
The monetary policy authority in Home sets the nominal interest rate following a feedback rule
of the form
!y 1R
Rt
Rt1 R t
Yt
=
exp (m m,t ) ,
R
R
Aw,t Y
8Though producers hire a different type of labor from specific labor market, I assume that they are a wage taker.
As in Woodford (2003), we may assume that there is a continuum of producers for each type. Then an individual
producer takes wage as given. In equilibrium, however, they produce the same amount of goods and thus the result
would be the same.
17
CH,t+s
+ GH,t+s ,
where Yt is output, t is gross CPI inflation, and R is the steady state value of Rt . The feedback
rule is perturbed by a monetary policy shock, m,t i.i.d. N (0, 1).
The government of Foreign has the same set of policy rules, but with different parameters.
3.1.4. Market clearing. Goods market clearing requires that the private and government demand
for each variety of goods is equal to the supply of the good. In the aggregate, it follows that
Z
NH
for Home. Since the model does not include investment, I assume that GH,t and GH,t include
investment. I denote gt an aggregate demand shock. For Foreign, an analogous equation holds:
Z
NF
YF,t
(j) + YF,t (j) dj Yt = CF,t
+ GF,t + CF,t + GF,t .
3.2. Determination of the real exchange rate. The assumption that asset markets are complete and assets are traded freely results in the complete international risk sharing and the uniqueness of the nominal discount factor. Consequently, nominal exchange rates are determined every
period by the condition
Mt,t+1
St+1
=
,
St
Mt,t+1
where Mt,t+1
is the nominal stochastic discount factor for Foreign households. I define the real
exchange rate as Qt = St Pt /Pt , and then
Uc,t
Qt = Q
,
Uc,t
. Log-linearized around the deterministic steady state, the log deviation of
where Q = Q0 Uc,0 /Uc,0
real exchange rates is determined as relative consumption
(25)
qt = 1 (
ct ct ) ,
where qt , ct and ct are log deviations from the steady state value of the corresponding variable.
In the empirical exercise, for estimation, I add an ad-hoc wedge zt in this equation
(26)
qt = 1 (
ct ct ) + zt .
where zt = z z,t with z,t i.i.d. N (0, 1). The wedge has the following purposes. First of all, I
estimate seven variables which will be explained in the next section and the original specification
has six shocks. To avoid stochastic singularity, I need one or more additional shocks. However, in
order to satisfy the first assumption of Assumption 2, I need to match the number of shocks and
the number of observable variables. Therefore, I need to put one additional shock in the model.
Equation (25) is a good place to add in an additional shock, since the exact relationship between
the real exchange rate and relative consumption is simply not true. In the data, the real exchange
rate and relative consumption do not have a close relationship predicted by the equation, which is
also known as the Backus-Smith puzzle (Backus and Smith 1993). In order to estimate the model,
18
I need an error term to capture deviations from the exact relationship in (25). It is common in the
literature estimating an open economy model to add a shock to the purchasing power parity relation
or the uncovered interest rate parity equation under the incomplete asset market assumption. This
will have the same result as the wedge in (26).9
It turns out that the parameter estimates of the DSGE model and the DSGE-VAR model are not
very different from their calibrated values in the literature. Therefore, despite the ad-hoc wedge,
the estimated impulse response of the DSGE model is close to that of the calibrated DSGE model
which does not have the wedge. This is also true for the prior distribution of the DSGE-VAR
model.
Note that to apply the econometric framework of the paper, an additional shock does not necessarily have to enter (26). The VAR model will relax Equation (25) without the wedge. I need to
put an additional shock, but the additional shock could be, for example, a common shock to the
world-wide technology trend. What is important is to figure out possible sources of uncertainties
in the model and add the additional shock to a place that is most likely to have uncertainties not
captured by the model. Considering the apparent misspecification of (25), I let zt capture the
fluctuations of the real exchange rate which are not driven by the structure of the model. I may
obtain such a wedge by adding a preference shock to the households utility, with opposite signs in
Home and Foreign. However, it is not more structural, or well based on behavioral rules, than the
simple wedge zt . I do not try to give zt a theoretical interpretation at this stage.
Robustness exercise.
3.3. Shocks. The model has seven shocks: Four of the shocks follow AR(1) processes and the two
monetary shocks and the wedge for the real exchange rate are assumed serially uncorrelated. The
seven innovations to the structural shocks and the wedge are
h
which I assume are normally distributed with zero mean and unit variance. They are mutually
independent
and serially iuncorrelated. Their standard deviation is 1, but they are multiplied by
h
feedback rule is based on the standard Taylor rule: y and have a prior distribution described
by a Gamma distribution with mean 0.125 (0.5/4) and standard deviation 0.05 and mean 1.7 and
standard deviation 0.3, respectively. The private-consumption expenditure accounts for about 67%
in the US and 57% in the Euro area. The steady state consumption-output ratio c is assumed
to have a Beta distribution with mean 0.6 and standard deviation 0.05. For the shock processes,
I assume that all the shocks have a prior distribution with the same autocorrelation coefficient
but different standard deviations of their innovations. The autocorrelation coefficient has a Beta
distribution with mean 0.6 and standard deviation around 0.2.
4.1.3. Posterior simulation. For the DSGE-VAR model, I assume the precision matrix S02 is In
and estimate the marginal likelihoods for four values of = 32 , 52 , 72 , 92 with a set of values of ,
{0.1, 0.2, , 1.5, 2, 3, 5}. The structure of the precision matrix implies that all the elements
of 0 have the same degree of dispersion around their mean. I choose and with the highest
marginal likelihood as the optimal weights on the DSGE model.
I generate a simulation chain for each model (for each pair of and ). I draw 3.15 million draws
of and 0 , but keep a draw in every three draws. Discarding the first 0.05 million draws, I have
1 million draws in the end. When I compute the impulse response functions, I use a draw in every
20 saved draws and use the resulting 50,000 draws. The thinning is done simply in consideration
of the physical memory and the storage space of a computer system.
4.2. Posterior distribution.
4.2.1. Marginal likelihoods. As explained above, rather than estimating the hyperparameters and
, I construct a set of DSGE-VAR models with different s and s and choose a model with the
highest marginal likelihood. Let the set of models contain M models indexed by i, A = {Ai }M
i=1 .
The posterior probability for a model Ai given the data Y is computed as
p (Ai ) p (Y |Ai )
,
p (Ai |Y ) = PM
j=1 p (Aj ) p (Y |Aj )
(27)
where M
j=1 p (Aj ) = 1. The posterior probability of a model Ai literally means a probability of
the model among the competing models in A when Y is observed. Assuming a flat prior on Ai s,
or (, )s, (27) becomes
p (Y |Ai )
p (Ai |Y ) = PM
,
j=1 p (Y |Aj )
which means the rank of the models based on the posterior probability is equivalent to the rank
based on the marginal likelihood, p (Y |Ai ). With the flat prior over A, a posterior odds ratio between any two models, p (Ak |Y ) /p (Al |Y ), is equal to a Bayes factor for the two models,
p (Y |Ak ) /p (Y |Ak ). So I choose a DSGE-VAR model with the highest marginal likelihood as the
most likely model, or a model with the best fit, among the competing models in A given Y and
condition subsequent analysis on the model. The pair of (, ) for the model can be interpreted
as an optimal weight on the DSGE model prior distribution. As An and Schorfheide (2007) put
P
21
it, this model selection is associated with a 0-1 loss function that gives a loss of one to choosing a
wrong model.
Figure 1 displays the estimate of the log marginal likelihood p (Y |Ai ) for the DSGE model and
the DSGE-VAR models. As in DS and Del Negro, Schorfheide, Smets, and Wouters (2007), the log
marginal likelihood estimates form a smooth upside-down U-shaped curve for each which is quite
flat around a peak. The log marginal likelihoods with = 52 (implying standard deviation 0.2
around means) are uniformly higher than with = 92 , but similar to those of the case with = 72 .
When the prior on 0 is loose ( = 32 ), the log marginal likelihoods are much smaller. This implies
that imposing a properly tight prior on 0andthus the covariance matrix 00 0 is necessary to
fit the data. The DSGE-VAR model with ,
= 0.7, 52 has the highest marginal likelihood,
which means that the optimal length of hypothetical dummy observations, or the optimal weight on
the DSGE model prior, is 70% of the
length of the actual observations. However, as the posterior
DSGE-VAR model is not big enough to make a qualitative difference in estimated impulse response
functions. The posterior distribution for the DSGE-VAR model is close to that for the DSGE
model although there are a few exceptions. The posterior mean of the inverse of the intertemporal
elasticity of substitution 1 is 7.565 for the DSGE model which is much larger than 4.423 for the
DSGE-VAR model. It is also larger than 5 found by Chari, Kehoe, and Mcgrattan (2002) to match
the volatility of the real exchange rate. The standard deviation of zt , 100 z , is also quite large for
the DSGE model compared to that for the DSGE-VAR model. Even though I put the ad-hoc wedge
in the real exchange rate equation, the DSGE model needs a very large value of 1 to amplify
the fluctuations of the consumption ratio and a very volatile wedge to fit the fluctuations of the
real exchange rate. However, the DSGE-VAR model has more loose cross-equation restrictions
and thus can match the real exchange rate dynamics with 1 and 100 z closer to their prior
distribution. The posterior mean of y and y is smaller for the DSGE model than for the DSGEVAR model. Overall, the posterior distribution of the structural parameters for the DSGE-VAR
model is closer to the prior distribution than for the DSGE model. In the DSGE-VAR model, their
prior distribution are not revised by the data as much as in the DSGE model. [change in the
table to ]
4.3. Identification and structural shocks.
4.3.1. Identification: initial responses to shocks. The identifying restrictions of the DSGE model are
imposed on the DSGE-VAR model through the prior distribution of the contemporaneous matrix
0 . I present the marginal posterior density estimates of the initial responses of the variables to
a positive technology shock in the US in panel (a) and to an expansionary monetary policy shock
in panel (b) of Figure 2, respectively. The initial response is H () M () in (1)-(2) for the DSGE
model and 1
0 in (3) for the DSGE-VAR model.
When the initial response in the DSGE model is strongly positive or negative, the DSGE-VAR
model tends to follow the sharp prediction by the DSGE model. For example, in response to
a positive shock to technology in the US, inflation in the US falls a lot and the interest rate is
lowered in both models. Key effects of a structural shock that characterizes the shock in the DSGE
model are well imposed on the DSGE-VAR model. For other variables, the posterior density of
the initial response in the DSGE-VAR model are more dispersed in general than in the DSGE
model. In particular, when the DSGE model has a highly concentrated posterior distribution, the
DSGE-VAR model is unable to match the degree of the concentration. For example, in Figure 2,
the DSGE model implies that log output in the Euro area, yt , has a highly concentrated initial
response to a positive technology shock in the US. However, for the DSGE-VAR model, the posterior
distribution of the same variable is not as concentrated as in the DSGE model. Because of the
structure of the covariance matrix in the prior distribution of 0 , (7), all the elements of 0 have
the same degree of the dispersion around their respective mean in the prior distribution. Therefore,
the marginal prior distribution of the initial response of variables can be wider in the DSGE-VAR
model than in the DSGE model. This is not a big concern since the posterior density for the
23
DSGE-VAR model overlaps the posterior density of the DSGE model for a significant posterior
probability. Also, it is possible that the data actually imply a dispersed distribution in some cases.
For example, in response to all the structural shocks, the posterior density of the initial response
of the log real exchange rate is more dispersed in the DSGE-VAR model than that in the DSGE
model. Considering the high volatility of the real exchange rate in the data and the fact that
estimates such as half-lives about the real exchange rate have large uncertainty, the DSGE-VAR
model appears to pick up volatile and noisy fluctuations of the real exchange rate.
Why does the posterior distribution of the initial response to a shock differ between the DSGE
model and the DSGE-VAR model? There are two reasons. First, the posterior distribution of the
structural parameters is different between the two models as explained in Section 4.2.2. Second,
0 () of the DSGE model to fit
0 of the DSGE-VAR model may deviate from its prior mean
the data better. The data are informative on 0 through its cross product. For example, in the
scalar case in Section 2.2, the data do not tell the sign of , but have information on its absolute
0 ()0
0 () of the DSGE model does not conform to the data, 0 of the
size. Therefore, when
DSGE-VAR model may deviate to match the data.
4.3.2. Estimated structural shocks. Since the DSGE-VAR model follows the DSGE model for identification, the structural shocks t of the DSGE-VAR model (3) are comparable to the structural
shocks et of the DSGE model (1)-(2). Comparison of the estimated structural shocks of the two
models provides evidence that the DSGE-VAR model indeed recovers structural shocks of the
DSGE model although with some corrections. The estimated structural shocks of both models,
presented in panel (a) of Figure 3, closely follow each other except for the wedge for the real exchange rate. There are some differences, but most of the fluctuations occur at the same time in
the same direction. The scatter plots of panel (b) of the same figure confirm the finding.
Again, there are some differences between the two models for the same reasons as in the comparison of the initial response. The biggest disagreement exists for the wedge zt for the real exchange
rate dynamics. In the DSGE model, the wedge has a strong serial correlation as opposed to its
specification of being independent over time. In the DSGE-VAR model, the estimates of the wedge
are not completely different from those of the DSGE model, but they show weaker serial correlation. As the cross equation restrictions of the DSGE model are relaxed in the DSGE-VAR model,
more endogenous variables and their lags are involved and the real exchange rate dynamics become
richer. Therefore, the burden on the wedge to absorb what is not explained by endogenous variables
is reduced.
Panel (a) of Table 3 reports the posterior means and 90% error bands of the covariance and autocorrelation coefficients of the estimated structural shocks of the DSGE model. The DSGE model
assumes that the innovations to different structural shocks are mutually uncorrelated. However,
there are many cases where two innovations are significantly correlated. First, the innovations to
monetary policy shocks are significantly correlated with the innovations to other shocks, especially
the innovations to technology shocks and demand shocks. This suggests that the interest rate
24
fluctuations that are not explained by the systematic feedback rule are not purely random nonsystematic policy shocks. The actual monetary reaction function of monetary authorities seems to
be more complicated than the Taylor-rule type feedback function of the DSGE model. For example,
if the Fed responds to output growth as well as to the output gap, then the increase of the interest
rates responding to rapid output growth due to demand shocks will be mostly captured by the
non-systematic monetary shock in the current setup of the monetary policy feedback rule. Next,
the wedge zt for the real exchange rate dynamics is correlated with the innovations to other shocks.
The wedge is supposed to be uncorrelated with other shocks and capture misspecifications that
exist only in the real exchange rate equation such as random noises in the foreign exchange market. The significant correlation with other shocks suggests that the real exchange rate is affected
by variables other than the relative consumption and a pure noise. Lastly, there is a significant
autocorrelation for all the innovations whereas they are supposed to be serially uncorrelated.
Panel (b) of Table 3 reports the same posterior moments of the identified structural shocks of the
DSGE-VAR model. The covariance matrix of the structural shocks is assumed to be an identity
matrix in (3). Interestingly, the 90% error band for the covariance of any pair of the shocks
includes zero. This is not a joint test, but the individual elements of the estimated covariance
matrix of the structural shocks does not violate the specification with 90% probability. Although
the DSGE model from which we derive the identification scheme of the structural shocks violates
the specification of structural shocks, the DSGE-VAR model alleviates the problem by allowing
deviations from the DSGE model restrictions. The error bands of the autocorrelation coefficients
include zero except for the US monetary policy shock m and the wedge z, and even for these two
shocks the size of serial correlation is much smaller than in the DSGE model.
Comparison of estimated structural shocks raises a complicated issue on identification of a VAR
model based on a DSGE model. What if estimated structural shocks are very different between the
DSGE model and the VAR model? It is likely that the DSGE model is seriously misspecified in
this case and the VAR model picks up completely different dynamics. However, it may be that the
VAR model fails to follow the identifying restrictions of the DSGE model. I have two suggestions.
First, if it is the case that the DSGE model is seriously misspecified, then the estimate of the weight
on the DSGE model prior, (, ), will be very small. Second, we can use the simulated draws of
the structural parameters from the posterior distribution of the DSGE model in order to construct
the posterior distribution of the VAR model conditional on the draws. This way, we can remove
one of the two possible channels through which the DSGE model and the VAR model differ and
examine the effects of relaxing the restrictions imposed by the DSGE model.
5.1. Impulse response functions of the real exchange rate. Panel (a) of Figure 4 plots the
posterior mean and 68% error bands of the impulse response to a positive technology shock in the
US. The first plot presents the impulse response of the real exchange rate. As in Steinsson (2008),
the real exchange rate in the DSGE model responds in a delayed, hump-shaped fashion to the shock.
As the positive technology shock cuts the real marginal cost, inflation drops and the interest rate
is lowered in response in the US. However, the interest rate is not adjusted fully on impact due to
the monetary policy smoothing and keeps falling for a couple of periods. Then, it starts to rise in
response to a boom in output. This hump-shaped response in the interest rate of the US makes the
real interest rate shoot over the steady state level on impact and then decline eventually below the
steady state. Since consumption is the minus expected sum of future real interest rates, the change
in the sign of the impulse response of the real interest rate causes consumption to increase initially
and then decline. According to equation (25), the real exchange rate follows this hump-shaped
response of consumption in the US. Consumption in the Euro area does not respond strongly and
the real exchange rate is mainly driven by consumption in the US.
In the DSGE-VAR model, the real exchange rate has a similar response to that of the DSGE
model. It appears that the cross-equation restrictions of the DSGE model regarding the real
exchange rate are not at odds with the data. However, the impulse response of the nominal
exchange rate in the second plot and the impulse response of the price ratio in the third plot
suggest that, though the impulse response of the real exchange rate looks similar between the two
models, a mechanism by which the two models generate such a hump-shaped impulse response is
not the same. In the DSGE model, the US price level drops and the ratio of the Euro area price
to the US price rises sharply. The sharp increase of the price ratio counteracts the hump-shaped
response of the real exchange rate, and as a result the nominal exchange rate does not show a
hump in its response but decays exponentially. The DSGE-VAR model shows different responses.
The price ratio does not respond as strongly as in the DSGE model and as a result the nominal
exchange rate responds in the same delayed, hump-shaped way.
We can think of two possibilities for the difference between the DSGE model and the DSGEVAR model. The first is that the DSGE model may not be able to generate a sluggish response of
the price ratio, specifically for the US price, in response to a positive technology shock. If this is
true, then the solution would be to address the persistence of inflation. The second possibility is
that the similar response of the real exchange rate in the two models is just a coincidence and the
two models have different real exchange rate dynamics. Panel (b) of Figure 4 provides evidence
supporting the second possibility.
Panel (b) of Figure 4 presents the posterior mean and 68% impulse response to an expansionary
(negative) monetary policy shock in the US. Both real and nominal exchange rates respond in a
delayed, hump-shaped fashion to the shock in the DSGE-VAR model, whereas they overshoot and
decay rapidly in the DSGE model. The response of both models is well known in the literature.
The identified VAR literature (see, for example, Eichenbaum and Evans 1995 and Scholl and Uhlig
2008) finds that the real and nominal exchange rate responds in such a way to a monetary policy
26
shock. Also, Steinsson (2008) acknowledges the inability of a sticky-price DSGE model to generate
a hump-shaped response to a monetary policy shock. As to the technology shock, the DSGE-VAR
model predicts that the price ratio responds more weakly to the monetary shock than the DSGE
model. However, fixing the response of the price ratio would not help address the problem of the
DSGE model not being able to generate a hump-shaped response of the real exchange rate as well
as the nominal exchange rate in the case of a monetary policy shock.
Because of the weak and sluggish response of the price ratio to a technology shock and a monetary
policy shock, the real and nominal exchange rate has a similar response to both shocks in the DSGEVAR model. The following evidence suggests that a main cause of the problem is likely to exist in
the nominal exchange rate dynamics of the DSGE model. The log-linearized equilibrium conditions
of the DSGE model imply the uncovered interest parity (UIP)12
Et
st+1 = rt rt ,
(28)
where st is the log deviation of the nominal exchange rate from its steady state value and rt and
rt are the log deviation of the nominal interest rate in Home and in Foreign, respectively. Unless
there is a friction in international asset trades, a DSGE model always implies (28) with complete
asset markets or not. The condition (28) also holds regardless of the inflation dynamics or the
monetary policy rule. Although I computed the impulse response of the nominal exchange rate by
subtracting the price ratio from the real exchange rate in Figure 4, the resulting impulse response
satisfies (28).
Panel (a) of Figure 5 shows the posterior mean and 68% error bands of the impulse response of
the nominal exchange rate and the interest rate differential to a positive technology shock in the
US. In the DSGE model, the US interest rate goes down in response to falling inflation due to the
shock. The response of the interest rate is stronger in the US than in the Euro area and hence the
interest rate differential is negative. Therefore, by (28), the nominal exchange rate continues to fall.
On the contrary, in the DSGE-VAR model, the response of the interest rate differential is weaker
and the nominal exchange rate does not respond as predicted by (28). One way for the DSGE
model to generate a hump-shaped response of the nominal exchange rate is that the interest rate
differential falls below zero in the beginning and goes up gradually. The DSGE model is not likely
to be able to generate such a response of the interest rate differential with standard parameter
values, considering the fact that the shock originates in the US and the interest rate will respond
more strongly in the US.
The expected excess US dollar return of investing in a Euro bond over the return of investing in
a US dollar bond, presented in the third plot of Panel (a), confirms that the UIP does not hold in
the DSGE-VAR model. The (log) expected excess return pk between period 1 and period k (k 2)
is defined as
pk =
k1
X
rj rj + 400 [
sj+1 sj ] .
j=1
12The model includes the wedge z in the condition. However, the wedge is muted when I compute the impulse
t
The expected excess return is always zero for k 2 in the DSGE model, but not in the DSGE-VAR
model. It is well known that the UIP does not hold in the data. Even though we take the DSGE
model to the data, its structure keeps it from deviating from the UIP. This is a good example that
we have to be careful in interpreting a result from an estimated DSGE model.
The impulse response to an expansionary monetary policy in the US is presented in Panel (b). In
both models, the impulse response of the interest rate differential is very similar. Again, however,
the nominal exchange rate decays exponentially in the DSGE model while it responds in a delayed,
hump-shaped way in the DSGE-VAR model. The expected excess return is not zero in the DSGEVAR model.
5.2. Robustness exercise. I assume that the wedge for the real exchange rate, zt , is mutually
independent with other structural shocks. To check whether this assumption is restrictive and drives
the result on the real exchange rate dynamics through its effects on the posterior distribution of the
structural parameters and the contemporaneous coefficient matrix, I relax the assumption and allow
the wedge to be correlated with the demand shocks gt and gt . Consequently, there is a possibility
that the demand shocks may enter the real exchange rate equation (26) and affect the real exchange
rate. By using a more general shock structure, the DSGE-VAR model has a better chance to match
the data. I do not allow its correlation with the technology shocks and the monetary shocks to
keep them identified. Now the DSGE-VAR model is partially identified and shocks other than the
technology shocks and the monetary shocks are left unidentified. The robustness exercise can be
implemented by modifying the covariance matrix of the DSGE structural shock, et . I do not repeat
estimation of the all the DSGE-VAR models but estimate the DSGE-VAR model for = 0.7 and
= 52 . Figure 6 displays the impulse response of the log real exchange rate, the log nominal
exchange rate, and the log price ratio. All the impulse responses are similar to those when the
wedge is assumed to be uncorrelated with other shocks.
28
p (0 , | Y )
min
,1 .
p (0 , | Y )
Otherwise, keep .
(3) [Draw 0 ]
(a) Draw u U [0, 1].
(b) [M21 ] If u
2 +
1 , draw 0 from N +
() .
0 () , c2 V
probability
p (0 , | Y ) /q (0 |, M21 )
,1 .
min
p (0 , | Y ) /q (0 |, M21 )
Otherwise, keep 0 .
(c) [M22 ] If u > 1 , draw 0 from N 0 , c23 V + () . Keep 0 with the probability
p (0 , | Y )
,1 .
min
p (0 , | Y )
Otherwise, keep 0 .
(4) Go to (2).
29
Parameter
Description
Distribution
Mean
S.D.
Annualized steady-state
real interest rate,
Normal
[0.976, 6.997]
Normal
0.5
0.1
[0.337, 0.665]
Gamma
[1.402, 4.540]
Beta
0.66
0.1
[0.500, 0.829]
Beta
0.66
0.1
[0.498, 0.825]
Gamma
[1.392, 4.568]
Gamma
[5.731, 12.140]
Gamma
1.5
0.3
[0.998, 1.975]
Beta
0.94
0.03
[0.895, 0.986]
Beta
0.7
0.15
[0.467, 0.943]
Beta
0.7
0.15
[0.463, 0.936]
Gamma
0.125
0.05
[0.045, 0.201]
Gamma
0.125
0.05
[0.048, 0.203]
Gamma
1.7
0.3
[1.225, 2.192]
Gamma
1.7
0.3
[1.194, 2.161]
Beta
0.6
0.2
[0.284, 0.929]
Beta
0.6
0.2
[0.292, 0.934]
Beta
0.6
0.2
[0.287, 0.932]
Beta
0.6
0.2
[0.290, 0.931]
Inv Gamma
10
[6.846, 13.034]
Inv Gamma
0.5
[0.068, 0.949]
Inv Gamma
0.5
[0.067, 0.941]
Inv Gamma
0.5
[1.712, 3.276]
Inv Gamma
0.5
[1.703, 3.270]
Inv Gamma
0.1
0.3
[0.012, 0.190]
Inv Gamma
0.1
0.3
[0.012, 0.188]
Normal
0.5
[1.169, 2.828]
Normal
0.5
[1.180, 2.822]
Gamma
0.3
[0.526, 1.481]
Beta
0.6
0.05
[0.518, 0.682]
Parameter
Prior
Posterior: DSGE
Posterior: DSGE-VAR
Mean
S.D.
90% HPD
Mean
S.D.
90% HPD
Mean
S.D.
90% HPD
[0.976, 6.997]
3.190
0.601
[2.191, 4.155]
1.145
0.575
[0.285, 1.942]
0.5
0.1
[0.337, 0.665]
0.344
0.032
[0.291, 0.397]
0.259
0.034
[0.208, 0.308]
[1.402, 4.540]
7.565
1.293
[5.454, 9.644]
4.423
1.066
[2.711, 6.118]
0.66
0.1
[0.500, 0.829]
0.528
0.046
[0.452, 0.604]
0.501
0.053
[0.414, 0.588]
0.66
0.1
[0.498, 0.825]
0.610
0.044
[0.537, 0.683]
0.617
0.049
[0.536, 0.696]
[1.392, 4.568]
3.192
0.762
[1.939, 4.395]
3.147
0.916
[1.660, 4.596]
[5.731, 12.140]
8.599
1.771
[5.707, 11.441]
7.564
1.686
[4.812, 10.258]
1.5
0.3
[0.998, 1.975]
1.213
0.247
[0.813, 1.611]
1.464
0.293
[0.978, 1.932]
0.94
0.03
[0.895, 0.986]
0.963
0.007
[0.951, 0.975]
0.964
0.012
[0.945, 0.983]
0.7
0.15
[0.467, 0.943]
0.908
0.012
[0.889, 0.929]
0.876
0.028
[0.832, 0.922]
0.7
0.15
[0.463, 0.936]
0.874
0.016
[0.848, 0.900]
0.827
0.036
[0.770, 0.886]
0.125
0.05
[0.045, 0.201]
0.056
0.014
[0.033, 0.078]
0.125
0.047
[0.050, 0.197]
0.125
0.05
[0.048, 0.203]
0.048
0.013
[0.028, 0.069]
0.126
0.046
[0.053, 0.197]
1.7
0.3
[1.225, 2.192]
2.110
0.282
[1.649, 2.569]
1.638
0.284
[1.170, 2.092]
1.7
0.3
[1.194, 2.161]
2.241
0.257
[1.814, 2.657]
1.647
0.278
[1.188, 2.091]
0.6
0.2
[0.284, 0.929]
0.981
0.005
[0.974, 0.989]
0.852
0.057
[0.764, 0.943]
0.6
0.2
[0.292, 0.934]
0.992
0.003
[0.988, 0.997]
0.897
0.053
[0.821, 0.975]
0.6
0.2
[0.287, 0.932]
0.811
0.048
[0.732, 0.889]
0.457
0.114
[0.270, 0.645]
0.6
0.2
[0.290, 0.931]
0.787
0.038
[0.725, 0.851]
0.474
0.132
[0.257, 0.693]
10
[6.846, 13.034]
12.833
0.986
[11.223, 14.436]
5.585
0.734
[4.399, 6.755]
0.5
[0.068, 0.949]
1.546
0.241
[1.158, 1.923]
1.243
0.235
[0.875, 1.606]
0.5
[0.067, 0.941]
1.350
0.207
[1.016, 1.671]
1.202
0.229
[0.843, 1.560]
0.5
[1.712, 3.276]
2.481
0.488
[1.705, 3.227]
2.802
0.576
[1.893, 3.687]
0.5
[1.703, 3.270]
2.464
0.476
[1.714, 3.201]
2.659
0.538
[1.805, 3.476]
0.1
0.3
[0.012, 0.190]
0.130
0.010
[0.114, 0.146]
0.105
0.013
[0.084, 0.126]
0.1
0.3
[0.012, 0.188]
0.124
0.010
[0.107, 0.140]
0.092
0.010
[0.075, 0.108]
0.5
[1.169, 2.828]
2.260
0.443
[1.538, 2.994]
1.830
0.418
[1.149, 2.522]
0.5
[1.180, 2.822]
1.764
0.424
[1.057, 2.451]
2.036
0.437
[1.325, 2.760]
0.3
[0.526, 1.481]
0.874
0.033
[0.821, 0.925]
0.913
0.124
[0.724, 1.108]
0.6
0.05
[0.518, 0.682]
0.511
0.051
[0.427, 0.596]
0.561
0.050
[0.478, 0.643]
31
(a) DSGE
1.07
Wedge
Demand
Technology
Monetary
AR1
[0.84, 1.32]
0.11
1.00
mean
[0.05, 0.17]
[0.76, 1.25]
0.32
-0.06
1.02
[0.26, 0.38]
[-0.15, 0.02]
[0.79, 1.28]
-0.08
-0.27
-0.02
1.00
[-0.14, -0.01]
[-0.36, -0.19]
[-0.07, 0.04]
[0.77, 1.25]
0.08
0.18
-0.13
0.35
1.00
[0.03, 0.14]
[0.12, 0.25]
[-0.21, -0.05]
[0.24, 0.45]
[0.77, 1.25]
0.33
0.45
0.11
-0.01
0.22
1.01
[0.23, 0.43]
[0.36, 0.55]
[0.06, 0.16]
[-0.13, 0.11]
[0.16, 0.29]
[0.78, 1.27]
0.26
-0.07
0.39
0.05
-0.07
0.21
1.01
[0.18, 0.35]
[-0.10, -0.03]
[0.31, 0.47]
[-0.02, 0.12]
[-0.19, 0.04]
[0.15, 0.27]
[0.78, 1.27]
0.940
0.40
0.30
-0.21
-0.19
0.44
0.52
[0.936, 0.945]
[0.36, 0.44]
[0.27, 0.33]
[-0.29, -0.09]
[-0.31, -0.05]
[0.40, 0.49]
[0.47, 0.58]
(b) DSGE-VAR
0.81
Wedge
Demand
Technology
Monetary
AR1
[0.60, 1.04]
0.00
1.01
mean
[-0.16, 0.15]
[0.79, 1.26]
0.02
0.04
1.00
[-0.13, 0.17]
[-0.11, 0.18]
[0.78, 1.26]
0.04
-0.01
0.02
1.04
[-0.15, 0.23]
[-0.19, 0.16]
[-0.15, 0.19]
[0.80, 1.32]
-0.04
0.06
-0.06
0.08
1.02
[-0.21, 0.13]
[-0.10, 0.22]
[-0.23, 0.10]
[-0.09, 0.26]
[0.78, 1.28]
-0.04
0.09
0.06
0.06
0.05
1.00
[-0.20, 0.11]
[-0.06, 0.24]
[-0.09, 0.20]
[-0.11, 0.24]
[-0.10, 0.22]
[0.77, 1.25]
-0.01
0.06
0.10
0.05
0.01
0.00
1.04
[-0.15, 0.14]
[-0.08, 0.20]
[-0.04, 0.25]
[-0.12, 0.22]
[-0.16, 0.18]
[-0.14, 0.14]
[0.81, 1.30]
0.35
0.05
0.04
0.02
-0.10
0.14
0.12
[0.17, 0.50]
[-0.10, 0.19]
[-0.11, 0.20]
[-0.14, 0.19]
[-0.26, 0.07]
[0.00, 0.27]
[-0.03, 0.27]
Table 3. Posterior means and 90% error bands of the covariance and autocorrelation coefficients of the innovations to structural shocks
Note: 90% error bands are [5% quantile and 95% quantile]. For the DSGE-VAR, ,
= 0.7, 52 . The
posterior distribution of the DSGE structural shocks is constructed using the Kalman smoother. The posterior distribution of the DSGE-VAR structural shocks is constructed by identifying reduced-form residuals
by 0 .
32
(a) DSGE
Wedge
Output
Output*
Inflation
Inflation*
Interest rate
Interest rate*
Real exchange rate
Nominal exchange rate
Demand shock
Technology shock
0.62
86.95
2.26
9.01
0.72
0.41
0.03
[0.20, 1.29]
[75.92, 94.05]
[0.96, 4.13]
[2.94, 19.49]
[0.24, 1.53]
[0.15, 0.80]
[0.01, 0.07]
1.58
3.71
86.21
2.53
5.47
0.08
0.42
[0.58, 3.06]
[1.57, 6.71]
[78.25, 92.56]
[0.72, 5.86]
[2.30, 10.44]
[0.02, 0.18]
[0.19, 0.76]
6.15
0.55
0.16
82.38
0.58
10.10
0.07
[1.65, 12.63]
[0.04, 2.00]
[0.03, 0.42]
[71.86, 92.11]
[0.17, 1.26]
[4.30, 17.02]
[0.03, 0.14]
22.59
0.59
2.12
2.44
67.48
0.41
4.37
[11.48, 34.81]
[0.18, 1.28]
[0.39, 4.80]
[0.82, 5.29]
[54.05, 80.49]
[0.18, 0.72]
[2.19, 7.28]
0.69
2.28
0.04
83.31
0.81
12.81
0.06
[0.11, 1.73]
[0.22, 5.63]
[0.00, 0.13]
[67.74, 94.70]
[0.22, 1.84]
[2.89, 27.13]
[0.02, 0.12]
2.69
0.97
0.36
3.50
74.38
0.27
17.83
[0.93, 5.20]
[0.22, 2.26]
[0.04, 1.13]
[0.87, 8.33]
[59.73, 87.51]
[0.08, 0.58]
[7.50, 30.53]
0.00
36.55
36.36
19.04
7.69
0.22
0.14
[0.00, 0.00]
[21.49, 50.26]
[22.23, 48.89]
[4.15, 43.84]
[1.60, 19.39]
[0.05, 0.52]
[0.03, 0.34]
0.17
36.71
43.03
4.54
2.72
9.83
3.00
[0.06, 0.34]
[24.13, 49.29]
[31.48, 54.89]
[0.71, 10.61]
[0.76, 5.41]
[4.45, 17.88]
[1.12, 6.13]
(b) DSGE-VAR
Wedge
Output
Output*
Inflation
Inflation*
Interest rate
Interest rate*
Real exchange rate
Nominal exchange rate
Demand shock
Technology shock
5.79
30.94
4.87
14.57
17.74
4.23
21.87
[0.55, 18.29]
[11.67, 53.42]
[0.34, 15.29]
[1.48, 35.71]
[2.25, 39.25]
[0.31, 14.06]
[4.94, 42.50]
6.54
17.11
45.43
6.14
9.70
3.69
11.40
[0.65, 18.79]
[3.52, 36.09]
[24.75, 64.34]
[0.83, 16.98]
[1.13, 24.88]
[0.35, 12.08]
[1.63, 25.81]
12.68
9.00
6.22
46.85
9.44
9.91
5.90
[3.11, 29.03]
[2.47, 19.06]
[1.75, 13.46]
[27.28, 64.62]
[2.56, 20.47]
[3.05, 20.61]
[1.46, 13.61]
14.51
3.94
4.81
20.22
48.09
3.24
5.19
[2.80, 35.08]
[0.78, 10.16]
[0.96, 11.90]
[5.72, 38.63]
[27.55, 67.77]
[0.67, 8.20]
[1.46, 10.92]
12.77
16.54
7.38
14.52
4.35
36.55
7.89
[1.85, 31.03]
[5.30, 32.76]
[1.03, 18.35]
[4.03, 29.47]
[0.55, 12.81]
[18.45, 55.50]
[0.85, 20.28]
10.09
5.14
15.94
9.39
21.44
5.60
32.40
[0.86, 29.15]
[0.30, 17.26]
[3.11, 34.07]
[0.69, 27.81]
[4.89, 44.40]
[0.28, 18.69]
[12.10, 54.61]
17.20
9.28
7.99
13.21
16.43
14.04
21.85
[0.38, 54.00]
[0.12, 33.87]
[0.08, 28.51]
[0.15, 45.20]
[0.21, 49.72]
[0.14, 44.73]
[0.50, 54.69]
20.77
8.52
7.14
11.13
11.30
17.87
23.27
[0.49, 59.72]
[0.10, 31.19]
[0.06, 26.17]
[0.11, 39.88]
[0.12, 38.61]
[0.22, 51.93]
[0.60, 56.86]
Table 4. Posterior means and 90% error bands of the variance decomposition of
the forecast errors
Note: In percentage. The horizon
the shocks. 90% error bands are [5% quantile and 95%
is
3 years after
2
33
(a) DSGE
Up-life
Log real exchange rate
Half-life
Log real exchange rate
mean
median
90% error band
mean
median
90% error band
7.0
7.0
[5.0, 10.0]
0.0
0.0
[0.0, 0.0]
4.8
5.0
[3.0, 7.0]
0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]
mean
median
90% error band
12.8
12.0
[9.0, 17.0]
10.5
10.0
[8.0, 13.0]
3.1
3.0
[3.0, 4.0]
3.1
3.0
[3.0, 4.0]
(b) DSGE-VAR
Up-life
Log real exchange rate
Half-life
Log real exchange rate
mean
median
90% error band
mean
median
90% error band
8.6
3.0
[0.0, 39.0]
8.8
3.0
[0.0, 39.0]
13.0
8.0
[0.0, 39.0]
13.3
8.0
[0.0, 39.0]
13.6
11.0
[0.0, 39.0]
16.3
13.0
[0.0, 39.0]
19.6
19.0
[0.0, 39.0]
20.9
20.0
[0.0, 39.0]
mean
median
90% error band
14.6
10.0
[0.0, 39.0]
18.2
16.0
[0.0, 39.0]
19.6
18.0
[0.0, 39.0]
24.5
26.0
[0.0, 39.0]
Table 5. Posterior mean, median and 90% error band of up-lives and half-lives
Note: Numbers are reported in quarters. Following Steinsson (2008), the up-life is defined as the time it
takes for the exchange rate to peak after the impact. The half-life is defined as the time it takes for the
exchange rate to fall below half of the initial response. 90% error bands are [5% quantile and 95% quantile].
For the DSGEVAR, ,
= 0.7, 52 .
34
870
880
890
900
= 52
910
= 72
920
= 32
940
930
= 92
DSGE: 1001.956
0.1
0.2
0.7
1.5
, scale= (1 + )
= 52
= 72
= 92
0.4
0.001
0.000
0.000
0.5
0.020
0.011
0.001
0.6
0.094
0.064
0.009
0.7
0.131
0.124
0.013
0.8
0.130
0.129
0.017
0.9
0.061
0.067
0.011
1.0
0.030
0.036
0.006
All the other models have a posterior probability lower than 0.0005.
35
1.1
0.012
0.013
0.002
1.2
0.005
0.006
0.001
1.3
0.002
0.002
0.000
1.4
0.001
0.001
0.000
0.2
0.0
0.2
10
8
2
0.3
0.1
0.1
1
0
10
15
10 20 30 40 50
2.5
1.5
0.2
2.0
1.5
DSGE
DSGEVAR
0.6
0.4
0.2
0.0
0.0
0 5
0.5
15
1.0
25
8 10
6
y
35
0.5
0.3
0.1
0.1
0.1
0.0
0.1
0.2
0.20
0.05
0.10
20
15
10
5
0
0 5
20
15
60
25
0 1 2 3 4 5 6 7
100
0.2
0.6
0.4
DSGE
DSGEVAR
0.7
0.5
r
0.3
20
40
0.2
12
0 2 4 6 8
0.0
60
0.2
0.15
0.00
0.10
36
0 2
0 2
0 2
0 2
0 2
3
0 2
0 2
1990
2005
DSGE 1995 2000
DSGEVAR
Time
0 2
3
1985
0 2
0 2
0 2
3 2 1
0
1
DSGE
DSGEVAR
3
0 2
0 2
(b)
0 2
(a)
10
20
30
40
2.0
DSGE
DSGEVAR
1.0
0.5
0.0
0.5
log(P*/P)
1.0
1.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.0
0.5
0.0
0.5
1.0
1.5
2.0
10
20
30
40
10
20
30
40
30
40
10
20
30
40
1.5
DSGE
DSGEVAR
0.5
0.0
0.5
log(P*/P)
1.0
1.5
1.0
0.5
0.0
0.5
0.5
0.5
0.0
1.0
1.5
10
20
30
40
10
20
38
10
20
30
40
DSGE
DSGEVAR
Excess Return
3
2
1
0
1
2
r r
10
20
30
40
10
20
30
40
30
40
10
20
30
40
3
3
Excess Return
3
2
1
0
1
2
3
r r
DSGE
DSGEVAR
10
20
30
40
10
20
10
20
30
40
2.0
0.5
0.0
0.5
log(P*/P)
1.0
1.5
2.0
1.5
1.0
0.5
0.0
0.5
0.5
0.0
0.5
1.0
1.5
2.0
DSGEVAR
10
20
30
40
10
20
30
40
30
40
1.5
log(P*/P)
1.0
1.5
1.0
0
10
20
30
40
0.5
0.0
0.5
0.0
0.0
0.5
1.0
1.5
DSGEVAR
10
20
30
40
10
20
Figure 6. Impulse response functions of exchange rates and price ratios with a
general covariance structure
Note: The impulse response functions are computed from the DSGE-VAR model that allows a correlation of
the wedge in the real exchange rate equation zt with the demand shocks. The thick and thin lines represent
the pointwise means and
respectively. The error bands are [16% quantile, 84% quantile].
68%
error bands,
2
40
aF
mF
mH
aH
gF
gH
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Time
Figure 7. Impulse response of the DSGE model and the DSGE-VAR model at the
posterior mode of the DSGE model
Note: The red solid lines represent the response of the DSGE model and the blue dashed lines represent the
response of the DSGE-VAR model. Each plot is the response of the variable on the left hand side in response
to the shock on the top. The impulse response of the DSGE-VAR model is computed as the DSGE model
prior is strictly imposed. The scale of the y-axis for output growth, inflation and interest rates are matched
between the US and the Euro area so that they are comparable.
41
References
An, S., and F. Schorfheide (2007): Bayesian Analysis of DSGE Models, Econometric Reviews, 26(2),
113172.
Backus, D., and G. Smith (1993): Consumption and real exchange rates in dynamic economies with
non-traded goods, Journal of International Economics, 35(3-4), 297316.
Benigno, G. (2004): Real exchange rate persistence and monetary policy rules, Journal of Monetary
Economics, 51(3), 473502.
Bergin, P., and R. Feenstra (2001): Pricing-to-market, staggered contracts, and real exchange rate
persistence, Journal of International Economics, 54(2), 333359.
Bouakez, H. (2005): Nominal rigidity, desired markup variations, and real exchange rate persistence,
Journal of International Economics, 66(1), 4974.
Burstein, A., M. Eichenbaum, and S. Rebelo (2005): Large Devaluations and the Real Exchange
Rate, Journal of Political Economy, 113(4), 742784.
Calvo, G. (1983): Staggered prices in a utility-maximizing framework, Journal of Monetary Economics,
12(3), 383398.
Canova, F., and L. Sala (2009): Back to square one: Identification issues in DSGE models, Journal of
Monetary Economics, 56(4), 431449.
Chari, V. V., P. J. Kehoe, and E. R. Mcgrattan (2002): Can Sticky Price Models Generate Volatile
and Persistent Real Exchange Rates?, Review of Economic Studies, 69(3), 533563.
Del Negro, M., and F. Schorfheide (2004): Priors from General Equilibrium Models for VARS,
International Economic Review, 45(2), 643673.
(2006): How good is what youve got? DSGE-VAR as a toolkit for evaluating DSGE models,
Economic Review, 91(2).
Del Negro, M., F. Schorfheide, F. Smets, and R. Wouters (2007): On the Fit of New Keynesian
Models, Journal of Business & Economic Statistics, 25(2), 123143.
Demmel, J., and P. Koev (2006): Accurate and efficient evaluation of Schur and Jack functions,
Mathematics of Computation, 75, 223239.
Eichenbaum, M., and C. L. Evans (1995): Some Empirical Evidence on the Effects of Shocks to
Monetary Policy on Exchange Rates, The Quarterly Journal of Economics, 110(4), 9751009.
Engel, C. (1999): Accounting for U.S. Real Exchange Rate Changes, Journal of Political Economy,
107(3), 507538.
Fernndez-Villaverde, J., J. F. Rubio-Ramrez, T. J. Sargent, and M. W. Watson (2007):
ABCs (and Ds) of Understanding VARs, American Economic Review, 97(3), 10211026.
Geweke, J. (1999): Using Simulation Methods for Bayesian Econometric Models: Inference, Development,
and Communication, Econometric Reviews, 18(1), 173.
(2005): Contemporary Bayesian Econometrics and Statistics (Wiley Series in Probability and Statistics). Wiley-Interscience, 1 edn.
Hamilton, J. D., D. F. Waggoner, and T. Zha (2007): Normalization in Econometrics, Econometric
Reviews, 26(2), 221252.
Ingram, B. F., and C. H. Whiteman (1994): Supplanting the Minnesota prior Forecasting macroeconomic time series using real business cycle model priors, Journal of Monetary Economics, 34(3), 497510.
Iskrev, N. (2010): Local identification in DSGE models, Journal of Monetary Economics, 57(2), 189202.
42
Justiniano, A., G. E. Primiceri, and A. Tambalotti (2010): Investment shocks and business cycles,
Journal of Monetary Economics, 57(2), 132145.
King, R. G., C. I. Plosser, and S. T. Rebelo (1988): Production, growth and business cycles : II.
New directions, Journal of Monetary Economics, 21(2-3), 309341.
Koev, P., and A. Edelman (2006): The efficient evaluation of the hypergeometric function of a matrix
argument, Mathematics of Computation, 75, 833846.
Komunjer, I., and S. Ng (2010): Dynamic Identification of DSGE Models, .
Lubik, T., and F. Schorfheide (2006): A Bayesian Look at New Open Economy Macroeconomicschap. 5.
Magnus, J. R., and H. Neudecker (1999): Matrix Differential Calculus with Applications in Statistics
and Econometrics, 2nd Edition. John Wiley & Sons.
Muirhead, R. J. (1982): Aspects of multivariate statistical theory. Wiley.
Rabanal, P., and J. Rubio-Ramirez (2005): Comparing New Keynesian models of the business cycle:
A Bayesian approach, Journal of Monetary Economics, 52(6), 11511166.
Scholl, A., and H. Uhlig (2008): New evidence on the puzzles: Results from agnostic identification on
monetary policy and exchange rates, Journal of International Economics, 76(1), 113.
Sims, C., D. Waggoner, and T. Zha (2008): Methods for inference in large multiple-equation Markovswitching models, Journal of Econometrics, 146(2), 255274.
Sims, C. A. (2005): Dummy Observation Priors Revisited, .
(2007): Comment on On the Fit of New Keynesian Models by Del Negro, Schorfheide, Smets and
Wouters, Journal of Business and Economic Statistics, 25(2), 152154.
Sims, C. A., J. H. Stock, and M. W. Watson (1990): Inference in Linear Time Series Models with
some Unit Roots, Econometrica, 58(1).
Sims, C. A., and T. Zha (2006): Does monetary policy generate recessions?, Macroeconomic Dynamics,
10(02), 231272.
Smets, F., and R. Wouters (2003): An Estimated Dynamic Stochastic General Equilibrium Model of
the Euro Area, Journal of the European Economic Association, 1(5), 11231175.
Smets, F., and R. Wouters (2007): Shocks and frictions in us business cycles: A bayesian DSGE
approach, American Economic Review, 97(3), 586606.
Steinsson, J. (2008): The dynamic behavior of the real exchange rate in sticky price models, American
Economic Review, 98(1), 519533.
Taylor, J. B., and V. Wieland (2009): Surprising Comparative Properties of Monetary Models: Results
from a New Data Base, National Bureau of Economic Research Working Paper Series, pp. 14849+.
Tierney, L. (1994): Markov Chains for Exploring Posterior Distributions, The Annals of Statistics, 22(4),
17011728.
Yun, T. (1996): Nominal price rigidity, money supply endogeneity, and business cycles*1, Journal of
Monetary Economics, 37(2), 345370.
43
for t = 1, , T where ut i.i.d. N (0, u ) and u = 00 0 . Following Sims, Stock, and Watson
(1990) I rewrite the model as
h
0
0
yt0 = 0 + 0 t + yt1
0 0 (t 1) B1 + + ytp
0 0 (t p) Bp + u0t ,
where
and
= 0 [I B1 Bp ]1 .
The demeaned and detrended series yt0 0 0 t is covariance stationary. That is, the roots of
I B1 z B2 z 2 Bp z p = 0
are greater than one in modulus. Let us define yt = yt0 0 0 t and rewrite the model as
0
yt0 = x0
t B + ut ,
0
0
0 , , y 0
and B = , , B10 , , Bp0 . Stacking the observations, I
where xt = 1, t, yt1
tp
obtain a matrix form representation
Y = X B + u,
where Y = (y1 , , yT )0 , X = (x1 , , xT )0 , and u = (u1 , , uT )0 .
0
Now suppose that we simulate data Y = yp+1 , , y0 , y1 , , yT of length T from the DSGE
accordingly. Then, the likelihood of the simulated data is
t , and X
model. Define yt , x
0
1
X X
B BT
,
p Y |B , u exp tr u B BT
2
1
= X
0 X
0 0
0 X
1
:
, u X
0 X
B |Y , u N B
T
1
As for the case with no deterministic time trend, I want to replace the simulated moments with
the population moments. I use the following rescaling matrix in Sims, Stock, and Watson (1990)13
1/2
T
0
T3/2
.
T1/2
T =
..
.
1/2
0
T
13The rescaling matrix is used for asymptotics as the sample size goes to infinity in Sims, Stock, and Watson (1990).
44
Then,
P
T2 Tt=1 t
0
2 PT
T
3
2
0
=
t=1 t T
t=1 t
T
0
P
T
1
0
0
T
2,t x
2,t
t=1 x
T
X
1 x
x
0 1
T
t t
t=1
1
2
1
3
0 T
0
0
1 PT
2,t x
0
t=1 x
2,t
0
0 , , y
0
where x
2,t = yt1
tp
. For simplicity, I approximate the sums of time indices. When T
is large enough, the approximation error is negligible. If I repeat the simulation of Y ofthe same
P
length, then the mean of T1 T x
x
0 in the lower right corner will converge to E x
x
0 .
t=1
2,t 2,t
2,t 2,t
T
X
1 1
1
x
t x
0
= u 1
t
T
T
T
T
0 X
u X
1
t=1
is substituted for by
u T QT
where
1
2T
1 2
3T
1
1 T
QT = 2
0
1
0
.
E x
2,t x
0
2,t
0
0
and time trend term separately. Let B2 = B1 , , Bp . Then,
B
2,T
T
1 X
= x
2,t x
0
2,t
T t=1
T
1X
x
2,t yt0 ,
T t=1
1
since with the demeaned
() = E x
and thus B
2,t x
0
E x
2,t yt0 can be used in place of B
2
2,t
2,T
and detrended yt the model is stationary. I let the coefficients on the constant and time trend
() = ( , ) of the DSGE model given . Then, the prior
term B1 = ( , )0 be equal to B
1
0
0
0
distribution of B = (B1 , B2 ) is
() , u T Q
B |0 N B
T
1
0
() = B
()0 , B
()0 . The distribution of the reduced-form coefficients and can
where B
1
2
=
be obtained from the result. Let B
B10
B = P B,
45
Bp0
0
. Then,
where
P =
1 0 (0 + 0 )
0 1
( 0 )
0 0
.. ..
. .
I
0 0
It follows that
(0 + p 0 )
( 0 )
B = (I P ) B ,
() = P B
(),
and thus for B
i1
h
(), u T P 0 1 Q P 1
B |0 N B
,
T
or
() , u T
B| N B
P0
1
QT P 1
i1
A.2. Probability density of the impact matrix in the DS identification method. The
density of the Inverted-Wishart prior for a covariance matrix u is proportional to
+n+1
1 h 2 1 i
1 2
(29)
p (u ) u
,
exp tr S u
2
where > n 1 and n is the dimension of u . S 2 is positive definite symmetric. The posterior
density in DS is of the same form. Note that the Jacobian of the transformation from u to its
Cholesky factor L is
n
n+1 Y
i
2
lii
,
2n LL0
i=1
i=1
This kernel is not invariant to the reordering of the variables because of the power i. Since DS
1
let 1
0 = L () with () fixed, the Jacobian of transforming L into 0 is 1. Therefore, the
kernel (29) for u results in the following density kernel for 1
0
p
1
0
(
"
#) n
1
Y
1 1 0 2
1
1
1 0
2
0 0 exp tr S 0 0
2
i=1
1
!i
X
0ik ()ki
, and
Note that
nT
2
p (Y |B, 0 , ) = (2)
h
i
0 T
0 2 exp 1 tr 0 0 (Y XB)0 (Y XB) ,
0
0
n
2
T xx , T
0
0
e
e
B B ()
,
T xx , T
0 B B ()
0 k
0 2
p (B|0 , ) = (2)
0
nk
2
1
exp tr
2
2
n2
p (0 |) = (2)
n
0
1
2 2
2
0 0 ()
,
S0 exp tr S0 0 0 ()
and,
p (B, 0 |Y, ) = (2)
n
2
T xx , T + X 0 X
0
be
be
,
0 0 B B
Txx , T + X 0 X B B
()
()
0 k
0 2
0
nk
2
1
exp tr
2
T
b2
be2 0
0 () 0 0 2 exp 1 tr S
Cn1 T + n, Se , S02 ,
0 0
0
2
0
1
0 ()
0 ()
0
.
exp tr S02 0
2
Therefore,
p (Y |) =
p (Y |B, 0 , ) p (B|0 , ) p (0 |)
p (B, 0 |Y, )
n
2
n
2
n
T xx , T (2) 2 S02 2
.
n
be2 2
2 1
0
T + n; S ; S0 ; 0 ()
T xx , T + X X Cn
(2)
nT
2
p (Y | B, 0 , ) p (B | 0 , )
p (B | Y, 0 , )
(2)
nT
2
2
n
T
b
2
T xx , T |00 0 | 2 exp 12 tr Se (00 0 )
.
n
2
T xx , T + X 0 X
0 2 exp 1 tr S
0 0
exp tr S0 0 0 ()
0 0 ()
0
2
2
is maximized with respect to 0 . Result (2) and (3) immediately follow from Theorem 5 in Appendix
B.
47
(30)
(31)
1
1
p
where B(L)
= In +
0 (1 L + + p L ); ut = 0 t is an n1 vector of reduced-form shocks and
1
1
ut N 0, (00 0 )
. Let Bs = 0 s for s = 1, , p and u = (00 0 )1 . The reduced-form
model (31) can be written compactly in matrix form
Y = XB + u,
(32)
0
0 , , y0
yt1
tp ; u = (u1 , , uT ) ; and
p (0 ) = Cn1 , Se2 , S 2 , 0
0
i
i
1 h
1 h
exp tr Se2 00 0
,
exp tr S 2 0 0 0 0
2
2
00 0
B|0 N B, 00 0
1
H 1 .
where > n and Se2 and S 2 are positive definite symmetric. Then the posterior distribution of 0
and B is
b2
p (0 |Y ) = Cn1 T + , Se2 + Se , S 2 , 0
0 T +n
1
be2
2
0
e
2
0 0
exp tr S + S
0 0
2
0
i
1 h 2
exp tr S
2
0 0
0 0
and
B|, Y N B, 00 0
1
X 0X + H
where
B =
0
1 0
X X +H
X Y + HB ,
48
1
b2
Se
0
= Y 0 Y + B 0 HB B X 0 X + H B.
When = n and Se2 = 0, that is, the prior for 0 is a normal distribution, the posterior distribution
for 0 and B is
be2 2
p (0 |Y ) =
T + n, S , S , 0
2
0 T
0
i
1
1 h 2
be
0
2
exp tr S 0 0 0 0
0 0 exp tr S 0 0
,
Cn1
and
B|, Y N B, 00 0
1
X 0X + H
1
X
X
+
H
exp
B
B
B
0
0 0
2
0
i
1 h
1
b2
00 0
exp tr S 2 0 0 0 0
,
exp tr Se2 + Se
2
2
where
0
1 0
X X +H
X Y + HB ,
0 0
0
0
B =
b
S
= Y Y + B HB B X X + H B.
Conditional on 0 and the data, the posterior distribution for B is straightforward. Theorem 10
gives us the marginal posterior density for 0 .
The result of the case where = n and Se2 = 0 is straightforward.
The normal prior for 0 in (7) is a special case of the prior in Proposition 4, with = n and
Se2 = 0. If we keep the kernel of the prior for 0 of the DS procedure as a probability density
kernel for 0 and modify it by multiplying a kernel of the normal prior, we will get the prior in
Proposition 4. The prior for 0 turns out to be conjugate. When we impose a flat prior on B, the
result holds with H = 0.
The posterior density function has a unique global maximum. Let us define the domain of the
prior and posterior of 0 , G = {0 Rnn : 0 is non-singular}.
Theorem 5. Suppose that > n; Se2 and S 2 are positive definite symmetric; and 0 is non-singular.
The function p : G R+ ,
p (0 ) = Cn1 , Se2 , S 2 , 0
49
0 n
0
i
1 h e2 0 i
1 h 2
2
0 0
exp tr S 0 0
exp tr S 0 0 0 0
+
0
( n) P
(33)
0
1
0
1
0
2
2
e
+ S + S In ,
where P is the commutation matrix such that P Z = Z 0 for a conformable matrix Z, is negative
semidefinite at 0 = +
0.
Proof. The first differential of the log density is
o 1 h
i
n n 0 1
d log p (0 ) =
d 00 0 tr Se2 d 00 0
tr 0 0
2
2
0
i
1 h 2
tr S d 0 0 0 0
2
o 1 h
i 1 h
i
0
n n 0 1 0
20 d0 tr Se2 200 d0 tr S 2 2 0 0 d0
=
tr 0 0
2
2 i
2
n
o
h
i
h
0
1
2
2
0
= ( n)tr 0 d0 tr Se 0 d0 tr S 0 0 d0
= tr
nh
2
e2 0
( n)1
0 S 0 S 0 0
0 i
d0 .
( n)01 Se2 00 S 2 0 0 = 0
which is equivalent to
(34)
Se2 + S 2
00 0 S 2 0 0 0 ( n)In = 0
since 0 is non-singular. Since Se2 and S 2 are positive definite symmetric and 0 is non-singular,
there exists a non-singular n n matrix X for any non-singular 0 such that
0 = X 0 S 2
Se2 + S 2
1
Plugging this into the first order condition (34) and rearranging the terms, we obtain
(35)
X 0 X X = ( n) S 2 0 0
1
Se2 + S 2
0 S 2
1
Note that X is symmetric, since X 0 X and the right hand side are symmetric. Let M X 12 In .
Then, M is also symmetric and X 0 X X = M 2 41 In . Therefore,
1
1
1
M 2 = In + ( n) S 2 0 0
Se2 + S 2 0 S 2
.
4
Let Q denote the right hand side. Since Q is positive definite symmetric, there exists a unique S and
such that SS 0 = Q where S is unitary, S 0 S = In , and is a diagonal matrix with eigenvalues of
Q on the diagonal. Then M is uniquely determined up to the sign change as S1/2 S 0 . This implies
50
that there exist two Xs satisfying (35) and therefore two 0 s satisfying the first order condition
(34). We let M = S1/2 S 0 and X = 21 In M .
We have found two candidates for a local maximum
1
1
2
2
2
e
+
=
S
,
I
+
M
S
+
S
n
0
0
2
1
1
2
2
2
e
S
+
S
.
=
I
S
n
0
0
2
+
> log p
We prove that log p +
0
0 . The first order condition (34) holds at 0 and 0 and
thus for 0 = +
0 or 0 = 0 ,
i
1 h
+( n) log |0 | tr S 2 0 0 0 + ( n)In + S 2 0 0 0 ,
2
where || means the absolute value of the determinant of a matrix. Therefore,
i 1 h
h
i
+
+
2
0
)
=
(
n)
log
tr
S
.
log p(+
)
log
p(
log
+
0
0
0
0
0
0
0
2
Note that
M + 1 In log M 1 In
log +
log
=
log
0
0
2
2
n
X
log
i=1
1/2
1/2
1
2
,
1
2
where i is an eigenvalue of Q or the square of an eigenvalue of M . All i s are greater than 1/4.
1
Se2 + S 2
+
2
It follows that log +
0 log 0 > 0. Since 0 0 = 2M 0 S
tr
h
S 2 0 0
+
0 0
i
= 2tr
S 2 0 0 M 0 S 2
Se2 + S 2
1
> 0.
Consequently,
log p(+
0 ) log p(0 ) > 0.
Now we are going to show that log p (0 ) has a global maximum
and the first differential
+
d log p (0 ) = 0 at a global maximum. Since log p 0 > log p 0 , it will follow that log p (0 )
51
yi = tr U 0 00 0 U = tr 00 0
i=1
n
X
gi ,
i=1
which is followed by
i
tr Se2 00 0 = tr S U 0 00 0 U
n
X
si yi
i=1
min (si )
= min (si )
n
X
i=1
n
X
yi
gi .
i=1
Therefore,
log p (0 )
n
n
X
nX
1
log (gi ) min (si )
gi .
2 i=1
2
i=1
0
Let us define the right-hand side as h (0) with
the eigenvalues {g1 , , gn } of (0 0 ).
Now choose R such that < log p +
and define Gc = {0 G : h(0 ) } and G = the
0
+
two points +
and +
0 and 0 on G satisfying the first order condition, log p(0 ) > log p(0 ),
0 G,
+
+
we conclude that log p (0 ) attains a unique global maximum at 0 on G. log p 0 is a global
+
2
global maximum at +
0 on G and d log p 0 0.
The second differential of log p (0 ) is
2
d log p(0 ) = tr ( n)
2
1
d
0
0
2
2
0
e
+ S + S d0 d0
and the Hessian (33) of log p(0) isby Magnus and Neudecker (1999). The Hessian is negative
+
2
semidefinite at +
0 since d log p 0 0.
X
(a1 )k (ap )k z k
k=0
(b1 )k (bq )k k!
(a1 , . . . , ap ; b1 , . . . , bq ; Z)
X
X
(a1 ) (ap ) C (Z)
k=0
(b1 ) (bq )
k!
where denotes summation over all partitions of k such that a partition = (k1 , k2 , . . . , km ) with
integers k1 k2 km 0 and || = k1 + k2 + + km = k; C (Z) is the zonal polynomial;
and
m
Y
1
(a)
a (i 1)
2
ki
i=1
is the generalized hypergeometric coefficient where (a)k = a(a + 1) (a + k 1).
The sum of the series converges for all symmetric Z if p q. When m = 1, the function reduces
to the hypergeometric function of scalar argument. The hypergeometric function defined here is a
special case of the hypergeometric function with the Jack function and the generalized Pochhammer
symbol. See Demmel and Koev (2006). Koev and Edelman (2006) propose an algorithm and provide
a c code to compute the hypergeometric function of matrix argument which they claim is the most
efficient among algorithms developed until then.
We state (quote) a theorem regarding the expectation of the zonal polynomial with a Wishart
random variable. It is the Corollary 7.2.8 of Muirhead (1982).
Theorem 8. [Corollary 7.2.8. of Muirhead (1982)] If A is Wn (m, ) with m > n 1 and B is an
arbitrary symmetric n n (fixed) matrix, then
1
E [C (AB)] = 2k
m C (B)
2
tr
A
|A| 2 (dA)
0 1
2
A>0
is finite. Then, it is
1
;
b;
2B
2n/2 n (/2) ||/2 .
F
1 1
2
n
o
n1
2
n1
1
C (BA) exp tr 1 A |A| 2 (dA)
2
A>0
1
= 2k
C (B) 2n/2 n (/2) ||/2
2
where 2n/2 n (/2) ||/2 is the corresponding normalizing constant for the Wishart distribution.
The zonal polynomial C () depends only on the eigenvalues of its matrix argument. Since A is
non-singular, BA and AB have the same eigenvalues and we can apply Theorem 8 here. Therefore,
Z
n1
1 1
F
(b;
BA)
exp
tr
A
|A| 2 (dA)
0 1
2
A>0
Z
X
X
n1
C (BA)
1 1
=
exp tr A |A| 2 (dA)
(b) k!
2
A>0 k=0
=
=
X
X
k=0
X
X
k=0
X
X
1
(b) k!
n1
1
C (BA) exp tr 1 A |A| 2 (dA)
2
A>0
1
2k
(b) k!
1
2 C (2B)
(b) k!
k=0
1
; b; 2B 2n/2 n (/2) ||/2 .
2
In the fifth equation, we used the fact that the zonal polynomial with a partition of k is homogeneous of degree k: 2k C (B) = C (2B).
= 1 F1
The next theorem computes the integral of a kernel whose normalizing constant is of interest
later.
Theorem 10. Let
n
q () = 0
i
i
1 h
1 h
exp tr S 2 ( )0 ( ) ,
exp tr S 2 0
2
2
i1
n 1 2 0 2h 2
.
1 F1
; ; S S S + S2
2 2 2
where the integral is over the set of non-singular .
io
h
n h
0 n1
i
2 exp 1 tr S 2 + S 2 0
exp tr S 2 0 HT
2
1 h 2 0 i
0
0
n
d H dH 2 exp tr S
.
2
We integrate the last terms with respect to H Vn,n , the Stiefel manifold of n n matrices with
orthonormal columns. Then, from Theorem 7.4.1. of Muirhead (1982),
2
exp tr S 2 0 HT
io
H 0 dH
HVn,n
2n n /2
1 1 2 0 2
n; S S A ,
0 F1
n (n/2)
2 4
q () (d)
2
i
n /2
1 h
=
exp tr S 2 0
n (n/2)
2
Z
i
n1
1 1 2 0 2
1 h 2
2
2
F
n;
S
S
A
|A|
exp
tr
S
+
S
A (dA)
0 1
2 4
2
A>0
2
i
n /2
1 h
=
exp tr S 2 0
n (n/2)
2
/2
i1
n 1 2 0 2h 2
2
2
2n/2 n
F
;
;
S
S
S
+
S
,
S + S 2
1 1
2
2 2 2
which completes the proof.
55
We define a function
i
n (/2)
1 h
(36)
Cn , S , S , 2
exp tr S 2 0
n (n/2)
2
h
i1
n
1
2
2 2
2
0 2
2
2
S + S 1 F1
.
; ; S S S +S
2 2 2
Notice that when n = 1,
C1 , s , s , =
n
2
2
2
s + s2
n2
2
exp s2 2
2
2
56
1 F1
1 1 s2
s2 2 .
; ;
2 2 2 s2 + s2