You are on page 1of 56

EVALUATION OF DSGE MODELS

WITH AN APPLICATION TO THE REAL EXCHANGE RATE DYNAMICS IN


STICKY-PRICE MODELS
WOONG YONG PARK
PRINCETON UNIVERSITY

Preliminary
Abstract. This paper develops a method to identify structural shocks of a vector autoregression
(VAR) model based on a dynamic stochastic general equilibrium (DSGE) model and uses the VAR
model as a reference model to evaluate the DSGE model. In an application, I study the real
exchange rate dynamics in sticky-price models. I identify a VAR model based on a two-country
sticky-price model and compare the impulse response functions of the two models. As well known,
a sticky-price model is misspecified in its real exchange rate dynamics. I find evidence that the
source of misspecification likely exists in the nominal exchange rate dynamics of the model.

1. Introduction
The literature estimating a dynamic stochastic general equilibrium (DSGE) model by a fullinformation, likelihood-based approach has been growing rapidly for the last decade. It started
for the purpose of monetary policy analysis, but many interesting questions of macroeconomics
other than monetary policy analysis are now answered based on estimation of a DSGE model. The
structure of a DSGE model is based on economic theory and gives us a chance to provide theoretical
explanations to empirical finding as well as take the theory to the data. The structure, however,
renders it important to evaluate a DSGE model. This paper proposes a Bayesian econometric
framework to evaluate a DSGE model and investigate its possible misspecifications. I apply the
framework to a two country sticky price model and investigate the real exchange rate dynamics of
the model.
A DSGE model completely specifies the dynamic properties of the data and does not allow the
dynamics to deviate from what is implied by its structure. Though we fit a structural model to the
data, the model is not able to capture dynamics beyond what changes in parameter values allow.
Because of this, two problems arise. First, it is difficult to assess how well the model fits the data
if we do not compare it with alternative models. We may settle for the model even when there is
an alternative model with better data fit. Second, if we do not consider another model that has
different specifications or desirably nests the original model, it is hard to detect misspecifications
that the model might have. This may mislead researchers seriously.
Date: October 2010.
Fisher Hall, Princeton University, NJ 08544. Email: woongp@Princeton.EDU.
1

The two problems are well known. Studies involving estimation of a DSGE model often conduct
robustness exercise one way or another with alternative specifications for the model. See, for
example, Justiniano, Primiceri, and Tambalotti (2010). There are also specific comparative studies
of different models. See, for recent example, Rabanal and Rubio-Ramirez (2005) and Taylor and
Wieland (2009). However, constructing a sufficiently large set of alternative models for a DSGE
model is a big obstacle. Unlike a statistical model for which it is straightforward to come up with a
nesting model, a DSGE model does not have a straightforward DSGE model that nests the model.
Additional theoretical work is often required to develop a nesting model. If the set of competing
models is not large enough, the competing models may suffer a common misspecification and the
model comparison exercise would not be able to detect the misspecification.
Del Negro and Schorfheide (2004) (DS, henceforth) propose a Bayesian framework to use a vector
autoregression (VAR) model as a benchmark model. Since a VAR model does not have a a priori
structural restrictions, it can serve as a convenient benchmark model. It is a common practice to
compare the fit and forecasting performance of a DSGE model with that of a VAR model. See, for
example, Smets and Wouters (2003) and Smets and Wouters (2007). DS use a VAR model as a
model that fits the data, but derive its prior distribution from a DSGE model of interest. Through
the prior distribution, the cross-equation restrictions and identification1 of the DSGE model are
imposed on the VAR model. As the prior distribution is systematically relaxed, the VAR model
captures richer dynamics than the DSGE model and we can evaluate the DSGE model by studying
the overall data fit and model characteristics of interest. The idea of inducing a prior for a VAR
model from a structural model dates back to Ingram and Whiteman (1994), and it is formalized
in the modern Bayesian framework by DS. DS and subsequent work (Del Negro and Schorfheide
2006 and Del Negro, Schorfheide, Smets, and Wouters 2007) show that combining a VAR model
with information from a DSGE model makes the VAR model have a better fit and forecasting
performance compared to the DSGE model.
It is central to model comparison and evaluation to compare impulse response functions to a
structural shock. Therefore, identification is important if we use a VAR model as a benchmark
model. However, the identification method of structural shocks of DS depends on the ordering of
the variables of a VAR model and hence the result changes as the ordering of the variables changes.
I propose an alternative identification method which is invariant to the ordering of the variables.
Identification of structural shocks of a VAR model is derived based on a DSGE model and therefore
impulse response functions to an identified structural shock are directly comparable to that of the
DSGE model.
I apply the framework to evaluate a seven-variable two country sticky price model. The application not only illustrates how the framework can be used for model evaluation, but also turns out
1

The term identification is used for different meanings in the literature. In this paper, it means identifying reducedform shocks to obtain structural shocks in a VAR model. Another meaning is identifying parameters of a model. The
second identification is also an important issue in the DSGE model estimation. See, for example, Canova and Sala
(2009), Iskrev (2010) and Komunjer and Ng (2010). Identifying reduced-form shocks is equivalent to identifying the
contemporaneous coefficient matrix of a structural VAR (SVAR) model, and the second meaning often includes the
first meaning.
2

to be interesting in its own right. Chari, Kehoe, and Mcgrattan (2002) document that in a sticky
price model with staggered price contracts monetary shocks can generate the high volatility of the
real exchange rate but cannot match the high persistence found in the data. Efforts to resolve the
persistence issue have tried various features that control price dynamics or explored alternative
monetary policy rules. See, for example, Bergin and Feenstra (2001), Benigno (2004), Bouakez
(2005), Steinsson (2008), and Carvalho and Nechio (2010). I find evidence that the problem is
likely to exist in the nominal exchange rate dynamics, or the uncovered interest rate parity implied
in the model. Recently, Steinsson (2008) claims that a sticky price model can match the persistence
of the real exchange rate by generating hump-shaped dynamics in response to a shock such as a
technology shock that enters the Phillips curve equation of the model. I find that the real exchange
rate responds in a delayed, hump-shaped fashion in response to the shock, but the mechanism by
which the model generates such hump-shaped dynamics is not supported by the data. Also, the real
exchange rate is found to respond in the same delayed, hump-shaped fashion to a monetary policy
shock as opposed to the prediction of the sticky price model. This is consistent with the findings
of the identified VAR literature. See, for example, Eichenbaum and Evans (1995) and Scholl and
Uhlig (2008). The empirical study of Steinsson (2008) uses a univariate model for the real exchange
rate and does not identify any structural shocks. Therefore, it does not directly support the claim.
In the application, I identify a technology shock and a monetary policy shock based on the DSGE
model and estimate the impulse response to the shocks.
The paper is organized as follows. Section 2 introduces the structure of a generic problem and
describes the empirical methodology in detail. The two country sticky price model is descried in
Section 3 and the empirical results are reported in Section 4. I investigate the real exchange rate
dynamics in detail in Section 5. All the proofs of propositions in Section 2 are in the Appendix.

2. Methodology
Consider the following state space representation of a DSGE model
(1)

[Measurement equation]

(2)

[Transition equation]

yt = + t + H () st ,
st = G () st1 + M () et ,

for t = 1, . . . , T where yt is an n 1 vector of observable variables (data). st is an m 1 vector of


state variables and et is an l 1 vector of structural shocks which follows i.i.d. N (0, Il ). Rh
is a vector of structural parameters of the model, which determines the coefficient matrices H ()
and G () and the impact matrix M (). I refer to simply as the structural parameter in the
following. and are a vector of means and deterministic growth rates, respectively, for yt .
( , ) . A model might include error terms in the measurement equation, but I can always
rewrite such a model in the form (1)-(2) by augmenting st with the error terms.
In order to evaluate the DSGE model, I estimate a SVAR model with its prior distribution
derived based on the DSGE model. For the observable variables yt of the DSGE model, consider
3

the following SVAR model with the lag order p


(3)

(L) (yt t) = t ,

where (L) is a p-th order matrix polynomial in the lag operator L, (L) = 0 + 1 L + + p Lp ;
t is an n 1 vector of the structural shocks of the SVAR model which follows i.i.d. N (0, In ); is
an n 1 vector of constant terms; and is an n 1 vector of coefficients on a deterministic time
trend. The deterministic time trend is included since the DSGE model may imply that some of the
observable variables have a deterministic time trend. When there is no deterministic time trend
in the DSGE model, we can drop t. I assume that the initial observations Y0 = {y0 , . . . , yp+1 }
are given, but omit Y0 in conditional densities henceforth. In order for yt to have a non-degenerate
distribution, the contemporaneous coefficient matrix 0 should be non-singular. The SVAR model
(3) can be transformed into a reduced form by multiplying through on the left by 1
0
(4)

B(L) (yt t) = ut ,

or
(5)

B(L)yt = + t + ut ,

1
p
where B(L) = In + 1
0 (1 L + + p L ). ut = 0 t is an n 1 vector of reduced-form disturbance terms and ut i.i.d. N (0, u ) where u = (00 0 )1 . Let Bs = 1
0 s for s = 1, , p.
Pp

Then = B (1) and = B (1) + ( s=1 sBs ) . It is convenient for derivations later to write
(5) compactly in matrix form

(6)

Y = XB + u,


0
0 , , y0
where Y = (y1 , , yT )0 ; X = (x1 , , xT )0 with x0t = 1, t, yt1
tp ; and u = (u1 , , uT ) .

The lagged coefficient matrix B is given as B =(, , B1 , , Bp )0 .


A few assumptions on the DSGE model are in order. The assumptions are necessary since I
derive the prior distribution of the SVAR model based on the DSGE model.
Assumption. The following hold for the DSGE model (1):
(1) The number of structural shocks is the same as the number of the observable variables,
(2) the initial impact of the structural shocks on the observables conditional on , H () M (),
is non-singular,
(3) the model is invertible, that is the eigenvalues of G () M () [H () M ()]1 [H () G ()]
are strictly less than one in modulus, and
(4) the demeaned and detrended observable variable (yt t) is stationary.
Assumption (1) and Assumption (2) are required since I estimate the SVAR model as a reference model. Assumption (3) is equivalent to Condition 1 of Fernndez-Villaverde, Rubio-Ramrez,
Sargent, and Watson (2007). When Assumption (3) is satisfied, the DSGE model implies that yt
follows a VAR process of possibly infinite order. Therefore, it is possible approximate the dynamics
of the DSGE model (1) with a VAR model of finite order. With a sufficiently large lag order, the
4

VAR model approximately nests the DSGE model. We can also recover et by observing yt if we
can identify et . Even though Assumption (3) does not hold, we may be able to recover part of
et that we are interested in with a good accuracy as discussed in Sims and Zha (2006). In this
paper, I only consider the case where Assumption (3) holds and thus a DSGE model has a VAR
representation. Assumption (4) is necessary since the prior distribution for B in (6) is derived using
the unconditional population moments of yt in the DSGE model. It is stricter than what is actually
necessary to derive the prior distribution. What I need is the finite-sample population moments of
yt conditional on the initial observations, and thus it is fine to have non-stationary yt . However,
when Assumption (4) is satisfied, computation to derive the prior distribution of the VAR model
becomes much easier. Also, if yt is non-stationary, Assumption (3) is violated in general. Note that
Assumption (4) does not necessarily apply to the VAR model. The same assumptions are required
for the DS framework.
I use Bayesian inference to estimate the VAR model (6). The set of the parameters of the model
consists of the structural parameter , the contemporaneous coefficient matrix 0 , and the reducedform coefficient matrix B. is included as a set of hyperparameters. It is a set of parameters for
the prior distribution and conditional on B and 0 , the density of the data Y is not dependent on
. The prior density and the posterior density of (6) are factorized hierarchically
p (B, 0 , ) = p (B|0 , ) p (0 |) p () ,
and
p (B, 0 , |Y ) = p (B|0 , , Y ) p (0 , |Y ) ,
respectively. In the following sections, I describe the prior distribution and derive the posterior
distribution from the prior distribution according to the hierarchical factorization. Then I discuss
how to simulate from the posterior distribution and do inference.
2.1. Prior distribution.
2.1.1. Prior distribution of . The prior distribution of is not restricted to a specific form. The
prior distribution of 0 and B is conditional on and not dependent on a specific form of the prior
distribution of . We can set a prior distribution of so that it is consistent with economic theories
and related empirical studies as in the literature estimating DSGE models.
2.1.2. Prior distribution of 0 . The prior distribution of 0 is defined conditional on . The identification of the SVAR model is characterized by the contemporaneous coefficient matrix, 0 . The
contemporaneous impact of a shock to t on yt is computed as

yt
= 1
0 .
0t SVAR
I assume that the prior distribution of 0 is centered around the identifying restrictions of the
DSGE model. The initial impact of a shock to et on yt conditional on in the DSGE model is

yt
= H () M () ,
e0t DSGE
5

0 ()
which will be denoted by

1

0 (), determines the contemporaneous relation. Its inverse,


0 () = [H () M ()]1
ship of yt . Therefore, I assume that conditional on , 0 is centered around
(7)

0 () , S 2 In ,
0 | N
0

where the precision matrix S02 is an n n positive definite symmetric matrix.2 Note that with
this prior 0 is non-singular with probability 1. The Kronecker product structure of the covariance
matrix in (7) is assumed for the tractability of the resulting posterior distribution
of 0 . It is


2
somewhat restrictive, however. The covariance matrix S0 In implies that Var 0 = S02 In


or Var 00 = In S02 , where is a vectorization operator. That is, the rows of 0 are independent
of each other, all the rows have the common covariance matrix S02 , and all the elements in each
column have the same variance. In the following application, I simply let S02 = In , which
means that all the elements of 0 have the same degree of dispersion around their mean. The
hyperparameter is to control the weight of the DSGE model identifying restrictions. As
becomes larger, the DSGE model identification is more tightly imposed.
The identification problem arises in a reduced-form VAR model since there are uncountably many
possible decompositions 0 or 1
0 from a reduced-form covariance matrix u of the disturbance
1 =
term. Even though we have a decomposition 0 of u , any transformation 0 satisfying (0
0 0 )
u preserves the likelihood of the reduced-form VAR model. The possibility of such an invariant,
likelihood-preserving transformation of 0 is eliminated by concentrating the prior distribution of
0 () and penalizing deviations from it. In principle, any distribution that
0 around the mean
0 () will work. I use a normal distribution for the tractability of the prior
can be centered around
and posterior distribution of 0 . The fact that we cannot pin down 0 from u means that the
data have information on 0 only through u . Thus a dummy observations prior that uses artificial
observations simulated from a DSGE model cannot be used for the identification.
0 (). Such zero restrictions
A DSGE model might impose zero restrictions on some elements of
cannot be handled with this type of prior since 0 should have a non-degenerate distribution with
no restrictions. But this limit should not be a big concern. First, for a large-scale or mediumscale model, it is difficult to know a priori which element is restricted to zero. Also, to solve the
model, we rely on a numerical solution method which inevitably incurs numerical errors in the
solution. It is hard in practice to figure out all the zero restrictions in the solution of numerical
solution methods. Second, though the elements of 0 restricted to zero are assumed having a nondegenerate distribution as opposed to the DSGE model, their dispersion will be smaller in general
than the dispersion of other non-zero elements. This is because the mean of the elements restricted
to zero will be always zero while the mean of the non-zero elements will move around due to the
variation of .

1
0 ()
DS also use a DSGE model for identification. DS decompose the impact matrix
of


0 ()
the DSGE model using the QR decomposition as

1

(), where L
()
() is lower
= L

2Here and in what follows, the hyperparameter S 2 is suppressed for notational simplicity.
0
6

() =
()
()0 = In .
() is called a rotation matrix.
() is unitary:
()0
triangular and

1
0 ()
()
Since we assume
is non-singular, the decomposition is unique if we require that L
have positive diagonal elements. The identification method by DS is to use the rotation matrix
() in order to identify 0 from u , the covariance matrix of the reduced-form shocks. Note that

we can find lower triangular L such that u = LL0 using the Cholesky decomposition. Then we
1

can identify the impact matrix 1


0 from u by multiplying () to L: 0 = L (). This
() is unitary. This method of
operation preserves the covariance matrix and the likelihood since
identification is intuitively appealing since we decompose u into a Cholesky factor and rotate it as
the DSGE model rotates its lower triangular matrix. However, the method depends on the ordering
of the variables in a VAR model, which is problematic since the impulse response of a variable to a
shock may change depending on where the variable is placed in the vector of observable variables.
As Sims (2007) pointed out, the method treats some elements of the impact matrix deterministic
while treating others stochastic. For example, when the impact matrix is diagonal, some of the offdiagonal elements are treated deterministic while other off-diagonal elements are treated stochastic.
In the appendix, I show that the density of 1
0 varies as the ordering of the variables changes.
2.1.3. Prior distribution of B. For the prior distribution of the reduced-form coefficient matrix B,
I follow DS. The prior distribution of B is motivated as a dummy observations prior. The idea is
that simulated data from the DSGE model given , or dummy observations, contain information
on the dynamics of the DSGE model and we can incorporate the information by augmenting the
actual data with the simulated data and fitting the VAR model to the augmented data. The more
simulated data are added, the stronger the dynamics of the DSGE model will be imposed on the
VAR model.
I first consider a case where there is no deterministic time trend ( = 0) and yt is stationary
in the DSGE model. Suppose that we simulate artificial observations of length T, denoted by
0
accordingly as in
YT = y1 , , yT , from the DSGE model given .3 Let us construct x
t and X
T
(6). Then, the likelihood of the simulated data is


p YT |B, 0

= (2)

n2T

T2

|u |


0 

1

YT XB
YT XB
exp tr 1
u
2



 


1
0 X

exp tr 1
B

B
X
B

u
T
T
T T
2


where u = (00 0 )1 and


data YT , this likelihood can

=
B
T

0 X

X
T T

1 





0 Y . Conditional on 0 and the simulated


X
T T

be interpreted as a posterior density of B with a flat prior when the


and variance-covariance
simulated data are observed. It is a normal distribution with mean B
T

1
0 X

:
matrix u X
T T


, 0 1 X
0 X

B|Y , 0 N B
0 0
T
T T
3A variable with tilde () stands for simulated observations.
7

1 

DS propose to use this posterior distribution of B with the simulated data as a prior distribution
of B.4 However, it is not practicalto simulate
every
time


 because of noise from simulation. DS
0
0
X

Y with their mean of repeated simulation


substitute the simulated moments X
and X
T T
T T
n

samples. Suppose that we repeat simulation of the same length and generate YTm
n

M
m
is an index for a simulation. Let us construct {
xm
t }m=1 and XT
law of large numbers, as M ,

(8)

oM
m=1

oM
m=1

where m

accordingly. Then, by the

M 
M
T

X

1 X
1 X
0
0

p T E xt x0t ,
XT XT =
(
xm
xm
t ) (
t )
M m=1
M m=1 t=1

and
(9)

M 
M
T

X
X
X

1

0 Y = 1

X
(
xm
ytm )0 p T E xt yt0 ,
t ) (
T T

m=1

m=1

t=1

where the expectations are conditional on the DSGE model and its parameter . Since the DSGE
model is stationary, the limiting moments in (8) and (9) exist and can be easily computed given .
I substitute the limiting moments for the simulated moments. Conditional on 0 and , its prior
distribution is


e () , 0 0
B | 0 , N B
0

(10)

1

TE xt x0t

1

e () = E (xt x0 )1 E (xt y 0 ). The prior distribution implies that the cross-equation restricwhere B
t
t
tions of the VAR model are centered around those of the DSGE model. The length of the simulated
data T plays the same role as in DS: It controls how tightly the DSGE cross-equation restrictions
are imposed. For ease of interpretation, I use the ratio of the lengths of the simulated data and the
actual data, = T/T , instead of T. The prior distribution (10) is a natural conjugate prior: The
posterior distribution of B is also a normal distribution. The prior distribution of B is not limited
to a natural conjugate prior. For example, it is possible to use a generalized dummy observations
prior by Sims (2005).
When the DSGE model has a deterministic time trend, I consider the autoregressive coefficient
matrix and the coefficients on the constant and time trend term separately.
Consider
0
 the demeaned
0

0 , , y 0
and detrended form (4). Let yt = yt t and xt = 1, t, x2,t with x2,t = yt1
tp .

Then, since yt is stationary, the mean of the autoregressive coefficient matrix B2 = (B1 , , Bp )0

1


() = E x
x
0
E x
yt0 . A DSGE model is usually solved by
can be obtained as B
2

2,t 2,t

2,t

normalizing the variables with


when there is one. It is straightforward

time trend

 a deterministic

0
to compute the moments E x
2,t x
2,t and E x
2,t yt from the DSGE model. I let the coefficients

() = ( , ) of the DSGE
on the constant and time trend term B = ( , )0 be equal to B
1

4Their original framework derives a prior distribution for as well.


u
8

model given . Then, the prior distribution of B = (B10 , B20 )0 is




B |0 , N

(11)

0

e () = B
0 () , B
0 ()
where B
1
2

1 
1 

e
B () , 0 0
T QT
,

and,
1
2T
1 2
3T

1
1

QT =
2T
0

0

 .

0
E x
2,t x
2,t

The distribution of and in (5) is computed from (11). Note that


B = P B,
where

P =

1 0 (0 + 0 )
0 1
( 0 )

0 0
.. ..
. .
I
0 0

It follows that

(0 + p 0 )

( 0 )

B = (I P ) B ,

e (),
() = P B
and thus for B

i1 

1 h

0
0 1 1
e
QT P
,
B |0 , N B (), 0 0
T P

or


(12)

() , 0 0
B|0 , N B
0

1

P0

1

QT P 1

i1 

The derivation is explained in detail in the Appendix.


In summary, the prior distribution of the parameters , 0 , and B is set up hierarchically. First,
the prior distribution of does not have to follow a specific form. Second, the prior distribution of
0 is derived based on the DSGE model, (7). Lastly, the prior distribution of B is derived through
the use of a dummy observations prior:


(13)

e () , 0 0
B | 0 , N B
0


1

T xx , T

i1 

e () is given in (10) and xx , T = E (xt x0 ) when the DSGE model does not have a
where B
t


e
deterministic time trend and B () is given in (11) and xx , T = (P 0 )1 Q P 1 when the DSGE
T

model has a deterministic time trend.


2.2. Posterior distribution. With the prior distribution (7) of 0 conditional on and the prior
distribution (13) of B conditional on and 0 , the joint posterior distribution of 0 and B conditional on can be found hierarchically.
9

Proposition 1. For the model (3), assume that 0 and B have the prior distributions (7) and
(13), respectively, conditional on . Then the conditional posterior density for 0 is
p (0 |Y, ) =

(14)



 2

0 T

1
be2 2
be
0


2
T + n, S , S0 , 0 () 0 0 exp tr S 0 0
2

 
0 

1

Cn1

0 ()
exp tr S02 0
2

0 ()
0

and the conditional posterior distribution for B is




e () , 0 0
B|Y, 0 , N B
0

(15)

1

T xx , T + X 0 X

i1 

where Cn () is defined in (36) in the Appendix and


0


b
be
b
e () ,
e 0 () [T xx (, T )] B
e () B
() T xx (, T ) + X 0 X B
Se = Y 0 Y + B
e () = T xx (, T ) + X 0 X
B
b

1 h


e () + X 0 X B
b ,
T xx (, T ) B

and,
b = X 0X
B

1

X 0 Y.

The posterior density (14) shows how the identification derived from a DSGE model works. It
is non-standard in that its kernel is a multiplicative mixture of a Wishart density kernel and a
normal density kernel. This is because the prior on 0 directly assumes a probability distribution
over 0 while the likelihood depends only on its cross product, 00 0 . The part that is a kernel of
the Wishart distribution

 2

0 T

1
be
0


2
(16)
0 0 exp tr S 0 0
2
comes from the likelihood and the part that is a kernel of a normal distribution

0 

1
0 ()
0 ()
0
exp tr S02 0
2


(17)



emerges from the prior distribution. The fact that the likelihood depends only on 00 0 is the fundamental reason of the identification problem of a reduced-form VAR model. Here, (17) distinguishes
different 0 s which have the same crossproduct and thus preserve the likelihood. For example,
suppose that 0 is a scalar. Then, the kernel (16) is symmetric around 0 = 0 and has two modes,
one on R+ and the other on R . If 0 () > 0, then (17) will put large probability mass on R+ .
Combining (17) with (16), the mode on R+ has a higher density and the mode on R is assigned
a lower density.
The posterior density has a unique global maximum near the prior mean (see Proposition 3). The
density is non-standard, but we can compute its normalizing constant Cn () using a technique for
a non-central Wishart distribution. The detail is described in Appendix. The following proposition
computes the marginal posterior distribution of after integrating out 0 and B using an identity
implied by the Bayes rule.
10

Proposition 2. For the model (3), assume that 0 and B have the prior distributions (7) and
(10), respectively, conditional on . Assume that has a prior density p (). Then, the data density
conditional on is
(18)

(2)

p (Y | ) =

nT
2

|T xx (, T )| 2 (2)

|T xx (, T ) +

X 0 X| 2

Cn1

n2
2

2 n
S 2

0
,
be2 2
T + n; S ; S0 ; 0 ()

where Cn () is defined in (36) in the Appendix. Its marginal posterior density is proportional to
q ( | Y ) = p (Y | ) p () .

(19)

As in DS, for posterior simulation, I can draw first and then conditional on the draw of ,
sample 0 and B. However, in the following application, it turns out that the computation of (18)
is prohibitively slow because of the function Cn () and this hierarchical simulation is not practical.
Alternatively, it is possible to draw and 0 jointly and then sample B directly from its posterior
distribution conditional on 0 and . By drawing and 0 together, I dont have to integrate
out 0 and there is no need to evaluate the function Cn (). The following proposition presents a
kernel of the joint posterior distribution of and 0 and its unique global maximum. The posterior
distribution of B conditional on and 0 is the same as in (15).
Proposition 3. For the model (3), assume that 0 and B have the prior distributions (7) and
(10), respectively, conditional on . Assume that has a prior density p (). Then, the following
hold:
(1) The data density conditional on 0 and is
(2)
(20)

nT
2

p (Y | 0 , ) =

 2
b

|T xx (, T )| 2 |00 0 | 2 exp 21 tr Se (00 0 )



|T xx (, T ) + X 0 X| 2

and the joint posterior density of 0 and is proportional to


q (0 , | Y ) p (Y | 0 , ) p (0 | ) p ()

(21)
where

p (0 | ) = (2)

n2


 
n
0 

1
2 2
2

0 0 ()
.
S0 exp tr S0 0 0 ()

(2) For a fixed , q (0 , | Y ) is maximized at


+
0 () =


1
 2
1
be
2
0 () S 2 S
In + M
+
S
,
0
0
2

where M is a unique symmetric square root of


h
i0
1
0 ()
In + T S02
4


1  2

1
be
2
0 () S 2
S +S

.
0

(3) For a fixed , the Hessian of log q (0 , | Y ) is




(22)

T P



1
0

0

1
0
11



 2
b

+ Se + S02 In ,

where P is a commutation matrix such that P Z = Z 0 for a conformable matrix Z. The Hessian
is negative semi-definite at 0 = +
0 ().
Since each element of 0 is a parameter and 0 is a large dimensional matrix, the framework
has to deal with many parameters in addition to the DSGE model parameter. A simple posterior
simulator which does not consider the structure of (21) is likely to be slow in convergence. The
global maximum and the Hessian at the maximum found in Proposition 3 turn out to be useful in
posterior simulation. Concentration of the kernel (21) with respect to 0 makes it easy to maximize
the joint posterior density of 0 and and draw 0 from its posterior density conditional on .
The fact that the posterior density of 0 conditional on and the joint posterior density of
0 and with fixed have the unique global maximum shows that the identification issue of a
reduced-form VAR model is indeed addressed. I assume a sufficiently large precision matrix to
make the posterior density behave well. Then the normalization problem or the weak identification
problem discussed in Hamilton, Waggoner, and Zha (2007) is also mitigated.
2.3. Posterior simulation. This section describes how to simulate from the posterior distribution
and do Bayesian inference. I propose two posterior simulators: The first simulator samples and
0 in an alternating manner with a Gibbs sampler and then sample B conditional on them. The
second simulator draws first and then samples 0 and B conditional on .
2.3.1. Posterior simulator I. Using the kernel (21), I construct a Gibbs sampler. First, I use a
random-walk Metropolis algorithm to draw conditional on the previous sample of 0 and . Then
I apply a Metropolis-Hastings algorithm to draw 0 conditional on the previous sample of 0 and
. This sampler is often called a Metropolis within Gibbs in the literature. Let us denote the Gibbs
sampler by G. The Metropolis step for and the Metropolis-Hastings step for 0 are denoted by M1
and M2 , respectively. B can be directly sampled from its posterior distribution (15) conditional
on 0 and .
The random-walk Metropolis algorithm to draw is widely used in the literature. The proposal
density of M1 is a normal distribution whose mean is the previous successful draw and the covariance matrix is a scaled inverse matrix of the minus Hessian at a posterior mode. The maximization
to find a mode is actually done only for the concentrated kernel with respect to 0 since for given
the kernel (21) is maximized at +
0 ().
For M2 , I propose to use a Metropolis-Hastings algorithm with a transition mixture. This
simulator is a mixture of an independence Metropolis algorithm5, denoted by M21 , and a randomwalk Metropolis algorithm, denoted by M22 . The transition density mixture chooses randomly
between the two algorithms with a specified probability. For the proposal distribution of M21 , I
use a normal distribution around +
0 (). (21) is locally well approximated by a normal distribution
+
around 0 () when the precision matrix S02 in (7) is sufficiently large. Let V + () denote the
inverse matrix of the minus Hessian of log p (0 | Y, ) at +
0 (), which is an approximate variance
based on a second-order Taylor expansion of the log density. Then, the proposal distribution for a
5See Tierney (1994) and Geweke (2005).
12

2
+
candidate draw is N +
0 () , c2 V () , where c2 is a scale factor. For the transition distribution
of the random-walk Metropolis part, M22 , I use a normal distribution whose mean is the previous
successful draw and variance is a scaled V + ().
In summary, the transition density for a candidate 0 of the simulator M2 is

q (0 |, 0 , M2 ) = 1 q (0 |, M21 ) + 2 q (0 |, 0 M22 )


2
+

where q (0 |, M21 ) is the density of N +


0 () , c2 V () and q (0 |, 0 , M22 ) is the density of

N 0 , c23 V + () with standard deviation scale factors c2 and c3 . The simulator M2 first chooses
between M21 and M22 and a selected simulator generates a draw. The acceptance probability is
given as in a single transition distribution case

(23)

(0 |, 0 , M2k ) = min

p (, 0 | Y ) /q (0 |, 0 , M2k )
,1 ,
p (, 0 | Y ) /q (0 |, 0 , M2k )


for k = 1 and 2.
Why do I mix two algorithms? Since 0 has many parameters (n2 ), a random-walk Metropolis
algorithm is in general slow to explore the posterior density of 0 . By mixing with an independence
Metropolis-Hastings algorithm, I can attain a high acceptance ratio and a faster convergence. On
the other hand, when I tried only an independence Metropolis-Hastings algorithm, occasionally no
draws of 0 were accepted for a long period. This is because the transition distribution q (0 |, M21 )
has a thinner tail than its target distribution p (0 | , Y ). When M21 happens to get stuck in
a region far from +
0 (), the mean of the transition distribution, the random-walk Metropolis
algorithm makes the simulator crawl around and helps the simulator to get out of the region. It
improves the convergence. The independence Metropolis algorithm works well in general, so I put
a high probability on M21 : 1 = 0.8 and 2 = 0.2. Table 1 summarizes the Gibbs sampler G.
2.3.2. Posterior simulator II. With the second simulator, I first draw from its marginal posterior
distribution and then sample 0 and B conditional on the draw of .
Let us denote the simulator by H. Sampling from its marginal posterior density which is
proportional to (19) can be done using a random-walk Metropolis algorithm in the same way as in
the Gibbs sampler G. The covariance matrix of the transition distribution is a scaled inverse matrix
of the minus Hessian at a posterior mode. Conditional on the draw of , I can sample 0 from its
conditional posterior density (14) using an algorithm similar to M2 of G. Sampling B conditional
on and 0 from its posterior distribution is straightforward.
However, for the application of the paper, it takes too much time to evaluate the function Cn ()
in (19). In the function Cn (), which emerges from the normalizing constant of the conditional
posterior density (14) of 0 , the time-consuming part is the generalized hypergeometric function
of matrix argument, 1 F1 (). It is an infinite sum of a series of functions of the eigenvalues of a
matrix argument. Here I end up having such a function because the posterior kernel of 0 (14)
is a multiplicative mixture of a normal density kernel and a Wishart density kernel. In other
applications, the hypergeometric function may converge quickly. It is worth a try. Computational
detail is described in the Appendix.
13

2.3.3. Estimation of marginal likelihoods. [why marginal likelihoods?] I construct a set of competing
models that are a continuum of models with different s and s. They are indexed by (0, ),
which controls the weight of the DSGE model prior on B, and (0, ), which controls the
weight of the DSGE identification prior on 0 . Following DS, I assume a flat prior on (, ) and
choose (, ) having the highest marginal likelihood. The marginal likelihood with respect to (, )
can be estimated using the modified harmonic mean (MHM) method with the standard weighting
function by Geweke (1999) or the new elliptical weighting function proposed by Sims, Waggoner,
and Zha (2008). For the Gibbs sampler G, I use the kernel (21) of the posterior density of 0 and
and, for the simulator H, I use the kernel (19) of the posterior density of . In practice, it is
impossible to do posterior simulation and estimate the marginal likelihoods for all the (, ) on
R++ . DS propose to discretize the positive real line and simulate at each point on the grid. I follow
their strategy in the following application.

3. Two-country sticky-price DSGE model


In this section, I describe a two-country sticky-price model. It is a variant of New Open Economy
Macroeconomic models and in particular based on work by Benigno (2004), Lubik and Schorfheide
(2006) and Steinsson (2008). For comparison, I estimate the DSGE model as well as a VAR model
with its prior distribution derived from the DSGE model as explained the previous section. I denote
the VAR model by the DSGE-VAR model henceforth.
3.1. Model. There are two countries, Home and Foreign. A domestic economy is analogous to a
standard closed-economy New Keynesian model. In each country, there is a continuum of households, a continuum of firms, and the government. Asset markets are complete and assets are freely
traded. Goods markets are segmented and thus firms can price-discriminate between markets. All
the goods in the model are tradable.6 The technology in each country has a deterministic time
trend component and a transitory stochastic component. The deterministic trend component is
common across countries, but the transitory technology shock is country-specific.
I present the optimization problems of economic agents in turn. I denote the Foreign counterpart
of the corresponding Home variable by superscript. Equilibrium conditions of the model and
technical details on the solution and estimation are contained in the Appendix.
6Real exchange rates can be decomposed as the relative price of tradable goods between countries and the ratio

of the price of non-tradable goods relative to the price of tradable goods between the countries. Engel (1999) and
Chari, Kehoe, and Mcgrattan (2002) show that the second term contributes a negligible part of fluctuations of real
exchange rates. The first component, the relative price of tradable goods between countries, account for most of the
real exchange rate fluctuations. Therefore, I ignore non-tradable goods. Burstein, Eichenbaum, and Rebelo (2005)
document that after large devaluations the relative price of purely traded goods with local non-tradable components
removed adjusts quickly and the slow adjustment of relative prices of non-tradable goods and services is the main
source of a large drop in real exchange rates. However, they also find that when considering small fluctuations of real
exchange rates in normal times, the relative price of purely traded goods accounts for the majority of the fluctuations
of real exchange rates, though not as much as the conventional measure of the relative price of tradable goods which
includes some local non-tradable components.
14

3.1.1. Households. There is a continuum of households in each country: the households in Home
are indexed by i NH = [0, 1] and the households in Foreign are indexed by j NF = (1, 2]. The
preferences of households are assumed identical across countries. Household i in Home seeks to
maximize the expected sum of discounted utilities
Et

(
X

"

s=0

(Ct+s /Ht+s )11/


Lt+s (i)1+

1 1/
1+

#)

where Ct is the consumption index and Lt (i) is labor supply. is the elasticity of intertemporal
substitution and is the inverse of the Frisch elasticity of labor supply. Consumption Ct is not
indexed by i because consumption is identical for all the households in equilibrium under the
assumption of complete asset markets. The labor market is heterogeneous and household i supplies
its specialized labor to domestic firm i.
Ht is an external habit, which is taken exogenously by the households. Here, I assume that
Ht = Aw,t , where Aw,t = exp (t) is a deterministic time trend component of technology in the
production function of firms. It is common in Home and Foreign. The formulation ensures the
stationarity of labor input along the balanced growth path while keeping the additive separability
of consumption and labor.7
The consumption index Ct is defined as
Ct =

1 1
H CH,t

1 1
F CF,t

1
11/

where CH,t is the aggregate index of consumption goods produced in Home and CF,t is the aggregate
index of consumption goods imported from Foreign. H is the share of local goods and F = 1H
is the share of imported goods. is the elasticity of intratemporal substitution between Homeproduced goods and Foreign-produced goods. H > 1/2 means that the households in Home have a
home bias in consumption. CH,t and CF,t are a constant elasticity of substitution (CES) aggregator
Z

CH,t =

ct (i)

1 1

di

1
11/

Z

and

NH

CF,t =

ct (j)

1 1

dj

1
11/

NF

where , which is greater than one, is the intratemporal elasticity of substitution between goods
produced in the same country.. Let us denote the Home currency price of goods produced in Home
and in Foreign by pH,t (i) and pF,t (j), respectively. Taking the prices of individual goods given,
the households allocate optimally a given level of total expenditure among the differentiated goods
7When a model has technology growth, a common approach in the literature such as Del Negro, Schorfheide, Smets,

and Wouters (2007) and Justiniano and Primiceri (2008) is to use the log utility ( = 1) and keep the additive
separability. However, I instead choose to deflate the consumption index by the trend and allow the intertemporal
elasticity of substitution (IES) to vary since I want to estimate the value of the intertemporal elasticity of substitution.
Though in a slightly different setting, Chari, Kehoe, and Mcgrattan (2002) find that the IES is required to be as
small as 1/5 to match the volatility of the real exchange rate. This is because the real exchange rate is determined
by the relative consumption between countries multiplied by the inverse of the IES and consumption is much less
volatile than the real exchange rate. Chari, Kehoe, and Mcgrattan (2002) instead let 1/ = to have a balanced
growth path. For a technical detail about the balanced growth and the functional form of a utility function, see King,
Plosser, and Rebelo (1988).
15

each period. This optimal allocation yields the demand for good i NH produced in Home and
the demand for good j NF produced in Foreign as
cH,t (i) =

pH,t (i)
PH,t

CH,t

and

pF,t (j)
PF,t

cF,t (j) =

CF,t ,

respectively. The price indices PH,t and PF,t associated with CH,t and CF,t , respectively, are determined as
Z

PH,t =

pH,t (i)

di

1
1

Z

and

PF,t =

pF,t (j)

dj

1
1

NF

NH

The composite goods produced in Home and in Foreign are determined as




CH,t = H

PH,t
Pt

Ct

and

CF,t = F

PF,t
Pt

Ct ,

respectively, where the consumption-based price index Pt associated with the consumption index
Ct is given by
h

1
1
Pt = H PH,t
+ F PF,t

1
1

Asset markets are complete and assets are freely traded. Household i faces the following flow
budget constraint
Pt Ct + Et [Mt,t+1 Bt+1 (i)] Bt (i) + Wt (i) Lt (i) + t + Tt .
Bt+1 (i) is the holdings of state contingent claims, Bt (i) is the net cash flow from the households
portfolio of state contingent claims, Wt (i) is the nominal wage, Tt is the government transfer net
of taxes, and t is the per-capita profit accruing to households from their ownership of local firms.
Mt,t+1 is the nominal stochastic discount factor.
The problem of the households in Foreign is analogous.
3.1.2. Firms. I assume that there is a continuum of firms whose size is the same as the size of
households in each country. Firm i NH in Home produces its good using a linear production
technology
(24)

YH,t (i) + YH,t


(i) = Aw,t At Lt (i)

(i) is exported to Foreign. The firms export their goods


where YH,t (i) is sold in Home and YH,t
to the foreign market as well as sell in the domestic market. The firm i hires labor Lt (i) from a
household of the same type. Technology consists of the deterministic time trend component Aw,t
and a country-specific shock At . I assume that at = log At follows a first-order autoregressive
process at = a at1 + a a,t , where 0 < a < 1, and a,t is i.i.d. N (0, 1).
International arbitrage of goods is difficult and consequently goods markets are segmented. The
assumption allows individual firms to price-discriminate across markets. I further assume that they
set prices in the buyers currency. Because of this market segmentation, the law of one price for
goods does not hold and the purchasing power parity does not pin down the real exchange rate.
The purchasing power parity also fails because of the home bias in consumption.
16

I follow Calvo (1983) and Yun (1996) for analytical tractability in order to introduce nominal
rigidities: A fraction of firms cannot choose their prices optimally. Fraction of them reset the
goods price in Home by gross inflation as pH,t (i) = pH,t1 (i) and the goods price in Foreign by
gross inflation as pH,t (i) = pH,t1 (i) , where and are the steady state gross inflation rate
in Home and in Foreign, respectively. The remaining fraction 1 of the firms set their price
optimally. They maximize the present discounted value of future nominal profits
Et

(
X

Mt,t+s

pH,t (i) YH,t+s (i) +

St pH,t (i)s YH,t+s


(i)

Wt+s (i)Lt+s (i)

s=0

subject to the demand function for its good


YH,t+s (i) =

pH,t (i)s
PH,t+s

(CH,t+s + GH,t+s )

and

YH,t+s
(i)

pH,t (i)s

PH,t+s

respectively, and the production function (24). The stochastic discount factor is given as Mt,t+s =
sj=1 Mt+j1,t+j and Mt,t = 1. St is the nominal exchange rate, which is units of Home currency per
unit of Foreign currency. An increase of St means the nominal depreciation of the Home currency.
GH,t+s and GH,t+s are the government purchases of goods produced in Home, by the government
in Home and in Foreign, respectively. I assume that the government has the same preferences for
goods as the households. Therefore, the government allocates its expenditure in the same way as
the households. With the assumption that firms take wage as given, their pricing decision on local
sales and exports can be solved separately.8
The problem of the firms in Foreign is defined in an analogous way with a nominal rigidity
parameter .
3.1.3. Government. The government purchases Gt that consists of goods produced in Home, GH,t ,
and goods produced in Foreign, GF,t . The government budget constraint is
Bt Rt1 Bt1 = Tt + Pt Gt ,
where Bt is the government bond supply and Rt is the nominal interest rate between period t1 and
period t. The fiscal policy is fully Ricardian. I do not explicitly model the government expenditure,
but assume that gt = log [Gt / (Aw,t G)], with the steady state government expenditure G, follows
an exogenous first-order autoregressive process as gt = g gt1 + g g,t , where g,t i.i.d. N (0, 1).
The monetary policy authority in Home sets the nominal interest rate following a feedback rule
of the form

!y 1R


 
Rt
Rt1 R t
Yt

=
exp (m m,t ) ,
R
R

Aw,t Y

8Though producers hire a different type of labor from specific labor market, I assume that they are a wage taker.

As in Woodford (2003), we may assume that there is a continuum of producers for each type. Then an individual
producer takes wage as given. In equilibrium, however, they produce the same amount of goods and thus the result
would be the same.
17

CH,t+s
+ GH,t+s ,

where Yt is output, t is gross CPI inflation, and R is the steady state value of Rt . The feedback
rule is perturbed by a monetary policy shock, m,t i.i.d. N (0, 1).
The government of Foreign has the same set of policy rules, but with different parameters.
3.1.4. Market clearing. Goods market clearing requires that the private and government demand
for each variety of goods is equal to the supply of the good. In the aggregate, it follows that
Z

NH

YH,t (i) + YH,t


(i) di Yt = CH,t + GH,t + CH,t
+ GH,t ,

for Home. Since the model does not include investment, I assume that GH,t and GH,t include
investment. I denote gt an aggregate demand shock. For Foreign, an analogous equation holds:
Z
NF

YF,t
(j) + YF,t (j) dj Yt = CF,t
+ GF,t + CF,t + GF,t .

3.2. Determination of the real exchange rate. The assumption that asset markets are complete and assets are traded freely results in the complete international risk sharing and the uniqueness of the nominal discount factor. Consequently, nominal exchange rates are determined every
period by the condition

Mt,t+1
St+1
=
,
St
Mt,t+1

where Mt,t+1
is the nominal stochastic discount factor for Foreign households. I define the real
exchange rate as Qt = St Pt /Pt , and then

Uc,t
Qt = Q
,
Uc,t
. Log-linearized around the deterministic steady state, the log deviation of
where Q = Q0 Uc,0 /Uc,0
real exchange rates is determined as relative consumption

(25)

qt = 1 (
ct ct ) ,

where qt , ct and ct are log deviations from the steady state value of the corresponding variable.
In the empirical exercise, for estimation, I add an ad-hoc wedge zt in this equation

(26)

qt = 1 (
ct ct ) + zt .

where zt = z z,t with z,t i.i.d. N (0, 1). The wedge has the following purposes. First of all, I
estimate seven variables which will be explained in the next section and the original specification
has six shocks. To avoid stochastic singularity, I need one or more additional shocks. However, in
order to satisfy the first assumption of Assumption 2, I need to match the number of shocks and
the number of observable variables. Therefore, I need to put one additional shock in the model.
Equation (25) is a good place to add in an additional shock, since the exact relationship between
the real exchange rate and relative consumption is simply not true. In the data, the real exchange
rate and relative consumption do not have a close relationship predicted by the equation, which is
also known as the Backus-Smith puzzle (Backus and Smith 1993). In order to estimate the model,
18

I need an error term to capture deviations from the exact relationship in (25). It is common in the
literature estimating an open economy model to add a shock to the purchasing power parity relation
or the uncovered interest rate parity equation under the incomplete asset market assumption. This
will have the same result as the wedge in (26).9
It turns out that the parameter estimates of the DSGE model and the DSGE-VAR model are not
very different from their calibrated values in the literature. Therefore, despite the ad-hoc wedge,
the estimated impulse response of the DSGE model is close to that of the calibrated DSGE model
which does not have the wedge. This is also true for the prior distribution of the DSGE-VAR
model.
Note that to apply the econometric framework of the paper, an additional shock does not necessarily have to enter (26). The VAR model will relax Equation (25) without the wedge. I need to
put an additional shock, but the additional shock could be, for example, a common shock to the
world-wide technology trend. What is important is to figure out possible sources of uncertainties
in the model and add the additional shock to a place that is most likely to have uncertainties not
captured by the model. Considering the apparent misspecification of (25), I let zt capture the
fluctuations of the real exchange rate which are not driven by the structure of the model. I may
obtain such a wedge by adding a preference shock to the households utility, with opposite signs in
Home and Foreign. However, it is not more structural, or well based on behavioral rules, than the
simple wedge zt . I do not try to give zt a theoretical interpretation at this stage.
Robustness exercise.
3.3. Shocks. The model has seven shocks: Four of the shocks follow AR(1) processes and the two
monetary shocks and the wedge for the real exchange rate are assumed serially uncorrelated. The
seven innovations to the structural shocks and the wedge are
h

z,t , g,t , g,t


, a,t , a,t
, m,t , m,t
,

which I assume are normally distributed with zero mean and unit variance. They are mutually
independent
and serially iuncorrelated. Their standard deviation is 1, but they are multiplied by
h

. I normalize their standard deviations to 1 in order to compare them


z , g , g , a , a , m , m
with identified structural shocks of the VAR model.
Robustness exercise.
4. Empirical results
The empirical results are reported in two parts. This section first describes the detail of estimation and then discusses the posterior estimation result for the DSGE model and the DSGE-VAR
model. I will discuss the evidence of misspecification of the DSGE model based on various estimation results. In the next section, I focus on the exchange rate dynamics and present evidence of
misspecification of the DSGE model regarding the real exchange rate dynamics.
9The uncovered interest parity condition is derived as r r = E s
t
t
t+1 + zt .
t
19

4.1. Data, prior distribution, and posterior simulation.


4.1.1. Data. I estimate the models with the US and the Euro area data.10 The likelihood of the
DSGE model and the DSGE-VAR model is based on the following data
[yt , yt , t , t , rt , rt , qt ] ,
where yt and yt are the log of per capita output, t and t are annualized CPI inflation rates, rt
and rt are annualized interest rates in the US and in the Euro area, respectively. qt is the log of
the real exchange rate, which is the nominal exchange rate times the ratio of price levels between
the Euro area and the US. I use the CPI rather than the output deflator for comparison with the
literature.
All data are multiplied by 100. This is to normalize the magnitude of the data so that the
contemporaneous coefficient matrix of the DSGE-VAR model has elements of similar magnitudes.
The data are quarterly and the sample period spans the period from 1982:2 through 2006:4. The
data before 1984:1 are used as initial observations for both models. The lag for the DSGE-VAR
model is 4. The Euro area data are based on the Area Wide Model (AWM) of the European Central
Bank, and the US data are obtained from the Federal Reserve Economic Data (FRED) system of
St. Louis Fed.11 For the nominal US Dollar (USD) - the Euro (EUR) exchange rates, I use the
corresponding series in the AWM dataset. Prior to the introduction of the Euro it is a weighted
average of individual bilateral exchange rates of the member countries against the USD.
4.1.2. Prior distribution of the structural parameters. For the structural parameters, , I follow the
convention in the literature: They are assumed a priori independent and their prior distributions
are truncated at the boundary of the determinacy region. The prior distribution of is the same
for the DSGE model and the DSGE-VAR model.
Specific distributions, which are similar to those assumed in the related literature, are reported
in Table 1. The prior distribution of a few parameters are worth mentioning. I assume that the
inverse of the intertemporal elasticity of substitution 1 has a Gamma distribution with mean 3
and standard deviation 1. Chari, Kehoe, and Mcgrattan (2002) shows that the parameter needs to
be around 5 to match the volatility of real exchange rates. I assume a more conventional range for
1 but with a large standard deviation to consider their finding. The nominal rigidity parameters
and have a Beta distribution with mean 0.66 (average price change in every three quarters)
and standard deviation 0.1. I assume that the elasticity of substitution between goods produced in
the same country follows a Gamma distribution with mean 9 and standard deviation 2, and the
elasticity of substitution between the US-produced goods and the Euro-area-produced goods has
a Gamma distribution with mean 1.5 and standard deviation 0.3. Chari, Kehoe, and Mcgrattan
(2002) set = 10 and = 1.5. The prior distribution on the coefficients of the monetary policy
10The countries included are Austria, Belgium, Cyprus, Finland, France, Germany, Greece, Ireland, Italy, Luxem-

bourg, Malta, Netherlands, Portugal, Slovenia, and Spain.


11I follow Lubik and Schorfheide (2006) for construction of the dataset. For the Euro area data, the readers are
referred to an explanatory note by Euro Area Business Cycle Network at http://www.eabcn.org/area-wide-model
for detail.
20

feedback rule is based on the standard Taylor rule: y and have a prior distribution described
by a Gamma distribution with mean 0.125 (0.5/4) and standard deviation 0.05 and mean 1.7 and
standard deviation 0.3, respectively. The private-consumption expenditure accounts for about 67%
in the US and 57% in the Euro area. The steady state consumption-output ratio c is assumed
to have a Beta distribution with mean 0.6 and standard deviation 0.05. For the shock processes,
I assume that all the shocks have a prior distribution with the same autocorrelation coefficient
but different standard deviations of their innovations. The autocorrelation coefficient has a Beta
distribution with mean 0.6 and standard deviation around 0.2.
4.1.3. Posterior simulation. For the DSGE-VAR model, I assume the precision matrix S02 is In
and estimate the marginal likelihoods for four values of = 32 , 52 , 72 , 92 with a set of values of ,
{0.1, 0.2, , 1.5, 2, 3, 5}. The structure of the precision matrix implies that all the elements
of 0 have the same degree of dispersion around their mean. I choose and with the highest
marginal likelihood as the optimal weights on the DSGE model.
I generate a simulation chain for each model (for each pair of and ). I draw 3.15 million draws
of and 0 , but keep a draw in every three draws. Discarding the first 0.05 million draws, I have
1 million draws in the end. When I compute the impulse response functions, I use a draw in every
20 saved draws and use the resulting 50,000 draws. The thinning is done simply in consideration
of the physical memory and the storage space of a computer system.
4.2. Posterior distribution.
4.2.1. Marginal likelihoods. As explained above, rather than estimating the hyperparameters and
, I construct a set of DSGE-VAR models with different s and s and choose a model with the
highest marginal likelihood. Let the set of models contain M models indexed by i, A = {Ai }M
i=1 .
The posterior probability for a model Ai given the data Y is computed as
p (Ai ) p (Y |Ai )
,
p (Ai |Y ) = PM
j=1 p (Aj ) p (Y |Aj )

(27)

where M
j=1 p (Aj ) = 1. The posterior probability of a model Ai literally means a probability of
the model among the competing models in A when Y is observed. Assuming a flat prior on Ai s,
or (, )s, (27) becomes
p (Y |Ai )
p (Ai |Y ) = PM
,
j=1 p (Y |Aj )
which means the rank of the models based on the posterior probability is equivalent to the rank
based on the marginal likelihood, p (Y |Ai ). With the flat prior over A, a posterior odds ratio between any two models, p (Ak |Y ) /p (Al |Y ), is equal to a Bayes factor for the two models,
p (Y |Ak ) /p (Y |Ak ). So I choose a DSGE-VAR model with the highest marginal likelihood as the
most likely model, or a model with the best fit, among the competing models in A given Y and
condition subsequent analysis on the model. The pair of (, ) for the model can be interpreted
as an optimal weight on the DSGE model prior distribution. As An and Schorfheide (2007) put
P

21

it, this model selection is associated with a 0-1 loss function that gives a loss of one to choosing a
wrong model.
Figure 1 displays the estimate of the log marginal likelihood p (Y |Ai ) for the DSGE model and
the DSGE-VAR models. As in DS and Del Negro, Schorfheide, Smets, and Wouters (2007), the log
marginal likelihood estimates form a smooth upside-down U-shaped curve for each which is quite
flat around a peak. The log marginal likelihoods with = 52 (implying standard deviation 0.2
around means) are uniformly higher than with = 92 , but similar to those of the case with = 72 .
When the prior on 0 is loose ( = 32 ), the log marginal likelihoods are much smaller. This implies
that imposing a properly tight prior on 0andthus the covariance matrix 00 0 is necessary to


fit the data. The DSGE-VAR model with ,
= 0.7, 52 has the highest marginal likelihood,
which means that the optimal length of hypothetical dummy observations, or the optimal weight on
the DSGE model prior, is 70% of the
 length of the actual observations. However, as the posterior

probability of the model with ,


, reported in the note of Figure 1, is not particularly high and
similar to that of the models around it, the model selection based 
on the
 marginal likelihoods is not

very sharp. This is as expected since varies smoothly around ,
and thus the model space



are similar. However, as the prior on B gets very tight (high
is dense. The models around ,
) or very loose (low ), the posterior probability of a model becomes negligible. This shows a
necessity to loosen the tight restrictions imposed by the DSGE model to improve the data fit and
at the same time a necessity to shrink a prior distribution properly. In the following, I only report
the empirical results of the best DSGE-VAR model with (, ) = (0.7, 5) and compare the model
with the estimated DSGE model.
Estimating a VAR model with the DSGE model prior substantially improves the overall fit
over the estimated DSGE model. The posterior probability of the DSGE model in the set A is
nearly zero. The improvement occurs via two channels. The cross-equation restrictions of the
DSGE-VAR model are weaker than the DSGE model so that the DSGE-VAR model can capture
richer dynamics. Also, the contemporaneous covariance structure of the DSGE-VAR model is more
flexible. The result strongly suggests that the restrictions of the DSGE model are too tight or it is
misspecified. However, comparison based on the marginal likelihoods does not indicate which part
is likely misspecified.
4.2.2. Posterior distribution of the structural parameters. Table 2 reports the mean, standard deviation and 90% highest posterior density (HPD) interval of the marginal posterior distribution
for the structural parameters. The posterior distribution for the DSGE-VAR model should be
interpreted with caution since the DSGE model is not a model of the data in this case. Once we
condition on 0 and B, then the structural parameters do not affect the likelihood of the VAR
model. The DSGE-VAR model cannot be used to estimate the structural parameters. The deep
parameters should be estimated by a DSGE model itself.
The posterior distribution for the DSGE-VAR model shows that the cross-equation restrictions
imposed by the DSGE model prior on the VAR model are comparable to those of the estimated
DSGE model. The difference of the posterior distribution between the DSGE model and the
22

DSGE-VAR model is not big enough to make a qualitative difference in estimated impulse response
functions. The posterior distribution for the DSGE-VAR model is close to that for the DSGE
model although there are a few exceptions. The posterior mean of the inverse of the intertemporal
elasticity of substitution 1 is 7.565 for the DSGE model which is much larger than 4.423 for the
DSGE-VAR model. It is also larger than 5 found by Chari, Kehoe, and Mcgrattan (2002) to match
the volatility of the real exchange rate. The standard deviation of zt , 100 z , is also quite large for
the DSGE model compared to that for the DSGE-VAR model. Even though I put the ad-hoc wedge
in the real exchange rate equation, the DSGE model needs a very large value of 1 to amplify
the fluctuations of the consumption ratio and a very volatile wedge to fit the fluctuations of the
real exchange rate. However, the DSGE-VAR model has more loose cross-equation restrictions
and thus can match the real exchange rate dynamics with 1 and 100 z closer to their prior
distribution. The posterior mean of y and y is smaller for the DSGE model than for the DSGEVAR model. Overall, the posterior distribution of the structural parameters for the DSGE-VAR
model is closer to the prior distribution than for the DSGE model. In the DSGE-VAR model, their
prior distribution are not revised by the data as much as in the DSGE model. [change in the
table to ]
4.3. Identification and structural shocks.
4.3.1. Identification: initial responses to shocks. The identifying restrictions of the DSGE model are
imposed on the DSGE-VAR model through the prior distribution of the contemporaneous matrix
0 . I present the marginal posterior density estimates of the initial responses of the variables to
a positive technology shock in the US in panel (a) and to an expansionary monetary policy shock
in panel (b) of Figure 2, respectively. The initial response is H () M () in (1)-(2) for the DSGE
model and 1
0 in (3) for the DSGE-VAR model.
When the initial response in the DSGE model is strongly positive or negative, the DSGE-VAR
model tends to follow the sharp prediction by the DSGE model. For example, in response to
a positive shock to technology in the US, inflation in the US falls a lot and the interest rate is
lowered in both models. Key effects of a structural shock that characterizes the shock in the DSGE
model are well imposed on the DSGE-VAR model. For other variables, the posterior density of
the initial response in the DSGE-VAR model are more dispersed in general than in the DSGE
model. In particular, when the DSGE model has a highly concentrated posterior distribution, the
DSGE-VAR model is unable to match the degree of the concentration. For example, in Figure 2,
the DSGE model implies that log output in the Euro area, yt , has a highly concentrated initial
response to a positive technology shock in the US. However, for the DSGE-VAR model, the posterior
distribution of the same variable is not as concentrated as in the DSGE model. Because of the
structure of the covariance matrix in the prior distribution of 0 , (7), all the elements of 0 have
the same degree of the dispersion around their respective mean in the prior distribution. Therefore,
the marginal prior distribution of the initial response of variables can be wider in the DSGE-VAR
model than in the DSGE model. This is not a big concern since the posterior density for the
23

DSGE-VAR model overlaps the posterior density of the DSGE model for a significant posterior
probability. Also, it is possible that the data actually imply a dispersed distribution in some cases.
For example, in response to all the structural shocks, the posterior density of the initial response
of the log real exchange rate is more dispersed in the DSGE-VAR model than that in the DSGE
model. Considering the high volatility of the real exchange rate in the data and the fact that
estimates such as half-lives about the real exchange rate have large uncertainty, the DSGE-VAR
model appears to pick up volatile and noisy fluctuations of the real exchange rate.
Why does the posterior distribution of the initial response to a shock differ between the DSGE
model and the DSGE-VAR model? There are two reasons. First, the posterior distribution of the
structural parameters is different between the two models as explained in Section 4.2.2. Second,
0 () of the DSGE model to fit
0 of the DSGE-VAR model may deviate from its prior mean
the data better. The data are informative on 0 through its cross product. For example, in the
scalar case in Section 2.2, the data do not tell the sign of , but have information on its absolute
0 ()0
0 () of the DSGE model does not conform to the data, 0 of the
size. Therefore, when
DSGE-VAR model may deviate to match the data.

4.3.2. Estimated structural shocks. Since the DSGE-VAR model follows the DSGE model for identification, the structural shocks t of the DSGE-VAR model (3) are comparable to the structural
shocks et of the DSGE model (1)-(2). Comparison of the estimated structural shocks of the two
models provides evidence that the DSGE-VAR model indeed recovers structural shocks of the
DSGE model although with some corrections. The estimated structural shocks of both models,
presented in panel (a) of Figure 3, closely follow each other except for the wedge for the real exchange rate. There are some differences, but most of the fluctuations occur at the same time in
the same direction. The scatter plots of panel (b) of the same figure confirm the finding.
Again, there are some differences between the two models for the same reasons as in the comparison of the initial response. The biggest disagreement exists for the wedge zt for the real exchange
rate dynamics. In the DSGE model, the wedge has a strong serial correlation as opposed to its
specification of being independent over time. In the DSGE-VAR model, the estimates of the wedge
are not completely different from those of the DSGE model, but they show weaker serial correlation. As the cross equation restrictions of the DSGE model are relaxed in the DSGE-VAR model,
more endogenous variables and their lags are involved and the real exchange rate dynamics become
richer. Therefore, the burden on the wedge to absorb what is not explained by endogenous variables
is reduced.
Panel (a) of Table 3 reports the posterior means and 90% error bands of the covariance and autocorrelation coefficients of the estimated structural shocks of the DSGE model. The DSGE model
assumes that the innovations to different structural shocks are mutually uncorrelated. However,
there are many cases where two innovations are significantly correlated. First, the innovations to
monetary policy shocks are significantly correlated with the innovations to other shocks, especially
the innovations to technology shocks and demand shocks. This suggests that the interest rate
24

fluctuations that are not explained by the systematic feedback rule are not purely random nonsystematic policy shocks. The actual monetary reaction function of monetary authorities seems to
be more complicated than the Taylor-rule type feedback function of the DSGE model. For example,
if the Fed responds to output growth as well as to the output gap, then the increase of the interest
rates responding to rapid output growth due to demand shocks will be mostly captured by the
non-systematic monetary shock in the current setup of the monetary policy feedback rule. Next,
the wedge zt for the real exchange rate dynamics is correlated with the innovations to other shocks.
The wedge is supposed to be uncorrelated with other shocks and capture misspecifications that
exist only in the real exchange rate equation such as random noises in the foreign exchange market. The significant correlation with other shocks suggests that the real exchange rate is affected
by variables other than the relative consumption and a pure noise. Lastly, there is a significant
autocorrelation for all the innovations whereas they are supposed to be serially uncorrelated.
Panel (b) of Table 3 reports the same posterior moments of the identified structural shocks of the
DSGE-VAR model. The covariance matrix of the structural shocks is assumed to be an identity
matrix in (3). Interestingly, the 90% error band for the covariance of any pair of the shocks
includes zero. This is not a joint test, but the individual elements of the estimated covariance
matrix of the structural shocks does not violate the specification with 90% probability. Although
the DSGE model from which we derive the identification scheme of the structural shocks violates
the specification of structural shocks, the DSGE-VAR model alleviates the problem by allowing
deviations from the DSGE model restrictions. The error bands of the autocorrelation coefficients
include zero except for the US monetary policy shock m and the wedge z, and even for these two
shocks the size of serial correlation is much smaller than in the DSGE model.
Comparison of estimated structural shocks raises a complicated issue on identification of a VAR
model based on a DSGE model. What if estimated structural shocks are very different between the
DSGE model and the VAR model? It is likely that the DSGE model is seriously misspecified in
this case and the VAR model picks up completely different dynamics. However, it may be that the
VAR model fails to follow the identifying restrictions of the DSGE model. I have two suggestions.
First, if it is the case that the DSGE model is seriously misspecified, then the estimate of the weight
on the DSGE model prior, (, ), will be very small. Second, we can use the simulated draws of
the structural parameters from the posterior distribution of the DSGE model in order to construct
the posterior distribution of the VAR model conditional on the draws. This way, we can remove
one of the two possible channels through which the DSGE model and the VAR model differ and
examine the effects of relaxing the restrictions imposed by the DSGE model.

5. Real exchange rate dynamics


I investigate the real exchange rate dynamics of the DSGE model in detail by comparing the
impulse response of the real and nominal exchange rate between the DSGE model and the DSGEVAR model.
25

5.1. Impulse response functions of the real exchange rate. Panel (a) of Figure 4 plots the
posterior mean and 68% error bands of the impulse response to a positive technology shock in the
US. The first plot presents the impulse response of the real exchange rate. As in Steinsson (2008),
the real exchange rate in the DSGE model responds in a delayed, hump-shaped fashion to the shock.
As the positive technology shock cuts the real marginal cost, inflation drops and the interest rate
is lowered in response in the US. However, the interest rate is not adjusted fully on impact due to
the monetary policy smoothing and keeps falling for a couple of periods. Then, it starts to rise in
response to a boom in output. This hump-shaped response in the interest rate of the US makes the
real interest rate shoot over the steady state level on impact and then decline eventually below the
steady state. Since consumption is the minus expected sum of future real interest rates, the change
in the sign of the impulse response of the real interest rate causes consumption to increase initially
and then decline. According to equation (25), the real exchange rate follows this hump-shaped
response of consumption in the US. Consumption in the Euro area does not respond strongly and
the real exchange rate is mainly driven by consumption in the US.
In the DSGE-VAR model, the real exchange rate has a similar response to that of the DSGE
model. It appears that the cross-equation restrictions of the DSGE model regarding the real
exchange rate are not at odds with the data. However, the impulse response of the nominal
exchange rate in the second plot and the impulse response of the price ratio in the third plot
suggest that, though the impulse response of the real exchange rate looks similar between the two
models, a mechanism by which the two models generate such a hump-shaped impulse response is
not the same. In the DSGE model, the US price level drops and the ratio of the Euro area price
to the US price rises sharply. The sharp increase of the price ratio counteracts the hump-shaped
response of the real exchange rate, and as a result the nominal exchange rate does not show a
hump in its response but decays exponentially. The DSGE-VAR model shows different responses.
The price ratio does not respond as strongly as in the DSGE model and as a result the nominal
exchange rate responds in the same delayed, hump-shaped way.
We can think of two possibilities for the difference between the DSGE model and the DSGEVAR model. The first is that the DSGE model may not be able to generate a sluggish response of
the price ratio, specifically for the US price, in response to a positive technology shock. If this is
true, then the solution would be to address the persistence of inflation. The second possibility is
that the similar response of the real exchange rate in the two models is just a coincidence and the
two models have different real exchange rate dynamics. Panel (b) of Figure 4 provides evidence
supporting the second possibility.
Panel (b) of Figure 4 presents the posterior mean and 68% impulse response to an expansionary
(negative) monetary policy shock in the US. Both real and nominal exchange rates respond in a
delayed, hump-shaped fashion to the shock in the DSGE-VAR model, whereas they overshoot and
decay rapidly in the DSGE model. The response of both models is well known in the literature.
The identified VAR literature (see, for example, Eichenbaum and Evans 1995 and Scholl and Uhlig
2008) finds that the real and nominal exchange rate responds in such a way to a monetary policy
26

shock. Also, Steinsson (2008) acknowledges the inability of a sticky-price DSGE model to generate
a hump-shaped response to a monetary policy shock. As to the technology shock, the DSGE-VAR
model predicts that the price ratio responds more weakly to the monetary shock than the DSGE
model. However, fixing the response of the price ratio would not help address the problem of the
DSGE model not being able to generate a hump-shaped response of the real exchange rate as well
as the nominal exchange rate in the case of a monetary policy shock.
Because of the weak and sluggish response of the price ratio to a technology shock and a monetary
policy shock, the real and nominal exchange rate has a similar response to both shocks in the DSGEVAR model. The following evidence suggests that a main cause of the problem is likely to exist in
the nominal exchange rate dynamics of the DSGE model. The log-linearized equilibrium conditions
of the DSGE model imply the uncovered interest parity (UIP)12
Et
st+1 = rt rt ,

(28)

where st is the log deviation of the nominal exchange rate from its steady state value and rt and
rt are the log deviation of the nominal interest rate in Home and in Foreign, respectively. Unless
there is a friction in international asset trades, a DSGE model always implies (28) with complete
asset markets or not. The condition (28) also holds regardless of the inflation dynamics or the
monetary policy rule. Although I computed the impulse response of the nominal exchange rate by
subtracting the price ratio from the real exchange rate in Figure 4, the resulting impulse response
satisfies (28).
Panel (a) of Figure 5 shows the posterior mean and 68% error bands of the impulse response of
the nominal exchange rate and the interest rate differential to a positive technology shock in the
US. In the DSGE model, the US interest rate goes down in response to falling inflation due to the
shock. The response of the interest rate is stronger in the US than in the Euro area and hence the
interest rate differential is negative. Therefore, by (28), the nominal exchange rate continues to fall.
On the contrary, in the DSGE-VAR model, the response of the interest rate differential is weaker
and the nominal exchange rate does not respond as predicted by (28). One way for the DSGE
model to generate a hump-shaped response of the nominal exchange rate is that the interest rate
differential falls below zero in the beginning and goes up gradually. The DSGE model is not likely
to be able to generate such a response of the interest rate differential with standard parameter
values, considering the fact that the shock originates in the US and the interest rate will respond
more strongly in the US.
The expected excess US dollar return of investing in a Euro bond over the return of investing in
a US dollar bond, presented in the third plot of Panel (a), confirms that the UIP does not hold in
the DSGE-VAR model. The (log) expected excess return pk between period 1 and period k (k 2)
is defined as
pk =

k1
X

rj rj + 400 [
sj+1 sj ] .

j=1
12The model includes the wedge z in the condition. However, the wedge is muted when I compute the impulse
t

response to structural shocks other than zt .


27

The expected excess return is always zero for k 2 in the DSGE model, but not in the DSGE-VAR
model. It is well known that the UIP does not hold in the data. Even though we take the DSGE
model to the data, its structure keeps it from deviating from the UIP. This is a good example that
we have to be careful in interpreting a result from an estimated DSGE model.
The impulse response to an expansionary monetary policy in the US is presented in Panel (b). In
both models, the impulse response of the interest rate differential is very similar. Again, however,
the nominal exchange rate decays exponentially in the DSGE model while it responds in a delayed,
hump-shaped way in the DSGE-VAR model. The expected excess return is not zero in the DSGEVAR model.
5.2. Robustness exercise. I assume that the wedge for the real exchange rate, zt , is mutually
independent with other structural shocks. To check whether this assumption is restrictive and drives
the result on the real exchange rate dynamics through its effects on the posterior distribution of the
structural parameters and the contemporaneous coefficient matrix, I relax the assumption and allow
the wedge to be correlated with the demand shocks gt and gt . Consequently, there is a possibility
that the demand shocks may enter the real exchange rate equation (26) and affect the real exchange
rate. By using a more general shock structure, the DSGE-VAR model has a better chance to match
the data. I do not allow its correlation with the technology shocks and the monetary shocks to
keep them identified. Now the DSGE-VAR model is partially identified and shocks other than the
technology shocks and the monetary shocks are left unidentified. The robustness exercise can be
implemented by modifying the covariance matrix of the DSGE structural shock, et . I do not repeat
estimation of the all the DSGE-VAR models but estimate the DSGE-VAR model for = 0.7 and
= 52 . Figure 6 displays the impulse response of the log real exchange rate, the log nominal
exchange rate, and the log price ratio. All the impulse responses are similar to those when the
wedge is assumed to be uncorrelated with other shocks.

28

6. Tables and Figures


Algorithm 1 Gibbs sampler G to draw 0 and
(1) [Initialize] Choose a starting value for 0 and .
(2) [Draw ]
(a) Draw from N , c21 .
(b) Keep with the probability



p (0 , | Y )
min
,1 .
p (0 , | Y )


Otherwise, keep .

(3) [Draw 0 ]
(a) Draw u U [0, 1].
(b) [M21 ] If u


2 +
1 , draw 0 from N +
() .
0 () , c2 V

probability

Keep 0 with the


p (0 , | Y ) /q (0 |, M21 )
,1 .
min
p (0 , | Y ) /q (0 |, M21 )
Otherwise, keep 0 .

(c) [M22 ] If u > 1 , draw 0 from N 0 , c23 V + () . Keep 0 with the probability


p (0 , | Y )
,1 .
min
p (0 , | Y )
Otherwise, keep 0 .


(4) Go to (2).

29

Parameter

Description

Distribution

Mean

S.D.

90% HPD interval

Annualized steady-state
real interest rate,

Normal

[0.976, 6.997]

SS world-wide technology growth rate

Normal

0.5

0.1

[0.337, 0.665]

Inverse elasticity of intertemporal substitution

Gamma

[1.402, 4.540]

Home firm Calvo parameter

Beta

0.66

0.1

[0.500, 0.829]

Foreign firm Calvo parameter

Beta

0.66

0.1

[0.498, 0.825]

Gamma

[1.392, 4.568]

Gamma

[5.731, 12.140]

Gamma

1.5

0.3

[0.998, 1.975]

Home bias in consumption

Beta

0.94

0.03

[0.895, 0.986]

Home monetary policy smoothing

Beta

0.7

0.15

[0.467, 0.943]

Foreign monetary policy smoothing

Beta

0.7

0.15

[0.463, 0.936]

Home monetary policy output gap

Gamma

0.125

0.05

[0.045, 0.201]

Foreign monetary policy output gap

Gamma

0.125

0.05

[0.048, 0.203]

Home monetary policy inflation

Gamma

1.7

0.3

[1.225, 2.192]

Foreign monetary policy inflation

Gamma

1.7

0.3

[1.194, 2.161]

AR coefficient for Home demand shock

Beta

0.6

0.2

[0.284, 0.929]

AR coefficient for Foreign demand shock

Beta

0.6

0.2

[0.292, 0.934]

AR coefficient for Home technology shock

Beta

0.6

0.2

[0.287, 0.932]

AR coefficient for Foreign technology shock

Beta

0.6

0.2

[0.290, 0.931]

SD for the real exchange rate wedge

Inv Gamma

10

[6.846, 13.034]

SD for Home demand shock

Inv Gamma

0.5

[0.068, 0.949]

SD for Foreign demand shock

Inv Gamma

0.5

[0.067, 0.941]

SD for Home technology shock

Inv Gamma

0.5

[1.712, 3.276]

SD for Foreign technology shock

Inv Gamma

0.5

[1.703, 3.270]

SD for Home monetary policy shock

Inv Gamma

0.1

0.3

[0.012, 0.190]

SD for Foreign monetary policy shock

Inv Gamma

0.1

0.3

[0.012, 0.188]

Steady-state inflation rate for Home

Normal

0.5

[1.169, 2.828]

Steady-state inflation rate for Foreign

Normal

0.5

[1.180, 2.822]

Steady-state real exchange rate

Gamma

0.3

[0.526, 1.481]

Beta

0.6

0.05

[0.518, 0.682]

Inverse Frisch elasticity of labor supply


Elasticity of substitution between goods
produced within a country
Elasticity of substitution between domestic
goods and imported goods

Steady-state Home consumption-output ratio

Table 1. Prior distributions of the DSGE parameters


Note: r = 1/ 1 is the annualized, steady state real interest rate. is a percentage growth rate. is
reparameterized so that > 1 .
30

Parameter

Prior

Posterior: DSGE

Posterior: DSGE-VAR

Mean

S.D.

90% HPD

Mean

S.D.

90% HPD

Mean

S.D.

90% HPD

[0.976, 6.997]

3.190

0.601

[2.191, 4.155]

1.145

0.575

[0.285, 1.942]

0.5

0.1

[0.337, 0.665]

0.344

0.032

[0.291, 0.397]

0.259

0.034

[0.208, 0.308]

[1.402, 4.540]

7.565

1.293

[5.454, 9.644]

4.423

1.066

[2.711, 6.118]

0.66

0.1

[0.500, 0.829]

0.528

0.046

[0.452, 0.604]

0.501

0.053

[0.414, 0.588]

0.66

0.1

[0.498, 0.825]

0.610

0.044

[0.537, 0.683]

0.617

0.049

[0.536, 0.696]

[1.392, 4.568]

3.192

0.762

[1.939, 4.395]

3.147

0.916

[1.660, 4.596]

[5.731, 12.140]

8.599

1.771

[5.707, 11.441]

7.564

1.686

[4.812, 10.258]

1.5

0.3

[0.998, 1.975]

1.213

0.247

[0.813, 1.611]

1.464

0.293

[0.978, 1.932]

0.94

0.03

[0.895, 0.986]

0.963

0.007

[0.951, 0.975]

0.964

0.012

[0.945, 0.983]

0.7

0.15

[0.467, 0.943]

0.908

0.012

[0.889, 0.929]

0.876

0.028

[0.832, 0.922]

0.7

0.15

[0.463, 0.936]

0.874

0.016

[0.848, 0.900]

0.827

0.036

[0.770, 0.886]

0.125

0.05

[0.045, 0.201]

0.056

0.014

[0.033, 0.078]

0.125

0.047

[0.050, 0.197]

0.125

0.05

[0.048, 0.203]

0.048

0.013

[0.028, 0.069]

0.126

0.046

[0.053, 0.197]

1.7

0.3

[1.225, 2.192]

2.110

0.282

[1.649, 2.569]

1.638

0.284

[1.170, 2.092]

1.7

0.3

[1.194, 2.161]

2.241

0.257

[1.814, 2.657]

1.647

0.278

[1.188, 2.091]

0.6

0.2

[0.284, 0.929]

0.981

0.005

[0.974, 0.989]

0.852

0.057

[0.764, 0.943]

0.6

0.2

[0.292, 0.934]

0.992

0.003

[0.988, 0.997]

0.897

0.053

[0.821, 0.975]

0.6

0.2

[0.287, 0.932]

0.811

0.048

[0.732, 0.889]

0.457

0.114

[0.270, 0.645]

0.6

0.2

[0.290, 0.931]

0.787

0.038

[0.725, 0.851]

0.474

0.132

[0.257, 0.693]

10

[6.846, 13.034]

12.833

0.986

[11.223, 14.436]

5.585

0.734

[4.399, 6.755]

0.5

[0.068, 0.949]

1.546

0.241

[1.158, 1.923]

1.243

0.235

[0.875, 1.606]

0.5

[0.067, 0.941]

1.350

0.207

[1.016, 1.671]

1.202

0.229

[0.843, 1.560]

0.5

[1.712, 3.276]

2.481

0.488

[1.705, 3.227]

2.802

0.576

[1.893, 3.687]

0.5

[1.703, 3.270]

2.464

0.476

[1.714, 3.201]

2.659

0.538

[1.805, 3.476]

0.1

0.3

[0.012, 0.190]

0.130

0.010

[0.114, 0.146]

0.105

0.013

[0.084, 0.126]

0.1

0.3

[0.012, 0.188]

0.124

0.010

[0.107, 0.140]

0.092

0.010

[0.075, 0.108]

0.5

[1.169, 2.828]

2.260

0.443

[1.538, 2.994]

1.830

0.418

[1.149, 2.522]

0.5

[1.180, 2.822]

1.764

0.424

[1.057, 2.451]

2.036

0.437

[1.325, 2.760]

0.3

[0.526, 1.481]

0.874

0.033

[0.821, 0.925]

0.913

0.124

[0.724, 1.108]

0.6

0.05

[0.518, 0.682]

0.511

0.051

[0.427, 0.596]

0.561

0.050

[0.478, 0.643]

Table 2. Prior and posterior distributions of the structural parameters


= 0.7 and
Note: See the note in Table 1. For the DSGE-VAR model,
= 52 .

31

(a) DSGE

1.07
Wedge

Demand

Technology

Monetary

AR1

[0.84, 1.32]
0.11

1.00

mean

[0.05, 0.17]

[0.76, 1.25]

0.32

-0.06

1.02

[0.26, 0.38]

[-0.15, 0.02]

[0.79, 1.28]

[90% error bands]

-0.08

-0.27

-0.02

1.00

[-0.14, -0.01]

[-0.36, -0.19]

[-0.07, 0.04]

[0.77, 1.25]

0.08

0.18

-0.13

0.35

1.00

[0.03, 0.14]

[0.12, 0.25]

[-0.21, -0.05]

[0.24, 0.45]

[0.77, 1.25]

0.33

0.45

0.11

-0.01

0.22

1.01

[0.23, 0.43]

[0.36, 0.55]

[0.06, 0.16]

[-0.13, 0.11]

[0.16, 0.29]

[0.78, 1.27]

0.26

-0.07

0.39

0.05

-0.07

0.21

1.01

[0.18, 0.35]

[-0.10, -0.03]

[0.31, 0.47]

[-0.02, 0.12]

[-0.19, 0.04]

[0.15, 0.27]

[0.78, 1.27]

0.940

0.40

0.30

-0.21

-0.19

0.44

0.52

[0.936, 0.945]

[0.36, 0.44]

[0.27, 0.33]

[-0.29, -0.09]

[-0.31, -0.05]

[0.40, 0.49]

[0.47, 0.58]

(b) DSGE-VAR

0.81
Wedge

Demand

Technology

Monetary

AR1

[0.60, 1.04]
0.00

1.01

mean

[-0.16, 0.15]

[0.79, 1.26]

0.02

0.04

1.00

[-0.13, 0.17]

[-0.11, 0.18]

[0.78, 1.26]

[90% error bands]

0.04

-0.01

0.02

1.04

[-0.15, 0.23]

[-0.19, 0.16]

[-0.15, 0.19]

[0.80, 1.32]

-0.04

0.06

-0.06

0.08

1.02

[-0.21, 0.13]

[-0.10, 0.22]

[-0.23, 0.10]

[-0.09, 0.26]

[0.78, 1.28]

-0.04

0.09

0.06

0.06

0.05

1.00

[-0.20, 0.11]

[-0.06, 0.24]

[-0.09, 0.20]

[-0.11, 0.24]

[-0.10, 0.22]

[0.77, 1.25]

-0.01

0.06

0.10

0.05

0.01

0.00

1.04

[-0.15, 0.14]

[-0.08, 0.20]

[-0.04, 0.25]

[-0.12, 0.22]

[-0.16, 0.18]

[-0.14, 0.14]

[0.81, 1.30]

0.35

0.05

0.04

0.02

-0.10

0.14

0.12

[0.17, 0.50]

[-0.10, 0.19]

[-0.11, 0.20]

[-0.14, 0.19]

[-0.26, 0.07]

[0.00, 0.27]

[-0.03, 0.27]

Table 3. Posterior means and 90% error bands of the covariance and autocorrelation coefficients of the innovations to structural shocks



Note: 90% error bands are [5% quantile and 95% quantile]. For the DSGE-VAR, ,
= 0.7, 52 . The


posterior distribution of the DSGE structural shocks is constructed using the Kalman smoother. The posterior distribution of the DSGE-VAR structural shocks is constructed by identifying reduced-form residuals
by 0 .
32

(a) DSGE
Wedge
Output
Output*
Inflation
Inflation*
Interest rate
Interest rate*
Real exchange rate
Nominal exchange rate

Demand shock

Technology shock

Monetary policy shock

0.62

86.95

2.26

9.01

0.72

0.41

0.03

[0.20, 1.29]

[75.92, 94.05]

[0.96, 4.13]

[2.94, 19.49]

[0.24, 1.53]

[0.15, 0.80]

[0.01, 0.07]

1.58

3.71

86.21

2.53

5.47

0.08

0.42

[0.58, 3.06]

[1.57, 6.71]

[78.25, 92.56]

[0.72, 5.86]

[2.30, 10.44]

[0.02, 0.18]

[0.19, 0.76]

6.15

0.55

0.16

82.38

0.58

10.10

0.07

[1.65, 12.63]

[0.04, 2.00]

[0.03, 0.42]

[71.86, 92.11]

[0.17, 1.26]

[4.30, 17.02]

[0.03, 0.14]

22.59

0.59

2.12

2.44

67.48

0.41

4.37

[11.48, 34.81]

[0.18, 1.28]

[0.39, 4.80]

[0.82, 5.29]

[54.05, 80.49]

[0.18, 0.72]

[2.19, 7.28]

0.69

2.28

0.04

83.31

0.81

12.81

0.06

[0.11, 1.73]

[0.22, 5.63]

[0.00, 0.13]

[67.74, 94.70]

[0.22, 1.84]

[2.89, 27.13]

[0.02, 0.12]

2.69

0.97

0.36

3.50

74.38

0.27

17.83

[0.93, 5.20]

[0.22, 2.26]

[0.04, 1.13]

[0.87, 8.33]

[59.73, 87.51]

[0.08, 0.58]

[7.50, 30.53]

0.00

36.55

36.36

19.04

7.69

0.22

0.14

[0.00, 0.00]

[21.49, 50.26]

[22.23, 48.89]

[4.15, 43.84]

[1.60, 19.39]

[0.05, 0.52]

[0.03, 0.34]

0.17

36.71

43.03

4.54

2.72

9.83

3.00

[0.06, 0.34]

[24.13, 49.29]

[31.48, 54.89]

[0.71, 10.61]

[0.76, 5.41]

[4.45, 17.88]

[1.12, 6.13]

(b) DSGE-VAR
Wedge
Output
Output*
Inflation
Inflation*
Interest rate
Interest rate*
Real exchange rate
Nominal exchange rate

Demand shock

Technology shock

Monetary policy shock

5.79

30.94

4.87

14.57

17.74

4.23

21.87

[0.55, 18.29]

[11.67, 53.42]

[0.34, 15.29]

[1.48, 35.71]

[2.25, 39.25]

[0.31, 14.06]

[4.94, 42.50]

6.54

17.11

45.43

6.14

9.70

3.69

11.40

[0.65, 18.79]

[3.52, 36.09]

[24.75, 64.34]

[0.83, 16.98]

[1.13, 24.88]

[0.35, 12.08]

[1.63, 25.81]

12.68

9.00

6.22

46.85

9.44

9.91

5.90

[3.11, 29.03]

[2.47, 19.06]

[1.75, 13.46]

[27.28, 64.62]

[2.56, 20.47]

[3.05, 20.61]

[1.46, 13.61]

14.51

3.94

4.81

20.22

48.09

3.24

5.19

[2.80, 35.08]

[0.78, 10.16]

[0.96, 11.90]

[5.72, 38.63]

[27.55, 67.77]

[0.67, 8.20]

[1.46, 10.92]

12.77

16.54

7.38

14.52

4.35

36.55

7.89

[1.85, 31.03]

[5.30, 32.76]

[1.03, 18.35]

[4.03, 29.47]

[0.55, 12.81]

[18.45, 55.50]

[0.85, 20.28]

10.09

5.14

15.94

9.39

21.44

5.60

32.40

[0.86, 29.15]

[0.30, 17.26]

[3.11, 34.07]

[0.69, 27.81]

[4.89, 44.40]

[0.28, 18.69]

[12.10, 54.61]

17.20

9.28

7.99

13.21

16.43

14.04

21.85

[0.38, 54.00]

[0.12, 33.87]

[0.08, 28.51]

[0.15, 45.20]

[0.21, 49.72]

[0.14, 44.73]

[0.50, 54.69]

20.77

8.52

7.14

11.13

11.30

17.87

23.27

[0.49, 59.72]

[0.10, 31.19]

[0.06, 26.17]

[0.11, 39.88]

[0.12, 38.61]

[0.22, 51.93]

[0.60, 56.86]

Table 4. Posterior means and 90% error bands of the variance decomposition of
the forecast errors
Note: In percentage. The horizon
the shocks. 90% error bands are [5% quantile and 95%
 is
 3 years after

2

quantile]. For the DSGEVAR, ,


= 0.7, 5 .

33

(a) DSGE
Up-life
Log real exchange rate

Log nominal exchange rate

Half-life
Log real exchange rate

mean
median
90% error band
mean
median
90% error band

7.0
7.0
[5.0, 10.0]
0.0
0.0
[0.0, 0.0]

4.8
5.0
[3.0, 7.0]
0.0
0.0
[0.0, 0.0]

0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]

0.0
0.0
[0.0, 0.0]
0.0
0.0
[0.0, 0.0]

mean
median
90% error band

12.8
12.0
[9.0, 17.0]

10.5
10.0
[8.0, 13.0]

3.1
3.0
[3.0, 4.0]

3.1
3.0
[3.0, 4.0]

(b) DSGE-VAR
Up-life
Log real exchange rate

Log nominal exchange rate

Half-life
Log real exchange rate

mean
median
90% error band
mean
median
90% error band

8.6
3.0
[0.0, 39.0]
8.8
3.0
[0.0, 39.0]

13.0
8.0
[0.0, 39.0]
13.3
8.0
[0.0, 39.0]

13.6
11.0
[0.0, 39.0]
16.3
13.0
[0.0, 39.0]

19.6
19.0
[0.0, 39.0]
20.9
20.0
[0.0, 39.0]

mean
median
90% error band

14.6
10.0
[0.0, 39.0]

18.2
16.0
[0.0, 39.0]

19.6
18.0
[0.0, 39.0]

24.5
26.0
[0.0, 39.0]

Table 5. Posterior mean, median and 90% error band of up-lives and half-lives
Note: Numbers are reported in quarters. Following Steinsson (2008), the up-life is defined as the time it
takes for the exchange rate to peak after the impact. The half-life is defined as the time it takes for the
exchange rate to fall below half of the initial response. 90% error bands are [5% quantile and 95% quantile].


For the DSGEVAR, ,
= 0.7, 52 .

34

870

880

890

900

= 52

910

= 72

920

log marginal likelihoods

= 32

940

930

= 92

DSGE: 1001.956

0.1

0.2

0.7

1.5

, scale= (1 + )

Figure 1. Log marginal likelihood estimates


Note: A dot represents the log marginal likelihood estimate of a model Ai (, ). The x-axis is scaled as
/(1 + ). The optimal (, ) is (0.7, 52 ). Assuming a flat prior over the models including the DSGE model,
the posterior probabilities, p (Ai |Y ) are estimated as

= 52
= 72
= 92

0.4
0.001
0.000
0.000

0.5
0.020
0.011
0.001

0.6
0.094
0.064
0.009

0.7
0.131
0.124
0.013

0.8
0.130
0.129
0.017

0.9
0.061
0.067
0.011

1.0
0.030
0.036
0.006

All the other models have a posterior probability lower than 0.0005.

35

1.1
0.012
0.013
0.002

1.2
0.005
0.006
0.001

1.3
0.002
0.002
0.000

1.4
0.001
0.001
0.000

0.2

0.0

0.2

10
8

2
0.3

0.1

0.1

1
0

10

15

10 20 30 40 50

(a) A positive technology shock in the US

2.5

1.5

0.2

2.0

1.5

DSGE
DSGEVAR

0.6

0.4

0.2

0.0

0.0

0 5

0.5

15

1.0

25

8 10
6

1.0 0.6 0.2

y
35

0.5

0.3

0.1

0.1

0.1

0.0

0.1

0.2

0.20

0.05

0.10

20
15
10
5
0

0 5

20

15

60

25

0 1 2 3 4 5 6 7

100

(b) An expansionary (negative) monetary policy shock in the US

0.2

0.6

0.4

DSGE
DSGEVAR

0.7

0.5
r

0.3

20

40

0.2

12
0 2 4 6 8

0.0

60

0.2

0.15

0.00

0.10

Figure 2. Marginal posterior density estimates of the


responses
 initial



Note: The estimates for DSGEVAR are from the model with optimal ,
= 0.7, 52 . The x-axis is in
percentage (t , t , rt , rt : annualized). The y-axis is a density. The gray vertical lines are at 0.

36

0 2

0 2

0 2

0 2

0 2
3

0 2

0 2

1990
2005
DSGE 1995 2000
DSGEVAR
Time

0 2

3
1985

0 2

0 2

0 2

3 2 1

0
1
DSGE

DSGEVAR
3

0 2

0 2

(b)

0 2

(a)

Figure 3. Posterior means of the structural shocks


Note: Panel (a) displays the posterior means of the structural shocks at each point in time. The red solid
lines represent the estimated series of the DSGE model and the blue dashed lines represent the estimated
series of the DSGE-VAR model. The posterior distribution of the DSGE structural shocks is constructed
using the Kalman smoother. The posterior distribution of the DSGE-VAR structural shocks is constructed
by identifying reduced-form residuals by 0 . Panel (b) presents scatter plots of the same series. The xcoordinates are from the DSGE model and the y-coordinates
DSGE-VAR model. The gray,

are from the


diagonal lines are a 45-degree line. For the DSGE-VAR, ,
= 0.7, 52 .
37

10

20

30

40

2.0

DSGE
DSGEVAR

1.0

0.5

0.0

0.5

log(P*/P)

1.0

1.5

2.0
1.5
1.0
0.5
0.0
0.5
1.0

1.0

0.5

0.0

0.5

1.0

1.5

2.0

(a) A positive technology shock in the US

10

20

30

40

10

20

30

40

30

40

10

20

30

40

1.5

DSGE
DSGEVAR

0.5

0.0

0.5

log(P*/P)

1.0

1.5
1.0
0.5

0.0

0.5

0.5
0.5

0.0

1.0

1.5

(b) An expansionary (negative) monetary policy shock in the US

10

20

30

40

10

20

Figure 4. Impulse response functions of exchange rates and price ratios


Note: The thick and thin lines represent the pointwise means and
respectively. The error
 68%
 error bands,


bands are [16% quantile, 84% quantile]. For the DSGEVAR, ,
= 0.7, 52 . The y-axis is percentage
deviations from the deterministic steady state values.

38

10

20

30

40

DSGE
DSGEVAR

Excess Return

3
2
1
0
1
2

r r

(a) A positive technology shock in the US

10

20

30

40

10

20

30

40

30

40

(b) An expansionary (negative) monetary policy shock in the US

10

20

30

40

3
3

Excess Return

3
2
1
0
1
2
3

r r

DSGE
DSGEVAR

10

20

30

40

10

20

Figure 5. Impulse response of nominal exchange rates, interest rate differentials,


and expected excess returns
Note: The thick and thin lines represent the pointwise means and
respectively. The error
 68%
 error bands,

2

bands are [16% quantile, 84% quantile]. For the DSGEVAR, ,


= 0.7, 5 . The y-axis is percentage
deviations from the deterministic steady state values. The interest rate differential and the expected excess
returns are in annualized percentage. The expected excess return between period 1 and period k is computed
Pk1
as pk = j=1 rj rj + 400 [
sj+1 sj ] .
39

(a) A positive technology shock in the US

10

20

30

40

2.0
0.5

0.0

0.5

log(P*/P)

1.0

1.5

2.0
1.5
1.0
0.5
0.0
0.5

0.5

0.0

0.5

1.0

1.5

2.0

DSGEVAR

10

20

30

40

10

20

30

40

30

40

(b) An expansionary (negative) monetary policy shock in the US

1.5
log(P*/P)

1.0

1.5
1.0
0

10

20

30

40

0.5
0.0

0.5
0.0

0.0

0.5

1.0

1.5

DSGEVAR

10

20

30

40

10

20

Figure 6. Impulse response functions of exchange rates and price ratios with a
general covariance structure
Note: The impulse response functions are computed from the DSGE-VAR model that allows a correlation of
the wedge in the real exchange rate equation zt with the demand shocks. The thick and thin lines represent
the pointwise means and
respectively. The error bands are [16% quantile, 84% quantile].
 68%
 error bands,

2

For the DSGE-VAR, ,


= 0.7, 5 . The y-axis is percentage deviations from the deterministic steady
state values.

40

aF

mF

mH

aH

gF

gH

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Time

Figure 7. Impulse response of the DSGE model and the DSGE-VAR model at the
posterior mode of the DSGE model
Note: The red solid lines represent the response of the DSGE model and the blue dashed lines represent the
response of the DSGE-VAR model. Each plot is the response of the variable on the left hand side in response
to the shock on the top. The impulse response of the DSGE-VAR model is computed as the DSGE model
prior is strictly imposed. The scale of the y-axis for output growth, inflation and interest rates are matched
between the US and the Euro area so that they are comparable.

41

References
An, S., and F. Schorfheide (2007): Bayesian Analysis of DSGE Models, Econometric Reviews, 26(2),
113172.
Backus, D., and G. Smith (1993): Consumption and real exchange rates in dynamic economies with
non-traded goods, Journal of International Economics, 35(3-4), 297316.
Benigno, G. (2004): Real exchange rate persistence and monetary policy rules, Journal of Monetary
Economics, 51(3), 473502.
Bergin, P., and R. Feenstra (2001): Pricing-to-market, staggered contracts, and real exchange rate
persistence, Journal of International Economics, 54(2), 333359.
Bouakez, H. (2005): Nominal rigidity, desired markup variations, and real exchange rate persistence,
Journal of International Economics, 66(1), 4974.
Burstein, A., M. Eichenbaum, and S. Rebelo (2005): Large Devaluations and the Real Exchange
Rate, Journal of Political Economy, 113(4), 742784.
Calvo, G. (1983): Staggered prices in a utility-maximizing framework, Journal of Monetary Economics,
12(3), 383398.
Canova, F., and L. Sala (2009): Back to square one: Identification issues in DSGE models, Journal of
Monetary Economics, 56(4), 431449.
Chari, V. V., P. J. Kehoe, and E. R. Mcgrattan (2002): Can Sticky Price Models Generate Volatile
and Persistent Real Exchange Rates?, Review of Economic Studies, 69(3), 533563.
Del Negro, M., and F. Schorfheide (2004): Priors from General Equilibrium Models for VARS,
International Economic Review, 45(2), 643673.
(2006): How good is what youve got? DSGE-VAR as a toolkit for evaluating DSGE models,
Economic Review, 91(2).
Del Negro, M., F. Schorfheide, F. Smets, and R. Wouters (2007): On the Fit of New Keynesian
Models, Journal of Business & Economic Statistics, 25(2), 123143.
Demmel, J., and P. Koev (2006): Accurate and efficient evaluation of Schur and Jack functions,
Mathematics of Computation, 75, 223239.
Eichenbaum, M., and C. L. Evans (1995): Some Empirical Evidence on the Effects of Shocks to
Monetary Policy on Exchange Rates, The Quarterly Journal of Economics, 110(4), 9751009.
Engel, C. (1999): Accounting for U.S. Real Exchange Rate Changes, Journal of Political Economy,
107(3), 507538.
Fernndez-Villaverde, J., J. F. Rubio-Ramrez, T. J. Sargent, and M. W. Watson (2007):
ABCs (and Ds) of Understanding VARs, American Economic Review, 97(3), 10211026.
Geweke, J. (1999): Using Simulation Methods for Bayesian Econometric Models: Inference, Development,
and Communication, Econometric Reviews, 18(1), 173.
(2005): Contemporary Bayesian Econometrics and Statistics (Wiley Series in Probability and Statistics). Wiley-Interscience, 1 edn.
Hamilton, J. D., D. F. Waggoner, and T. Zha (2007): Normalization in Econometrics, Econometric
Reviews, 26(2), 221252.
Ingram, B. F., and C. H. Whiteman (1994): Supplanting the Minnesota prior Forecasting macroeconomic time series using real business cycle model priors, Journal of Monetary Economics, 34(3), 497510.
Iskrev, N. (2010): Local identification in DSGE models, Journal of Monetary Economics, 57(2), 189202.
42

Justiniano, A., G. E. Primiceri, and A. Tambalotti (2010): Investment shocks and business cycles,
Journal of Monetary Economics, 57(2), 132145.
King, R. G., C. I. Plosser, and S. T. Rebelo (1988): Production, growth and business cycles : II.
New directions, Journal of Monetary Economics, 21(2-3), 309341.
Koev, P., and A. Edelman (2006): The efficient evaluation of the hypergeometric function of a matrix
argument, Mathematics of Computation, 75, 833846.
Komunjer, I., and S. Ng (2010): Dynamic Identification of DSGE Models, .
Lubik, T., and F. Schorfheide (2006): A Bayesian Look at New Open Economy Macroeconomicschap. 5.
Magnus, J. R., and H. Neudecker (1999): Matrix Differential Calculus with Applications in Statistics
and Econometrics, 2nd Edition. John Wiley & Sons.
Muirhead, R. J. (1982): Aspects of multivariate statistical theory. Wiley.
Rabanal, P., and J. Rubio-Ramirez (2005): Comparing New Keynesian models of the business cycle:
A Bayesian approach, Journal of Monetary Economics, 52(6), 11511166.
Scholl, A., and H. Uhlig (2008): New evidence on the puzzles: Results from agnostic identification on
monetary policy and exchange rates, Journal of International Economics, 76(1), 113.
Sims, C., D. Waggoner, and T. Zha (2008): Methods for inference in large multiple-equation Markovswitching models, Journal of Econometrics, 146(2), 255274.
Sims, C. A. (2005): Dummy Observation Priors Revisited, .
(2007): Comment on On the Fit of New Keynesian Models by Del Negro, Schorfheide, Smets and
Wouters, Journal of Business and Economic Statistics, 25(2), 152154.
Sims, C. A., J. H. Stock, and M. W. Watson (1990): Inference in Linear Time Series Models with
some Unit Roots, Econometrica, 58(1).
Sims, C. A., and T. Zha (2006): Does monetary policy generate recessions?, Macroeconomic Dynamics,
10(02), 231272.
Smets, F., and R. Wouters (2003): An Estimated Dynamic Stochastic General Equilibrium Model of
the Euro Area, Journal of the European Economic Association, 1(5), 11231175.
Smets, F., and R. Wouters (2007): Shocks and frictions in us business cycles: A bayesian DSGE
approach, American Economic Review, 97(3), 586606.
Steinsson, J. (2008): The dynamic behavior of the real exchange rate in sticky price models, American
Economic Review, 98(1), 519533.
Taylor, J. B., and V. Wieland (2009): Surprising Comparative Properties of Monetary Models: Results
from a New Data Base, National Bureau of Economic Research Working Paper Series, pp. 14849+.
Tierney, L. (1994): Markov Chains for Exploring Posterior Distributions, The Annals of Statistics, 22(4),
17011728.
Yun, T. (1996): Nominal price rigidity, money supply endogeneity, and business cycles*1, Journal of
Monetary Economics, 37(2), 345370.

43

Appendix A. Derivations and Proofs


A.1. Prior distribution when a DSGE model has a deterministic time trend. Suppose
that a DSGE model of interest implies that some of the variables have a deterministic trend.
Following the DSGE model, I consider a reduced-form VAR model with a deterministic trend
conditional on 0 and ,
0
0
yt0 = 0 + 0 t + yt1
B1 + + ytp
Bp + u0t ,

for t = 1, , T where ut i.i.d. N (0, u ) and u = 00 0 . Following Sims, Stock, and Watson
(1990) I rewrite the model as
h

0
0
yt0 = 0 + 0 t + yt1
0 0 (t 1) B1 + + ytp
0 0 (t p) Bp + u0t ,

where

= 0 0 (B1 + 2B2 + + pBp ) [I B1 Bp ]1 ,




and

= 0 [I B1 Bp ]1 .
The demeaned and detrended series yt0 0 0 t is covariance stationary. That is, the roots of




I B1 z B2 z 2 Bp z p = 0

are greater than one in modulus. Let us define yt = yt0 0 0 t and rewrite the model as

0
yt0 = x0
t B + ut ,

0

0

0 , , y 0
and B = , , B10 , , Bp0 . Stacking the observations, I
where xt = 1, t, yt1
tp
obtain a matrix form representation
Y = X B + u,
where Y = (y1 , , yT )0 , X = (x1 , , xT )0 , and u = (u1 , , uT )0 .

0
Now suppose that we simulate data Y = yp+1 , , y0 , y1 , , yT of length T from the DSGE
accordingly. Then, the likelihood of the simulated data is
t , and X
model. Define yt , x



0 




1

X X
B BT
,
p Y |B , u exp tr u B BT
2

1 

= X
0 X

0 Y . Conditional on = 0 and the simulated data Y , this


where B
X
u

0 0

likelihood implies that a posterior distribution of


when the simulated data are observed with
and variance-covariance matrix

a flat prior for B is a normal distribution with mean B


u
T


0 X

1

:


, u X
0 X

B |Y , u N B
T

1 

As for the case with no deterministic time trend, I want to replace the simulated moments with
the population moments. I use the following rescaling matrix in Sims, Stock, and Watson (1990)13
1/2

T
0

T3/2

.
T1/2
T =

..

.
1/2

0
T
13The rescaling matrix is used for asymptotics as the sample size goes to infinity in Sims, Stock, and Watson (1990).
44

Then,
P
T2 Tt=1 t
0
2 PT

T
3
2

0
=
t=1 t T
t=1 t
T


0
P

T
1

0
0
T
2,t x
2,t
t=1 x

T
X
1 x
x
0 1
T

t t

t=1

1
2
1
3

0 T

0
0
1 PT

2,t x
0
t=1 x
2,t

0

0 , , y
0
where x
2,t = yt1
tp
. For simplicity, I approximate the sums of time indices. When T
is large enough, the approximation error is negligible. If I repeat the simulation of Y ofthe same

P
length, then the mean of T1 T x
x
0 in the lower right corner will converge to E x
x
0 .
t=1

2,t 2,t

2,t 2,t

Therefore, the variance covariance matrix

T
X

1 1
1
x
t x
0
= u 1

t
T
T
T
T

0 X

u X

1

t=1

is substituted for by


u T QT
where

1
2T
1 2
3T

1
1 T

QT = 2
0

1

 0
 .
E x
2,t x
0
2,t

For the mean, I consider the autoregressive coefficient matrix


and the coefficients on the constant
0

0
0
and time trend term separately. Let B2 = B1 , , Bp . Then,


B
2,T

T
1 X

= x
2,t x
0
2,t
T t=1

T
1X

x
2,t yt0 ,
T t=1

1
since with the demeaned
() = E x
and thus B
2,t x
0
E x
2,t yt0 can be used in place of B
2
2,t
2,T
and detrended yt the model is stationary. I let the coefficients on the constant and time trend
() = ( , ) of the DSGE model given . Then, the prior
term B1 = ( , )0 be equal to B
1
0

0
0
distribution of B = (B1 , B2 ) is

() , u T Q
B |0 N B
T


1 

0

() = B
()0 , B
()0 . The distribution of the reduced-form coefficients and can
where B
1
2
=
be obtained from the result. Let B

B10
B = P B,

45

Bp0

0

. Then,

where

P =

1 0 (0 + 0 )
0 1
( 0 )

0 0
.. ..
. .
I
0 0

It follows that

(0 + p 0 )

( 0 )

B = (I P ) B ,

() = P B
(),
and thus for B

i1 
h



(), u T P 0 1 Q P 1
B |0 N B
,
T
or

() , u T
B| N B

P0

1

QT P 1

i1 

A.2. Probability density of the impact matrix in the DS identification method. The
density of the Inverted-Wishart prior for a covariance matrix u is proportional to


+n+1

1 h 2 1 i
1 2
(29)
p (u ) u
,
exp tr S u
2
where > n 1 and n is the dimension of u . S 2 is positive definite symmetric. The posterior
density in DS is of the same form. Note that the Jacobian of the transformation from u to its
Cholesky factor L is
n
n+1 Y
i
2
lii
,

2n LL0

i=1

where lii is the i-th diagonal element of L. This leads us to



n
i Y
0
1 h 2
i
0 1


2
.
exp tr S LL
lii
p (L) LL

i=1

This kernel is not invariant to the reordering of the variables because of the power i. Since DS
1

let 1
0 = L () with () fixed, the Jacobian of transforming L into 0 is 1. Therefore, the
kernel (29) for u results in the following density kernel for 1
0
p

1
0

(
" 
#) n



 1
Y
1  1 0 2
1
1
1 0
2


0 0 exp tr S 0 0
2

i=1
1

where 0ik is the ik-th entry of 1


0 , ()ki is the ki-th entry of ()

!i
X

0ik ()ki

, and

0ik ()ki = lii .

A.3. Proof of Proposition 1. This is an application of Proposition 4 in Appendix B.

A.4. Proof of Proposition 2. By the Bayes rule,


p (Y |) p (B, 0 |Y, ) = p (Y |B, 0 , ) p (B|0 , ) p (0 |) .
46

Note that
nT
2

p (Y |B, 0 , ) = (2)


h
i
0 T

0 2 exp 1 tr 0 0 (Y XB)0 (Y XB) ,
0
0



 n
2

T xx , T

 

0 

0
e
e

B B ()
,
T xx , T
0 B B ()

0 k
0 2
p (B|0 , ) = (2)
0


nk
2

1
exp tr
2
2

n2

p (0 |) = (2)


 
n
0 

1
2 2
2

0 0 ()
,
S0 exp tr S0 0 0 ()

and,
p (B, 0 |Y, ) = (2)

n



2

T xx , T + X 0 X

0 





be
be
,
0 0 B B
Txx , T + X 0 X B B
()
()

0 k
0 2
0



nk
2

1
exp tr
2


T
b2
be2 0 
0 () 0 0 2 exp 1 tr S
Cn1 T + n, Se , S02 ,
0 0
0
2
 

0 

1
0 ()
0 ()
0
.
exp tr S02 0
2




Therefore,
p (Y |) =

p (Y |B, 0 , ) p (B|0 , ) p (0 |)
p (B, 0 |Y, )


 n
2
n

2
n
T xx , T (2) 2 S02 2

.
n



be2 2
2 1

0

T + n; S ; S0 ; 0 ()
T xx , T + X X Cn

(2)

nT
2

A.5. Proof of Proposition 3. From an identity


p (B | Y, 0 , ) p (Y | 0 , ) = p (Y | B, 0 , ) p (B | 0 , ) ,
Result (1) follows since
p (Y | 0 , ) =

p (Y | B, 0 , ) p (B | 0 , )
p (B | Y, 0 , )
(2)

nT
2


 2



 n
T
b

2
T xx , T |00 0 | 2 exp 12 tr Se (00 0 )
.

n



2
T xx , T + X 0 X

Now fix . Then q (0 , | Y ) is maximized when





 2
 
0 

0 T

1
be
0
2

0 2 exp 1 tr S
0 0
exp tr S0 0 0 ()
0 0 ()
0
2
2
is maximized with respect to 0 . Result (2) and (3) immediately follow from Theorem 5 in Appendix
B.
47

Appendix B. Inference on a SVAR model with a prior on the contemporaneous


coefficient matrix
We have a SVAR model for yt
(L)yt = t ,

(30)

where (L) is a p-th order matrix polynomial in the lag operator L, 0 + 1 L + + p Lp ; t is


an n 1 vector of VAR model structural shocks and t N (0, In ). A constant term has been
suppressed for notational convenience. We assume that 0 is non-singular. The assumption on
the structural shocks that they have the non-degenerate covariance matrix conditional on the past
observations actually imply that 0 is non-singular. We also assume that the initial observations
Y0 = {y0 , . . . , yp+1 } are given. For notational simplicity, we suppress the initial observations
in the condition of probability density functions in the following. The SVAR model (30) can be
transformed into a reduced form by multiplying through on the left by 1
0
B(L)yt = ut ,

(31)

1
1
p
where B(L)
= In +
0 (1 L + + p L ); ut = 0 t is an n1 vector of reduced-form shocks and

1
1
ut N 0, (00 0 )
. Let Bs = 0 s for s = 1, , p and u = (00 0 )1 . The reduced-form
model (31) can be written compactly in matrix form

Y = XB + u,

(32)

where Y = (y1 , , yT )0 ; a typical row of X is x0t =

0
0 , , y0
yt1
tp ; u = (u1 , , uT ) ; and

B = (B1 , , Bp )0 . Define k = np.


Instead of dealing with the lagged coefficient matrices of the SVAR model (30), we condition on
0 and do inference with respect to the reduced-form coefficient matrix B . Assuming a prior for
0 and a natural conjugate prior for B , the following proposition derives the posterior distribution
for 0 and B. The function Cn is defined in the Appendix.
Proposition 4. For the model (32), assume the following prior


p (0 ) = Cn1 , Se2 , S 2 , 0

0
i
i
1 h
1 h
exp tr Se2 00 0
,
exp tr S 2 0 0 0 0
2
2


00 0

B|0 N B, 00 0

1

H 1 .

where > n and Se2 and S 2 are positive definite symmetric. Then the posterior distribution of 0
and B is
b2

p (0 |Y ) = Cn1 T + , Se2 + Se , S 2 , 0





0 T +n

1
be2
2
0
e


2
0 0
exp tr S + S
0 0
2


0
i
1 h 2

exp tr S
2

0 0

0 0

and


B|, Y N B, 00 0

1

X 0X + H


where
B =

 0
1  0

X X +H
X Y + HB ,
48

1 

b2

Se

0

= Y 0 Y + B 0 HB B X 0 X + H B.


When = n and Se2 = 0, that is, the prior for 0 is a normal distribution, the posterior distribution
for 0 and B is

be2 2
p (0 |Y ) =
T + n, S , S , 0



 2

0 T

0
i
1
1 h 2
be
0


2
exp tr S 0 0 0 0
0 0 exp tr S 0 0
,

Cn1

and


B|, Y N B, 00 0

1

X 0X + H


1 

Proof. The joint prior density is proportional to



h
i
0 k

0 2 exp 1 tr 0 0 (B B)0 H (B B)
0
0
2




0 n
0
i
1 h e2 0 i
1 h 2


2
0 0
exp tr S 0 0
,
exp tr S 0 0 0
2
2
and a kernel of the pdf of the data conditional on the initial observations Y0 is

i
h
0 T

0 2 exp 1 tr 0 0 (Y XB)0 (Y XB) .
0
0
2
Then, the joint posterior density for B and 0 is proportional to


0

0 T ++kn


1
0
0
0
2
tr

X
X
+
H
exp

B
B

B
0
0 0
2






0
i

1 h
1
b2
00 0
exp tr S 2 0 0 0 0
,
exp tr Se2 + Se
2
2
where
 0
1  0

X X +H
X Y + HB ,

0 0
0
0

B =
b
S

= Y Y + B HB B X X + H B.

Conditional on 0 and the data, the posterior distribution for B is straightforward. Theorem 10
gives us the marginal posterior density for 0 .
The result of the case where = n and Se2 = 0 is straightforward.

The normal prior for 0 in (7) is a special case of the prior in Proposition 4, with = n and
Se2 = 0. If we keep the kernel of the prior for 0 of the DS procedure as a probability density
kernel for 0 and modify it by multiplying a kernel of the normal prior, we will get the prior in
Proposition 4. The prior for 0 turns out to be conjugate. When we impose a flat prior on B, the
result holds with H = 0.
The posterior density function has a unique global maximum. Let us define the domain of the
prior and posterior of 0 , G = {0 Rnn : 0 is non-singular}.
Theorem 5. Suppose that > n; Se2 and S 2 are positive definite symmetric; and 0 is non-singular.
The function p : G R+ ,


p (0 ) = Cn1 , Se2 , S 2 , 0


49





0 n
0
i
1 h e2 0 i
1 h 2


2
0 0
exp tr S 0 0
exp tr S 0 0 0 0

has a unique global maximum at




1
1
,
=
In + M 0 S 2 Se2 + S 2
2
where M is a unique symmetric square root of
1 

1

1
Se2 + S 2 0 S 2
.
In + ( n) S 2 0 0
4
The Hessian of log p (0 ) which is given by


+
0

( n) P

(33)



0
1
0

1
0





2
2
e
+ S + S In ,


where P is the commutation matrix such that P Z = Z 0 for a conformable matrix Z, is negative
semidefinite at 0 = +
0.
Proof. The first differential of the log density is
o 1 h
i
n n 0 1
d log p (0 ) =
d 00 0 tr Se2 d 00 0
tr 0 0
2
2
0
i
1 h 2
tr S d 0 0 0 0
2
o 1 h
i 1 h
i
0
n n 0 1 0
20 d0 tr Se2 200 d0 tr S 2 2 0 0 d0
=
tr 0 0
2
2 i
2
n
o
h
i
h
0
1
2
2
0
= ( n)tr 0 d0 tr Se 0 d0 tr S 0 0 d0
= tr

nh

2
e2 0
( n)1
0 S 0 S 0 0

0 i

d0 .

It follows that the first order condition to maximize log p (0 ) is


0

( n)01 Se2 00 S 2 0 0 = 0
which is equivalent to
(34)

Se2 + S 2

00 0 S 2 0 0 0 ( n)In = 0


since 0 is non-singular. Since Se2 and S 2 are positive definite symmetric and 0 is non-singular,
there exists a non-singular n n matrix X for any non-singular 0 such that


0 = X 0 S 2



Se2 + S 2

1

Plugging this into the first order condition (34) and rearranging the terms, we obtain
(35)

X 0 X X = ( n) S 2 0 0

1 

Se2 + S 2



0 S 2

1

Note that X is symmetric, since X 0 X and the right hand side are symmetric. Let M X 12 In .
Then, M is also symmetric and X 0 X X = M 2 41 In . Therefore,

1 

1
1
M 2 = In + ( n) S 2 0 0
Se2 + S 2 0 S 2
.
4
Let Q denote the right hand side. Since Q is positive definite symmetric, there exists a unique S and
such that SS 0 = Q where S is unitary, S 0 S = In , and is a diagonal matrix with eigenvalues of
Q on the diagonal. Then M is uniquely determined up to the sign change as S1/2 S 0 . This implies
50

that there exist two Xs satisfying (35) and therefore two 0 s satisfying the first order condition
(34). We let M = S1/2 S 0 and X = 21 In M .
We have found two candidates for a local maximum



1
1
2
2
2
e
+
=
S
,
I
+
M
S
+
S

n
0
0
2



1
1
2
2
2
e

S
+
S
.
=
I

S
n
0
0
2


+
> log p
We prove that log p +
0
0 . The first order condition (34) holds at 0 and 0 and

thus for 0 = +
0 or 0 = 0 ,

log p(0 ) = log Cn1 , Se2 , S 2 , 0


i
1 h 
+( n) log |0 | tr S 2 0 0 0 + ( n)In + S 2 0 0 0 ,
2
where || means the absolute value of the determinant of a matrix. Therefore,

i 1 h
h

i
+

+
2
0
)
=
(

n)
log

tr
S

.
log p(+
)

log
p(

log

+




0
0
0
0
0
0
0
2
Note that












M + 1 In log M 1 In
log +

log

=
log



0
0



2
2

n
X

log

i=1

1/2

1/2

1
2
,
1
2

where i is an eigenvalue of Q or the square of an eigenvalue of M . All i s are greater than 1/4.



1

Se2 + S 2

+
2
It follows that log +
0 log 0 > 0. Since 0 0 = 2M 0 S

tr

h

S 2 0 0



+
0 0

i

= 2tr



S 2 0 0 M 0 S 2



Se2 + S 2

1

> 0.

Consequently,

log p(+
0 ) log p(0 ) > 0.
Now we are going to show that log p (0 ) has a global maximum
and the first differential
 

+
d log p (0 ) = 0 at a global maximum. Since log p 0 > log p 0 , it will follow that log p (0 )

has a unique global maximum at +


0 . We proceed by splitting G into a compact set and a set which
is a union of the complement of the compact set and some of the boundary of the compact set.
Note that

1 h
i
n
log p (0 )
log 00 0 tr Se2 00 0 .
2
2
Since Se2 and (00 0 ) are positive definite symmetric, they are decomposed as
Se2 = U S U 0 ,
00 0 = V V 0 ,
where U and V are unitary and S and are a diagonal matrix with the eigenvalues of Se2 and
(00 0 ) on the diagonal, respectively. Let {s1 , , sn } and {g1 , , gn } be the eigenvalues of Se2
and (00 0 ), respectively. Then si > 0 and gi > 0 for i = 1, . . . , n. Let {y1 , , yn } be the diagonal

51

elements of U 0 (00 0 ) U . Then, yi > 0 for i = 1, . . . , n and


n
X

yi = tr U 0 00 0 U = tr 00 0




i=1

n
X

gi ,

i=1

which is followed by
i


 

tr Se2 00 0 = tr S U 0 00 0 U

n
X

si yi

i=1

min (si )
= min (si )

n
X
i=1
n
X

yi
gi .

i=1

Therefore,
log p (0 )

n
n
X
nX
1
log (gi ) min (si )
gi .
2 i=1
2
i=1

0
Let us define the right-hand side as h (0) with
 the eigenvalues {g1 , , gn } of (0 0 ).
Now choose  R such that  < log p +
and define Gc = {0 G : h(0 ) } and G = the
0

closure of G\Gc . Since k0 k = (tr00 0 )1/2 = ( ni=1 gi )1/2 , for k0 k = K,


  1
n
h (0 ) <
n log K 2 min (si ) K 2
2
2
and we can choose K so that h (0 ) < . Therefore, every 0 G is bounded and G is compact. It
is obvious that +
0 G.
By the Weierstrass theorem, log p (0 ) which is continuous on the compact set G has a maximum
on G. A maximum of log p (0 ) on G is at an interior point of G, since log p (0 ) is continuous on
G = Gc G and thus for 0 at the boundary of G
P

log p (0 )  < log p +


0 M
where M is the maximum value of log p (0 ) on G. With the fact that log p (0 ) is twice differentiable
on G, we conclude that d log p (0 ) = 0 and d2 log p (0 ) 0 at a maximum. Since we have only

+
two points +
and +
0 and 0 on G satisfying the first order condition, log p(0 ) > log p(0 ), 
0 G,
+
+
we conclude that log p (0 ) attains a unique global maximum at 0 on G. log p 0 is a global


maximum on G, since 0 Gc , log p (0 )  < log p +


0 . In summary, log p (0 ) has a unique


+
2
global maximum at +
0 on G and d log p 0 0.
The second differential of log p (0 ) is
2

d log p(0 ) = tr ( n)

2
1
d
0
0



2
2
0
e
+ S + S d0 d0


and the Hessian (33) of log p(0) isby Magnus and Neudecker (1999). The Hessian is negative
+
2
semidefinite at +
0 since d log p 0 0.


In a univariate case, both +


0 and 0 are a local maximum.
52

Appendix C. Some results on multivariate statistical analysis


This section defines some matrix argument functions and prove important lemmas and theorems.
Most of the definitions follow Muirhead (1982).
The non-centrality bias in a prior or posterior density for a precision matrix (the inverse of
a covariance matrix) which results from assuming a nonzero mean for the inverse of an impact
matrix (a contemporaneous coefficient matrix) in a linear system is represented by hypergeometric
functions. We first define the hypergeometric function of scalar argument.
Definition 6. The generalized hypergeometric function of scalar argument is defined as
p Fq (a1 , . . . , ap ; b1 , . . . , bq ; z)

X
(a1 )k (ap )k z k
k=0

(b1 )k (bq )k k!

where (a)k = a(a + 1) (a + k 1).


The sum of the series converges for all finite (complex) z if p q. We next define the hypergeometric function of matrix argument which is a generalization of the scalar argument case.
Definition 7. The generalized hypergeometric function of matrix argument is defined as, for a
symmetric m m matrix Z,
p Fq

(a1 , . . . , ap ; b1 , . . . , bq ; Z)

X
X
(a1 ) (ap ) C (Z)
k=0

(b1 ) (bq )

k!

where denotes summation over all partitions of k such that a partition = (k1 , k2 , . . . , km ) with
integers k1 k2 km 0 and || = k1 + k2 + + km = k; C (Z) is the zonal polynomial;
and

m 
Y
1
(a)
a (i 1)
2
ki
i=1
is the generalized hypergeometric coefficient where (a)k = a(a + 1) (a + k 1).
The sum of the series converges for all symmetric Z if p q. When m = 1, the function reduces
to the hypergeometric function of scalar argument. The hypergeometric function defined here is a
special case of the hypergeometric function with the Jack function and the generalized Pochhammer
symbol. See Demmel and Koev (2006). Koev and Edelman (2006) propose an algorithm and provide
a c code to compute the hypergeometric function of matrix argument which they claim is the most
efficient among algorithms developed until then.
We state (quote) a theorem regarding the expectation of the zonal polynomial with a Wishart
random variable. It is the Corollary 7.2.8 of Muirhead (1982).
Theorem 8. [Corollary 7.2.8. of Muirhead (1982)] If A is Wn (m, ) with m > n 1 and B is an
arbitrary symmetric n n (fixed) matrix, then


1
E [C (AB)] = 2k
m C (B)
2

where is a partition of an integer k.


Proof. See the proof of Muirhead (1982).

The following lemma is a special case of Theorem 7.3.4. of Muirhead (1982).
53

Lemma 9. Suppose that B is an n n symmetric matrix and is an n n positive definite


symmetric matrix. Also, suppose that for > n 1, the integral


Z
n1
1  1 
F
(b;
BA)
exp

tr

A
|A| 2 (dA)
0 1
2
A>0
is finite. Then, it is


1
;
b;
2B
2n/2 n (/2) ||/2 .
F
1 1
2
n

o

Proof. Notice that exp 12 tr 1 A |A|


Wn (, ). It follows that by Theorem 8

n1
2

for A > 0 is a kernel of the central Wishart


n1
1 
C (BA) exp tr 1 A |A| 2 (dA)
2
A>0
 
1
= 2k
C (B) 2n/2 n (/2) ||/2
2


where 2n/2 n (/2) ||/2 is the corresponding normalizing constant for the Wishart distribution.
The zonal polynomial C () depends only on the eigenvalues of its matrix argument. Since A is
non-singular, BA and AB have the same eigenvalues and we can apply Theorem 8 here. Therefore,


Z
n1
1  1 
F
(b;
BA)
exp

tr

A
|A| 2 (dA)
0 1
2
A>0


Z
X
X
n1
C (BA)
1  1 
=
exp tr A |A| 2 (dA)
(b) k!
2
A>0 k=0
=
=

X
X
k=0
X
X
k=0

X
X

1
(b) k!


n1
1 
C (BA) exp tr 1 A |A| 2 (dA)
2
A>0

1
2k
(b) k!


C (B) 2n/2 n (/2) ||/2

1
2 C (2B)

(b) k!

k=0

2n/2 n (/2) ||/2

1
; b; 2B 2n/2 n (/2) ||/2 .
2
In the fifth equation, we used the fact that the zonal polynomial with a partition of k is homogeneous of degree k: 2k C (B) = C (2B).



= 1 F1

The next theorem computes the integral of a kernel whose normalizing constant is of interest
later.
Theorem 10. Let
n

q () = 0

i
i
1 h
1 h
exp tr S 2 ( )0 ( ) ,
exp tr S 2 0
2
2


where > n 1 and S 2 and S 2 are positive definite symmetric. Then,




Z

i 2
n
n2 n (/2)
1 h
2
q () (d) = 2 2 2
exp tr S 2 0
S + S 2
n (n/2)
2
54

i1
n 1 2 0  2h 2
.
1 F1
; ; S S S + S2
2 2 2
where the integral is over the set of non-singular .


Proof. Note that


i
1 h
exp tr S 2 ( )0 ( ) 1.
2
We make the change of variables. Put = HT , where H is n n, with H 0 H = In and T being
upper-triangular. Then,
1



(d) = 2n 0 2 d 0 H 0 dH
and



n1
i


1 h
0 q () (d) 2n 0 2 exp tr S 2 0
d 0 H 0 dH .
2
0
0
The double integral over (d ( )) and (H dH) on the right hand side is finite (to be explained in
detail). Therefore,
Z


|q ()| (d) < .

Now note that after the change of variables,


q () (d) =



io
h

n h
0 n1
i
2 exp 1 tr S 2 + S 2 0
exp tr S 2 0 HT
2




1 h 2 0 i
0
0
n
d H dH 2 exp tr S
.

2
We integrate the last terms with respect to H Vn,n , the Stiefel manifold of n n matrices with
orthonormal columns. Then, from Theorem 7.4.1. of Muirhead (1982),
2

exp tr S 2 0 HT

io

H 0 dH

HVn,n

2n n /2
1 1 2 0  2
n; S S A ,
0 F1
n (n/2)
2 4


where A = T 0 T = 0 . The constant term


2n n /2 /n (n/2) arises due to the normalization on the
0  2
1 2
Stiefel manifold, Vn,n . Since 4 S S is symmetric, by Lemma 9,
Z

q () (d)
2

i
n /2
1 h
=
exp tr S 2 0
n (n/2)
2



Z
 i
n1
1 1 2 0  2
1 h 2
2
2

F
n;
S

S
A
|A|
exp

tr
S
+
S
A (dA)
0 1
2 4
2
A>0
2

i
n /2
1 h
=
exp tr S 2 0
n (n/2)
2
 

/2
i1 

n 1 2 0  2h 2
2

2
2n/2 n
F
;
;
S

S
S
+
S
,
S + S 2
1 1
2
2 2 2
which completes the proof.



55

We define a function
i
n (/2)
1 h
(36)
Cn , S , S , 2
exp tr S 2 0
n (n/2)
2



h
i1 

n
1
2
2 2
2
0  2
2
2
S + S 1 F1
.
; ; S S S +S
2 2 2
Notice that when n = 1,


C1 , s , s , =

n
2

2
2
s + s2

n2
2

exp s2 2
2
2
 

56


1 F1

1 1 s2
s2 2 .
; ;
2 2 2 s2 + s2

You might also like