You are on page 1of 27

International Journal of Forecasting 25 (2009) 602–628

www.elsevier.com/locate/ijforecast

Multi-step forecasting in emerging economies: An investigation of


the South African GDP
Guillaume Chevillon ∗
ESSEC Business School, Paris, France
CREST–INSEE, France

Abstract

To forecast at several, say h, periods into the future, a modeller faces a choice between iterating one-step-ahead forecasts
(the IMS technique), or directly modeling the relationship between observations separated by an h-period interval and using
it for forecasting (DMS forecasting). It is known that structural breaks, unit-root non-stationarity and residual autocorrelation
may improve DMS accuracy in finite samples, all of which occur when modelling the South African GDP over the period
1965–2000. This paper analyzes the forecasting properties of 779 multivariate and univariate models that combine different
techniques of robust forecasting. We find strong evidence supporting the use of DMS and intercept correction, and attribute
their superior forecasting performance to their robustness in the presence of breaks.
c 2008 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.

Keywords: Multi-step forecasting; Intercept correction; Structural breaks

1. Introduction the context of exponential smoothing. Klein (1971)


then applied it to dynamic forecasting, and Johnston
When a forecaster uses a model with a given (1974) put a temporary end to the analysis by
periodicity, but wishes to forecast at several, say concluding that, when a quadratic loss function
h > 1, periods into the future, she is faced with is used as the criterion for both estimation and
a choice between iterating one-step-ahead forecasts forecast accuracy, the former constitutes a “reliable
(iterated multi-step, or IMS, forecasting), or directly indicator of prediction efficiency”, and hence direct
modelling the relationship between the end-of-sample methods are inefficient for forecasting. However, first
observation and its hth successor in order to forecast Findley (1983) and then Weiss (1991) re-examined the
the latter (direct multi-step, or DMS). This direct previous results in the context of misspecified models,
technique was first suggested by Cox (1961) in and found asymptotic relevance in matching criteria
of estimation and forecast efficiency, as with the direct
∗ Corresponding address: ESSEC Business School, Paris, France. multi-step method. The renewed interest these authors
E-mail address: chevillon@essec.fr. brought to the topic spurred new theoretical analyses

0169-2070/$ - see front matter c 2008 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.ijforecast.2008.12.004
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 603

by, inter alia, Clements and Hendry (1996) and Tiao for occasional location shifts that induce negative
and Xu (1993) for ARMA processes; Bhansali serial correlation – point towards deterministic shifts
(1996, 1997), Bhansali and Kokoszka (2002), and as a potential source of success of DMS in finite
Brodsky and Hurvich (1999) for long memory samples.
processes; Haywood and Tunnicliffe-Wilson (1997) The aims of this paper are threefold: first, we run
in the frequency domain; Schorfheide (2002) for a forecast competition over 779 different univariate
asymptotically vanishing misspecification; Findley, and multivariate methods, the latter having rarely been
Potscher, and Wei (2004) and Ing (2003, 2004) for used in conjunction with DMS (some exceptions exist,
very general settings; and Chevillon and Hendry such as Haug & Smith, 2007). Second, we specifically
(2005), Chevillon (2008), and Proietti (2008) for assess the DMS model, which was developed by
misspecified (AR)IMA processes; see Chevillon Aron and Muellbauer (2002) (AM henceforth) for
(2007) for a survey. Another stream of articles on forecasting the South African GDP. Unlike Marcellino
the relative merits of DMS versus IMS, proposes et al. (2006), who differenced all of the integrated
new techniques such as Partial Least-Squares (see Lin series to ensure that they were working with stationary
& Tsay, 2005), or design ex ante tests for potential variables, we focus on accuracy in forecasting the
improvements using DMS (see Clark & McCracken, level of the variable, since differencing corrects for
2005, Giacomini & White, 2006 and Haywood & some location shifts at the cost of higher variance—
Tunnicliffe-Wilson, 2004). in effect transforming step dummies into blips. Thus,
Despite the number of theoretical results in AM established a relationship which allowed them, in
favor of DMS, few empirical analyses exhibit spite of the many breaks experienced by the economy,
successful uses of direct multi-step methods, except, to forecast the annual change in the quarterly GDP
notably, Tiao and Tsay (1994) on the US monthly series over the last twenty years. Third, we establish
price index for food, Tsay (1993) on the quarterly which features improve the forecast accuracy of
US unemployment rate, and Liu (1996) on the the standard IMS and DMS methods (in level,
monthly unemployment rate in Taiwan. The results differences or equilibrium-correction form), with
from other authors were more mixed, such as Kang additional robustifying techniques such as intercept
(2003) and Lin and Tsay (1996), who found correction (see Clements & Hendry, 1999), double-
evidence both in favour and against; while the differencing, and the use of various indicator variables
major analysis of 171 US monthly macroeconomic and deterministic or stochastic trends.
time series performed by Marcellino, Stock and
We find that DMS and methods that are designed
Watson (2006) strongly favoured iterative forecasting
to be robust to breaks, such as intercept correction,
techniques. Also working on US series, Proietti (2008)
improve forecast accuracy in multivariate models.
showed a net improvement from using DMS for
There also appear to be very different patterns for
consumer inflation, but not for GDP. Other recent
multivariate and univariate techniques, and the former
partially successful uses of DMS techniques include,
prove to be more accurate in this experiment. Also,
inter alia, Eklund and Karlsson (2005), Jorda and
Marcellino (2007) and Schumacher and Breitung our results favour the use of fractional integration in
(2008). It should be noted that the authors who ARMA models.
found evidence against direct multi-step methods The plan of this paper is as follows. First, we review
mostly used post-war US data, although over extended the South African context and the model developed
periods. The United States has not undergone by Aron and Muellbauer (2002). In Section 3, we
massive shifts in this era, and by aggregation over derive alternative multivariate IMS and DMS models.
such a large economy, some stability is expected. We then proceed to a general comparison of the
However, several of the theoretical analyses – such forecasting techniques, analysing the methods that
as Peña (1994) for breaks, Bhansali (1997), whose perform best, with a special focus on their relationship
framework of long memory can be seen as relevant to breaks. A conclusion follows. Supplementary tables
to regularly occurring breaks in the light of Perron to those that appear in this paper are available from the
and Qu (2007), and Chevillon and Hendry (2005) IJF website. www.forecasters.org/ijf.
604 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

2. Forecasting the South African GDP Table 1


Monetary policy and exchange rate policy regimes (Aron &
Muellbauer, 2002).
2.1. Thirty years of breaks
Period Monetary policy regimes
South Africa has undergone a profound transition 1960–81 Liquid asset ratio-based system with quantitative
in the last thirty years, and hence, any model controls on interest rates and credit
of its economy is subject to frequently occurring 1981–85 Mixed system during transition
breaks. From 1976, with the government’s policy 1986–98 Cost of cash reserves-based system with
pre-announced M3 targets
of Apartheid, the country began suffering from 1998–99 Daily tenders of liquidity through repurchase
increasing international isolation, which culminated, transactions (repo system), plus pre-annouced M3
between late 1985 and the ‘free’ elections of 1994, in targets and targets for core inflation
a period of almost no access to international capital. 2000– Repo system with inflation targeting
These factors, when taken in combination with the Period Exchange rate policy regime
high degree of reliance of the economy on mineral
1961(1)–71(2) Pegged to fixed pound Sterling
exports, might explain some of the shocks and large
1971(3)–74(2) Pegged in episodes to floating US dollar/pound
variations observed in the economic variables (see Sterling
the articles by Aron and Muellbauer for an extended 1974(3)–75(2) ‘Controlled independent float’: devaluations every
analysis of the South African context). few weeks
Following Aron and Muellbauer, we can distin- 1975(3)–79(1) Fixed Regime: pegged to the US dollar
1979(2)–82(4) Dual foreign exchange system: controlled floating
guish three main monetary regimes since the 1960s.
commercial rand and floating financial rand
Until the late 1970s, there existed quantitative con- 1983(1)–85(3) Unification to a controlled floating rand
trols on interest rates and credit, and the main 1985(4)–95(1) Return to a dual system
criteria used for monetary policy were liquid asset ra- 1995(2)– Unification to a controlled floating rand
tios, while the corrective effect of interest rates was
largely neglected by the regulatory authorities. Finan-
cial liberalization and the transition towards a more was instated and announced in line with market forces.
flexible, cash-reserves based system took place during The latter became market determined in 1983 and the
the first half of the 1980s. From 1986 onwards, the dual rates were soon re-unified. A debt crisis and the
monetary authorities made use of the discount rate to collapse of the Rand provoked a return to the dual
influence the interbank overnight refinancing market currency system after 1985. In 1995, unification of
in order to achieve pre-declared monetary targets. The the dual currency was initiated under a managed rate,
credit growth which followed the financial liberaliza- which became fully floating at the introduction of
tion soon lessened the usefulness of monetary targets, inflation targeting.
thus leading, from 1998 onwards, to a new regime. We reproduce Table 1 of AM (Table 1 here),
The South African Reserve Bank (SARB) then where they present the various regimes experienced
offered some amount at the daily tender for repurchase by the South African economy. In order to measure
transactions, thus signalling its preferences on short- the degree of non-stability in the economy, we also
term interest rates via an auction mechanism. In early estimate a series that represents the break intensity. For
2000, inflation targeting was reinstated as part of the this, we focus on five key variables: the logarithm of
medium-term monetary objectives. real GDP, CPI inflation (measured as log-differences),
Following the development of monetary policy, the the short-term interest rate, and the logarithms of the
foreign exchange market experienced various regimes real exchange rate (with the US dollar) and of narrow
pointing toward greater flexibility. From US dollar money. For both the level and the first difference
or pound Sterling pegs, combined with restrictions of each of the variables, we estimate autoregressive
on resident and nonresident capital flows, the system processes of order 1, AR(1), with a constant, over
moved, in 1979, to a regime of dual currency. Most rolling windows of 30 observations (dated according
non-resident transactions operated at the floating to their endpoint). For each of these subsamples,
‘financial’ exchange rate, while a ‘commercial’ rate we use the technique developed by Bai and Perron
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 605

Fig. 1. Break intensity in the South African economy.

(2003)1 to estimate the number of breaks the AR(1) correction equation which depends on a set of
undergoes, assuming that breaks affect either the mean variables {X i }:
or the autoregressive coefficient. We therefore obtain k
!
ten series which, when added up, give the break
X
∆4 yt = α yt−4 − µt−4 − γi X i,t−4 − β
intensity of the South African economy. Fig. 1 records i=1
the resulting variable. k
X
In accordance with Table 1, we observe that the + δi Z i,t−5 + t−4 , (1)
several exchange rate regimes that were experienced in i=1
the 1970s, and in particular the recurring devaluations
where α < 0, t is white noise and µt is a
toward the middle of the decade, imply a non-
smooth stochastic trend which aims to capture the
negligible break intensity. The monetary transition of
underlying production capacity of the economy. This
the early 1980s results in a large break intensity (BI)
trend is defined, as in Harvey (1993) and Harvey
that specifically affects the interest rate. The more
and Jaeger (1993), by a smooth I(2) extracted from
autarkic 8–10 year period starting in 1986 was quieter,
yt using STAMP (see Koopman, Harvey, Doornik &
and BI is indeed temporarily lower until the beginning
Shephard, 2000). AM study several sets of variables
of the political transition, as new economic policies
for {X i }, and settle on a few. They find that their
were installed in the second half of the 1990s. This
equation, estimated using quarterly data over the
measure of break intensity will find its use when
period 1963(1)–2000(2), is stable over the various
we assess the relationship between the most accurate
regimes, using as regressors the four-quarter moving
forecasting techniques and breaks. average (RPRIMA) of the real prime interest rate
(RPRIME), and the fourth lag thereof, a twelve-
2.2. The AM model quarter moving average of the government surplus to
GDP ratio (RGSURMA12), and the ratio of current
Aron and Muellbauer developed a model for account surplus to current GDP (RCASUR). The non-
forecasting the annual change of the quarterly South integrated {Z i } series comprise the annual change
African real GDP (Y or y in log) via an equilibrium (∆4 RPRIME) in the real prime interest rate; the
long-term growth in terms of trade (∆12 tot); the first
difference of a financial liberalization indicator (a
1 We also use the Gauss code that they provide on their website. spline indicator variable FLIB); a monetary regime
606 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Fig. 2. Variables entering the AM equation; see Table 4 for definitions.

shift dummy (for the 1983(2)–85(4) period, denoted in AM, this is computed over the whole sample.
by N) interacting with both a four-quarter moving For ex ante forecasting we resort to estimating it
average and an annual difference of RPRIME; recursively. To stress the importance of this trend, we
and finally an indicator dummy variable for the also assess the forecasting models with the smooth
1991(3)–92(2) drought (DUM92). These variables are trend estimated over the whole sample, with a linear
presented in Fig. 2. The proximity of the I(2) trend to deterministic trend, and with no trend at all.
yt clearly plays a role in the forecasting accuracy of Thus, the AM technique and its variants are
Eq. (1): the correlation between these variables is very not seen as well-specified models of the process
large, at 0.994. generating the South African GDP, but rather as
In order to assess the forecasting power of this misspecified yet parsimonious DMS forecasting tools.
equation, we need to develop alternative techniques.
This is what we turn to in the next section. However,
it should be noted that several dummy variables 3. Competing forecasts
which enter the equation could only be defined
ex-post. Hence, for ex-ante forecast comparisons, The forecasting equation developed by Aron and
these should be omitted from the equation (this Muellbauer includes a few exogenous variables which
involves at least DUM92, and potentially N and FLIB, ought to be modelled for a proper IMS technique to
although the latter two could be kept, since they be used. Unfortunately, doing so might increase the
were ‘predictable’ from the government’s decisions). degree of misspecification, which would in turn be
Finally, there remains the issue of the I(2) trend: detrimental to our assessment of forecast accuracy.
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 607

Hence, we resort to three simpler multivariate models International Financial Statistics (IFS) database. In
which provide the possibility for both IMS and order to maintain a unique series for forecasting
DMS forecasting. The first two models define small across models, we used the real GDP series from AM
monetary systems for South Africa, while the third throughout.
uses the main variables of the AM equation. Following Jonsson (2001), we conduct a cointegra-
Another issue arises regarding the variable to be tion analysis of five variables2,3 : the log of real money
forecast: the AM technique provides forecasts of m k − cpi, for k ∈ {1, ∗}; the log of real exchange
∆4 yT +4 from an end of sample T . This annual rate r er ; the log of real GDP y; the nominal trea-
difference does not fit the definition of the IMS sury interest rate R; and inflation ∆cpi. Fig. 3 presents
forecast as it is usually referred to. Indeed, if a model graphs of real money, together with the inflation rate
in differences provides some ∆b xT +1 = b B∆xT , with and their differences; interest rates, the GDP and the
4 real exchange rate are recorded in Fig. 4. We see that
yt = Pxt , then the IMS forecast is ∆b xT +4 = bB ∆xT ,
and hence ∆4b
P4
xT +4 = i=1 b
i
B ∆xT . The IMS forecast annual changes are much more persistent than first dif-
is also nonlinear in the parameters when, as in AM, ferences, but that they also exhibit large fluctuation
is ∆b amplitudes. This may benefit DMS models.
the
 model  xT +1 = AxT , for which ∆4b xT +4 =
b
(b
A + I)4 − I xT . Thus, this choice of target seems We also notice the large contractions of real narrow
to benefit direct multi-step estimation. In order not to money in the late 1970s and mid 1980s during periods
favour DMS a priori, we assume that the forecasts to of higher inflation, and the continuous depreciation of
be evaluated concern the levels of the GDP and not the Rand with respect to the US dollar. Real M∗ also
the annual differences. Hence, from the AM equation contracted severely over the period.
∆4b yT +4 = b c1 yT + bCxT , we retrieve b yT +4 = (1 +
3.1.2. A VAR system
c1 )yT + b
b CxT .
In addition to the five economic variables, where
We also consider forecasting at various horizons
the analysis below is conducted using M1 (cases where
in order to mitigate the issue of levels versus
there are differences in using a model that contains M∗
differences, and potential differences in seasonal
instead of M1 are reported where they occur), we allow
adjustments (all of the variables are seasonally
for a constant and a trend to enter the cointegration
adjusted, but the changes in regime and breaks imply
space unrestrictedly. We restrict our attention to a
that seasonality may not be taken into account well).
VAR(2), as tests showed that lags beyond 2 are not
In the comparisons, we will see that whether h =
significant. The VAR in levels, estimated over the
2, 3, 4, 5 or 8 is important for the relative forecast
period 1965(1)–2000(2), fits the data reasonably well,
accuracy. Hence, we will estimate Eq. (1) with the
as is shown in Fig. 5. Indeed, except for inflation, the
left-hand side (LHS) variable defined as ∆h yt for the
vector of variables is well explained by its past, as can
different horizons h. We will also produce iterated
been seen from the figure.
multi-step forecasts by cumulating the one-step-ahead
A corresponding test summary is presented in
growth forecasts ∆b yt+1 . All of our methods use
Table 2, which records statistical information about
quarterly data and are estimated recursively over the
the VAR, namely the equation residual standard
period 1965(2)–2000(2), starting with an initializing
errors (bσ ); single-equation evaluation statistics for
sample of 30 observations.

3.1. A small monetary model 2 The variables modelled by Jonsson (2001) were the short and
long interest rates, real income, the exchange rate, broad money
3.1.1. Data description and prices. Broad money may seem an interesting variable to
use in the presence of a financial liberalization; however, we
The variables which we include in our model are were unfortunately unable to work with it, for reasons of vintage
the M1 narrow money aggregate (denoted by M1 ) or comparatibility. We use quasi-money instead, noting that our aim
quasi-money (denoted by M∗ ), the consumer price is not so much to model the economy as to obtain an operational
model which we can easily modify to see which features matter for
index (CPI), the 3-month treasury bill interest rate forecast accuracy.
(per annum, R), and the South African Rand/US 3 Computations and tests were conducted using PcGive and
dollar exchange rate, which were obtained from the OxMetrics.
608 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Fig. 3. Quarterly South African monetary series and CPI inflation; see Table 4 for definitions.

Table 2
Specification statistics for the VARSAM1 model with a linear trend estimated as a VAR(2).

Statistic R ∆cpi (m − cpi) y r er VAR


σ (%)
b 8.03 0.86 4.84 1.11 6.29
Far (5, 127) 0.517 1.22 2.03* 0.108 1.41 –
Farch (4, 124) 2.39 0.684 0.715 0.740 0.625 –
Fhet (22, 109) 1.80 0.773 0.931 1.25 1.09 –
χnd
2 (2) 40.6** 10.8** 0.805 7.48* 10.1** –
v (125, 511)
Far – – – – – 1.06
v
Fhet (330, 1228) – – – – – 1.06
χnd
2v (10) – – – – – 66.1**
∗ Denotes significance at the 5% level.
∗∗ Denotes significance at the 1% level.

no serial correlation (Far , against 5th-order residual variables and the system; this reflects what can be seen
autoregression); no ARCH (Farch , against fourth- from the graphs in the third column of Fig. 5, namely
order); no heteroscedasticity (Fhet , see White, 1980); the presence of large outliers. Given our interest in
and a test for normality (χnd2 , see Doornik & Hansen, breaks, and since there seems to be non-normality
1994). Analogous system (vector) tests are labelled as caused by the outliers, we therefore retain our model.
‘v ’. Hence, the normality tests fail here for most of the Replacing m 1 with m ∗ yields the same results, with the
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 609

Fig. 4. Quarterly seasonally adjusted series of South African interest rates, real GDP and real exchange rates; see Table 4 for definitions. The
left column records the series in first and fourth differences (the latter are only reported over the sample that is used for estimation).

exception of Far for (m ∗ − cpi), which rejects at the variables, noting that the Johansen procedure has been
1% level, and χnd
2 , which is significant for (m − cpi)
∗ shown by Gonzalo (1994) to have good finite sample
but not for y. properties even in the presence of non-normal errors.4
Finally, as regards parameter constancy, Fig. 6 The cointegration statistics reported in Table 3 –
presents the equation residuals obtained by recur- where a constant and a trend enter the cointegration
sive estimation and their 0 ± 2b σ boundaries, which space unrestrictedly and restrictedly, respectively –
approximately represent their 95% confidence inter- support the hypothesis that there are two cointegrating
vals if the VAR is stationary. Fig. 7 records the relationships (see Johansen, 1995).
1↑ and N↓ (BreakPoint) Chow constancy tests for We then follow Jonsson (2001), and identify two
the individual equations and the VAR system (see cointegrating vectors:
Chow, 1960), together with the 1% bound. We no-
tice that, according to both figures, we can be rea- c1t = Rt − 0.126 r ert − 76.5 ∆cpi t − (m 1t − cpi t )
sonably confident of the overall constancy of the c2t = r ert − (yt − 0.019t) − 20.1 ∆cpi t
relationships, although they do exhibit occasional + 2.11 Rt − −.357(m 1t − cpi t ),
breaks, and we therefore retain this system for further
which are presented in Fig. 8. When replacing m 1
analysis.
with m ∗ , the estimated rank of cointegration is altered:
3.1.3. Stationarity analysis
Assuming that the VAR is reasonably specified, 4 I thank an anonymous referee for pointing this reference out to
we investigate the cointegration properties of the five me.
610 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Fig. 5. Goodness of fit of the VAR(2) model, with the VARSAM1 variables listed in Table 4, with an unrestricted constant and restricted linear
trend. This is estimated over the period 1965(1)–2000(2).

Table 3 cointegrating relationships which experience breaks


Cointegration statistics: eigenvalues (λ), log-likelihood (l), Trace tends to reduce the accuracy of the forecasts.
statistic (Tr) and the corresponding p-values. The model is a VAR(2)
for VARSAM1 with a restricted linear trend. South Africa may provide such an example, as,
when estimating the model over various subperiods,
Rank λ l Tr p-value
the trace statistic provides justification for varying
0 – 1457.141 156.95** [0.000]
numbers of cointegrating vectors. However, in the
1 0.453 1499.968 71.293** [0.009]
2 0.243 1519.728 31.773 [0.407] following we will compare the forecasting properties
3 0.124 1529.099 13.031 [0.737] of the VAR in level (VAR) and the VAR in differences
4 0.058 1533.359 4.5108 [0.671] taking cointegration into account (VEC) or not (DV).
5 0.031 1535.615 – –
The parameters of the cointegrating vectors will be
∗∗ Denotes significance at the 1% level.
recursively estimated under the constraints that yt and
the trend do not belong to c1t . As for the AM equation,
we cannot reject the presence of a unique common
we will compare models with four types of trends; for
trend, i.e., four cointegration relationships. Recursive
consistency, this choice will also be imposed in c2t
estimation of the VAR model with quasi money
appears to be much less stable than with narrow instead of the linear trend.
money; following the evidence of Jonsson (2001), we In addition, we explore two types of DMS
only retain two cointegrating vectors, bearing in mind forecasting in the VAR in differences: one consists of
that we might thus underestimate their number. modelling ∆xt conditional on It−h , while the other
Unfortunately, as has been shown by Clements focuses on ∆h xt |It−h as the AM equation. We refer
and Hendry (1999) and Hendry (2006), the use of to the former as DMS and the latter as DMSh.
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 611

Fig. 6. Residuals from recursive estimation of the VAR(2) model of VARSAM1.

3.2. An IMS version of AM claims the existence of one cointegrating relationship


between the three, when a trend is allowed to enter
In order to analyze the properties of the multi- the cointegration space. By contrast, if we allow N
step AM forecasting procedure, below we develop and FLIB to enter the system unrestrictedly, then
a multivariate equivalent. It must be noted that the we only marginally reject the hypothesis that the
parsimonious version of the AM only uses a few matrix is of full rank, and hence that the variables
variables: besides the real GDP itself and dummies, are not cointegrated ( p-value of 4%, given by PcGive
the regressors entering the equation are RPRIME (via in estimation over the period 1960(3)–2000(2), for
transformations), RCASUR, RGSUR, the log of TOT, a VAR(2) model, with the lags beyond 2 being
and an I(2) trend (see the definitions in Section 2.2 statistically insignificant).5 This small model, also
above). Hence, it seems natural to develop a VAR detailed in Table 4, allows us to generate IMS – and
model which includes these variables. As regards DMS – forecasts which we can compare to the solved
the N and FLIB indicators, they could be either out AM equation.
included or not, depending how acute the forecaster’s
perception of the economic environment could be 3.3. Univariate methods
assumed to have been at the time. We also consider
various models for the trend. 3.3.1. Robust techniques
Augmented Dickey–Fuller tests reject the presence We follow the results from the research of
of a unit root in the RCASUR and RGSUR series, Clements and Hendry, and use alternative forecasting
but fail to reject it as regards y, RPRIME and TOT.
Thus, a cointegration analysis should include the latter 5 This is why RGSUR and RCASUR appeared among the set of
three variables, and a trace test for reduced rank {X i } variables in Eq. (1).
612 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Fig. 7. 1↑ and N↓ (BreakPoint) Chow constancy tests for the individual equations and the system. The model is VARSAM1 and is estimated
as a VAR(2) with a restricted linear trend over the period 1965(1)–2000(2).

Fig. 8. Cointegrating vectors from the VAR(2) including m 1 .


G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 613

Table 4
Description of the data, sources and model definitions.

Data definitions
Source: Aron and Muellbauer (2002), seasonally adjusted
y Log Real GDP
RPRIME Real prime interest rate
RCASUR Ratio of current account surplus to current GDP
RGSUR Government surplus to GDP ratio
tot Log terms of trade
FLIB Spline financial liberalization indicator
N Monetary regime shift dummy for 1983(2)–85(4)
DUM92 Indicator variable for the 1991(3)–92(2) drought
Source: International Financial Statistics, seasonally adjusted
m1 Log M1 monetary aggregate
m∗ Log quasi money
cpi Log Consumer price index
R Short term interest rate
rer Log Real exchange rate (with US)

Models (estimated recursively over 1965(1)–2000(2)) Regressors

AM for ∆h yt ∆4 PRIMEt−h
P3
RPRIMAt−h = 14 i=0 RPRIMEt−h−i and RPRIMAt−h−4
RGSURMA12t−h = 12 1 P11 RGSUR
i=0 t−h−i
∆12 (tot)t−h
∆(FLIB)t−h
AMIMS (yt , RPRIMEt , RCASURt , RGSURt , tott )
VARSAM1 (m 1t , cpit , Rt , rert )
VARSAM∗ (m ∗t , cpit , Rt , rert )

Dummies

1 (DUM92t , N t , FLIBt )
Pt Pt
2 ( i=1 DUM92i , i=1 N i , FLIBt )
3 (N × ∆4 PRIMEt , N ×PRIMAt )
∅ No dummy

Trend

I2 Smooth I(2) trend estimated over the whole sample


recI2 Smooth I(2) trend estimated recursively
L Linear trend
∅ No trend

techniques, focusing on two classes: those of by:


differencing and intercept correcting. Thus, if we wish
to forecast yT +4 from a forecast origin T , it has been (DV) : yT +4 = yT ,
b (2a)
shown that the models: (DDVIMS) : yT +4 = yT + 4(yT − yT −1 ),
b (2b)
∆yt = ζ1t , (DV) (DDVDMS) : yT +4 = yT + (yT − yT −4 ).
b (2c)
∆∆yt = ζ2t , (DDVIMS)
∆4 ∆4 yt = ζ4t , (DDVDMS), By contrast, intercept correcting constitutes an
adjustment to an existing model, such that if b
yT +4 and
where the ζit are assumed to be independent white yT +4 are forecasts from the IMS and DMS models:
e
noise, can exhibit some degrees of robustness to
breaks. They respectively lead to forecasts b
yT +4 given IMS : b b 4 yT ,
yT +4 = Ψ
614 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

DMS : e e4 yT ,
yT +4 = Ψ are estimated recursively). Then, the trend can be
the overall I(2) (labelled as I2), a linear trend (L),
then the intercept corrected forecasts are defined as the a recursively estimated I(2) trend (recI2), or not
original forecasts corrected for the in-sample forecast included (∅). For the dummies, we include the three
errors of the latest available observations: discussed: DUM92, N and FLIB; the models with
b 4 yT + (yT − Ψ
yT +4 = Ψ
ICIMS : b b 4 yT −4 ), the three dummies are labelled as d1. We also allow
e4 yT + (yT − Ψ
e4 yT −4 ). for the cumulated DUM92 and N to be used instead,
ICDMS : eyT +4 = Ψ
so that their first difference provides the version in
More generally, we also consider intercept correction AM; together with FLIB this gives the models d2.
for all forecasting methods. We also P4tried, in d3, using N × ∆4 PRIME and
N × 14 i=1 RPRIMEt−i . Finally, ∅ refers to models
3.3.2. ARIMA models without dummies; see Table 4 for a definition of the
In addition to the robust methods presented above, models.
we also consider ARFIMA( p, d, q) models, where The forecasting technique can be any of IMS,
d is restricted to be 0 or 1, or is estimated as ICIMS (with intercept correction), DMS, DMSh,
a fractional parameter, using ARFIMA in Ox, see ICDMS or ICDMSh. We also used DV, DDVDMS and
Doornik and Ooms (2003). For the order of the DDVIMS. In the ARIMA( p,d,q) and ARIMA(ar,ma)
AR and MA polynomials, we allowed for either cases, we denote the three cases with, respectively,
fixing them at values between 0 and 2, or choosing d = 0, 1 and estimated d ∈ (0, 1), by IMS, DIMS and
them in-sample by minimizing the Schwarz criterion. FIMS. The same holds for DMS, DDMS and FDMS,
The resulting forecasts were also possibly intercept where the ARFIMA models are augmented with addi-
corrected, but we did not allow for dummy variables tional h lags, and the first h − 1 are restricted to zero.
or trends in these univariate models. We denote as In total, we obtain 779 models, each of which is es-
ARIMA( p, d, q) the models with estimated ( p, q) timated recursively over the period 1965(1)–2000(2)
and ARIMA(ar, ma) for fixed values (ar, ma) ∈ and assessed for h ∈ {2, 3, 4, 5, 8} over the period
{0, 1, 2}2 . IMS, DIMS and FIMS then refer to d = 0, 1 1974(2)–2000(2); see Table 5 for a complete descrip-
or being estimated; the same holds for DMS, DDMS tion of the forecasting techniques. Should we forecast
and FDMS. With or without intercept correction (IC), using expanding or rolling windows? There does not
univariate ARFIMA models provide 120 different seem to be a clear cut answer, but the evidence of
techniques. Pesaran and Timmermann (2005) points towards the
use of expanding windows, hence our use thereof. We
4. Forecast comparison report below the key features arising from the exer-
cise; additional tables are available from the IJF web-
4.1. Techniques site (www.forecasters.org/ijf).

We proceed to a comparison of ex-ante forecast 4.2. Forecast accuracy


accuracy using the various methods delineated above.
There are 779 techniques in total, which are labelled Prior to ranking the individual performances of the
as follows. The models are AM (univariate Aron & forecasting techniques, we first assess, in Table 6,
Muellbauer), AMIMS (multivariate version of AM), how these methods perform out-of-sample compared
VARSAM1 (the monetary VAR with narrow money), to their in-sample fit. The table presents the average
VARSAM∗ (with quasi money), ARIMA( p, d, q) ratios, over the set of forecasts and for all possible
(estimated lag orders), and ARIMA(ar, ma) (with choices of trends and dummy variables, of the root
(ar, ma) replaced with their values). For each of mean square forecast error (RMSFE) over the in-
the previous multivariate models, the representations sample estimated standard error of the residuals. In
can be VAR (in levels), DV or VEC (the last for order to compare them at several horizons √ h, we scaled
VARSAM1 and VARSAM∗ only; the cointegrating the ratios forPh IMS and ARIMA by h, since the
vectors are specified using the whole sample but variance of i=1 εt+i is hV(εt ) if εt is white noise
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 615

Table 5 in Table 6, often have almost zero denominators.6


Description of the forecasting techniques. This sits well with the choice made by authors such
as Marcellino et al. (2006) to work with stationary
series only.
It is important to stress that Table 6 reports
averages over the models with the different types
of dummies and trends that we consider in the
exercise. These averages conceal various patterns.
In particular, with respect to the types of trends,
the ratios tend to be smallest when using the
overall I(2); then either a linear or no trend yield
approximately similar reliability of in-sample fit.
However, when a recursively estimated I(2) trend
is used, the RMSFE tends to be much larger than
the in-sample residual standard errors. We will see
below that, in addition, recursive I(2) trends do not
bring much forecast accuracy benefit. By contrast,
the choice of dummy variables seems irrelevant
in the context of predictability of the forecast
error.

4.3. Rankings

Turning to forecast accuracy, we evaluate it via


RMSFE in Tables 7–9. We record the rankings for the
45 most accurate forecasts at horizons h = 2 and 4,
and these ‘league tables’ differ significantly. Tables
for h = 3 and 5 (not given here) show that there
is a continuity which allows one to approximately
interpolate over these values of the horizon. Noticing
that the best forecasts are generally obtained using
models where the smooth trend is estimated over
the whole sample, we also report the same rankings
excluding models with I2. We comment on these
(an assumption which can clearly be violated for the tables below, analysing each of the four points that
successive forecast errors over the forecast horizon). we believe need stressing out of this exercise in turn.
It is clear from the table that for most techniques Also, and this holds for all of the tables, whether we
the ratios increase with the forecast horizons, which consider the Mean Absolute Prediction Error (MAPE)
means that forecast accuracy worsens, and the in- or RMSFE does not modify the relative rankings
sample fit provides a less reliable gauge for DMS and significantly; hence our focus on the RMSFE, with the
DMSh than for IMS. The univariate methods are the unreported MAPE tables being available from the IJF
only ones for which the ratios remain close to unity, website.
especially when the lag orders are not chosen from
the data. ARIMA models in levels perform particularly
6 The median IMS ratios of ex-post MSFE over ex-ante residual
badly when assessing ex-ante vs ex-post fit. This is due
standard error – taking values between 417 and 468 when scaled by
to the exceptional fit of these flexible models in some h – are also large, but for DMS, the equivalent scaled medians are
subsamples. The ratios, whose average is reported between 1.6 and 2.4.
616

Table 6
Average ratios of the RMSFE over the in-sample residual standard error.
IMS DMS DMSh
Estimation Model h=2 h=3 h=4 h=5 h=8 h=2 h=3 h=4 h=5 h=8 h=2 h=3 h=4 h=5 h=8
VAR AMIMS 1.445 1.638 1.803 1.986 2.32 2.252 2.750 3.223 3.824 5.151
VARSAM1 1.289 1.448 1.612 1.815 2.40 2.204 2.632 3.093 3.814 5.715
VARSAM∗ 1.276 1.424 1.602 1.805 2.34 2.225 2.628 2.979 3.698 5.939
ARIMA( p, d, q) 82 155 80 617 77 570 70 892 59 773 59 177 451 75 38 310 17.4 3.07
ARIMA(ma, ar ) 1.152 1.154 1.143 1.142 1.141 0.943 31.2 0.814 206 1.154

DV AMIMS 1.081 1.189 1.280 1.411 1.575 1.232 1.593 1.914 2.234 3.060 1.387 1.832 1.996 1.819 3.009
VARSAM1 0.990 1.113 1.186 1.298 1.530 1.183 1.519 1.867 2.408 3.118 1.273 1.528 1.602 1.843 3.184
VARSAM∗ 0.986 1.115 1.184 1.293 1.524 1.215 1.530 1.872 2.487 3.029 1.343 1.595 1.766 1.985 3.649
AM 1.298 1.520 1.720 1.884 2.36 1.679 1.947 2.252 2.420 3.824
ARIMA( p, d, q) 1.113 1.150 1.245 1.266 1.295 1.197 1.017 0.836 1.923 1.369
ARIMA(ma, ar ) 0.450 0.506 0.524 0.572 0.652 0.654 0.777 0.591 0.814 1.086

VEC VARSAM1 1.112 1.243 1.329 1.429 1.648 1.332 1.692 2.139 2.714 3.725 1.446 1.782 1.900 2.102 3.783
VARSAM∗ 1.131 1.263 1.373 1.487 1.743 1.341 1.673 2.086 2.692 3.508 1.464 1.861 1.921 2.234 3.992
Fractional ARIMA( p, d, q) 2.72 2.71 2.67 2.57 3.42 2413 640 12.7 4.58 2.49 2.29
ARIMA(ma, ar ) 0.968 1.056 1.130 1.188 1.272 0.853 1.000 0.901 1.985 1.912
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 617

Table 7
RMSFE Rankings, ranked for h = 2.

Overall Excluding the I2 trend


Model Dum Trend Method h = 2 h = 4 h = 8 Model Dum Trend Method h = 2 h = 4 h = 8
VARSAM1 VEC ∅ I2 DMS 1 116 116 VARSAM1 VEC ∅ L DMS 4 128 99
VARSAM1 VEC 1 I2 DMS 2 136 117 VARSAM1 VEC 1 L DMS 13 156 107
VARSAM∗ VEC ∅ I2 DMS 3 192 81 VARSAM1 DV ∅ L DMS 14 60 92
VARSAM1 VEC ∅ L DMS 4 128 99 VARSAM1 DV ∅ ∅ DMS 14 62 92
VARSAM∗ VAR ∅ I2 IMS 5 4 3 VARSAM∗ VEC ∅ L DMS 16 144 77
VARSAM1 DV ∅ I2 IMS 6 15 7 AMIMS DV ∅ L DMS 18 160 68
VARSAM∗ DV ∅ I2 IMS 7 16 6 AMIMS DV ∅ ∅ DMS 18 162 68
AMIMS DV ∅ I2 IMS 8 14 5 ARIMA ( p, d, q) ∅ ∅ FIMS 21 81 43
AMIMS DV ∅ I2 DMSh 9 2 19 ARIMA (1, 1) ∅ ∅ FIMS 24 109 59
VARSAM1 DV ∅ I2 DMS 10 51 110 VARSAM∗ DV ∅ L DMS 25 75 61
AMIMS VAR ∅ I2 IMS 11 3 1 VARSAM∗ DV ∅ ∅ DMS 25 77 61
VARSAM∗ VEC 1 I2 DMS 12 207 83 VARSAM∗ DV ∅ L ICIMS 27 44 37
VARSAM1 VEC 1 L DMS 13 156 107 VARSAM∗ DV ∅ ∅ ICIMS 27 46 37
VARSAM1 DV ∅ L DMS 14 60 92 VARSAM∗ VEC 1 L DMS 31 169 84
VARSAM1 DV ∅ ∅ DMS 14 62 92 ARIMA (2, 0) ∅ ∅ FIMS 32 95 45
VARSAM∗ VEC ∅ L DMS 16 144 77 VARSAM1 DV 1 L DMS 35 70 94
VARSAM∗ DV ∅ I2 DMS 17 58 66 VARSAM1 DV 1 ∅ DMS 35 72 94
AMIMS DV ∅ L DMS 18 160 68 ARIMA (2, 1) ∅ ∅ FIMS 37 167 115
AMIMS DV ∅ ∅ DMS 18 162 68 VARSAM1 DV ∅ L ICIMS 38 40 34
AMIMS DV ∅ I2 DMS 20 150 86 VARSAM1 DV ∅ ∅ ICIMS 38 42 34
ARIMA (1, 1) ∅ ∅ FIMS 21 81 43 ARIMA (1, 0) ∅ ∅ FDMS 41 138 122
VARSAM1 DV 1 I2 DMS 22 57 112 VARSAM1 DV ∅ L IMS 42 86 50
VARSAM∗ DV ∅ I2 ICIMS 23 19 10 VARSAM1 DV ∅ ∅ IMS 42 88 50
ARIMA (2, 0) ∅ ∅ FIMS 24 109 59 VARSAM∗ VEC ∅ recI2 DMS 44 158 387
VARSAM∗ DV ∅ L DMS 25 75 61 VARSAM∗ DV ∅ L IMS 45 92 53
VARSAM∗ DV ∅ ∅ DMS 25 77 61 VARSAM∗ DV ∅ ∅ IMS 45 94 53
VARSAM∗ DV ∅ L ICIMS 27 44 37 VARSAM1 VEC ∅ recI2 DMS 47 114 147
VARSAM∗ DV ∅ ∅ ICIMS 27 46 37 VARSAM1 DV ∅ recI2 DMS 48 61 133
VARSAM1 DV ∅ I2 DMSh 29 5 22 VARSAM1 VAR ∅ L IMS 52 84 60
VARSAM1 VAR ∅ I2 IMS 30 7 2 AMIMS DV ∅ L IMS 53 78 47
VARSAM∗ VEC 1 L DMS 31 169 84 AMIMS DV ∅ ∅ IMS 53 80 47
ARIMA (2, 1) ∅ ∅ FIMS 32 95 45 VARSAM1 VAR ∅ ∅ IMS 55 143 230
VARSAM∗ DV 1 I2 DMS 33 59 67 AMIMS DV ∅ L ICIMS 56 64 40
VARSAM1 DV ∅ I2 ICIMS 34 20 9 AMIMS DV ∅ ∅ ICIMS 56 66 40
VARSAM1 DV 1 L DMS 35 70 94 VARSAM1 DV ∅ L DMSh 58 37 57

4.3.1. Models two VAR models. This seems to indicate that I2 is key
Comparing the three different VAR models that we for the economic model that was obtained by Aron
considered, none is always preferred. However, when and Muellbauer. Indeed, we see from the left columns
using the I2 trend, AMIMS tends to come first in the of Table 8 that their univariate AM technique is the
tables for h ≥ 4, whereas VARSAM1 then VARSAM∗ most accurate at horizon h = 4, but that there is
are more accurate for shorter (h < 4) horizons,7 and
no need here for dummy variables. It also forecasts
whether M1 or M∗ is used does not matter much. When
excluding I2, AMIMS performs worse than the other well at horizons h > 2, provided that I2 is used.
When excluding this type of trend, its performance
unfortunately drops in the tables at short horizons,
7 When writing h < 4, we also comment on the unreported tables
for h = 3. The same holds for h > 2 and the unreported table with
but retains some good medium-term – here over eight
h = 5. quarter – forecasting power, since the two versions
618 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Table 8
RMSFE Rankings, ranked for h = 4.

Overall Excluding the I2 trend


Model Dum Trend Method h = 2 h = 4 h = 8 Model Dum Trend Method h = 2 h = 4 h = 8
AM ∅ I2 DMSh 90 1 4 VARSAM1 VEC ∅ recI2 ICIMS 431 33 561
AMIMS DV ∅ I2 DMSh 9 2 19 VARSAM1 DV ∅ L DMSh 58 37 57
AMIMS VAR ∅ I2 IMS 11 3 1 VARSAM1 DV ∅ recI2 DMSh 436 38 82
VARSAM∗ VAR ∅ I2 IMS 5 4 3 VARSAM1 DV ∅ ∅ DMSh 58 39 57
VARSAM1 DV ∅ I2 DMSh 29 5 22 VARSAM1 DV ∅ L ICIMS 38 40 34
VARSAM∗ DV ∅ I2 DMSh 50 6 23 VARSAM1 DV ∅ recI2 ICIMS 366 41 521
VARSAM1 VAR ∅ I2 IMS 30 7 2 VARSAM1 DV ∅ ∅ ICIMS 38 42 34
VARSAM1 VEC ∅ I2 DMSh 191 8 36 VARSAM∗ DV ∅ L ICIMS 27 44 37
AMIMS DV ∅ I2 ICDMSh 177 9 46 VARSAM∗ DV ∅ recI2 ICIMS 367 45 522
VARSAM∗ VEC ∅ I2 DMSh 179 10 39 VARSAM∗ DV ∅ ∅ ICIMS 27 46 37
VARSAM∗ VAR ∅ I2 DMS 210 11 443 VARSAM∗ VEC ∅ recI2 ICIMS 460 47 554
AMIMS VAR ∅ I2 DMS 207 12 17 VARSAM1 VEC ∅ L ICIMS 68 48 30
VARSAM1 DV ∅ I2 ICDMSh 315 13 91 AMIMS VAR ∅ recI2 ICDMS 827 55 780
AMIMS DV ∅ I2 IMS 8 14 5 AMIMS VAR ∅ ∅ ICDMS 295 56 55
VARSAM1 DV ∅ I2 IMS 6 15 7 VARSAM1 DV ∅ L DMS 14 60 92
VARSAM∗ DV ∅ I2 IMS 7 16 6 VARSAM1 DV ∅ recI2 DMS 48 61 133
VARSAM∗ DV 1 I2 DMSh 390 17 486 VARSAM1 DV ∅ ∅ DMS 14 62 92
VARSAM∗ VAR 1 I2 DMS 318 18 482 AMIMS DV ∅ L ICIMS 56 64 40
VARSAM∗ DV ∅ I2 ICIMS 23 19 10 AMIMS DV ∅ recI2 ICIMS 398 65 524
VARSAM1 DV ∅ I2 ICIMS 34 20 9 AMIMS DV ∅ ∅ ICIMS 56 66 40
VARSAM∗ DV ∅ I2 ICDMSh 321 21 49 VARSAM1 DV 1 L DMS 35 70 94
VARSAM∗ VEC ∅ I2 ICDMSh 362 22 78 VARSAM1 DV 1 recI2 DMS 79 71 140
VARSAM1 VEC ∅ I2 ICDMSh 355 23 111 VARSAM1 DV 1 ∅ DMS 35 72 94
VARSAM1 DV 1 I2 DMSh 383 24 444 VARSAM∗ DV ∅ L DMS 25 75 61
VARSAM∗ VEC ∅ I2 IMS 40 25 8 VARSAM∗ DV ∅ recI2 DMS 62 76 142
AMIMS VAR ∅ I2 ICDMS 120 26 426 VARSAM∗ DV ∅ ∅ DMS 25 77 61
AMIMS DV 1 I2 ICDMSh 523 27 432 AMIMS DV ∅ L IMS 53 78 47
VARSAM∗ VEC 1 I2 DMSh 358 28 489 AMIMS DV ∅ recI2 IMS 446 79 535
VARSAM∗ VAR ∅ I2 ICDMS 93 29 519 AMIMS DV ∅ ∅ IMS 53 80 47
AMIMS DV ∅ I2 ICIMS 51 30 12 ARIMA (1, 1) ∅ ∅ FIMS 21 81 43
VARSAM∗ VEC 2 I2 DMSh 644 31 901 VARSAM1 VAR ∅ L IMS 52 84 60
VARSAM∗ DV 1 I2 ICDMSh 549 32 497 VARSAM1 VEC 1 recI2 ICIMS 521 85 571
VARSAM1 VEC ∅ recI2 ICIMS 431 33 561 VARSAM1 DV ∅ L IMS 42 86 50
VARSAM1 DV 1 I2 ICDMSh 536 34 465 VARSAM1 DV ∅ recI2 IMS 415 87 531
VARSAM∗ VEC ∅ I2 ICIMS 112 35 14 VARSAM1 DV ∅ ∅ IMS 42 88 50

with no trend, or a recursively estimated I2, are the 4.3.2. Best techniques
best in Table 9.
First, we notice that VEC models are the best at
Univariate ARIMA models also feature in the
short horizons h = 2, with or without I2, but that
tables, though more so when I2 is excluded. Also, the
autoregressive lag order is always at least as large as their relative performance worsens as h increases. This
the moving average order, and only once – at h = corroborates the risks posed by equilibrium-correction
2 when the I2 trend is excluded – is a model with in a non-stationary world, see Hendry (2006). The
endogenously chosen lag orders (ARIMA( p, d, q)) fact that VEC models remain in the tables even
present in the rankings. As the literature on ARIMA for rankings over longer horizons is in line with
forecasting has shown, these models are therefore the noncausal single or double difference models –
good, yet never best here. We will comment on them Eqs. (2a)–(2c) – being absent: Hendry (2006) shows
further in the next subsection. that both types should not be as accurate, and hence
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 619

Table 9
RMSFE Rankings, ranked for h = 8.

Overall Excluding the I2 trend


Model Dum Trend Method h = 2 h = 4 h = 8 Model Dum Trend Method h=2 h=4 h=8
AMIMS VAR ∅ I2 IMS 11 3 1 AM ∅ ∅ DMSh 88 250 25
VARSAM1 VAR ∅ I2 IMS 30 7 2 AM ∅ recI2 DMSh 299 249 28
VARSAM∗ VAR ∅ I2 IMS 5 4 3 VARSAM1 VEC ∅ L ICIMS 68 48 30
AM ∅ I2 DMSh 90 1 4 ARIMA (2, 0) ∅ ∅ ICFIMS 359 273 32
AMIMS DV ∅ I2 IMS 8 14 5 ARIMA (1, 0) ∅ ∅ ICFIMS 372 298 33
VARSAM∗ DV ∅ I2 IMS 7 16 6 VARSAM1 DV ∅ L ICIMS 38 40 34
VARSAM1 DV ∅ I2 IMS 6 15 7 VARSAM1 DV ∅ ∅ ICIMS 38 42 34
VARSAM∗ VEC ∅ I2 IMS 40 25 8 VARSAM∗ DV ∅ L ICIMS 27 44 37
VARSAM1 DV ∅ I2 ICIMS 34 20 9 VARSAM∗ DV ∅ ∅ ICIMS 27 46 37
VARSAM∗ DV ∅ I2 ICIMS 23 19 10 AMIMS DV ∅ L ICIMS 56 64 40
VARSAM1 VEC ∅ I2 IMS 84 49 11 AMIMS DV ∅ ∅ ICIMS 56 66 40
AMIMS DV ∅ I2 ICIMS 51 30 12 VARSAM1 VEC 1 L ICIMS 187 132 42
VARSAM1 VEC ∅ I2 ICIMS 96 36 13 ARIMA (1, 1) ∅ ∅ FIMS 21 81 43
VARSAM∗ VEC ∅ I2 ICIMS 112 35 14 VARSAM1 VEC ∅ L IMS 110 166 44
VARSAM1 VEC 1 I2 IMS 118 50 15 ARIMA (2, 1) ∅ ∅ FIMS 32 95 45
VARSAM1 VEC 1 I2 ICIMS 203 54 16 AMIMS DV ∅ L IMS 53 78 47
AMIMS VAR ∅ I2 DMS 207 12 17 AMIMS DV ∅ ∅ IMS 53 80 47
VARSAM∗ VEC 1 I2 IMS 117 53 18 VARSAM1 DV ∅ L IMS 42 86 50
AMIMS DV ∅ I2 DMSh 9 2 19 VARSAM1 DV ∅ ∅ IMS 42 88 50
VARSAM∗ VEC 1 I2 ICIMS 242 83 20 VARSAM∗ VEC ∅ L ICIMS 197 108 52
VARSAM∗ DV 1 I2 IMS 86 67 21 VARSAM∗ DV ∅ L IMS 45 92 53
VARSAM1 DV ∅ I2 DMSh 29 5 22 VARSAM∗ DV ∅ ∅ IMS 45 94 53
VARSAM∗ DV ∅ I2 DMSh 50 6 23 AMIMS VAR ∅ ∅ ICDMS 295 56 55
VARSAM∗ DV 1 I2 ICIMS 105 74 24 VARSAM1 VEC 1 L IMS 219 231 56
AM ∅ ∅ DMSh 88 250 25 VARSAM1 DV ∅ L DMSh 58 37 57
VARSAM1 DV 1 I2 IMS 89 89 26 VARSAM1 DV ∅ ∅ DMSh 58 39 57
AMIMS DV 1 I2 IMS 67 63 27 ARIMA (2, 0) ∅ ∅ FIMS 24 109 59
AM ∅ recI2 DMSh 299 249 28 VARSAM1 VAR ∅ L IMS 52 84 60
VARSAM1 DV 1 I2 ICIMS 181 110 29 VARSAM∗ DV ∅ L DMS 25 75 61
VARSAM1 VEC ∅ L ICIMS 68 48 30 VARSAM∗ DV ∅ ∅ DMS 25 77 61
AMIMS DV 1 I2 ICIMS 178 137 31 VARSAM∗ DV 1 L DMS 60 96 63
ARIMA (2, 0) ∅ ∅ FIMS 359 273 32 VARSAM∗ DV 1 ∅ DMS 60 98 63
ARIMA (1, 0) ∅ ∅ FIMS 372 298 33 AMIMS DV ∅ L DMS 18 160 68
VARSAM1 DV ∅ L ICIMS 38 40 34 AMIMS DV ∅ ∅ DMS 18 162 68
VARSAM1 DV ∅ ∅ ICIMS 38 42 34 AMIMS DV 1 L DMS 65 189 70

we can be confident that any location shifts that the tables, never as the most accurate, but always among
South African economy may have experienced over the best techniques.
the sample are not too detrimental to short horizon How do these models interact with the choice of
forecasting, although they matter for larger h. iterated or multi-step forecasting, with or without
intercept correction? When VEC is most accurate, it is
In contrast, level forecasting, such as with VAR using either DMS (or occasionally DMSh) over short
techniques, performs best at h = 8 in Table 9, in horizons, or ICIMS for larger h. At short horizons,
the left-hand columns, and, interestingly, we notice ICDMSh also appears later on in the table when the
that AMIMS, VARSAM1 and VARSAM∗ are then DMSh version of VEC is best (see Table 8, overall
the best performing; they are also generally good at ranking). Interestingly, when forecasting the level
shorter horizons. Models in differences that exclude (VAR) performs well, it is mostly associated with IMS
cointegrating relationships, DV, also appear in the (although DMS is present in Table 8, overall ranking)
620 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

when the I2 trend is used, but with ICIMS when it is h = 2, do not rank the same forecasting techniques as
not. This implies that ICIMS provides a robust way those where h > 2, see Table 7: when preferred at
to account for the evolutions of the trend that occur h = 2, the techniques perform increasingly worse as
immediately following the forecast origin, especially h grows. In contrast, most models which are ranked at
at longer horizons. h = 8 do not fare so badly for h = 2. Hence, good
As for DV models, all combinations occur. medium term forecast accuracy is a reasonable gauge
However, it emerges that at short horizons, DV IMS for shorter-term quality, but the converse does not
is best if combined with I2, while DV DMS is seem to hold, particularly when short-run techniques
best if not; DMSh is better at intermediate horizons involve cointegrating relationships. The latter should
(an annual horizon for quarterly data with residual therefore not be referred to as long-term equilibria,
seasonal patterns); and for h = 8, again, DV IMS as implicitly argued by Hendry (2006), since they are
is best if combined with I2, but if not, DV ICIMS is subject to intermittent breaks.
preferred.
The above results, which show that methods 4.3.4. Trends and dummies
deemed robust to breaks, namely DMS and IC, The I(2) trend, when estimated over the whole
feature prominently, are all the more interesting when sample, incorporates bilateral information at each
compared to the univariate models that perform well. point in time t, and hence alters the forecasts for the
Indeed, it is striking that the ARIMA models that better. Indeed, I2 features most of the time in the left
appear in the tables are without exception fractionally columns of Tables 7–9. When this trend is excluded,
integrated. Recent work by Perron and Qu (2007) has the relative performances of the various techniques
shown how occasional breaks can yield a pattern akin are altered; IMS in particular fares less well if not
to long memory. R. Bhansali had already shown in combined with intercept correction. Surprisingly, no
several papers, see e.g. Bhansali and Kokoszka (2002), alternative trend specification dominates. Both the
that DMS could prove useful when dealing with long linear trend (L) and no trend at all (∅) feature
range dependence, such as that arising from fractional prominently, and each model appears twice, first with
integration; see also Brodsky and Hurvich (1999) and
L then without a trend. Hence, linear trends are
Bos, Franses, and Ooms (2002). The results of our
preferable to no trend, but this does not change the
experiment are not fully satisfactory in this respect,
rankings of the models. Only at horizon h = 4 does
since most of the occurrences of ARFIMA models in
the recursive I(2) appear consistently, and it then tends
the tables involve FIMS, and occasionally ICFIMS,
to lie between the other two, and provides little gain,
but only once FDMS. However, the clear link between
except for the very best models.
breaks and long memory shows promise for further
For each model, either three different sets of
research.
dummy variables were used or none. In all of the
4.3.3. Horizons tables, the models without dummies perform best.
Clearly, horizons matter, and in particular as The same techniques appear again further down in
regards the forecasting technique developed by AM. It the tables, but with the dummies DUM92, N and
performs best for its horizon of focus, h = 4, and over FLIB included. Hence, there does not seem to be an
longer horizons, but not for h < 4. It is interesting to argument favouring their use here.
notice that DMSh, which essentially models seasonal
differences, like AM, performs particularly well at 4.4. Forecast accuracy improvements
horizons that are multiple of 4, i.e. (pluri-)annual
horizons. This shows that, although the data we We now assess the average improvement in forecast
use are seasonally adjusted, there must remain some accuracy that is brought by the choice of method.
seasonal effects. It is obvious that in an economy that Tables 10–13 record the average ratios of RMSFEs
has experienced many breaks, seasonal components for the methods delineated in the first three columns
cannot be perfectly estimated. over that of a reference model, e.g., with respect to
More generally, it appears that the most accurate recI2 trend in Table 10. We analyze four elements of
methods vary with h. In particular, very short horizons, improvement in turn. Extended tables with h = 3, 5
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 621

Table 10
Forecast accuracy comparisons: average RMSFE ratios over the corresponding models with a recursively estimated I(2) trend; the averages are
computed AMIMS and VARSAM models (VEC is only for the latter).

Model I2 trend Linear trend No trend


h=2 h=4 h=8 h=2 h=4 h=8 h=2 h=4 h=8
VAR IMS 0.853 0.82 0.848 0.988 1.106 1.194 1.104 1.408 1.852
DMS 0.671 0.665 0.764 0.779 0.794 0.882 0.844 0.825 0.923
ICIMS 0.946 0.912 1.006 1.050 1.180 1.443 0.998 0.916 1.041
ICDMS 0.591 0.567 0.699 0.695 0.686 0.876 0.783 0.666 0.711
DV IMS 0.805 0.741 0.613 0.816 0.772 0.690 0.816 0.772 0.690
DMS 0.939 0.941 0.937 0.947 0.962 0.929 0.947 0.962 0.929
DMSh 0.739 0.757 0.928 0.715 0.924 0.974 0.715 0.924 0.974
ICIMS 0.852 0.772 0.630 0.829 0.773 0.688 0.829 0.773 0.688
ICDMS 0.967 0.970 0.978 0.969 0.995 0.996 0.969 0.995 0.996
ICDMSh 0.79 0.734 0.931 0.758 0.867 0.977 0.758 0.867 0.977
VEC IMS 0.837 0.744 0.622 0.856 0.757 0.660 0.906 0.916 0.999
DMS 0.895 0.916 0.889 0.935 0.911 0.886 0.703 0.693 0.911
DMSh 0.786 0.854 0.818 0.761 1.091 0.941 0.959 0.921 1.043
ICIMS 0.863 0.752 0.629 0.832 0.726 0.641 0.639 0.655 0.844
ICDMS 0.961 0.934 0.948 0.970 0.957 0.970 0.814 0.764 0.639
ICDMSh 0.804 0.822 0.848 0.771 1.015 0.968 0.936 0.945 0.935

Table 11 IMS in the level of the VAR, where recI2 improves


Forecast accuracy comparisons: average RMSFE ratios over the accuracy over both the linear trend and no trend.
corresponding models without taking cointegration into account.
The main message from Table 10 is that there is
Model Forecast horizon an ordering between the trends: I2 is best, followed
2 4 8 by L, ∅, and finally recI2. Exceptions to this rule,
VARSAM1 VEC IMS 0.942 0.938 0.924 besides that which we reported above, concern (i) IC
DMS 0.885 0.989 0.936 methods in VAR level forecasting, where at longer
DMSh 0.989 1.013 1.009 horizons it is better not to include a trend than to
ICIMS 0.941 0.929 0.918 use L; and (ii) all methods, bar IMS, with the model
ICDMS 0.908 0.945 0.923
ICDMSh 0.962 1.008 0.966
in differences without (DV) or with cointegration
VARSAM∗ VEC IMS 1.064 1.035 1.005
(VEC, except for DMSh) at short horizons: methods
DMS 0.957 1.091 1.056 including an I(2) trend, no matter how it is estimated,
DMSh 0.972 0.941 1.042 are not preferable.8
ICIMS 1.077 1.033 0.999 Three recommendations follow: (i) since the I2
ICDMS 0.972 1.029 1.024 trend cannot be estimated ex ante, a modeller should
ICDMSh 0.972 0.945 1.024 avoid computing it recursively when forecasting with
any method that is meant to improve robustness to
misspecification. In contrast, (ii) recI2 proves useful
that also report the MAPE are available from the IJF when forecasting nonstationary variables in level
website. with IMS, but if intercept correction is used for
the nonstationary variables, then any kind of trend
4.4.1. Trends should be omitted. Finally, (iii) when dealing with
When comparing the various models with respect the differenced series, including the differences in the
to the choice of trend, as in Table 10, it is clear that
the recursively estimated I(2) trend leads to the worst 8 A linear trend and no trend are equivalent for the model
accuracy in most cases. However, its performance estimated in differences without cointegration (DV), since a
improves at longer horizons, in particular when using constant is always included.
622 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Table 12
Forecast accuracy comparisons: average RMSFE ratios over the corresponding models estimated in differences rather than in levels.

Model I2 trend recI2 trend Linear or no trend


h=2 h=4 h=8 h=2 h=4 h=8 h=2 h=4 h=8
AMIMS or VARSAM IMS 0.978 0.955 1.162 0.950 0.907 0.910 1.178 1.388 1.839
DMS 1.352 1.315 1.588 1.903 1.823 1.899 1.616 1.544 1.863
ICIMS 0.862 0.819 1.008 0.769 0.649 0.556 0.951 0.899 1.043
ICDMS 1.118 1.278 1.716 1.821 2.147 2.350 1.387 1.475 1.889

Model Overall ar = 0 ar 6= 0
h=2 h=4 h=8 h=2 h=4 h=8 h=2 h=4 h=8

ARIMA( p, d, q) IC/no IC IMS 0.919 0.807 0.538


FIMS 0.717 0.593 0.48
DMS 7.84e07 1.60e07 4.49
FDMS 6.16e06 13.646 1.091
ARIMA(ar, ma) no IC IMS 7.820 6.392 4.633 0.718 0.753 0.77
FIMS 2.066 2.307 2.283 0.666 0.669 0.662
DMS 0.528 0.668 0.901 1.357 1.581 1.598
FDMS 0.521 0.569 0.574 0.979 1.062 1.091

ARIMA(ar, ma) with IC IMS 245 1.016 1.036 2.274 2.283 2.125
FIMS 0.833 0.878 0.92 1.148 1.053 0.955
DMS 1.611 1.081 0.766 0.469 0.362 0.214
FDMS 0.754 0.858 0.794 0.526 0.459 0.193
Note: for the ARIMA models, the comparison is with the model imposing d = 0.

stochastic trend reduces accuracy and does not capture number of cointegrating vectors, or, even worse, omit
the potential breaks in mean growth. them altogether. Care must be taken with these results:
the analysis that we have carried out here is by no
4.4.2. Cointegration means sufficient to obtain a definitive answer, and
We now turn to the usefulness of cointegration questions remain as to whether or not equilibrium
relationships: they were specified using the whole relationships are useful in forecasting, see Hendry
sample, and, despite reasonable constancy of the (2006). All we can say is that the forecast failure that
cointegration vectors, will necessarily be misspecified they potentially induce may not be as prevalent as was
in subsamples. Table 11 shows three key results. feared: South Africa has experienced numerous breaks
First, neither the gains nor the losses in forecast over its recent history, and yet cointegration is useful
accuracy are as substantial as in other comparisons: in forecasting.
RMSFE ratios vary between 0.88 and 1.077. Second,
if cointegration relationships improve accuracy at all, 4.4.3. Differences versus levels
they do so mostly at short forecast horizons for Now for a vexed issue: is it preferable to forecast
multi-step methods and at longer horizons for iterated in differences or in levels? Marcellino et al. (2006)
methods. The ratios do indeed decrease with h for chose to compare forecasts with differenced variables,
(IC)DMS(h) and increase with h for (IC)IMS. Third, but we also considered levels in our analysis.
the differences between VARSAM1 and VARSAM∗ Table 12 records the ratios of accuracy statistics
do not permit us to conclude which of VEC or DV for level forecasts to those for the differences.
seems better overall. However, as we noted that the The discrepancy between multivariate models and
rank of cointegration may be underestimated in the their univariate counterparts is striking. It appears
case of VARSAM∗ , the table suggests that including that DMS forecasting in levels does not compare
relevant equilibrium correction mechanisms benefits advantageously with targeting the differences: the
accuracy, and that it is costly to underestimate the latter is more accurate for all models, except in the
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 623

Table 13
Forecast accuracy comparisons: average RMSFE ratios over the corresponding IMS models.

Model Forecast horizon


2 4 8
AMIMS and VARSAM VAR DMS 1.266 0.947 0.965
ICDMS 1.709 1.662 1.899
DV DMS 0.834 0.716 0.751
DMSh 1.069 1.050 1.141
ICDMS 1.034 0.839 0.855
ICDMSh 1.236 1.091 2.462
DDVDMS 0.858 0.762 0.622

VEC DMS 0.740 0.715 0.766


DMSh 1.548 1.0145 3.047
ICDMS 0.938 0.797 0.850
ICDMSh 1.297 0.995 2.814
AM dum 6= 2 DMSh 1.039 0.748 0.779
dum = 2 DMSh 1.129 0.938 1.932
ARIMA( p, d, q) DMS 8.21e07 9.53e06 6.611
DDMS 0.946 0.487 0.783
FDMS 7.85e06 11.31 1.779
ARIMA(ar, ma) ar = 0 DMS 0.094 0.155 0.331
DDMS 2.945 1.583 1.947
FDMS 0.346 0.380 0.473
ARIMA(ar, ma) ar 6= 0 DMS 1.760 1.749 2.049
DDMS 2.810 3.290 5.981
FDMS 2.004 2.213 2.206

cases of (i) IMA models, and (ii) ARIMA with IC Second, in multivariate models, ICIMS works best
and nonzero autoregressive lag order. In both cases, in levels (without a trend, as we saw in Section 4.4.1),
the improvements from using levels over differences but IMS should be used in differences, unless a
are substantial, and are similar for integrated and stochastic trend is included in the model, in which case
fractionally integrated models. The other instances the levels are preferable. The recommendation is the
where levels perform unambiguously better are opposite in univariate models: it is better to use the
(iii) for ICIMS, with stochastic trends (I2 or recI2), levels when forecasting with IMS in autoregressive
and in a few cases with a linear or no trend, processes, and the differences when using ICIMS,
(iv) IMS with recI2, and (v) ARIMA( p, d, q) models with the exception that levels are always preferable in
with IMS or FIMS. It should be noted that when conjunction with (IC)IMS when lag orders are chosen
using the Schwarz criterion.
using a recursive I(2) trend, and this type of trend
only, using levels rather than differences enhances
4.4.4. IMS versus DMS
the accuracy of ICIMS, and in most cases of
IMS, and that this improvement increases with the Finally, we consider the question at the core of
horizon. this paper, that of IMS versus DMS, as in Table 13.
We consider the various versions in turn. First, the
Two conclusions follow from these results. First, standard version of DMS tends to perform better
DMS is to be preferred in levels for univariate models than IMS, and especially so at intermediate horizons
(but do not let the Schwarz criterion pick the model (2 < h < 5). The notable exception is ARIMA
in this case!) and in differences for multivariate models with an imposed nonzero autoregressive lag
models (and maybe with cointegration, as noted in the order. The relative accuracy improvement of DMS
previous subsection). over IMS is stronger for DV and VEC models than for
624 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

VAR. Indeed, as a rule DMS cannot be recommended the most accurate techniques with the break intensity,
for the multivariate models in levels. For univariate as defined in Section 2.1.
models, the picture is more mixed: gains are not We compute the correlation between the absolute
straightforward for AM, although DMSh marginally ex ante forecast errors and break intensity, over the
improves upon IMS (cumulated dummies, d2, strongly whole sample and for each technique.9 We then order
modify the results, but the accuracy deteriorates when the estimated correlations according to the overall
they are used). DMS also performs well for IMA forecast accuracy of each technique (as reported in the
models, but not for the other reported cases. However, left columns of Tables 7–9). For each of h ∈ {2, 4, 8},
in ARIMA models, DMS performs appallingly when we obtain an ordered sample of 779 correlations.
used for models estimated in differences, unless the We then consider recursive samples of the n first
Schwarz criterion is used for specification, in which correlation coefficients (i.e., corresponding to the n
case they perform much better than IMS. Note that most accurate techniques), where n varies from 10
fractional DMS is not a good idea: it is better to reserve to 500. For each of these recursive samples, Fig. 9
fractional models for IMS. records the quartiles of the distribution of the n
It also appears from the table that DMSh does correlations, where n is reported on the horizontal
improve upon DMS in general. Also, intercept axis. The purpose of this exercise is to check
correction is better used in conjunction with IMS whether the accuracy is negatively related to the
rather than DMS, although both ICDMS and ICDMSh break intensity, and whether this holds for the best
exhibit patterns similar to DMS: they are better at performing techniques and only for them. We use
intermediate horizons. quartiles because some techniques which have low
Finally, robust double differencing seems to accuracy may also, by chance or not, exhibit negative
perform well in its direct multi-step version. However, correlation, but we do not expect this to occur too
DDVDMS never appears in the models with the best much as we move to the least accurate methods.
results in terms of forecast accuracy, and hence it is Fig. 9 records these quartiles for rankings at
not clear that its use should be recommended as a rule. h = 2 and h = 8, while Fig. 10 does the same
The conclusions that we draw are twofold: for h = 4, distinguishing two subperiods. For the
DMS is better than IMS in multivariate models choice of these subsamples, we refer to Table 1,
in differences or equilibrium-correction forms, and so that the two periods exhibit different features:
at longer horizons for models in levels. Additional the first covers 1973(4)–1986(4), when the South
corrections to DMS are not necessary. However, these African economy went through many changes and the
corrections may benefit IMS and render it more breaks were rather frequent; whereas in period 2, from
accurate than DMS (see the previous discussions). 1986 to 1995, the country was not very involved in
DMS is also useful in IMA models, but not so much in the international economy, and the system (i.e., the
ARIMA models with fixed lag orders. This confirms legal–political environment) did not evolve as fast
the results of Stoica and Nehorai (1989) and Tiao (which is not to say that the economy did not suffer,
and Xu (1993) for IMA models and Ing (2003) for say, from the 1992 drought). From 1995 to 1998, after
the democratic elections, the economic environment
autoregressive models. As pointed out by a referee, it
(banking and financial sectors) was relatively stable,
may be possible that data mismeasurement plays a role
and deregulation took place again afterward. The
here. Measurement errors may indeed induce negative
columns on the right of the figures present magnified
serial correlation, and this could explain why DMS
versions of the left columns, focusing on the 100 most
performs well with IMA models.
accurate techniques.
From these figures, two remarks can be drawn.
4.5. Robustness to breaks
First, there is a large contrast between short (h = 2)
We now address the question which underlies the
9 In computing the correlation, we consider break intensity
choice of South Africa as a country for the analysis: is
measured over the 30 observations prior to the value to be forecast
robustness to breaks essential to forecast accuracy? To (not the forecast origin). Hence, we also allow for breaks occurring
answer this, we contrast the forecast performance of over the forecast horizon.
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 625

Fig. 9. Quartiles of the distribution of coefficients of correlation between the absolute forecast error and the break intensity. The quartiles are
computed for the sample of the n most accurate techniques at that horizon, where n is on the horizontal axis. Dotted lines represent the first and
third quartiles, crosses are used for the median.

and long horizons (h = 8). Indeed, robustness to estimation can improve the accuracy of forecasts, and
breaks does not seem to matter for h = 2, whereas we if so, when it can. We knew from theoretical results
observe a clear pattern at h = 8: the 80 most accurate obtained by, inter alia, Peña (1994) and Schorfheide
forecasting techniques do better in the presence of (2002), that amongst the conditions beneficial to DMS
breaks, but this feature disappears when considering are those of model misspecification and non-constancy
the remaining methods. Similar patterns arise at h = of the DGP. Given that Aron and Muellbauer (2002)
4 on either subperiod for the best 40 techniques, have derived an equation for forecasting the South
although this is clearer in the second subperiod. African GDP which uses direct multi-step estimation,
From the figures, we can infer that, at longer and that the national economy of this country has
horizons, being accurate in the presence of breaks is experienced several regime changes and extraneous
key to forecast accuracy. It does not seem to matter shocks, we decided to build our forecast accuracy
at short horizons. This in line with the discussions comparison drawing on this research. The strategy on
above, namely that equilibrium correction is useful at which we settled was to derive 779 models and to
short horizons (Hendry, 2006) and that both DMS and record measures of their corresponding historical ex-
ICIMS improve upon IMS at long horizons. ante forecast accuracy.
The results are, essentially, that the direct
5. Conclusions equation derived by Aron and Muellbauer (2002)
has impressive forecasting power at horizons at
The purpose of our analysis was to observe, least as large as that for which it was designed,
in an empirical exercise, whether direct multi-step viz. four quarters ahead. It is therefore worth the
626 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Fig. 10. Quartiles of the distribution of coefficients of correlation between the absolute forecast error and the break intensity. The quartiles are
computed for the sample of the n most accurate techniques at that horizon, where n is on the horizontal axis. Dotted lines represent the first and
third quartiles, crosses are used for the median. Correlations are computed over two subsamples: pre-1987 (period 1) and post-1986 (period 2).

effort of designing a parsimonious economic equation also rank high. In accordance with these results, we
aimed at DMS forecasting. However, as these authors showed that robustness to breaks was achieved by
specified their model using the whole sample, it is these methods at intermediate and long horizons, and
only reasonable that it should have good forecasting that this seems to be a significant feature for forecast
properties: it is difficult to judge its performance accuracy.
further since this model does not forecast so well
at all horizons. At short horizons, we found that Acknowledgements
equilibrium-correction models rank among the most
accurate when combined with DMS. However, the I am grateful to David Hendry, Janine Aron, John
best techniques at h = 2 are no guide for Muellbauer, Michael Clements and three anonymous
their performances at longer horizons (though the referees for helpful suggestions.
converse holds). Indeed, it is multivariate models for
the differences of the variables that perform best, References
with DMS at intermediate horizons, and with IMS
combined with intercept correction at large horizons. Aron, J., & Muellbauer, J. (2002). Interest rate effects on output:
It is hence important to study multi-step forecasting in Evidence from a GDP forecasting model for South Africa. IMF
Staff Papers, 49, 185–213.
multivariate models further, as in Jorda and Marcellino Bai, J., & Perron, P. (2003). Computation and analysis of multiple
(2007), and more needs to be discovered of the structural change models. Journal of Applied Econometrics, 18,
properties of IC and IMS. Univariate ARFIMA models 1–22.
G. Chevillon / International Journal of Forecasting 25 (2009) 602–628 627

Bhansali, R. J. (1996). Asymptotically efficient autoregressive Harvey, A. C., & Jaeger, A. (1993). Detrending, stylized facts
model selection for multistep prediction. Annals of the Institute and the business cycle. Journal of Applied Econometrics, 8,
of Statistical Mathematics, 48, 577–602. 231–247.
Bhansali, R. J. (1997). Direct autoregressive predictors for multistep Haug, A. A., & Smith, C. (2007). Local linear impulse responses
prediction: Order selection and performance relative to the plug- for a small open economy. Discussion paper 2007/09. Reserve
in predictors. Statistica Sinica, 7, 425–449. Bank of New Zealand.
Bhansali, R. J., & Kokoszka, P. (2002). Computation of the forecast Haywood, J., & Tunnicliffe-Wilson, G. (1997). Fitting time series
coefficients for multistep prediction of long-range dependent model by minimizing multistep-ahead errors: A frequency
time series. International Journal of Forecasting, 18, 181–206. domain approach. Journal of the Royal Statistical Society, Series
Bos, C. S., Franses, P. H., & Ooms, M. (2002). Inflation, forecast B, 59, 237–254.
intervals and long memory regression models. International Haywood, J., & Tunnicliffe-Wilson, G. (2004). A test for
Journal of Forecasting, 18(2), 243–264. improved multi-step forecasting. Mimeo. Victoria University of
Brodsky, J., & Hurvich, C. M. (1999). Multi-step forecasting for Wellington.
long-memory processes. Journal of Forecasting, 18, 59–75. Hendry, D. F. (2006). Robustifying forecasts from equili-
Chevillon, G. (2007). Direct multi-step estimation and forecasting. brium–correction systems. Journal of Econometrics, 135,
Journal of Economic Surveys, 21(4), 746–785. 399–426.
Chevillon, G. (2008). Multi-step forecasting in the presence of weak Ing, C.-K. (2003). Multistep prediction in autoregressive processes.
trends. Mimeo. ESSEC Business School. Econometric Theory, 19, 254–279.
Chevillon, G., & Hendry, D. F. (2005). Non-parametric direct Ing, C.-K. (2004). Selecting optimal multistep predictors for
multi-step estimation for forecasting economic processes. autoregressive process of unknown order. The Annals of
International Journal of Forecasting, 21, 201–218. Statistics, 32, 693–722.
Johansen, S. (1995). Likelihood-based inference in cointegrated
Chow, G. C. (1960). Tests of equality between sets of coefficients in
vector auto-regressive models. Oxford: Oxford University Press.
two linear regressions. Econometrica, 28, 591–605.
Johnston, H. N. (1974). A note on the estimation and prediction
Clark, T. E., & McCracken, M. W. (2005). Evaluating direct
inefficiency of dynamic estimators. International Economic
multistep forecasts. Econometric Reviews, 24(4), 369–404.
Review, 15, 251–255.
Clements, M. P., & Hendry, D. F. (1996). Multi-step estimation
Jonsson, G. (2001). Inflation, money demand and purchasing power
for forecasting. Oxford Bulletin of Economics and Statistics, 58,
parity in South Africa. IMF Staff Papers, 42, 243–265.
657–683.
Jorda, O., & Marcellino, M. (2007). Path forecast evaluation.
Clements, M. P., & Hendry, D. F. (1999). Forecasting non-
Mimeo. UC Davis.
stationary economic time series. Cambridge, MA: The MIT
Kang, I.-B. (2003). Multi-period forecasting using different models
Press.
for different horizons: An application to US economic time
Cox, D. R. (1961). Prediction by exponentially weighted moving series data. International Journal of Forecasting, 19, 387–400.
averages and related methods. Journal of the Royal Statistical
Klein, L. R. (1971). An essay on the theory of economic prediction.
Society, B 23, 414–422.
Chicago, IL: Markham.
Doornik, J. A., & Hansen, H. (1994). An omnibus test for univariate Koopman, S. J., Harvey, A. C., Doornik, J. A., & Shephard, N.
and multivariate normality. Discussion paper. Nuffield College. (2000). STAMP: Structural time series analyser, modeller and
Doornik, J., & Ooms, M. (2003). Computational aspects of predictor. London: Timberlake Consultant Press.
maximum likelihood estimation of autoregressive fractionally Lin, J., & Tsay, R. (2005). Comparisons of forecasting methods with
integrated moving average models. Computational Statistics and many predictors. Mimeo. Academia Sinica.
Data Analysis, 42, 333–348. Lin, J. L., & Tsay, R. S. (1996). Co-integration constraint and
Eklund, J., & Karlsson, S. (2005). Forecast combination and model forecasting: An empirical examination. Journal of Applied
averaging using predictive measures. Discussion paper no. Econometrics, 11, 519–538.
5268. CEPR. Liu, S. I. (1996). Model selection for multiperiod forecasts.
Findley, D. F. (1983). On the use of multiple models for multi- Biometrika, 83(4), 861–873.
period forecasting. Proceedings of Business and Economic Marcellino, M., Stock, J. H., & Watson, M. W. (2006). A
Statistics, American Statistical Association, 528–531. comparison of direct and iterated multistep AR methods
Findley, D. F., Potscher, B. M., & Wei, C. Z. (2004). Modeling of for forecasting macroeconomic time series. Journal of
time series arrays by multistep prediction or likelihood methods. Econometrics, 135, 499–526.
Journal of Econometrics, 118, 151–187. Peña, D. (1994). Discussion: Second-generation time-series model,
Giacomini, R., & White, H. (2006). Tests of conditional predictive A comment. Journal of Forecasting, 13, 133–140.
ability. Econometrica, 74(6), 1545–1578. Perron, P., & Qu, Z. (2007). An analytical evaluation of the log-
Gonzalo, J. (1994). Five alternative methods of estimating long- periodogram estimate in the presence of level shifts. Mimeo.
run equilibrium relationships. Journal of Econometrics, 60, Boston University.
203–233. Pesaran, M. H., & Timmermann, A. (2005). Small sample properties
Harvey, A. C. (1993). Time series models (2nd ed.). Hemel of forecasts from autoregressive models under structural breaks.
Hempstead: Harvester Wheatsheaf. Journal of Econometrics, 129, 183–217.
628 G. Chevillon / International Journal of Forecasting 25 (2009) 602–628

Proietti, T. (2008). Direct and iterated multistep AR methods for Tiao, G. C., & Tsay, R. S. (1994). Some advances in non-linear
difference stationary processes. Mimeo. University of Rome, and adaptive modelling in time-series analysis. Journal of
Tor Vergata. Forecasting, 13, 109–131.
Schorfheide, F. (2002). VAR forecasting under local misspecifica- Tiao, G. C., & Xu, D. (1993). Robustness of maximum likelihood
tion: Loss function estimation and model selection. Working pa- estimates for multi-step predictions: The exponential smoothing
per. Economics department, University of Pennsylvania. case. Biometrika, 80, 623–641.
Schumacher, C., & Breitung, J. (2008). Real-time forecasting of Tsay, R. S. (1993). Comment: Adpative forecasting. Journal of
German GDP based on a large factor model with monthly Business and Economic Statistics, 11(2), 140–142.
and quarterly data. International Journal of Forecasting, 24(3), Weiss, A. A. (1991). Multi-step estimation and forecasting in
386–398. dynamic models. Journal of Econometrics, 48, 135–149.
Stoica, P., & Nehorai, A. (1989). On multi-step prediction errors White, H. (1980). A heteroskedasticity-consistent covariance matrix
methods for time series models. Journal of Forecasting, 13, estimator and a direct test for heteroskedasticity. Econometrica,
109–131. 48, 817–838.

You might also like